Descriptive Analytics
Business report: Visualisation and Statistical Analysis of a Real World Data Set
House Market Case Study
Aim: For this coursework you will act as a junior business analytics consultant working for one of the group members hometown's (i.e. the town/city he/she grew up) estate agency with the aim of carrying out a study on the housing market in his/her hometown. The manager of the real estate agency wants to obtain a general view of the housing market focusing on the housing prices, type and size of houses for sale.
The Data Set
You will collect your data from a reliable source using the first part of your hometown's postcode or a UK postcode of your preference. (e.g. if your postcode is B17 8GH, you should use B17 only to collect the data).
Select information for a sample of around 80 to 100 houses making sure that you have a representative sample of different type of houses (flat, terraced, detached, semi- detached), and sizes (i.e. number of bedrooms). The total number of houses in your postcode (i.e. population) should be more than 100. If this is not the case then you need to extend your search to a neighbouring postcode to ensure that you have a large enough population from where you will draw the sample of 80 to 100 houses. For each house collect information on the price and a maximum of four characteristics such as: house type, number of bedrooms, number of bathrooms, and distance from nearest railway station (i.e. if you think that distance is not a good choice of characteristics you can use another but please justify your choice). With the exemption of price the above characteristics are just suggestions. You need to think which features (characteristics) in your postcode are important and could influence people's purchasing behaviour.
Once you have completed your statistical analysis write a report that provides the manager of the company with the information he/she needs. However, remember that managers are numerate but not statisticians! You therefore need to produce a professionally written report separated in two parts, the main part and appendix. The main part - around 5 pages - needs to include only information without any statistical terminology, i.e. Managers that are not experts in statistics need to be able to understand it. All technical information and analysis needs to be included in the appendix.
Statistical Analysis
The report should contain amongst other things the following information,
1. A description of the problem, the source of your data, the limitation and any problems encountered in using the data. Discuss your sampling method and justify why the sample is representative of the population.
2. Produce at least three visualisation methods for either the population or the sample (i.e. they should be of different type) to describe the data. One of them should definitely include more than one house characteristics. A clear summary of the information obtained from each visualisation method as well as a justification of their choice of type is required. Make sure that you evaluate the visualisation methods by critiquing their adequateness and ensure that appropriate principles and guidelines are followed when drawing them. At least one visualisation method should reflect the house market in that location as a whole.
3. A clear summary and table of the descriptive statistics and the information which can be obtained from these statistics.
4. Assuming that the housing prices are normally distributed present your manager with the 95% or 99% confidence interval of the average house price per house type and explain their meaning.
5. Undertake the necessary analysis to produce your manager with a summary on whether the average price of the different type of houses in your data sample is in line with the average price in the UK (or your region/county).
6. Carry out correlation analysis (i.e. correlation matrix) between price (dependent variable) and all the other house characteristics you think affect the price of the house (independent variables). Interpret results. It is usually useful to know which variables are highly correlated but it is also sometimes useful to draw attention to correlations, which are unexpectedly low.
7. Carry out regression analysis and derive the most parsimonious model. Comment on the significance of the effect of the independent variables (e.g. size, house type, number of bedrooms etc.) on the dependent variable (price). Comment on the magnitude of the effect of the independent variables on the dependent variable. Provide the reasoning behind the steps taken to identify the most parsimonious model and the reason for choosing your model (i.e. model selection criterion).
8. Carry out the residual analysis for the final (most parsimonious) model. There is no need to carry out the residual analysis for any other model since it is a waste of time. Do not show me any residual analysis for any model other than the final one. Comment on the suitability of the final model with regard to its adequacy, goodness of fit and suitability. If the model is inadequate, what does it mean and comment on what should be done to address this problem. (Hint: residual analysis refers to checking the five regression assumptions).
9. Write the derived statistical model and give example of its usage.
10. Any analysis taken should be justified answering the following questions. Why is this method used? What does this method do? What is the information obtained?
Writing the Report
a) The Course Manager is numerate but has no knowledge of statistics so any statistical terminology must be explained within the report.
b) The Course Manager has no knowledge of statistical packages. You should mention which package has been used for the analysis (Excel/SPSS/
R) but no more. References to SPSS/Excel/R procedures will be lost on him/her. Explanations should use the real variable names and not column numbers, these have no meaning outside the specific spreadsheet.
c) The main body of the report should contain only brief explanations of technical terms, e.g. ‘the standard deviation is a measure of the spread or variability of
the data about the sample mean'.
d) Do not use technical terms, which have not been explained beforehand. A discussion of a regression line is meaningless if the reader has no idea what regression is or does.
e) Number the pages of the report, use sections and make sure that you include
table of contents.
f) Make sure that you pay attention to the following rules for professional report writing.
Coursework - 2nd Hand Car Market Case Study
Aim: Each student has to carry out a study on behalf of a market research company that is acting on behalf of Car4all, on the second-hand car market in the UK for a specific model and manufacturer. The marketing research company wishes to determine, which are the most important variables in determining the price of a second hand car. The company is also interested in a statistical model it can use to estimate the market value of a car knowing only its mileage, engine size and other optional (e.g. body make 3/5 doors or hatchback, Central Locking), whether Mileage Verified, full service history, number of previous owners, etc.
The data set
You can collect your data on ONE MODEL of ONE specific MAKE of car from any reliable source such as specialised websites, or similar in your home country.
You need to derive a population of approximately 300 cars and from those select between 80 to 100 cars of the same model and make, e.g. FORD FOCUS or TOYOTA AURIS, etc.
For each car collect price and a maximum of four characteristics such as mileage, engine size, colour and other optional (e.g. body make 3 /5 doors or hatchback, Central Locking), whether Mileage Verified, full service history, number of previous owners, etc.
It would be advisable also that you check that the cars' age range does not exceed 5 years.
Once you have completed your statistical analysis write a report that provides the manager of the company with the information he/she needs. However, remember that managers are numerate but not statisticians! You therefore need to produce a professionally written report separated in two parts, the main part and appendix. The main part - around 5 pages - needs to include only information without any statistical terminology, i.e. Managers that are not experts in statistics need to be able to understand it. All technical information and analysis needs to be included in the appendix.
Statistical analysis
The report should contain amongst other things the following information,
1. A description of the problem, the source of your data, the limitation and any problems encountered in using the data. Discuss your sampling method and justify why the sample is representative of the population. Explain why the 5-year range was recommended.
2. Produce at least three visualisation methods for either the population or the sample (i.e. they should be of different type) to describe the data. You can include cars of different models and not necessarily related to the car model used in the subsequent statistical analysis. One of them should definitely include more than one car characteristics. A clear summary of the information
obtained from each visualisation method as well as a justification of their choice of type is required. Make sure that you evaluate the visualisation methods by critiquing their adequateness and ensure that appropriate principles and guidelines are followed when drawing them. At least one visualisation method should reflect the second-hand car market in that location as a whole and not limited to the specific car and model that the subsequent statistical analysis will be based on.
3. A clear summary table of the descriptive statistics and the information. When carrying out the appropriate descriptive analysis be aware of the different types of data. You should also carry out analysis of continuous variables when divided according to categorical variables. For example, carry out the descriptive analysis for price but divided according to fuel type. Comment on the results. Make sure that your summary table provide statistical measures that are useful and meaningful for the type of data considered.
4. Assuming that the second-hand car prices are normally distributed present your manager with the 95% or 99% confidence interval of the average second-hand car price and explain their meaning.
5. Undertake the necessary analysis to produce your manager with a summary on whether the average price of a second-hand car of your choice in your data sample is in line with the average price in the UK (or your region/county) for either the same model or one in a similar category of cars. Make sure that you reference the source you have used to obtain the average price from. Carry out any more hypotheses tests that you find appropriate. Interpret results.
6. Carry out correlation analysis (i.e. correlation matrix) between price (dependent variable) and all the other car characteristics you think affect the price of the car (independent variables). Interpret results. It is usually useful to know which variables are highly correlated but it is also sometimes useful to draw attention to correlations, which are unexpectedly low.
7. Carry out regression analysis and derive the most parsimonious model. Comment on the significance of the effect of the independent variables (e.g. mileage, fuel type, gearbox type etc.) on the dependent variable (price). Comment on the magnitude of the effect of the independent variables (e.g. mileage, fuel type, gearbox type etc.) on the dependent variable (price). Provide the reasoning behind the steps taken to identify the most parsimonious model and the reason for choosing your model (i.e. model selection criterion)
8. Carry out the residual analysis for the final (most parsimonious) model. There is no need to carry out the residual analysis for any other model since it is a waste of time. Do not show me any residual analysis for any model other than the final one. Comment on the suitability of the final model with regard to its adequacy, goodness of fit and suitability. If the model is inadequate, what does
it mean and comment on what should be done to address this problem. (Hint: residual analysis refers to checking the five regression assumptions).
9. Write the derived statistical model and give example of its usage.
10. Any analysis taken should be justified answering the following questions. Why is this method used? What does this method do? What is the information obtained?