Thursday, December 5, 2019

Car Data Statistical Analysis

Question: Describe about the Report on Car Data Statistical Analysis? Answer: Statistical data analysis plays an important role in decision making. Here, we analyse the data for different types of cars. We study the price of the cars and other related variables of the cars. We check whether there is any significant variable which effects on the price of the car. We also check whether there is any linear relationship exists between the mileage of the car and price of the car. We see some descriptive statistics for given variables. The data is collected for the 73 cars. The method of random sampling is used for data collection. All selected cars are of same make and model. The make of the car is Chevrolet and model of the cars is cavalier. We select the four characteristics or variables given as price, mileage, trim and type from given data set. Data is accessed from different online and offline sources. Data is collected from websites such as https://www.autotrader.co.uk/ and https://www.carsource.co.uk/. Data is given in the appendix section. After data collec tion, the next important step is data analysis or statistical analysis. In statistical data analysis, we use some statistical tools and statistical techniques. For this statistical analysis purpose, we used the SPSS statistical software for outputs for descriptive statistics, graphs and different tests. We use some descriptive statistics for summarising the variables given in the data set. We also use some inferential statistics to check some claims about the car data. We have to check the relationship between the given variables. We have to see some graphical analysis for given variables. We have to see some inferential statistics or testing of hypothesis for checking our claims regarding the given car data. After statistical analysis, we made some conclusions regarding the car data. The scatter diagram for the variables price of the car and mileage of the car shows that there is negative association or linear relationship exists. The correlation coefficient between price and mileage is given as -0.84, this means, there is a high negative linear relationship or association or correlation exists between the price and mileage of the car. This means, as there is an increment in the price of the car, there is decrement in the mileage of the car or vice versa. About 70.6% of the variation in the price is explained by the mileage of the car. The average price of the car is given as 19431. The minimum price is noted as 8500 while maximum price is noted as 32950. The average mileage of the car is given as 44992 units. The minimum mileage is observed as 3558 units and maximum mileage observed as 154000 units. The value for coefficient of determination or R square is given as 0.706 which means, ab out 70.6% of the variation in the price is explained by the mileage of the car. The regression equation for this regression model is given as below: Price = 24920.172 0.122*mileage We use this regression model because the correlation between price and mileage is high however all other correlations are significant. Some of the descriptive statistics for the price of the cars is summarise as below: Descriptive Statistics N Minimum Maximum Sum Mean Std. Deviation Price 73 8500.00 32950.00 1418463.00 19431.0000 5112.65422 Valid N (listwise) 73 The average price of the cars is given as 19431 approximately and the minimum price of the car is observed as 8500 approximately. The maximum price of the car is observed as 32950 approximately. Standard deviation is given as 5112.65 and the values for skewness and kurtosis are negative values so we can conclude that the data skewed at negative side or left side from the mean. The descriptive statistics for the mileage of the cars is summarising as below: Descriptive Statistics N Minimum Maximum Sum Mean Std. Deviation Mileage 73 3558.00 154000.00 3284459.00 44992.5890 35203.72920 Valid N (listwise) 73 Five number summaries for the price of the car is given as below: Five-Number Summary Minimum 10546.78 First Quartile 12126.9 Median 12944.94 Third Quartile 13725.45 Maximum 15053.93 The five-number summary for the mileage of the car is given as below: Five-Number Summary Minimum 1160 First Quartile 15794 Median 20043 Third Quartile 25031 Maximum 39946 The residual analysis for the given regression model is summarising as below: Residuals Statisticsa Minimum Maximum Mean Std. Deviation N Predicted Value 6131.9106 24486.0898 19431.0000 4294.91472 73 Residual -7507.92676 8818.44629 .00000 2773.61510 73 Std. Predicted Value -3.096 1.177 .000 1.000 73 Std. Residual -2.688 3.157 .000 .993 73 a. Dependent Variable: Price Residual mean is given as zero with the standard deviation of 2773, this suggests that the model is a balanced model. For checking the claim whether the mileage for different engine size is same or not, we have to use the one way analysis of variance. The null and alternative hypothesis is given as: null hypothesis: The mileage for different engine size is same. Alternative hypothesis: The mileage for different engine size is not same. For checking this claim, we have to use the one way ANOVA test. The p-value for this ANOVA test is given as 0.482 which is greater than level of significance or alpha value 0.05, so we do not reject the null hypothesis. The correlation coefficient between price and mileage found as -0.84, this means there is high negative correlation or association or linear relationship exists between the price and mileage. The correlation coefficient between price and engine size is given as 0.278 while the correlation coefficient between the price and gearbox size is given as 0.161. The correlation coefficient between the price and age or year is given as 0.736 which is high positive correlation. This means, for two variables mileage and age, the correlations found as high or strong. The correlation matrix is given in the appendix section. Detailed data analysis with figures and complete data is given in the appendix section. Conclusions: The correlation coefficient between price and mileage is given as -0.84, this means, there is a high negative linear relationship or association or correlation exists between the price and mileage of the car. This means, as there is an increment in the price of the car, there is decrement in the mileage of the car or vice versa. The value for coefficient of determination or R square is given as 0.702 which means, about 70.2% of the variation in the price is explained by the mileage of the car. The correlation coefficient between the price and age is given as 0.736, this mean, there is high correlation or association or linear relationship exists between two variables price and age. Two most correlated variables are price and mileage of the car. Average mileage for the cars with different engines is same.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.