Statistical Data Analysis – Statistics Assignment Sample

Assignment

On

Data Analysis













Submitted To

Submitted By





Task 1 1. Descriptive statistics

The descriptive statistics showing mean, median, mode, standard deviation, variance and range for the start-up costs of the five businesses as provided in the given data are shown as below:

Table :1 Descriptive Statistics
Pizza (X1) Baker (X2) Shoe-store (X3) Gift Shop (X4) Pet Store (X5)
Mean 83.00 92.09 72.30 87.00 51.63
Median 80 87 70 97.5 49
Mode 35 Nil Nil 100 30
Standard Deviation 34.13 38.89 31.37 35.90 27.07
Sample Variance 1165.17 1512.69 983.79 1289.11 733.05
Range 105 120 90 115 90

2. Frequency distribution and histogram

(a) Frequency and relative frequency distribution of various businesses

(i) Frequency distribution of business X1 (Pizza)

Bin Class Interval Frequency Relative Frequency
30 0-30 0 0.0
60 30-60 4 0.333
90 60-90 3 0.250
120 90-120 3 0.250
150 120-150 2 0.167
Total 12 1.0

(ii) Frequency distribution of business X2 (Baker)

Bin Class Interval Frequency Relative Frequency
30 0-30 0 0.0
60 30-60 3 0.27
90 60-90 4 0.36
120 90-120 2 0.18
150 120-150 1 0.09
180 150-180 1 0.09
11 1

(iii) Frequency distribution of business X3 (Shoe store)

Bin Class Interval Frequency Relative Frequency
30 0-30 0 0
60 30-60 4 0.4
90 60-90 3 0.3
120 90-120 2 0.2
150 120-150 1 0.1
10 1

(iv) Frequency distribution of business X4 (gift shop)

Bin Class Interval Frequency Relative Frequency
30 0-30 0 0
60 30-60 3 0.3
90 60-90 1 0.1
120 90-120 5 0.5
150 120-150 1 0.1
10 1

(v) Frequency distribution of business X5 (Pet store)

Bin Class Interval Frequency Relative Frequency
30 0-30 6 0.38
60 30-60 5 0.31
90 60-90 4 0.25
120 90-120 1 0.06
16 1.00

(b) Relative frequency histogram

(i) Relative frequency histogram of business X1 (Pizza)

(ii) Relative frequency histogram of business X2 (Baker)

(iii) Relative frequency histogram of business X3 (Shoe store)

(iv) Relative frequency histogram of business X4 (gift shop)

(v) Relative frequency histogram of business X5 (Pet store)

3. Discussion on results of descriptive statistics and histograms Descriptive statistics

  • Mean of the start-up cost of various businesses represent the average value of cost. It is 51.63 in case of pet shop which is the lowest value and highest in case of baker business which is 92.09.
  • Median is the middle value of the start-up costs. It is 49 in case of pet shop business which is lowest value and 97.5 in case of gift shop which is the highest value.
  • Mode depicts the cost which is happening most number of times. Pet store business has lowest mode value which is 30 and gift store has highest mode which is 100.
  • Standard deviation of cost shows the risk of variation of cost from the average cost. Pet business has the lowest standard deviation which implies that the level of risk is low in this business. The bakery business has standard deviation of 38.89 which is the highest value. It means this business has high risk level.
  • Variance is also a representative of risk factor as it is the square value of standard deviation. The business with highest variance has highest risk level.
  • Range depicts the difference largest and smallest value. Pet business has lowest range of 890 and bakery business has highest range of 120.

Overall the analysis depicts that the investment required for pet business is low and it has lowest risk as well.

Frequency and relative frequency distribution and histograms

The frequency distribution shows the number of times the costs lie within a class interval. The relative frequency distribution shows the frequency in relative terms with respect to total frequency.

(i) In case of pizza business, costs lying within the range of 0-30 are nil. Maximum costs lie within the range of 30-60 and least costs lie within the range of 120-150.

(ii) In case of baker business, costs lying within the range of 0-30 are nil. Maximum costs lie within the range of 60-90 and least costs lie within the range of 120-150 and 150-180.

(iii) In case of shoe-store business, costs lying within the range of 0-30 are nil. Maximum costs lie within the range of 30-60 and least costs lie within the range of 120-150.

(iv) In case of gift shop business, costs lying within the range of 0-30 are nil. Maximum costs lie within the range of 90-120 and least costs lie within the range of 60-90 and120-150.

(v) In case of pet store business, maximum costs lie within the range of 0-30 and least costs lie within the range of 90-120.

4. Test of significance (ANOVA)

The significance of difference in start-up costs of various costs has been tested with the help of ANOVA.

(a) Hypothesis

H0: µ1 = µ2 = µ3 =µ4 = µ5

There is no significant difference in the mean startup cost of given five businesses.

H1: µ1 ≠ µ2 ≠ µ3≠ µ4 ≠ µ5

There is significant difference in the mean startup cost of given five businesses.Where

µ1 shows the average start-up cost of pizza business

µ2 shows the average start-up cost of baker business

µ3 shows the average start-up cost of shoe store business

µ4 shows the average start-up cost of gift shop business

µ5 shows the average start-up cost of pet shop business

The excel output is given as below:

SUMMARY

Groups Count Sum Average Variance
X1 13 1079 83 1165.167
X2 11 1013 92.09091 1512.691
X3 10 723 72.3 983.7889
X4 10 870 87 1289.111
X5 16 826 51.625 733.05

ANOVA

Source of Variation SS df MS F P-value F critical
Between Groups 14298.22424 4 3574.556 3.246336 0.018391 2.539689
Within Groups 60560.75909 55 1101.105
Total 74858.98333 59

The null hypothesis is rejected as the p-value is 0.018 which is less than 0.05. It means there is there is significant difference in the mean startup cost of given five businesses.

Task 2

For All Greens Pty Ltd. six variables are given out of which sales is the dependent variable and rest all are independent variable.

(i) Excel output and estimated regression equation:

The excel output of multiple regression model is shown as below:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.9966
R Square 0.9932
Adjusted R Square 0.9916
Standard Error 17.6492
Observations 27

ANOVA

df SS MS F Significance F
Regression 5 952538.9 190507.8 611.5903672 5.39731E-22
Residual 21 6541.4 311.5
Total 26 959080.35
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -18.859 30.150 -0.626 0.538 -81.560 43.841 -81.560 43.841
X2 16.202 3.544 4.571 0.000 8.831 23.573 8.831 23.573
X3 0.175 0.058 3.032 0.006 0.055 0.294 0.055 0.294
X4 11.526 2.532 4.552 0.000 6.260 16.792 6.260 16.792
X5 13.580 1.770 7.671 0.000 9.898 17.262 9.898 17.262
X6 -5.311 1.705 -3.114 0.005 -8.858 -1.764 -8.858 -1.764

The regression equation of sales on five independent variables is as follows

Y = -18.86+ 16.20*X2 + 0.18*X3 + 11.53 X4+ 13.58 X5-5.31X6

Where Y= Annual net sales, X2= area, X3 = inventory, X4 = advertisement expense, X5 = size of sales districts, X6 = Number of competing stores

(ii) Fitness of the regression equation

The value if R-square indicates the goodness of fit of a model. The model is deemed to be fit if he value of R-square lies within the range of 70% to 100%. The value of R-square in the model is 0.99. It means that model is good as the 99% of the changes in sales are because of the independent variables considered.

(iii) Test of significance

H0: There is no significant relationship between sales and other given variables.

H1: There is significant relationship between sales and other given variables.

The null hypothesis is rejected as the p-value is 0 which is less than 0.05. It means there is there is significant relationship between sales and other given variables.

(iv) Interpretation of slope coefficient

The slope coefficient shows that with a unit change in independent variable, how much change takes place in dependent variable.

(a) The slope coefficient of area is 16. It means for every unit change in area, the change in sales will be equal to 16.

(b) The slope coefficient of inventory is 0.17. It means for every unit change in inventory, the change in sales will be equal to 0.17.

(c) The slope coefficient of advertisement is 11.53. It means for every unit change in advertisement, the change in sales will be equal to 11.53

(d) The slope coefficient of sales district’s size is 13.58. It means for every unit change in size, the change in sales will be equal to 11.53 times

(e) The slope coefficient of number of competing stores is -5.3. It means for every unit increase in number of competing stores, the decline in sales will be equal to 5.3 times.

(v) Confidence intervals at 95%

The lowest and highest limits of intervals are given as below:

Coefficient TINV( 0.05,23) SE TINV( 0.05,23) *SE Lower Limit Higher Limit
X2 16.20 2.07 3.54 7.33 8.87 23.53
X3 0.17 2.07 0.06 0.12 0.06 0.29
X4 11.53 2.07 2.53 5.24 6.29 16.76
X5 13.58 2.07 1.77 3.66 9.92 17.24
X6 -5.31 2.07 1.71 3.53 -8.84 -1.78

(vi) Test of significance level of slope coefficient

Variable p-value Interpretation
Area (Square feet) 0 P-value is less than 0.05. It means the slope of area is significant.
Inventory 0 P-value is less than 0.05. It means the slope of inventory is significant.
Advertisement expense 0.006 P-value is less than 0.05. It means the slope of advertisement expense is significant.
Size of sales district 0 P-value is less than 0.05. It means the slope of size of sales district is significant.
Number of competing stores 0.005 P-value is less than 0.05. It means the slope of competing is significant.

(vii) No insignificant variable

All variables are significant as the p-value of all independent variable is less than 0.05. It means there is no insignificant variable which needs to be deleted. The regression equation as determined earlier will remain same.

(viii) Estimation of sales

Sales are to be estimated for a franchisee which has 1,000 sq ft floor area, $150,000 inventory, $5,000 spent on advertising, 5,000 families in the area of operation and 2 competitors.

Y = -18.86+ 16.20*X2 + 0.18*X3 + 11.53 X4+ 13.58 X5-5.31X6

Y = -18.86+ 16.20*(1) + 0.18*(150) + 11.53(5) + 13.58(5)-5.31(2)

= -18.86+16.20+27+57.65+67.9-10.62

= 139.27

References

Laerd Statistic. (2014). One-way ANOVA. Retrieved May 2017, from https://statistics.laerd.com: https://statistics.laerd.com/statistical-guides/one-way-anova-statistical-guide.php

Laerd Statistics. (2013). Descriptive and Inferential Statistics. Retrieved from https://statistics.laerd.com: https://statistics.laerd.com/statistical-guides/descriptive-inferential-statistics.php

Statistica. (2015). http://www.statsoft.com. Retrieved May 2017, from http://www.statsoft.com/Textbook/Multiple-Regression

Statistics Solutions. (2014). What is Multiple Linear Regression? Retrieved May 2017, from https://www.statisticssolutions.com/what-is-multiple-linear-regression/