Business Analytics

Chapter 2 — Descriptive Statistics

Q10Cumulative Frequency Distribution

▾

Consider the following frequency distribution. Construct a cumulative frequency distribution.

Answer

Class	Frequency	Cumulative Frequency
10–19	10	10
20–29	14	24
30–39	17	41
40–49	7	48
50–59	2	50
Total	50	50

Cumulative frequency = running total of all frequencies. The final row always equals total sample size n = 50.

Q13Mean & Median

▾

a) Data: 10, 20, 12, 17, 16 — Compute mean and median.
b) Add value 12 → Data: 10, 20, 12, 17, 16, 12 — Compute and compare.

Part a

x̄ = (10+20+12+17+16) / 5 = 75 / 5 = 15

Sorted: 10, 12, [16], 17, 20 → Median = 16 (middle of n=5)

Mean

Median

Part b

x̄ = (10+20+12+17+16+12) / 6 = 87 / 6 = 14.5

Sorted: 10, 12, [12, 16], 17, 20 → Median = (12+16)/2 = 14

Mean

14.5

Median

Adding 12 (below original mean of 15) pulls both values lower. Mean: 15→14.5 · Median: 16→14.

Q14Percentiles (20th, 25th, 65th, 75th)

▾

Data: 27, 25, 20, 15, 30, 34, 28, 25 — Compute the 20th, 25th, 65th, and 75th percentiles.

Method

Sort the data: 15, 20, 25, 25, 27, 28, 30, 34 (n = 8)
Compute location: L = (P / 100) × n
Non-integer L → round up, take that position's value
Integer L → average of position L and L+1

Results

Percentile	L = (P/100)×8	Rule	Answer
20th	1.6	Non-integer → position 2	20
25th (Q1)	2.0	Integer → avg(pos 2, 3)	22.5
65th	5.2	Non-integer → position 6	28
75th (Q3)	6.0	Integer → avg(pos 6, 7)	29

Q15Mean, Median & Mode

▾

Data: 53, 55, 70, 58, 64, 57, 53, 69, 57, 68, 53 — Compute mean, median, mode.

Answer

x̄ = 657 / 11 = 59.73

Sorted: 53,53,53,55,57,[57],58,64,68,69,70 → Median = 57 (6th of 11)

Mode = 53 (appears 3 times — highest frequency)

Mean

59.73

Median

Mode

Q16Geometric Mean — Annual Growth Rate

▾

Asset declines from $5,000 to $3,500 over nine years. What is the mean annual growth rate?

Formula & Calculation

x̄g = (End Value / Start Value)^(1/n) − 1

x̄g = (3500 / 5000)^(1/9) − 1 = (0.7)^0.1111 − 1 = −0.0374

Annual Growth Rate

−3.74%

The asset loses 3.74% per year on average. Geometric mean is used because ordinary averaging of percentage changes over multiple periods gives misleading results.

Q17Mutual Fund Comparison

▾

Stivers fund: $10,000 → $18,000 over 8 years.
Trippi fund: $5,000 → $10,600 over 8 years.
Which mutual fund performed better?

Geometric Mean Annual Return

Stivers: (18000/10000)^(1/8) − 1 = (1.8)^0.125 − 1 = 7.63%

Trippi: (10600/5000)^(1/8) − 1 = (2.12)^0.125 − 1 = 9.85%

Stivers

7.63%

Trippi

9.85%

Trippi performed better — 9.85% annual return vs Stivers at 7.63%. Despite a lower initial investment, Trippi achieved a higher compound growth rate over 8 years.

Q19Wait-Tracking System Analysis

▾

Compare patient wait times (minutes) for offices with vs. without a wait-tracking system.
a) Mean & median b) Variance & std dev c & d) Box plots e) Conclusion

a & b — Summary Statistics

Statistic	With System	Without System
Data (sorted)	9,11,12,12,13,14,15,18,31,37	12,16,17,20,23,24,31,37,44,67
Mean (x̄)	17.2	29.1
Median	13.5	23.5
Variance (s²)	86.18	275.66
Std Dev (s)	9.28	16.60

c & d — Box Plot Five-Number Summary

Statistic	With System	Without System
Min	9	12
Q1	12	17
Median	13.5	23.5
Q3	18	37
IQR	6	20
Upper Fence	27	67
Outliers	3137	None

e) Yes — offices with a wait-tracking system have significantly shorter and more consistent wait times. Mean: 17.2 vs 29.1 min. Std Dev: 9.28 vs 16.60. The system effectively reduces and stabilizes patient wait times.

Q24Covariance & Correlation Coefficient

▾

x = {4, 6, 11, 3, 16} | y = {50, 50, 40, 60, 30}
a) Scatter diagram b) Relationship c) Covariance d) Correlation coefficient

a — Scatter Diagram

b — Relationship

The scatter diagram shows a negative relationship — as x increases, y tends to decrease.

c — Sample Covariance

x̄ = 8 | ȳ = 46

xᵢ	yᵢ	xᵢ − x̄	yᵢ − ȳ	(xᵢ−x̄)(yᵢ−ȳ)
4	50	−4	4	−16
6	50	−2	4	−8
11	40	3	−6	−18
3	60	−5	14	−70
16	30	8	−16	−128
Σ				−240

s_xy = −240 / (5−1) = −60

d — Correlation Coefficient

sₓ = √(118/4) = 5.43 | s_y = √(520/4) = 11.40

r_xy = −60 / (5.43 × 11.40) ≈ −0.97

Covariance

−60

Correlation r

−0.97

r = −0.97 indicates a very strong negative linear relationship. Values close to −1 mean near-perfect inverse correlation between x and y.

Chapter 3 — Data Visualization

Q11Vehicle Production — Line & Clustered Bar

▾

OICA: Toyota, GM, Volkswagen, Hyundai vehicle production (millions) over 5 years.
a) Line chart b) Discuss trends c) Clustered bar — leading manufacturer each year?

a — Line Chart

b — Discussion

GM led in Years 1–2 but declined sharply in Year 4. Toyota showed consistent performance and led from Year 3 onward. Hyundai showed the strongest growth — nearly doubling from Year 1 to Year 5. The Year 4 dip likely reflects a major global market disruption.

c — Clustered Bar Chart

Leading manufacturer by year: Year 1: GM (8.97M) · Year 2: GM (9.35M) · Year 3: Toyota (9.24M) · Year 4: Toyota (7.23M) · Year 5: Toyota (8.56M)

Q13Insurance Sales — Column Chart

▾

Top 6 salespeople: Harish(24), David(41), Kristina(19), Steven(23), Tim(53), Mona(39).
a) Column chart b) Sort most→fewest c) Add data labels

a, b & c — Column Chart (sorted, with labels)

Tim leads with 53 contracts. Order: Tim(53) → David(41) → Mona(39) → Harish(24) → Steven(23) → Kristina(19).

Q16Smartphone Ownership by Age

▾

Survey: smartphone ownership % by age group (18–24 to 65+).
a) Stacked column b) Clustered column c) Which is better and why?

a — Stacked Column Chart

b — Clustered Column Chart

c) As age increases, smartphone ownership drops sharply while "No Cell Phone" rises. The clustered chart is better for comparing individual categories across age groups. The stacked chart shows totals well but makes segment comparisons harder. Clustered is preferred here.

Q17Store Manager Time Allocation

▾

Logan Outdoor Equipment Co. — 6 locations, 4 task categories (% of time).
a) Stacked bar b) Clustered bar c) Multiple bar d) Which is preferable? e) Inferences

a — Stacked Bar Chart

b — Clustered Bar Chart

d — Preferable form?

The clustered bar chart is preferable — it allows direct comparison of each task type across all six locations simultaneously.

e — Inferences

Customer interaction dominates at Boise (64%) and Olympia (54%). Portland spends the most time in required meetings (52%). Missoula has the highest idle time (30%), suggesting potential inefficiency. Time allocation varies significantly across locations, indicating inconsistent management practices.