1
Chapter 2 Descriptive Statistics
2-1 Overview 2-2 Summarizing Data 2-3 Pictures of Data 2-4 Measures of Central Tendency 2-5 Measures of Variation 2-6 Measures of Position 2-7 Exploratory Data Analysis Review and Projects

2
Overview 2-1 Descriptive Statistics Inferential Statistics
summarizes or describes the important characteristics of a known set of population data Inferential Statistics uses sample data to make inferences about a population

3
Important Characteristics
of Data 1. Nature or shape of the distribution, such as bell-shaped, uniform, or skewed 2. Representative score, such as an average 3. Measure of scattering or variation

4
Summarizing Data With Frequency Tables
2-2 Summarizing Data With Frequency Tables Frequency Table lists categories (or classes) of scores, along with counts (or frequencies) of the number of scores that fall into each category

5
Axial Loads of 0.0109 in. Cans Table 2-1 270 278 250 290 274 242 269
257 272 265 263 234 273 277 294 279 268 230 262 273 201 275 260 286 272 284 282 278 268 263 285 289 208 292 279 276 242 258 264 281 262 278 265 241 267 295 283 209 276 273 263 218 271 289 223 217 225 292 270 204 265 271 273 283 275 276 282 270 256 268 259 272 269 251 208 290 220 277 293 254 223 263 274 262 200 272 268 206 280 287 257 284 279 252 215 281 291 276 285 297 290 228 274 277 286 251 278 289 269 267 276 206 284 268 291 293 280 282 230 275 236 295 289 283 261 262 252 277 204 286 270 278 272 281 288 248 266 256 292

6
Frequency Table of Axial Loads of Aluminum Cans
9 3 5 4 14 32 52 38

7
Frequency Table Definitions
Class: An interval. Lower Class Limit: The left endpoint of a class. Upper Class Limit: The upper endpoint of a class. Class Mark: The midpoint of the class. Class width: the difference between the two consecutive lower class limits.

8
Definition values for the example
Score Frequency 9 3 5 4 14 32 52 38 Table 2-2 Lower Class Limits: 200, 210, … Upper class limits: 209,219 … Class Marks: 204.5=( )/2,, 214.5, … Class width: =10.

9
Determine the Definition Values for this Frequency Table
Classes Lower Class Limits Upper Class Limits Class Marks Class Width Quiz Scores Frequency 2 5 8 11 7 0 – 4 5 – 9

10
class width = round up of
Constructing A Frequency Table 1. Decide on the number of classes. 2. Determine the class width by dividing the range by the number of classes (range = highest score – lowest score) and round up. range class width = round up of number of classes 3. Select for the first lower limit either the lowest score or a convenient value slightly less than the lowest score. 4. Add the class width to the starting point to get the second lower class limit. 5. List the lower class limits in a vertical column and enter the upper class limits. 6. Represent each score by a tally mark in the appropriate class. Total tally marks to find the total frequency for each class.

11
Guidelines For Frequency Tables
1. Classes should be mutually exclusive. 2. Include all classes, even if the frequency is zero. 3. Try to use the same width for all classes. 4. Select convenient numbers for class limits. 5. Use between 5 and 20 classes. 6. The sum of the class frequencies must equal the number of original data values.

12
Relative Frequency Table
class frequency sum of all frequencies

13
Relative Frequency Table
Score Frequency 9 3 5 4 14 32 52 38 Table 2-2 Axial Load Relative 0.051 0.017 0.029 0.023 0.080 0.183 0.297 0.217 0.080- Table 2-3 9 = .051 175 3 = .017 175 5 = .029 175

14
Cumulative Frequency Table
Axial Load Cumulative Frequency Score Frequency 9 3 5 4 14 32 52 38 Less than 210 Less than 220 Less than 230 Less than 240 Less than 250 Less than 260 Less than 270 Less than 280 Less than 290 Less than 300 9 12 17 21 25 39 71 123 161 175 Cumulative Frequencies

15
Frequency Tables Table 2-2 Table 2-3 Table 2-4 Axial Load Relative
Cumulative Frequency Score Frequency Less than 210 Less than 220 Less than 230 Less than 240 Less than 250 Less than 260 Less than 270 Less than 280 Less than 290 Less than 300 9 12 17 21 25 39 71 123 161 175 9 3 5 4 14 32 52 38 0.051 0.017 0.029 0.023 0.080 0.183 0.297 0.217 0.08-

16
Mean as a Balance Point Mean FIGURE 2-7

17
Notation µ is pronounced ‘mu’ and denotes the mean of all values
S denotes the summation of a set of values x is the variable usually used to represent the individual data values n represents the number of data values in a sample N represents the number of data values in a population x is pronounced ‘x-bar’ and denotes the mean of a set of sample values µ is pronounced ‘mu’ and denotes the mean of all values in a population

18
Calculators can calculate the mean of data
Definitions Mean the value obtained by adding the scores and dividing the total by the number of scores S x Sample x = n S x µ = Population N Calculators can calculate the mean of data

19
Definitions Median often denoted by x (pronounced ‘x-tilde’)
the middle value when scores are arranged in (ascending or descending) order ~ often denoted by x (pronounced ‘x-tilde’) is not affected by an extreme value

20
no exact middle — shared by two numbers
(in order) exact middle MEDIAN is 4 no exact middle — shared by two numbers MEDIAN is 4.5 4 + 5 = 4.5 2

21
Definitions Mode Bimodal Multimodal No Mode
the score that occurs most frequently Bimodal Multimodal No Mode the only measure of central tendency that can be used with nominal data

22
Examples a b c Mode is 5 Bimodal No Mode

23
Examples a b c Mode is 5 Bimodal No Mode d e Mode is 3 No Mode

24
highest score + lowest score
Definitions Midrange the value halfway between the highest and lowest scores highest score + lowest score Midrange = 2

25
measures of central tendency
Round-off rule for measures of central tendency Carry one more decimal place than is present in the orignal set of data

26
An Example of Skewness Symmetric Dataset 1: 3, 4, 4, 5, 5, 5, 6, 6, 7
Mean = 5, Median = 5 Dataset 2: 3, 4, 4, 5, 5, 5, 7, 7 ,9. Mean=5.444, Median = 5. Skewed right Dataset 3: 2, 3, 3, 5, 5, 5, 6, 6, 7. Mean = 4.667, Median = 5. Skewed left

27
Skewness SYMMETRIC SKEWED LEFT SKEWED RIGHT (negatively) (positively)
Figure 2-8 (b) Mode = Mean = Median SYMMETRIC Mean Mode Mode Mean Median Median Figure 2-8 (a) SKEWED LEFT (negatively) SKEWED RIGHT (positively) Figure 2-8 (c)

28

29
Mean from a Frequency Table use class mark of classes for variable x
S (f • x) x = Formula 2-2 S f x = class mark f = frequency S f = n

30
Mean of this frequency table =14.4
Quiz Scores Frequency Class Marks 0 – 4 5 – 9 2 5 8 11 7 2 7 12 17 22 Mean of this frequency table =14.4

31
Measure of Variation Range score highest lowest score

32
(average deviation from the mean)
Measure of Variation Standard Deviation a measure of variation of the scores about the mean (average deviation from the mean)

33
Sample Standard Deviation Formula
S (x – x)2 S = n – 1 Formula 2 -4 calculators can calculate sample standard deviation of data

34
Find the standard deviation of the sample data:
2, 3, 4, 5, 5, 5. S2 = 8/5=1.6, S=1.26. Use the shortcut formula to find the standard deviations of the above data, and the waiting times at the two banks. 1) S x2 =104, 2) Jefferson Valley Bank: S x2 =513.27, S x =71.5, s=0.48. 3) Bank of Providence: S x2 =541.09, S x =71.5, s=1.82.

35
Population Standard Deviation
S (x – µ) 2 s = N calculators can calculate the population standard deviation of data

36
for Standard Deviation
Symbols for Standard Deviation Sample Population s Sx xsn–1 s s x xsn Textbook Book Some graphics calculators Some graphics calculators Some nongraphics calculators Some nongraphics calculators

37
standard deviation squared
Measure of Variation Variance standard deviation squared s } 2 use square key on calculator Notation 2

38
Variance S (x – x)2 n – 1 S (x – µ)2 s2 = N s2 = Sample Variance
Population Variance N

39
Round-off Rule for measures of variation
Carry one more decimal place than was present in the original data

40
Standard Deviation Shortcut Formula
n (S x2) – (S x)2 s = n (n – 1) Formula 2 – 6

41
Standard deviation gets larger as spread of data increases.
FIGURE 2-10 Same Means (x = 4) Different Standard Deviations s = 0 1 2 3 4 5 6 7 s = 3.0 s = 0.8 s = 1.0 Frequency Standard deviation gets larger as spread of data increases.

42
(applies to bell shaped distributions)
FIGURE 2-10 The Empirical Rule (applies to bell shaped distributions) 68% within 1 standard deviation 0.340 0.340 x – s x x + s

43
(applies to bell shaped distributions)
FIGURE 2-10 The Empirical Rule (applies to bell shaped distributions) 95% within 2 standard deviations 68% within 1 standard deviation 0.340 0.340 0.135 0.135 x – 2s x – s x x + s x + 2s

44
The Empirical Rule x – 3s x – 2s x – s x x + s x + 2s x + 3s
FIGURE 2-10 The Empirical Rule (applies to bell shaped distributions) 99.7% of data are within 3 standard deviations of the mean 95% within 2 standard deviations 68% within 1 standard deviation 0.340 0.340 0.024 0.024 0.001 0.001 0.135 0.135 x – 3s x – 2s x – s x x + s x + 2s x + 3s

45
Range Rule of Thumb s » Range » 4s Range 4 (maximum) (minimum) or
x s (maximum) (minimum) x – 2s x Range » 4s or s » Range 4

46
Chebyshev’s Theorem applies to distributions of any shape
the proportion (or fraction) of any set of data lying within k standard deviations of the mean is always at least 1 – 1/k2, where k is any positive number greater than 1.

47
Measures of Variation Summary
For typical data sets, it is unusual for a score to differ from the mean by more than 2 or 3 standard deviations.

48
An application of measure of variation
There are two brands, A, B or car tires. Both have a mean life time of 60,000 miles, but brand A has a standard deviation on lifetime of 1000 miles and Brand B has a standard deviation on lifetime of 3000 miles. Which brand would you prefer?

49
divides ranked scores into four equal parts
Quartiles Q1, Q2, Q3 divides ranked scores into four equal parts 25% 25% 25% 25% Q1 Q2 Q3

50
Percentiles 99 Percentiles

51
Sorted Axial Loads of 175 Aluminum Cans
Finding the Percentile of a Given Score number of scores less than x Percentile of score x = • 100 total number of scores Sorted Axial Loads of 175 Aluminum Cans [1] [16] [31] [46] [61] [76] [91] [106] [121] [136] [151] [166]

52
Finding the Value of the kth Percentile
Start Rank the data. (Arrange the data in order of lowest to highest.) Finding the Value of the kth Percentile Compute L = n where n = number of scores k = percentile in question ) ( k 100 The value of the kth percentile is midway between the Lth score and the highest score in the original set of data. Find Pk by adding the L th score and the next higher score and dividing the total by 2. Is L a whole number ? Yes No Change L by rounding it up to the next larger whole number. The value of Pk is the Lth score, counting from the lowest

53
Sorted Axial Loads of 175 Aluminum Cans
[1] [16] [31] [46] [61] [76] [91] [106] [121] [136] [151] [166] The 10th percentile: L=175*10/100=17.5, round up to 18. So the 10th percentile is the 18th one in the sorted data, i.e., 230. The 25th percentile: L=175*25/100=43.52, rounded up to 44. The 25th percentile is the 44th one in the sorted data, I.ei. 262.

54
Interquartile Range: Q3 – Q1 Semi-interquartile Range: Midquartile:
2 Q1 + Q3 2

55
Exploratory Data Analysis
Used to explore data at a preliminary level Few or no assumptions are made about the data Tends to evolve relatively simple calculations and graphs

56
Used to explore data at a preliminary level Few or no assumptions are made about the data Tends to evolve relatively simple calculations and graphs Traditional Statistics Used to confirm final conclusions about data Typically requires some very important assumptions about the data Calculations are often complex, and graphs are often unnecessary

57
Boxplots Box-and-Whisker Diagram
5 – number summary Minimum first quartile Q1 Median third quartile Q3 Maximum

58
Boxplots Box-and-Whisker Diagram
60 68.5 78 52 90 Figure Boxplot of Pulse Rates (Beats per minute) of Smokers

59
Figure Boxplots Normal Uniform Skewed

60
Values that are very far away from most of the data
Outliers Values that are very far away from most of the data

61
Class Survey Data Boxplots for the heights of those who never broke a bone and those who did

62
When comparing two or more boxplots, it is necessary to use the same scale.
40 50 60 70 80 90 100 PULSE 1 2 (yes) SMOKE (No)

Similar presentations