Download presentation

1

Chapter 2 Descriptive Statistics

2-1 Overview 2-2 Summarizing Data 2-3 Pictures of Data 2-4 Measures of Central Tendency 2-5 Measures of Variation 2-6 Measures of Position 2-7 Exploratory Data Analysis Review and Projects

2

Overview 2-1 Descriptive Statistics Inferential Statistics

summarizes or describes the important characteristics of a known set of population data Inferential Statistics uses sample data to make inferences about a population

3

Important Characteristics

of Data 1. Nature or shape of the distribution, such as bell-shaped, uniform, or skewed 2. Representative score, such as an average 3. Measure of scattering or variation

4

Summarizing Data With Frequency Tables

2-2 Summarizing Data With Frequency Tables Frequency Table lists categories (or classes) of scores, along with counts (or frequencies) of the number of scores that fall into each category

5

Axial Loads of 0.0109 in. Cans Table 2-1 270 278 250 290 274 242 269

257 272 265 263 234 273 277 294 279 268 230 262 273 201 275 260 286 272 284 282 278 268 263 285 289 208 292 279 276 242 258 264 281 262 278 265 241 267 295 283 209 276 273 263 218 271 289 223 217 225 292 270 204 265 271 273 283 275 276 282 270 256 268 259 272 269 251 208 290 220 277 293 254 223 263 274 262 200 272 268 206 280 287 257 284 279 252 215 281 291 276 285 297 290 228 274 277 286 251 278 289 269 267 276 206 284 268 291 293 280 282 230 275 236 295 289 283 261 262 252 277 204 286 270 278 272 281 288 248 266 256 292

6

Frequency Table of Axial Loads of Aluminum Cans

9 3 5 4 14 32 52 38

7

Frequency Table Definitions

Class: An interval. Lower Class Limit: The left endpoint of a class. Upper Class Limit: The upper endpoint of a class. Class Mark: The midpoint of the class. Class width: the difference between the two consecutive lower class limits.

8

Definition values for the example

Score Frequency 9 3 5 4 14 32 52 38 Table 2-2 Lower Class Limits: 200, 210, … Upper class limits: 209,219 … Class Marks: 204.5=( )/2,, 214.5, … Class width: =10.

9

Determine the Definition Values for this Frequency Table

Classes Lower Class Limits Upper Class Limits Class Marks Class Width Quiz Scores Frequency 2 5 8 11 7 0 – 4 5 – 9

10

class width = round up of

Constructing A Frequency Table 1. Decide on the number of classes. 2. Determine the class width by dividing the range by the number of classes (range = highest score – lowest score) and round up. range class width = round up of number of classes 3. Select for the first lower limit either the lowest score or a convenient value slightly less than the lowest score. 4. Add the class width to the starting point to get the second lower class limit. 5. List the lower class limits in a vertical column and enter the upper class limits. 6. Represent each score by a tally mark in the appropriate class. Total tally marks to find the total frequency for each class.

11

Guidelines For Frequency Tables

1. Classes should be mutually exclusive. 2. Include all classes, even if the frequency is zero. 3. Try to use the same width for all classes. 4. Select convenient numbers for class limits. 5. Use between 5 and 20 classes. 6. The sum of the class frequencies must equal the number of original data values.

12

Relative Frequency Table

class frequency sum of all frequencies

13

Relative Frequency Table

Score Frequency 9 3 5 4 14 32 52 38 Table 2-2 Axial Load Relative 0.051 0.017 0.029 0.023 0.080 0.183 0.297 0.217 0.080- Table 2-3 9 = .051 175 3 = .017 175 5 = .029 175

14

Cumulative Frequency Table

Axial Load Cumulative Frequency Score Frequency 9 3 5 4 14 32 52 38 Less than 210 Less than 220 Less than 230 Less than 240 Less than 250 Less than 260 Less than 270 Less than 280 Less than 290 Less than 300 9 12 17 21 25 39 71 123 161 175 Cumulative Frequencies

15

Frequency Tables Table 2-2 Table 2-3 Table 2-4 Axial Load Relative

Cumulative Frequency Score Frequency Less than 210 Less than 220 Less than 230 Less than 240 Less than 250 Less than 260 Less than 270 Less than 280 Less than 290 Less than 300 9 12 17 21 25 39 71 123 161 175 9 3 5 4 14 32 52 38 0.051 0.017 0.029 0.023 0.080 0.183 0.297 0.217 0.08-

16

Mean as a Balance Point Mean FIGURE 2-7

17

Notation µ is pronounced ‘mu’ and denotes the mean of all values

S denotes the summation of a set of values x is the variable usually used to represent the individual data values n represents the number of data values in a sample N represents the number of data values in a population x is pronounced ‘x-bar’ and denotes the mean of a set of sample values µ is pronounced ‘mu’ and denotes the mean of all values in a population

18

Calculators can calculate the mean of data

Definitions Mean the value obtained by adding the scores and dividing the total by the number of scores S x Sample x = n S x µ = Population N Calculators can calculate the mean of data

19

Definitions Median often denoted by x (pronounced ‘x-tilde’)

the middle value when scores are arranged in (ascending or descending) order ~ often denoted by x (pronounced ‘x-tilde’) is not affected by an extreme value

20

no exact middle — shared by two numbers

(in order) exact middle MEDIAN is 4 no exact middle — shared by two numbers MEDIAN is 4.5 4 + 5 = 4.5 2

21

Definitions Mode Bimodal Multimodal No Mode

the score that occurs most frequently Bimodal Multimodal No Mode the only measure of central tendency that can be used with nominal data

22

Examples a b c Mode is 5 Bimodal No Mode

23

Examples a b c Mode is 5 Bimodal No Mode d e Mode is 3 No Mode

24

highest score + lowest score

Definitions Midrange the value halfway between the highest and lowest scores highest score + lowest score Midrange = 2

25

measures of central tendency

Round-off rule for measures of central tendency Carry one more decimal place than is present in the orignal set of data

26

An Example of Skewness Symmetric Dataset 1: 3, 4, 4, 5, 5, 5, 6, 6, 7

Mean = 5, Median = 5 Dataset 2: 3, 4, 4, 5, 5, 5, 7, 7 ,9. Mean=5.444, Median = 5. Skewed right Dataset 3: 2, 3, 3, 5, 5, 5, 6, 6, 7. Mean = 4.667, Median = 5. Skewed left

27

Skewness SYMMETRIC SKEWED LEFT SKEWED RIGHT (negatively) (positively)

Figure 2-8 (b) Mode = Mean = Median SYMMETRIC Mean Mode Mode Mean Median Median Figure 2-8 (a) SKEWED LEFT (negatively) SKEWED RIGHT (positively) Figure 2-8 (c)

28

Best Measure of Central Tendency Table 2-6 Advantages – Disadvantages

29

Mean from a Frequency Table use class mark of classes for variable x

S (f • x) x = Formula 2-2 S f x = class mark f = frequency S f = n

30

Mean of this frequency table =14.4

Quiz Scores Frequency Class Marks 0 – 4 5 – 9 2 5 8 11 7 2 7 12 17 22 Mean of this frequency table =14.4

31

Measure of Variation Range score highest lowest score

32

(average deviation from the mean)

Measure of Variation Standard Deviation a measure of variation of the scores about the mean (average deviation from the mean)

33

Sample Standard Deviation Formula

S (x – x)2 S = n – 1 Formula 2 -4 calculators can calculate sample standard deviation of data

34

Find the standard deviation of the sample data:

2, 3, 4, 5, 5, 5. S2 = 8/5=1.6, S=1.26. Use the shortcut formula to find the standard deviations of the above data, and the waiting times at the two banks. 1) S x2 =104, 2) Jefferson Valley Bank: S x2 =513.27, S x =71.5, s=0.48. 3) Bank of Providence: S x2 =541.09, S x =71.5, s=1.82.

35

Population Standard Deviation

S (x – µ) 2 s = N calculators can calculate the population standard deviation of data

36

for Standard Deviation

Symbols for Standard Deviation Sample Population s Sx xsn–1 s s x xsn Textbook Book Some graphics calculators Some graphics calculators Some nongraphics calculators Some nongraphics calculators

37

standard deviation squared

Measure of Variation Variance standard deviation squared s } 2 use square key on calculator Notation 2

38

Variance S (x – x)2 n – 1 S (x – µ)2 s2 = N s2 = Sample Variance

Population Variance N

39

Round-off Rule for measures of variation

Carry one more decimal place than was present in the original data

40

Standard Deviation Shortcut Formula

n (S x2) – (S x)2 s = n (n – 1) Formula 2 – 6

41

Standard deviation gets larger as spread of data increases.

FIGURE 2-10 Same Means (x = 4) Different Standard Deviations s = 0 1 2 3 4 5 6 7 s = 3.0 s = 0.8 s = 1.0 Frequency Standard deviation gets larger as spread of data increases.

42

(applies to bell shaped distributions)

FIGURE 2-10 The Empirical Rule (applies to bell shaped distributions) 68% within 1 standard deviation 0.340 0.340 x – s x x + s

43

(applies to bell shaped distributions)

FIGURE 2-10 The Empirical Rule (applies to bell shaped distributions) 95% within 2 standard deviations 68% within 1 standard deviation 0.340 0.340 0.135 0.135 x – 2s x – s x x + s x + 2s

44

The Empirical Rule x – 3s x – 2s x – s x x + s x + 2s x + 3s

FIGURE 2-10 The Empirical Rule (applies to bell shaped distributions) 99.7% of data are within 3 standard deviations of the mean 95% within 2 standard deviations 68% within 1 standard deviation 0.340 0.340 0.024 0.024 0.001 0.001 0.135 0.135 x – 3s x – 2s x – s x x + s x + 2s x + 3s

45

Range Rule of Thumb s » Range » 4s Range 4 (maximum) (minimum) or

x s (maximum) (minimum) x – 2s x Range » 4s or s » Range 4

46

Chebyshev’s Theorem applies to distributions of any shape

the proportion (or fraction) of any set of data lying within k standard deviations of the mean is always at least 1 – 1/k2, where k is any positive number greater than 1.

47

Measures of Variation Summary

For typical data sets, it is unusual for a score to differ from the mean by more than 2 or 3 standard deviations.

48

An application of measure of variation

There are two brands, A, B or car tires. Both have a mean life time of 60,000 miles, but brand A has a standard deviation on lifetime of 1000 miles and Brand B has a standard deviation on lifetime of 3000 miles. Which brand would you prefer?

49

divides ranked scores into four equal parts

Quartiles Q1, Q2, Q3 divides ranked scores into four equal parts 25% 25% 25% 25% Q1 Q2 Q3

50

Percentiles 99 Percentiles

51

Sorted Axial Loads of 175 Aluminum Cans

Finding the Percentile of a Given Score number of scores less than x Percentile of score x = • 100 total number of scores Sorted Axial Loads of 175 Aluminum Cans [1] [16] [31] [46] [61] [76] [91] [106] [121] [136] [151] [166]

52

Finding the Value of the kth Percentile

Start Rank the data. (Arrange the data in order of lowest to highest.) Finding the Value of the kth Percentile Compute L = n where n = number of scores k = percentile in question ) ( k 100 The value of the kth percentile is midway between the Lth score and the highest score in the original set of data. Find Pk by adding the L th score and the next higher score and dividing the total by 2. Is L a whole number ? Yes No Change L by rounding it up to the next larger whole number. The value of Pk is the Lth score, counting from the lowest

53

Sorted Axial Loads of 175 Aluminum Cans

[1] [16] [31] [46] [61] [76] [91] [106] [121] [136] [151] [166] The 10th percentile: L=175*10/100=17.5, round up to 18. So the 10th percentile is the 18th one in the sorted data, i.e., 230. The 25th percentile: L=175*25/100=43.52, rounded up to 44. The 25th percentile is the 44th one in the sorted data, I.ei. 262.

54

Interquartile Range: Q3 – Q1 Semi-interquartile Range: Midquartile:

2 Q1 + Q3 2

55

Exploratory Data Analysis

Used to explore data at a preliminary level Few or no assumptions are made about the data Tends to evolve relatively simple calculations and graphs

56

Exploratory Data Analysis Traditional Statistics

Used to explore data at a preliminary level Few or no assumptions are made about the data Tends to evolve relatively simple calculations and graphs Traditional Statistics Used to confirm final conclusions about data Typically requires some very important assumptions about the data Calculations are often complex, and graphs are often unnecessary

57

Boxplots Box-and-Whisker Diagram

5 – number summary Minimum first quartile Q1 Median third quartile Q3 Maximum

58

Boxplots Box-and-Whisker Diagram

60 68.5 78 52 90 Figure Boxplot of Pulse Rates (Beats per minute) of Smokers

59

Figure Boxplots Normal Uniform Skewed

60

Values that are very far away from most of the data

Outliers Values that are very far away from most of the data

61

Class Survey Data Boxplots for the heights of those who never broke a bone and those who did

62

When comparing two or more boxplots, it is necessary to use the same scale.

40 50 60 70 80 90 100 PULSE 1 2 (yes) SMOKE (No)

Similar presentations

© 2023 SlidePlayer.com Inc.

All rights reserved.