1
AP STATISTICS Summer Institute 2016 Day 2
Lance Belin JJ Pearce High School – Richardson, TX Years AP Statistics Reader/Table Leader – College Board Years Masters in Statistics – University of Texas – Dallas

2
APSI schedule: August 3, 2016 8:30 AM to 4:00 PM Morning Break: 10:15 AM Lunch: 11:30 -12:00 Afternoon Break: 2:00 PM

4
2009 AP STATISTICS EXAM Question #6

5
2009 AP STATISTICS EXAM Question #6

8
Simulating the Sampling Distribution of Sampling variability Biased Sampling Why we never “Accept” the null Introduction to Inference Type 1 and Type 2 Error Modeling power

9
Introducing Inference Page 40
What proportion of beads are blue? Maybe 50%? Maybe a little less than 50%?

10
Sampling Distribution of or building a histogram
Samples of size n = 30 and n = 100

11
Answer the questions #1-10 on pages 40-45
1. 2. 3. 4. and 8. n = 30 Sample Proportion p-hat Sampling Variability Stay the same 9. 10.

12
Answer the questions #11-16 on pages 40-45
11. 12. 13.

13
Answer the questions #11-16 on pages 40-45 14.
15. InvNorm(.05) = 0.05

14
Answer the questions #11-16 on pages 11-16
16. 17. n = 100 18.

15
Introducing Inference New Data…
Ho: Pblue = 0.50 Ha: Pblue < 0.50 For samples of size n = 30, a sample of 10 blue or less will reject the null in favor of the alternative

16
Introducing Inference
What do we conclude? 31% of the time we rejected the null hypothesis for the alternative hypothesis. Which means 69% of the time we failed to reject the null hypothesis for the alternative hypothesis. So how many of each color bead are really in the box? There are 10,000 blue beads! But, there are 15,000 red beads!

17
Introducing Inference
Considering the hypotheses Ho: Pblue = 0.50 Ha: Pblue < 0.50 Ha is true since the true proportion of blue beads is 0.40

18
Errors and Power of the Test
How often did we commit a Type II error? 69% of the time! So much for “Accepting” the null! (see this years AP test about interpreting confidence intervals!) The power of the test is approximately 31%

19
Sampling Variability What happens to our samples when we increase the sample size? This was the concept tested by question 3b from the 2014 AP Exam! The sampling variability decreases!

20
Error and Power of the Test

21
Errors and Power of the Test
How often did we commit a Type II error? Only 38% of the time now! The power of the test has increased to approximately 62%

22
What do the beads help illustrate?
How to conduct a one-proportion z-test (FRQ 2005 # 4) The sampling distribution is less variable when the sample size increases (FRQ 2014 # 3 part b, FRQ 2015 # 6 part d, FRQ 2007 # 3 part a) An understanding that the power of the test increases when the sample size increase (FRQ 2009 Form B # 4 part b) What a type II error is (FRQ 2012 # 5 part a, FRQ 2009 #5 part c, FRQ 2013 Secure Exam #6b) Convenience samples may not be representative of the population (FRQ 2013 # 2a, FRQ 2013 Secure Exam #5c) Will never “accept” the null hypothesis (FRQ 2014 # 1c, FRQ 2012 # 5b

23
Where can you get the scoops?

24
Where can you get the scoops?
Go to

25
2013 Free Response # 2 An administrator at a large university wants to conduct a survey to estimate the proportion of students who are satisfied with the appearance of the university buildings and grounds. The administrator is considering three methods of obtaining a sample of 500 students from the 70,000 students at the university. (a) Because of financial constraints, the first method the administrator is considering consists of taking a convenience sample to keep the expenses low. A very large number of students will attend the first football game of the season, and the first 500 students who enter the football stadium could be used as a sample. Why might such a sampling method be biased in producing an estimate of the proportion of students who are satisfied with the appearance of the buildings and grounds?

26
2013 Free Response # 2 Essentially correct (E) if the response correctly includes the following three components: 1. Provides a reasonable explanation for why the sample might not be representative of the population; 2. Mentions a link between the nonrepresentative nature of the convenience sample and the variable of interest (opinion about appearance of university buildings and grounds 3. Indicates a plausible direction for the bias of the estimator. Solution Part (a): The first 500 students who enter the football stadium were not likely to be representative of the population of all students at the university. In other words, these 500 students were likely to differ systematically from the population with regard to many variables. For example, these 500 students might have more school pride than the population of students as a whole, which might be related to their opinions about the appearance of university buildings and grounds. Perhaps their school pride is related to having more positive opinions about the appearance of university buildings and grounds, in which case the sample proportion of students who were satisfied would be biased toward overestimating the population proportion of students who were satisfied.

27
2013 Free Response # 2 (b) Because of the large number of students at the university, the second method the administrator is considering consists of using a computer with a random number generator to select a simple random sample of 500 students from a list of 70,000 student names. Describe how to implement such a method. Essentially correct (E) if the response correctly includes the following three components: 1. Assigns numbers to the student names; 2. Uses a computer random number generator to randomly generate 500 distinct/unique numbers between 1 and 70,000; 3. Selects students whose names correspond to the 500 random numbers for the sample. Solution Part (b): Obtain a list of all 70,000 students at the university. Assign an identification number from 1 to 70,000 to each student. Then use a computer to generate 500 random integers between 1 and 70,000 without replacement. The students whose ID numbers correspond to those numbers were then selected for the sample.

28
2013 Free Response # 2 (c) Because stratification can often provide a more precise estimate than a simple random sample, the third method the administrator is considering consists of selecting a stratified random sample of 500 students. The university has two campuses with male and female students at each campus. Under what circumstance(s) would stratification by campus provide a more precise estimate of the proportion of students who are satisfied with the appearance of the university buildings and grounds than stratification by gender? Essentially correct (E) if the response correctly notes that the circumstance described requires more variability in opinions about appearance of university buildings and grounds between the two campuses than between the two genders. Solution Part (c): Stratifying by campus would be more advantageous than stratifying by gender provided that opinions about appearance of university buildings and grounds between the two campuses differ more than the opinions about appearance of university buildings and grounds between the two genders.

29
4.1 Random Sampling – Show Me the Money! Page 12
Name your favorite movie of all time. On average, how much money do you think a movie grosses (earns) in theaters? Take a guess at the title of the top-grossing movie of 2015. What do you think was the maximum amount grossed by a movie in 2015? #1-9 Non-random selection #10-14 SRS

31
Random Rectangles – Page 56

33
The Guessing Game (Page 14)
At right 13 famous people are listed. Your job is to estimate the ages of these famous people. Write your estimates in the LAST COLUMN provided. After everyone has estimated, I will give you the actual ages. Fill them in the table. On the axes below, construct a scatterplot using the above data. Put the Actual age on the x-axis and the Estimated age on the y-axis.

34
The Guessing Game (Page 14)
At right 13 famous people are listed. Your job is to estimate the ages of these famous people. Write your estimates in the LAST COLUMN provided. After everyone has estimated, I will give you the actual ages. Fill them in the table. On the axes below, construct a scatterplot using the above data. Put the Actual age on the x-axis and the Estimated age on the y-axis. And describe you graph. 47 22 70 86 41 30 34 43 57 52 52 34 46 FORM/PATTERN – DIRECTION – STRENGTH – CONTEXT

35
2016 AP STATISTICS EXAM Question #6

36
2016 AP STATISTICS EXAM Question #6

37
Student Response 6A1

38
Student Response 6A2

39
Student Response 6A3

40
What would you look like if your name was Tai Shan?
Parents names: Mother: Mei Xiang Father: Tian Tian

41
On August 2, learned gender male – 1.82 lbs
Background: born on July 9, 2005 On August 2, learned gender male – 1.82 lbs Named on day 100 – Tai Shan means “peaceful mountain” Dec he moved to China August 8 at 2.6 pounds

42
Just over 1 month old

43
October 12 – 96 days old

45
So what kind of graph is appropriate?
The Data Age in days weight (lbs) 62 21.25 73 22.5 84 24.7 96 25.5 105 27.1 115 28.5 125 28.6 136 31.2 158 36.6 187 37 249 44.4 Categorical or Quantitative? So what kind of graph is appropriate? How many variables? data source:

46
Cut data just after this picture was taken
Cut data just after this picture was taken. Tai began eating bamboo rather than just nursing….so the growth rate changed.

48

49
Strength – Direction – Form – in Context There appears to be a strong, positive, linear relationship between age and weight.

50
How to get a line of best fit
First draw an ellipse around the points. Then cut it in half.

51
How to get a line of best fit
But another student may do this. The equation (slope) may be drastically different.

52
To get the equation of theBEST fit line using a calculator.
Push STAT EDIT To get to the lists. First enter the data into List1 and List2.

53
Will always get you “home”.
If a list already has data you need to delete, use the arrow to buttons to highlight the LIST NAME at the top. Then push CLEAR ENTER Now enter the data. Handy side note: 2nd – QUIT Will always get you “home”.

54
Push 2nd then y= To get to the statplots Set up your graph. Note: L1 and L2 are found above the numbers 1 and 2. Push 2nd and then the number to enter a list name.

55
From here, you may push graph but you probably won’t see it.
Go home! Well, push 2nd – quit From here, you may push graph but you probably won’t see it. We need a proper window. Push ZOOM 9

56
Ta Da!

57
Now to get the equation of the linear regression line
(Or best fit line, if you want) CALC Push STAT 4 Linreg

58
Then either enter for the default lists (L1 and L2) OR
Enter the list numbers separated by a comma (above 7) Enter the list numbers into the appropriate space. And push ENTER

59
So what’s all this? The equation: ŷ = .127x If you didn’t get r and r2 and you want them, push 2nd, 0, and go down to diagnostics ON and hit enter twice. Then try again.

60
What does the slope mean? What would you write?
ŷ = .127x What does the slope mean? What would you write?

61
ŷ = .127x 1 First, make it a fraction. For every 1 day increase in age, the weight increases .127 pounds, on average. y-intercept? If the baby panda was 0 days old, he would weigh about pounds. Well, that’s a silly extrapolation!

62
FYI : r is called the correlation coefficient and is ALWAYS between -1 and 1. The closer it is to -1 or 1 the more the points line up. So r = .989 suggests a very strong, positive, linear relationship between age and weight. r2 is called the coefficient of determination and tells us the amount of variation the two variables have in common. r2 = .978 means that 97.8% of the variation in weight is related to the variation in age.

63
So why is all this a big deal?

64
ŷ = .127x +13.74 ŷ = .127(348) +13.74 ŷ = 57.9 pounds y = 54 pounds
Now we can use our equation to make predictions. How much would you predict Tai Shan weighed at 348 days? ŷ = .127x ŷ = .127(348) ŷ = 57.9 pounds y = 54 pounds residual = y – ŷ = 54 – 57.9 = ̶ 3.9

65
ŷ = .127x +13.74 ŷ = .127(1095) +13.74 ŷ = 152 pounds y ≈ 200 pounds
How much would you predict Tai Shan weighed at 3 years? ŷ = .127x ŷ = .127(1095) ŷ = 152 pounds y ≈ 200 pounds residual = y – ŷ = = 48

66
To determine if a linear model is really appropriate, we should check the residual plot.
Go back into your statplot: 2nd , y= , 1 To put resid into the ylist, 2nd stat resid Note: resid will only come up IF you have just previously done the linear regression.

67
Residual Plot 2.83 r e s i d u a l 0.9 age in days The residual plot shows no obvious pattern so a linear model is a good choice.

68
Tia Shan just celebrated his 11th birthday
at his new home in China.

Similar presentations