A few sample problems… Confidence interval for a mean




Yüklə 39.38 Kb.
tarix16.04.2016
ölçüsü39.38 Kb.
STAT 11

April 17, 2008



A few sample problems…
1. Confidence interval for a mean.
Mars Rover Two measures 64 daily high temperatures, and finds that the average of the 64 measurements is –42.5 degrees C. Assume that the standard deviation of daily temperatures is exactly 16 degrees C, based on data from Mars Rover One.
a. In this problem, what are n, , , , and s ?
b. What is the standard error of the mean temperature?
c. Give a 95% confidence interval for .
d. Give an 82% confidence interval for .

2. Confidence interval for a proportion.
You randomly selected 100 of your latest homemade stink bombs, and very carefully tested them. Unfortunately(?) 80 of them failed. Let p be the “true” failure rate (for

all stink bombs, not just the ones in the sample).


a. What is ?
b. What is the SE ?
c. What is a 95% CI for the true failure rate?
d. If you felt like using Wilson’s method for this problem, what would change?

3. Confidence interval for a difference of means.
You found that 30 randomly selected Sunoco stations had an average gas price of $ 3.10, with s = $ 0.05.
Also, 40 randomly selected Lukoil stations had an average price of $ 3.12, with s = $ 0.08.
Let D = (average of all Lukoil prices) minus (average of all Sunoco prices).
What is a 95% confidence interval for D ? What can you conclude about the relative prices at Lukoil and Sunoco stations generally ?

4. Interpreting a scatterplot.
a. Look at this scatterplot, and estimate…
The mean of the “x” variable;

The standard deviation of the “x” variable;

The mean of the “y” variable;

The standard deviation of the “y” variable;


The correlation coefficient (r) for the two variables.
b. Describe the relationship in words.



5. More on standard errors.
Science News reported that the average length of a junk-DNA sequence in Speciesus Inventedus is 128 bases, based on a sample of 100 measurements.
a. What is the SE for , the true average length?
b. Actually, Science News gave the SE: They said it’s 8. What is the standard deviation of the original sample?

6. One-way chi-square problem.
A poker-dealing machine is supposed to deal cards at random, as if from an infinite deck.
In a test, you counted 1600 cards, and observed the following:
Spades 404

Hearts 420

Diamonds 400

Clubs 376


Could it be that the suits are equally likely? Or are these discrepancies too much to be random?

7. Another one-way chi-square problem.
Same as before, but this time jokers are included, and you counted 1662 cards, with these

results:
Spades 404

Hearts 420

Diamonds 400

Clubs 356

Jokers 82


a. If a deck contains 54 cards and two of them are jokers, what is the probability that any particular randomly-chosen card would be a joker?
b. How many jokers would you expect out of 1662 random cards? How many of each suit?
c. Is it possible that the cards are really random? Or are the discrepancies too large?
-----------------------------------------------------
SOLUTIONS

1a. n = 64

 = the true (long-run) average daily temperature at the Mars Rover Two site

(we don’t know the value)



= –42.5 degrees C

 = 16 degrees C (by assumption)

s = the sample standard deviation – this could be computed from the sample, but it isn’t

given in the problem

1b. SE = exactly
1c. Use z* = 1.96, since you are NOT using s as a substitute for . MOE = 1.96 × 2.00 = call it 3.9 degrees C, so the CI is

[ -42.5 – 3.9, -42.5 + 3.9 ]

or

[ -46.4, -38.6 ]


1d. The problem here is to compute z* when C = 0.82. You want to leave a probability of

/2 = 0.09 in each tail, so you could either…

Find 0.09 under PHI(z) in the z table, and see that it corresponds to

z = - 1.34, which is –z*, or

Find 0.91 under PHI(z) in the z table, and see that it corresponds to

z = + 1.34, which is +z*.


Using z* = 1.34, we get MOE = 1.34 × 2.00 = call it 2.7 degrees, so the CI is

[ -45.2, -39.8 ] .


2a. = 0.80 ( = 80 divided by 100 )


2b. SE = exactly.
2c. Use z* = 1.96 (always use z* for a proportion) . The MOE is 1.96 × 0.04 = 0.08, so the CI is [ 0.72, 0.88 ].
(Be careful here, if you use percentages. It would be ok to write [ 72%, 88% ], but if you write in percentage points, then be sure to do the same thing with the MOE. In particular, don’t write (WRONG) [ 79.92, 80.08 ]. )
2d. The center of the CI would still be = 0.80 (exactly), but you would also compute

and you would use it to compute SE:
SE = .
The resulting confidence interval would be very slightly larger than the one in 2c.

3. There’s no reason to pool the “s” values here, and that isn’t supposed to be part of the course material anyway. So compute the two SE’s separately:


Sunoco:

n = 30, = 3.10, s = 0.05, SE = s/sqrt(n) = $ 0.00913

Lukoil:

n = 40, = 3.12, s = 0.08, SE = s/sqrt(n) = $ 0.01265


Combined SE:
= $ 0.01560
MOE: Use z* = 1.96 again, so MOE = (1.96) (0.01560) = $ 0.0306…call it 3 cents.
The observed difference is
= 0.02 (that’s 3.12 – 3.10)
so the confidence interval for the difference is
[ –$0.01, +$0.05 ].
You can’t tell from these reports – at least you can’t tell with 95% confidence – whether Sunoco or Lukoil has generally higher prices.

4a. The true values are as follows:


mean of x: 2.086

sd of x: 2.933 (using n-1)

mean of y: 4.300

sd of y: 1.925


correlation: –0.832
4b. In words: The relationship is STRONG, NEGATIVE, LINEAR.

(Although really, how strong it is depends on context. You might ask, strong compared to what? )


(The same is true of linearity. You might detect a slight downward bending at the left edge of the picture; or, you might decide that the apparent bending is caused by just one point and is probably accidental. In fact, the data were generated using a linear equation with normally distributed errors.)

5a. Trick question – you can’t tell what the SE is, because you would need to know  or s.


5b. You have SE =  / sqrt(100) =  / 10. Since SE = 8, you have 8 =  / 10, so they must be using 80 for . (You still can’t tell whether they got this from the sample or estimated it in some other way.)
6.




expected

expected




observed

(percent)

(counts)

z

404

0.25

400

0.200

420

0.25

400

1.000

400

0.25

400

0.000

376

0.25

400

-1.200
















chi-square->

2.480
















critical value->

7.815

Compute each z from its own row as (observed-expected)/sqrt(expected). Be sure to use the counts in this formula, not the percentages. The chi-square statistic is the sum of the squares of the z-values.


The number of degrees of freedom is 3 (number of categories minus 1).
The critical value is from a table you’ll have on the exam (using  = 0.05). But you don’t need it in this case. The chi-square value is about what you would expect with 3 degrees of freedom, and none of the z statistics are out of line (not even as large as 2, certainly not beyond 4). So, DO NOT REJECT the null hypothesis. There is no reason to suspect that the cards are not random.
7.




expected

expected




observed

(percent)

(counts)

z

404

0.2407

400.1

0.194

420

0.2407

400.1

0.994

400

0.2407

400.1

-0.006

356

0.2407

400.1

-2.205

82

0.0370

61.6

2.606

1662




1662



















chi-square->

12.680
















critical value->

9.488

This time, the chi-square statistic (12.68) is above the =0.05 critical value, so you could reject the null hypothesis and declare that the cards are not random. The problem is clearly that there are too many jokers at the expense of clubs – you can see that from the z statistics.


On the other hand, the p-value is only 0.013 (you can’t compute that during an exam) so the test isn’t totally convincing. You wouldn’t want to arrest the machine designer on this evidence.

(end)






Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azrefs.org 2016
rəhbərliyinə müraciət

    Ana səhifə