Chapter 7 Sampling Methods
and the Central Limit Theorem
Homework #7 (Hilary Term Week 2): Chapter 7, Exercises 2, 10, 12 & 14.
Chapter 1: What is
Statistics?
Chapter 2: Describing Data:
Frequency Distributions and Graphic Presentation
Chapter 3: Describing Data:
Numerical Measures
Summary: Chapters 1 to 3
emphasise techniques that allow us to describe data from a population or sample.
That is, we describe something that has already
happened.
Chapter 4: A Survey of
Probability Concepts
Chapter 5: Discrete
Probability Distributions
Chapter 6: The Normal
Probability Distribution
Summary: Probability
distributions encompass all possible outcomes of an experiment and the
probability associated with each outcome. We use probability distributions to
describe (e.g. calculate the probability of) something that might occur in the
future.
Chapter 7: Sampling Methods
and the Central Limit Theorem
Summary: We will construct a
distribution of the sample mean in order to understand how, and to what extent,
the sample means tend to cluster around the population
mean.
Chapter 8: Estimation and
Confidence Intervals
Summary: We will use the
sample statistic to infer information about the population parameter. For
example, the sample mean (or sample proportion) can be used to estimate the
population mean (or population proportion).
= (X1 +
X2 + … + Xn)/n
"One of the most important
concepts in statistical inference is the probability distribution of the mean of
a random sample, since we often use the sample mean to tell us something about
an associated population."
(Aside: Unfortunately, n is
sometimes used to denote both the sample size and the population size. More
precisely, N represents the population size while n represents the sample size.
In general, it is assumed that n << N.)
Theorem: Any linear
combination of independent, Normally distributed random variables is itself
normally distributed.
i.e. if Xi ~
N(m, s2) Þ ~ N(?,
?)
E()?
Var()?
E()?
E() = E[(X1 + X2 + … +
Xn)/n]
= (1/n).E(X1 +
X2 + … + Xn)
= (1/n).[E(X1)+
E(X2) + … + E(Xn)]
= (1/n).[m + m + … + m] =
(1/n).[n.m]
= m (as should be
expected)
Var()?
Var() = Var[(X1 + X2 + … +
Xn)/n]
=
(1/n2).Var[(X1 + X2 + … +
Xn)/n]
=
(1/n2).[Var(X1) + Var(X2) + … +
Var(Xn)]
=
(1/n2).[s2 + s2 + … + s2]
=
(1/n2).[n.s2]
= s2/n (a little, bit not too,
surprising)
i.e. if Xi ~
N(m, s2) Þ ~
N(m, s2/n)
Note: The standard deviation
of the sample mean, s/Ön, is often referred to as
the "standard error" to distinguish it from s, the standard deviation of
the population.
The sample mean , drawn from a population with mean m and variance s2, has a sampling
distribution which approaches a Normal distribution with mean m and variance s2/n, as the sample size
approaches infinity (often > 30 in practice).
i.e. Xi ~ ?(m, s2) Þ ~
N(m, s2/n)
Þ Z =
~ N(0,1)
IQ (the intelligence
quotient) is Normally distributed with (population) mean 100 and (population)
standard deviation 16.
16
(b) What proportion of the
population has IQ between 90 and 110?
P(90 < IQ < 110) =
P([90–100]/16 < Z < [110–100]/16) = P(-.625 < Z < .625) = 1 – {2 x
P(Z > .625)} = 1 – {2 x .266} = .47
IQ Example
(Continued)
Ten adults are selected at
random from the population and their IQs measured.
__
__
16/Ö10
(d) What is the probability
that the (sample) mean IQ lies within the range 90 to 110? How does this answer
compare to the answer to part (b)?
__
P(90 < IQ < 110) =
P{[90–100]/(16/Ö10) < Z <
[110–100]/(16/Ö10)} =
P(-1.98 < Z < 1.98) =
1 – {2 x P(Z > 1.98)} = 1 – {2 x .0239} = .9522 >>
.47