Chapter 7 Sampling Methods and the Central Limit Theorem

Homework #7 (Hilary Term Week 2): Chapter 7, Exercises 2, 10, 12 & 14.

Brief Review

Chapter 1: What is Statistics?

Chapter 2: Describing Data: Frequency Distributions and Graphic Presentation

Chapter 3: Describing Data: Numerical Measures

Summary: Chapters 1 to 3 emphasise techniques that allow us to describe data from a population or sample. That is, we describe something that has already happened.

Chapter 4: A Survey of Probability Concepts

Chapter 5: Discrete Probability Distributions

Chapter 6: The Normal Probability Distribution

Summary: Probability distributions encompass all possible outcomes of an experiment and the probability associated with each outcome. We use probability distributions to describe (e.g. calculate the probability of) something that might occur in the future.

Chapter 7: Sampling Methods and the Central Limit Theorem

Summary: We will construct a distribution of the sample mean in order to understand how, and to what extent, the sample means tend to cluster around the population mean.

Chapter 8: Estimation and Confidence Intervals

Summary: We will use the sample statistic to infer information about the population parameter. For example, the sample mean (or sample proportion) can be used to estimate the population mean (or population proportion).

Sample Mean As A Random Variable

= (X₁ + X₂ + … + X_n)/n

"One of the most important concepts in statistical inference is the probability distribution of the mean of a random sample, since we often use the sample mean to tell us something about an associated population."

(Aside: Unfortunately, n is sometimes used to denote both the sample size and the population size. More precisely, N represents the population size while n represents the sample size. In general, it is assumed that n << N.)

Theorem: Any linear combination of independent, Normally distributed random variables is itself normally distributed.

i.e. if X_i ~ N(m, s²) Þ ~ N(?, ?)

E()?

Var()?

E()?

E() = E[(X₁ + X₂ + … + X_n)/n]

= (1/n).E(X₁ + X₂ + … + X_n)

= (1/n).[E(X₁)+ E(X₂) + … + E(X_n)]

= (1/n).[m + m + … + m] = (1/n).[n.m]

= m (as should be expected)

Var()?

Var() = Var[(X₁ + X₂ + … + X_n)/n]

= (1/n²).Var[(X₁ + X₂ + … + X_n)/n]

= (1/n²).[Var(X₁) + Var(X₂) + … + Var(X_n)]

= (1/n²).[s² + s² + … + s²]

= (1/n²).[n.s²]

= s²/n (a little, bit not too, surprising)

i.e. if X_i ~ N(m, s²) Þ ~ N(m, s²/n)

Note: The standard deviation of the sample mean, s/Ön, is often referred to as the "standard error" to distinguish it from s, the standard deviation of the population.

Central Limit Theorem

The sample mean , drawn from a population with mean m and variance s², has a sampling distribution which approaches a Normal distribution with mean m and variance s²/n, as the sample size approaches infinity (often > 30 in practice).

i.e. X_i ~ ?(m, s²) Þ ~ N(m, s²/n)

Þ Z = ~ N(0,1)

IQ Example

IQ (the intelligence quotient) is Normally distributed with (population) mean 100 and (population) standard deviation 16.

IQ ~ N(100, 16²) Þ IQ – 100 ~ N(0, 1)

(b) What proportion of the population has IQ between 90 and 110?

P(90 < IQ < 110) = P([90–100]/16 < Z < [110–100]/16) = P(-.625 < Z < .625) = 1 – {2 x P(Z > .625)} = 1 – {2 x .266} = .47

IQ Example (Continued)

Ten adults are selected at random from the population and their IQs measured.

__ __

IQ ~ N(100, 16²/10) Þ IQ – 100 ~ N(0, 1)

16/Ö10

(d) What is the probability that the (sample) mean IQ lies within the range 90 to 110? How does this answer compare to the answer to part (b)?

P(90 < IQ < 110) =

P{[90–100]/(16/Ö10) < Z < [110–100]/(16/Ö10)} =

P(-1.98 < Z < 1.98) = 1 – {2 x P(Z > 1.98)} = 1 – {2 x .0239} = .9522 >> .47