Chapter 8 Random Sampling

Learning Objectives

Explain what is meant by a sample, a population and statistical inference.
Define a random sample from a distribution of a random variable.
Explain what is meant by a statistic and its sampling distribution.
Determine the mean and variance of a sample mean and the mean of a sample variance in terms of the population mean, variance and sample size.
State and use the basic sampling distributions for the sample mean and the sample variance for random samples from a normal distribution.
State and use the distribution of the \(t\)-statistic for random samples from a normal distribution.
State and use the \(F\) distribution for the ratio of two sample variances from independent samples taken from normal distributions.

Population: This refers to the entire group of individuals or items that we are interested in studying. For example, all car insurance policies in a company.
Sample: This is a subset of the population from which data is collected. It’s a manageable group used to draw conclusions about the larger, often impractical-to-observe, population.
Statistical Inference: This is the process of drawing conclusions about a population based on data obtained from a sample. It extends inferential analysis from a sample to the wider population. This involves estimating population parameters and testing hypotheses.

A random sample is a set of random variables, X₁, X₂, …, Xₙ, that are independent and identically distributed (IID) from a specific population.
Characteristics of a random sample:
- Each item in the population has a probability of inclusion in the sample that is proportional to its frequency in the parent population.
- The inclusion or exclusion of different items operates independently.

Statistic: A statistic is a function derived solely from a random sample X. For instance, the sample mean (X̄ = ΣXᵢ / n) and sample variance (S² = Σ(Xᵢ - X̄)² / (n-1)) are examples of statistics.
Sampling Distribution: The distribution of a statistic is known as its sampling distribution.
Standard Error: The standard deviation of a sampling statistic is specifically termed its standard error.

For a random sample of size n from a population with mean μ and variance σ²:

Mean of the Sample Mean (E[X̄]):
- E[X̄] = μ.
- Derivation: E[X̄] = E[(1/n) ΣXᵢ] = (1/n) ΣE[Xᵢ] = (1/n) Σμ = (1/n) * nμ = μ.
Variance of the Sample Mean (Var[X̄]):
- Var[X̄] = σ²/n.
- Derivation: Var[X̄] = Var[(1/n) ΣXᵢ] = (1/n)² ΣVar[Xᵢ] (due to independence) = (1/n)² Σσ² = (1/n)² * nσ² = σ²/n.
Mean of the Sample Variance (E[S²]):
- E[S²] = σ². This indicates that the sample variance S² is an unbiased estimator of the population variance σ².
- Derivation: E[S²] = E[(1/(n-1)) Σ(Xᵢ - X̄)²] = E[(1/(n-1)) (ΣXᵢ² - nX̄²)]. Using E[Xᵢ²] = Var[Xᵢ] + E[Xᵢ]² = σ² + μ² and E[X̄²] = Var[X̄] + E[X̄]² = (σ²/n) + μ², the derivation simplifies to E[S²] = σ².

Sample Mean (X̄) when population variance (σ²) is known:
- If a random sample is drawn from a Normal(μ, σ²) distribution, then the sample mean X̄ follows a Normal(μ, σ²/n) distribution exactly.
- The pivotal quantity (X̄ - μ) / (σ/√n) follows a Standard Normal (N(0,1)) distribution. This also holds approximately for large samples from any distribution due to the Central Limit Theorem.
Sample Variance (S²) from a Normal population:
- If S² is the sample variance from a Normal(μ, σ²) distribution, then (n-1)S²/σ² follows a Chi-squared (χ²) distribution with (n-1) degrees of freedom.
- Important Note: This result is strictly for samples from normal distributions and does not hold for non-normal distributions.

Sample Mean (X̄) when population variance (σ²) is unknown:
- If S is the sample standard deviation from a Normal(μ, σ²) distribution, then the t-statistic (X̄ - μ) / (S/√n) follows a t-distribution with (n-1) degrees of freedom.
- This distribution is more spread out than the standard normal due to the additional uncertainty from estimating σ² with S.

Definition of F-distribution: If U and V are independent random variables following Chi-squared (χ²) distributions with m and n degrees of freedom respectively, then the ratio (U/m) / (V/n) follows an F-distribution with m and n degrees of freedom (Fₘ,ₙ).
Application for variance ratios: For two independent random samples from Normal(μ₁, σ₁²) and Normal(μ₂, σ₂²) distributions, with sample variances S₁² and S₂² respectively:
- The ratio (S₁²/σ₁²) / (S₂²/σ₂²) follows an F-distribution with (n₁-1) and (n₂-1) degrees of freedom (F(n₁-1),(n₂-1)).
- This is typically used to test for the equality of two population variances (i.e., H₀: σ₁² = σ₂²).

Keep these critical points at your fingertips, and you’ll be well on your way to acing your CS1 exam!