8A-1 Chapter 8 Sampling Distributions and Estimation Part 1 Sampling Variation Estimators and Sampling Distributions Sample Mean and the Central Limit Theorem Confidence Interval for a Mean (m) with Known with with Known Known with Known s Confidence Interval for a Mean (m) with Known with with Known Unknown with Known s McGraw-Hill/Irwin

Confidence Interval for a Proportion (p) 2007 The McGraw-Hill Companies, Inc. All rights reserved Sampling Variation Sample statistic a random variable whose value depends on which population items happen to be included in the random sample.

Depending on the sample size, the sample statistic could either represent the population well or differ greatly from the population. This sampling variation can easily be illustrated. 8A-3 Sampling Variation Consider eight random samples of size n = 5 from a large population of GMAT scores for MBA applicants.

The sample means ( xi ) tend to be close to the population mean (m = 520.78). 8A-4 Sampling Variation Dot plot of eight sample means Dot plot of eight samples of size n = 5 8A-5 Estimators and Sampling Distributions

Some Terminology Estimator a statistic derived from a sample to infer the value of a population parameter. Estimate the value of the estimator in a particular sample. Population parameters are represented by Greek letters and the corresponding statistic by Roman letters. 8A-6

Estimators and Sampling Distributions Examples of Estimators 8A-7 Estimators and Sampling Distributions Sampling Distributions

The sampling distribution of an estimator is the probability distribution of all possible values the statistic may assume when a random sample of size n is taken. An estimator is a random variable since samples vary. Sampling error = ^ 8A-8 Estimators and Sampling Distributions Bias

Bias is the difference between the expected value of the estimator and the true parameter. Bias = E( ^ ) An estimator is unbiased if E( ^ ) = On average, an unbiased estimator neither overstates nor understates the true parameter. 8A-9

Estimators and Sampling Distributions Bias Sampling error is random whereas bias is systematic. An unbiased estimator avoids systematic error. 8A-10 Estimators and Sampling

Distributions 8A-11 Estimators and Sampling Distributions Efficiency Efficiency refers to the variance of the estimators sampling distribution. A more efficient estimator has smaller variance. 8A-12

Estimators and Sampling Distributions Consistency A consistent estimator converges toward the parameter being estimated as the sample size increases. 8A-13 Sample Mean and the Central Limit Theorem

The sample mean is an unbiased estimator of m, therefore, E( X ) = E(X) = m The standard error of the mean is the standard deviation of the sampling error of x : sx = with Known s with Known with Known n 8A-14

Sample Mean and the Central Limit Theorem If the population is exactly normal, then the sample mean follows a normal distribution. 8A-15 Sample Mean and the Central Limit Theorem

For example, the average price, m, of a 5 GB MP3 player is $80.00 with a standard deviation, s, equal to $10.00. What will be the mean and standard error from a sample of 20 players? E( X ) = E(X) = m with Known = $80.00 sx = with Known s = 10 = $2.236 with Known with Known with Known n of prices 20 for these players is a If the distribution normal distribution, then the sampling distribution on x is N(80.00, 2.236). 8A-16

Sample Mean and the Central Limit Theorem Central Limit Theorem (CLT) for a Mean If a random sample of size n is drawn from a population with mean m and standard deviation s, the distribution of the sample mean x approaches a normal distribution with mean m and standard deviation sx = s/ n as the sample size increase. If the population is normal, the distribution of the sample mean is normal regardless of sample size.

8A-17 Sample Mean and the Central Limit Theorem 8A-18 Sample Mean and the Central Limit Theorem Symmetric Population: Uniform Distribution

Rule of thumb: to obtain a normal distribution for the sample mean, n > 30. A much smaller n will suffice if the population is symmetric. For example, consider a uniform population U(500, 1000). 8A-19 Sample Mean and the Central Limit Theorem Symmetric Population: Uniform Distribution

The central limit theorem predicts that samples drawn from this population will have a mean of 1000 and the standard error of the mean of: with Known sx = s/ n Predicted S.E. for n=1 = 288.7/ with Known with Known with Known 1 = 288.7 n=2 n=4 = 288.7/ with Known with Known with Known 2 = 204.1 = 288.7/ with Known with Known with Known 4 = 144.3 n = 16

= 288.7/ with Known with Known 16 16 = 72.2 8A-20 Sample Mean and the Central Limit Theorem Histograms of Sample Means from Uniform Population 8A-21 Sample Mean and the Central Limit Theorem Histograms of Sample Means from Uniform Population

8A-22 Sample Mean and the Central Limit Theorem Skewed Population: Waiting Time Consider a strongly skewed population for waiting times at airport security screening with m = 2.983 and s = 2.451

8A-23 Sample Mean and the Central Limit Theorem Skewed Population: Waiting Time The CLT predicts that samples drawn from this population will have a mean of 2.983 minutes and standard error of the mean: with Known sx = s/ n Predicted S.E. for n=1 = 2.451/ with Known with Known with Known 1 = 2.451 n=2

n=4 = 2.451/ with Known with Known with Known 2 = 1.733 = 2.451/ with Known with Known with Known 4 = 1.255 n = 16 = 2.451/ with Known with Known 16 16 = 0.613 8A-24 Sample Mean and the Central Limit Theorem Histograms of Sample Means from Skewed Population

8A-25 Sample Mean and the Central Limit Theorem Histograms of Sample Means from Skewed Population 8A-26 Sample Mean and the Central Limit Theorem Range of Sample Means

The CLT permits a range or interval within which the sample means are expected to fall. Where z is from the with Known s m+z standard normal table. with Known with Known n If we know m and s, the range of sample means for samples of size n are predicted to be: 90% Interval m + 1.645 with Known s

with Known with Known n 95% Interval m + 1.960 with Known s with Known with Known n 8A-27 99% Interval m + 2.576 with Known s with Known with Known n Sample Mean and the

Central Limit Theorem Illustration: GMAT Scores For samples of size n = 5 applicants, within what range would GMAT means be expected to fall? The parameters are m = 520.78 and s = 86.8. The predicted range for 95% of the sample means is: m + 1.960 with Known s = 520.78 + 1.960 86.8 with Known 5 with Known n = 520.78 + 76.08 8A-28

Sample Mean and the Central Limit Theorem Sample Size and Standard Error The standard error declines as n increases, but at a decreasing rate. with Known s m + z Make the interval with Known small by increasing n.

with Known n The distribution of sample means collapses at the true population mean m as n increases. 8A-29 Sample Mean and the Central Limit Theorem Illustration: All Possible Samples from a Uniform Population Consider a discrete uniform population consisting of the integers {0, 1, 2, 3}.

The population parameters are: m = 1.5, s = 1.118 8A-30 Sample Mean and the Central Limit Theorem Illustration: All Possible Samples from a Uniform Population Consider a discrete uniform population consisting of the integers {0, 1, 2, 3}.

The population parameters are: m = 1.5, s = 1.118 8A-31 Sample Mean and the Central Limit Theorem Illustration: All Possible Samples from a Uniform Population All possible samples of size n = 2, with replacement, are given below along with their means.

8A-32 Sample Mean and the Central Limit Theorem Illustration: All Possible Samples from a Uniform Population The population is uniform, yet the distribution of all possible sample means has a peaked triangular shape. 8A-33 Sample Mean and the

Central Limit Theorem Illustration: All Possible Samples from a Uniform Population The CLTs predictions for the mean and standard error are mx = m = 1.5 and with Known sx = s/ n = 1.118/ 2 = 0.7906 8A-34

Sample Mean and the Central Limit Theorem Illustration: All Possible Samples from a Uniform Population x the mean of means is x = 1(0.0) + 2(.05) + 3(1.0) + 4(1.5) + 3(2.0) + 2(2.5) + 1(3.0) = 1.5 16 The standard deviation of the means is

8A-35 Confidence Interval for a Mean (m) with Known s What is a Confidence Interval? A sample mean x is a point estimate of the population mean m. A confidence interval for the mean is a range

mlower < m < mupper The confidence level is the probability that the confidence interval contains the true population mean. The confidence level (usually expressed as a %) is the area under the curve of the sampling distribution. 8A-36 Confidence Interval for a Mean (m) with Known s What is a Confidence Interval? The confidence interval for m with known s is:

8A-37 Confidence Interval for a Mean (m) with Known s Choosing a Confidence Level A higher confidence level leads to a wider confidence interval. Greater confidence implies loss of

precision. 95% confidence is most often used. 8A-38 Confidence Interval for a Mean (m) with Known s Interpretation A confidence interval either does or does not contain m.

The confidence level quantifies the risk. Out of 100 confidence intervals, approximately 95% would contain m, while approximately 5% would not contain m. 8A-39 Confidence Interval for a Mean (m) with Known s Is s Ever Known?

Yes, but not very often. In quality control applications with ongoing manufacturing processes, assume s stays the same over time. In this case, confidence intervals are used to construct control charts to track the mean of a process over time. 8A-40 Confidence Interval for a Mean (m) with Unknown s Students t Distribution

Use the Students t distribution instead of the normal distribution when the population is normal but the standard deviation s is unknown and the sample size is small. with Known s x+t with Known The confidence interval with Known n for m (unknown s) is with Known s with Known s x-t x + t

Students t Distribution t distributions are symmetric and shaped like the standard normal distribution. The t distribution is dependent on the size of the sample. 8A-43 Confidence Interval for a Mean (m) with Unknown s Degrees of Freedom

Degrees of Freedom (d.f.) is a parameter based on the sample size that is used to determine the value of the t statistic. Degrees of freedom tell how many observations are used to calculate s, less the number of intermediate estimates used in the calculation. n=n-1 8A-44 Confidence Interval for a Mean (m) with Unknown s Degrees of Freedom

As n increases, the t distribution approaches the shape of the normal distribution. For a given confidence level, t is always larger than z, so a confidence interval based on t is always wider than if z were used. 8A-45 Confidence Interval for a Mean (m) with Unknown s Comparison of z and t

For very small samples, t-values differ substantially from the normal. As degrees of freedom increase, the t-values approach the normal z-values. For example, for n = 31, the degrees of freedom are: What would the t-value be for a 90% confidence interval? n = 31 1 = 30 8A-46 Confidence Interval for a Mean (m) with Unknown s Comparison of z and t For n = 30, the corresponding z-value is 1.645.

8A-47 Confidence Interval for a Mean (m) with Unknown s Example GMAT Scores Again Here are the GMAT scores from 20 applicants to an MBA program: 8A-48 Confidence Interval for a Mean (m) with Unknown s

Example GMAT Scores Again Construct a 90% confidence interval for the mean GMAT score of all MBA applicants. x = 510 s = 73.77 Since s is unknown, use the Students t for the confidence interval with n = 20 1 = 19 d.f. First find t0.90 from Appendix D.

8A-49 Confidence Interval for a Mean (m) with Unknown s 8A-50 Confidence Interval for a Mean (m) with Unknown s Example GMAT Scores Again

The 90% confidence interval is: with Known s with Known s x-t x + t

73.7 513 1.729 < m < 513 + 1.729 7 with Known 7 with Known with Known 20 with Known 20 513 28.52 < m < 513 + 28.52 We are 90% certain that the true mean GMAT score is within the interval 481.48 < m < 538.52.

8A-51 Confidence Interval for a Mean (m) with Unknown s Confidence Interval Width Confidence interval width reflects - the sample size, - the confidence level and - the standard deviation. To obtain a narrower interval and more precision - increase the sample size or

- lower the confidence level (e.g., from 90% to 80% confidence) 8A-52 Confidence Interval for a Mean (m) with Unknown s A Good Sample Here are five different samples of 25 births from a population of N = 4,409 births and their 95% CIs. 8A-53

Confidence Interval for a Mean (m) with Unknown s A Good Sample An examination of the samples shows that sample 5 has an outlier. The outlier is a warning that the resulting confidence interval possibly could not be trusted. In this case, a larger sample size is needed.

8A-54 Confidence Interval for a Mean (m) with Unknown s Using Appendix D Beyond n = 50, Appendix D shows n in steps of 5 or 10. If the table does not give the exact degrees of freedom, use the t-value for the next lower n. This is a conservative procedure since it causes the interval to be slightly wider. For d.f. above 150, use the z-value.

8A-55 Confidence Interval for a Mean (m) with Unknown s Using Excel Use Excels function =TINV(probability, d.f.) to obtain a two-tailed value of t. Here, probability is 1 minus the confidence level. 8A-56 Confidence Interval for a

Mean (m) with Unknown s Using MegaStat MegaStat give you a choice of z or t and does all calculations for you. 8A-57 Confidence Interval for a Mean (m) with Unknown s Using MINITAB MINITAB also gives

confidence intervals for the median and standard deviation. 8A-58 Confidence Interval for a Proportion (p) A proportion is a mean of data whose only value is 0 or 1. The Central Limit Theorem (CLT) states that the distribution of a sample proportion p = x/n

approaches a normal distribution with mean p and standard deviation sp = with Known p(1-p) with Known with Known with Known with Known with Known with Known n p = x/n is a consistent estimator of p. 8A-59 Confidence Interval for a Proportion (p)

Illustration: Internet Hotel Reservations Management of the Pan-Asian Hotel System tracks the percent of hotel reservations made over the Internet. The binary data are: 1 Reservation is made over the Internet 0 Reservation is not made over the Internet After data was collected, it was determined that the proportion of Internet reservations is p = .20.

8A-60 Confidence Interval for a Proportion (p) Illustration: Internet Hotel Reservations Here are five random samples of n = 20. Each p is a point estimate of p. Notice the sampling variation in the value of p. 8A-61

Confidence Interval for a Proportion (p) Applying the CLT The distribution of a sample proportion p = x/n is symmetric if p = .50 and regardless of p, approaches symmetry as n increases. 8A-62

Confidence Interval for a Proportion (p) Applying the CLT As n increases, the statistic p = x/n more closely resembles a continuous random variable. As n increases, the distribution becomes more symmetric and bell shaped. As n increases, the range of the sample proportion p = x/n narrows.

The sampling variation can be reduced by increasing the sample size n. 8A-63 Confidence Interval for a Proportion (p) When is it Safe to Assume Normality? Rule of Thumb: The sample proportion p = x/n may be assumed to be normal if both np > 10 and n(1-p) > 10. Sample size to assume normality:

8A-64 Confidence Interval for a Proportion (p) Standard Error of the Proportion The standard error of the proportion sp depends on p, as well as n. It is largest when p is

near .50 and smaller when p is near 0 or 1. 8A-65 Confidence Interval for a Proportion (p) Standard Error of the Proportion The formula for the standard error is symmetric. 8A-66 Confidence Interval for a Proportion

(p) Standard Error of the Proportion Enlarging n reduces the standard error sp but at a diminishing rate. 8A-67 Confidence Interval for a Proportion (p) Confidence Interval for p The confidence interval for p is

p+z with Known p(1-p) with Known with Known with Known with Known with Known with Known n Where z is based on the desired confidence. Since p is unknown, the confidence interval for p = x/n (assuming a large sample) is p+z with Known p(1-p) with Known with Known with Known with Known with Known with Known n 8A-68

Confidence Interval for a Proportion (p) Confidence Interval for p z can be chosen for any confidence level. For example, 8A-69 Confidence Interval for a Proportion (p) Example Auditing

A sample of 75 retail in-store purchases showed that 24 were paid in cash. What is p? p = x/n = 24/75 = .32 Is p normally distributed? np = (75)(.32) = 24 n(1-p) = (75)(.88) = 51 Both are > 10, so we may conclude normality. 8A-70 Confidence Interval for a Proportion (p)

Example Auditing The 95% confidence interval for the proportion of retail in-store purchases that are paid in cash is: p+z with Known p(1-p) with Known = .32 + 1.96 with Known with Known with Known with Known with Known n = .32 + .106 with Known .32(1-.32) .32(1-.32) with Known with Known with Known with Known with Known with Known with Known with Known

.214 < p < .426 We are 95% confident that this interval contains the true population proportion. 8A-71 Confidence Interval for a Proportion (p) Narrowing the Interval

The width of the confidence interval for p depends on - the sample size - the confidence level - the sample proportion p To obtain a narrower interval (i.e., more precision) either - increase the sample size - reduce the confidence level 8A-72 Confidence Interval for a Proportion (p) Using Excel and MegaStat

To find a confidence interval for a proportion in Excel, use (for example) =0.15-NORMSINV(.95)*SQRT(0.15*(1-0.15)/200) =0.15+NORMSINV(.95)*SQRT(0.15*(1-0.15)/200) 8A-73 Confidence Interval for a Proportion (p) Using Excel and MegaStat In MegaStat, enter p and n to obtain the

confidence interval for a proportion. MegaStat always assumes normality. 8A-74 Confidence Interval for a Proportion (p) Using Excel and MegaStat If the sample is small, the distribution of p may not be well approximated by the normal.

Confidence limits around p can be constructed by using the binomial distribution. 8A-75 Confidence Interval for a Proportion (p) Polls and Margin of Error In polls and surveys, the confidence interval width when p = .5 is called the margin of error. Below are some margins of error for 95% confidence interval assuming p = .50.

Each reduction in the margin of error requires a disproportionately larger sample size. 8A-76 Confidence Interval for a Proportion (p) Rule of Three If in n independent trials, no events occur, the upper 95% confidence bound is approximately 3/n. Very Quick Rule

A Very Quick Rule (VQR) for a 95% confidence interval when p is near .50 is p + 1/ n 8A-77 Applied Statistics in Business and Economics End of Part 1 of Chapter 8 8A-78