Top

# Sample Statistic

A sample is a group of elements selected from the population. The characteristics that describe the population are called the parameters and the characteristics of the sample data are known as statistics. Inferential statistics provides methods to generalize the population characteristics making use of sample statistics.

Suppose a report reads as "50% of kindergarten children in US watch TV for more than 20 hrs per week".  This conclusion is not arrived at surveying the TV watching habits of all kindergarten children in US. On the contrary this inference would have been arrived at making an estimate of the population parameters using sample data collected. It is not generally practical to find the population parameters studying all the elements of the population. In most cases the population parameters are generalized making use of statistics of representative and unbiased samples.

 Related Calculators Statistical Sample Size Calculator Calculate Statistics Calculating T Test Statistic Chi Square Statistic Calculator

## Sample Statistic Definition

A sample statistic is a numerical descriptive measure of a sample. A statistic is usually derived from measurements of the individuals in the sample. The statistic is a characteristic of a sample data distribution like mean, median, mode, standard deviation and proportions. A sample statistic could be the measure of any characteristic of the sample.

### Sample Statistic Symbol

The following table shows different notations used for parameters and statistics

 Characteristic SampleStatistic PopulationParameter Mean $\overline{x}$ $\mu$ Standard deviation s $\sigma$ Proportion $\hat{p}$ p

The sample size is indicated by the lower case letter 'n' and the population size by the upper case letter 'N'.

## Computing Sample Statistics

Computing sample mean is not different from computing the population mean. We add all the data values and divide this sum by the number of values in the data set.

### Formula for Sample Mean

$\overline{x}=\frac{\sum_{i=1}^{n}x_{i}}{n}$

But the standard deviation of the population estimated using the formula $\sqrt{\frac{\sum (x_{i}-\overline{x})^{2}}{n}}$ to compute the standard deviation of a small sample underestimates the population parameter. In order to get an unbiased estimate of the population standard deviation, the n in the numerator is replaced by n - 1.

### Formula for Sample Standard Deviation

s = $\sqrt{\frac{\sum (x_{i}-\overline{x})^{2}}{n-1}}$

## Population Parameter Sample Statistic

A measure found from analyzing sample data is a sample statistic. An important aspect of inferential statistics attempt to estimate population parameters using sample statistics. The mean of an unbiased sample collected using random methods can be used as estimator of the mean of the population which is represented by the sample and the population is approximately normal.

Specifically if the sample mean $\overline{x}$ is used as the estimate of the population mean $\mu$, we say that $\overline{x}$ is a point estimate of the population mean $\mu$. The sample mean $\overline{x}$ is the best estimator of population mean μ than the sample median and sample mode. The reason is the means of many samples of the same population vary less than the median and mode of the different samples.

### Properties of a Good Estimator

1. The estimator should be unbiased. This means that the mean of estimates obtained from different samples of the same size should be equal to the parameter being estimated.
2. The estimator should be consistent. The estimator should approach the parameter value as the sample size is increased.
3. The estimator should be efficient. Amongst the various estimators of a population parameter, the relatively efficient estimator should have the smallest variance.

## Confidence Intervals for Parameter Estimation

The estimate for the population mean is also given as an interval estimate using confidence level for the interval. The confidence level tells how much in percent terms, we are confident that the interval estimate will contain the population parameter. The confidence level is a pre assigned percent before the interval estimate is made.

Using the confidence level and z-score table (when the population variance is known or the sample size is large) or student's t-table, the maximum error of estimate is calculated. The following maps the formula used to the known characteristic of the variable.

 Maximum Error of Estimate E When population variance $\sigma$ is known. $E=Z_{\frac{\alpha }{2}}(\frac{\sigma }{\sqrt{n}})$ When population variance is unknown for sample size, n ≥ 30. $E=Z_{\frac{\alpha }{2}}(\frac{s}{\sqrt{n}})$ When popular variance is unknown and sample size, n < 30 $E=t_{\frac{\alpha }{2}}(\frac{s}{\sqrt{n}})$

α in the formula is equal to 1- confidence level expressed as a decimal.
$z_{\frac{\alpha }{2}}$ is the Z-score corresponding to a two tail area of α in the Standardized normal distribution and
$t_{\frac{\alpha }{2}}$ is the t-score corresponding to two tail area of α in the Student's- t distribution table.

When the population variance is unknown, the sample variance which is again a sample statistic is used in the formula to compute the maximum error of estimate.

The interval estimate for the population mean will be
$\overline{x}-E < \mu <\overline{x}+E$

## Hypothesis Testing

A statistical hypothesis is a claim or conjecture about a population parameter. Hypothesis testing is the decision making process on this conjecture. For this purpose, a sample test statistic using sample statistic/s is evaluated and compared with critical values determined for the test. Inferences are drawn on the basis of this comparison and a decision is arrived at.

## Sample Statistic Problem

Let us look at an example where we use the two sample statistics, mean and variance to find an interval estimate for the population parameter.

### Solved Example

Question: The following the GPA score of 30 High school students. Find the 90% confidence interval for true mean.

Solution:

 3.1 2.9 2.8 2.9 3.8 4.8 4.2 3.9 3.4 2.5 4.2 3.7 3.3 2.1 3.8 3 3.7 4 2.7 3.8 3.2 3.5 3.5 3.6 2.2 3.1 3.5 4 2.7 4.5

Using the formulas to compute the sample mean and standard deviation we get,

Sample mean

$\overline{x}=\frac{\sum x}{n}=\frac{102.4}{30}=3.41$

Sample Standard deviation

$s=\sqrt{\frac{\sum (x_{i}-\overline{x})^{2}}{n-1}}=0.65$

The value of $Z_{\frac{\alpha }{2}}$ for 90% confidence level = 1.65

Hence the maximum error of estimate $E=Z_{\frac{\alpha }{2}}(\frac{s}{\sqrt{n}})=1.65(\frac{0.65}{\sqrt{30}})= 0.20$

Hence the 90% confidence interval for population mean is

$\overline{x}$  - E < μ < $\overline{x}$ + E

3.41- 0.20 < μ < 3.41 + 0.20

3.21 < μ < 3.61.

 Related Topics Math Help Online Online Math Tutor
*AP and SAT are registered trademarks of the College Board.