# COMPARING MEANS USING T-TESTS

There are three types of t-tests in this chapter.

**1). One-Sample t - test**, which is used to compare a single mean to a fixed number or "gold standard"

**2). Two-Sample t - test**, which is used to compare two population means based on independent samples from two populations or groups

**3). Paired Sample t - test**, which is used to compare two means based on samples that are related in some way

# One Sample t - test

The one-sample t-test is used for comparing sample results with a known value. Specifically, in this type of test, a single sample is collected, and the resulting sample mean is compared with a value of interest, sometimes a “gold standard,” that is not based on the current sample. For example, this specified value might be one of the following:

####
The weight indicated on a can of vegetables

The advertised breaking strength of a type of steel pipe

Government specification on the percentage of fruit juice that must be in a drink
before it can be advertised as “fruit juice”

The purpose of the one-sample t-test is to determine whether there is sufficient evidence to
conclude that the mean of the population from which the sample is taken is different from
the specified value.

Related to the one-sample t-test is a confidence interval on the mean. The confidence
interval is usually applied when you are not testing against a specified value of the
population mean but instead want to know a range of plausible values of the unknown
mean of the population from which the sample was selected.

### Appropriate Applications for a One-Sample t-Test

The following are examples of situations in which a one-sample t-test would be
appropriate:

### Design Considerations for a One-Sample t-Test

The key assumption underlying the one-sample t-test is that the population from which the
sample is selected is normal. However, this assumption is rarely if ever precisely true in
practice, so it is important to know how concerned you should be about apparent
nonnormality in your data. The following are rules of thumb (Moore & McCabe, 2012):

You will see variations of these rules throughout the literature. In particular, some statisticians will add that the one-sample t-test may not be appropriate if the data are skewed (even with a large sample size), or if there are substantial outliers. In these cases, a nonparametric test might be more desirable.

# Hypotheses for a One-Sample t-Test

When performing a one-sample t-test, you may or may not have a preconceived assumption about the direction of your findings. Depending on the design of your study, you may decide to perform a one- or two-tailed test.

### Two-Tailed t-Tests

The basic hypotheses for the one-sample t-test are as follows: where µ denotes the mean
of the population from which the sample was selected, and µ0 denotes the hypothesized
value of this mean. It should be reiterated that µ0
is a value that does not depend on the
current sample.

H0
: µ = µ0
(in words, the population mean is equal to the hypothesized value µ0).

Ha
: µ ≠ µ0
(the population mean is not equal to µ0
).

### One-Tailed t-Tests

If you are only interested in rejecting the null hypothesis if the population mean differs
from the hypothesized value in a direction of interest, you may want to use a one-tailed
(sometimes called a one-sided) test. If, for example, you want to reject the null hypothesis
only if there is sufficient evidence that the mean is larger than the value hypothesized
under the null (i.e., µ0
), the hypotheses become the following:

H0
: µ = µ0
(the population mean is equal to the hypothesized value µ0
).

Ha
: µ > µ0
(the population mean is greater than µ0
).

Analogous hypotheses could be specified for the case in which you want to reject H0 only
if there is sufficient evidence that the population mean is less than µ0
.

SPSS always
reports a two-tailed p-value, so you should modify the reported p-value to fit a one-tailed
test by dividing it by 2 if your results are consistent with the direction specified in the
alternative hypothesis and an a priori decision was made that a one-tailed test was
appropriate.

# Hypothetical Example

Click Here To Download Sample Dataset (SPSS Format)A certain pen is designed to be 4 inches in length. The lengths of a random sample of 15 pens are 4, 3.95, 4.01, 3.95, 4, 3.98, 3.97, 3.97, 4.01, 3.98, 3.99, 4.01, 4.02, 4.02, and 3.98. Test whether pens are out of the design or not.

In this example, the pens are out of the design when they are too short or too long. Therefore, in the one sample t-test,
we test the two-tailed hypothesis:

Null hypothesis (H0
): µ = 4 (the population mean is equal to 4”).

Alternative hypothesis (Ha
): µ ≠ 4 (the population mean is not equal to 4”).

**Sample Output**

One-Sample Statistics

One-Sample Statistics

N | Mean | Std. Deviation | Std. Error Mean | |

Length | 15 | 3.9893 | 0.02314 | 0.00597 |

One-Sample Statistics

One-Sample Statistics

Test Value=4 | ||||||

t | df | Sig. (2-tailed) |
Mean Difference | 95% Confidence Interval of the Difference | ||

Lower | Upper | |||||

Length | -1.786 | 14 | 0.096 | -0.01067 | -0.0235 | 0.0021 |

### Interpretation:

Sample mean length is 3.9893 with standard deviation of 0.02314. So, not much deviation exist among the sample data points which we have
randomly selected. Further, P-Value for the test is 0.096 which is greater than the value of significance (By Default: 0.05), hence, we do not reject the null hypothesis, and we do not conclude that there is a problem with
the lengths of the pens.

Although, it is clearly seen that there is a mathematical difference between required length (i.e. 4) and sample mean length (i.e. 3.9893), but this difference of 0.0107 is not statistically significant as far as complete
population is into consideration. Therefore, we
conclude that population mean length of the pens is also 4. Since, there is no significant evidence that the mean pen length is different from 4. So,
pens are not out of design.

# Two-Sample t-Test

The two-sample (independent groups) t-test is used to determine whether the unknown
means of two populations are different from each other based on independent samples
from each population. If the two-sample means are sufficiently different from each other,
then the population means are declared to be different. A related test, the paired t-test, to
be discussed in the next section, is used to compare two population means using samples
that are paired in some way.

The samples for a two-sample t-test can be obtained from a single population that has been
randomly divided into two subgroups, with each subgroup subjected to one of two
treatments (e.g., two medications) or from two separate populations (e.g., male and
female). In either case, for the two sample t-test to be valid, it is necessary that the two
samples are independent (i.e., unrelated to each other).

### Appropriate Applications for a Two-Sample t-Test

In each of the following examples, the two-sample (independent groups) t-test is used to
determine whether the population means of the two groups are different.

(1) meeting once a week with taped lessons provided on a CD or on the Internet and

(2) having three sessions a week using standard lectures by the same professor. Students are randomly placed into one of the two sections at the time of registration. Using results from a standardized final exam, the researcher compares mean differences between the learning obtained in the two types of classes.

### Design Considerations for a Two-Sample t-Test

The characteristics of the t-tests in the above examples are the following:

#### A Two-Sample t-Test Compares Means

In an experiment designed to use the two-sample t-test, you want to compare means from a quantitative variable such as height, weight, amount spent, or grade. In other words, it should make sense to calculate the mean of the observations. This measurement is called your “response” or “outcome” variable. Note: The outcome measure should not be a categorical (nominal/discrete) variable such as hair color, gender, or occupational level, even if the data have been numerically coded.

#### You Are Comparing Independent Samples

The two groups contain subjects (or objects) that are not paired or matched in any way.
These subjects typically are obtained in one of two ways:

#### The t-Test Assumes Normality

A standard assumption for the t-test to be valid when you have small sample sizes is that the outcome variable measurements are normally distributed. That is, when each sample is graphed as a histogram, the shape approximates a bell curve. When the distribution of the data is markedly skewed, the mean is a poor representation of central tendency and thus violates the assumptions of this test.

#### Are the Variances Equal?

Another consideration that should be addressed before using the t-test is whether the population variances can be considered to be equal. The two-sample t-test is robust against moderate departures from the normality and variance assumption, but independence of samples must not be violated. For specifics, see the section below titled “Deciding Which Version of the t-Test Statistic to Use.”

# Hypotheses for a Two-Sample t-Test

As with any version of the t-test, when performing a two-sample t-test, you may or may not have a preconceived assumption about the direction of your findings. Depending on the design of your study, you may decide to perform a one- or two-tailed test.

### Two-Tailed Tests

In this setting, there are two populations, and we are interested in testing whether the
population means (i.e., µ1 and µ2
) are equal. The hypotheses for the comparison of the
means in a two-sample t-test are as follows:

H0
: µ1 = µ2
(the population means of the two groups are the same).

Ha
: µ1 ≠ = µ2
(the population means of the two groups are different).

### One-Tailed Tests

If your experiment is designed so that you are only interested in detecting whether one
mean is larger than the other, you may choose to perform a one-tailed (sometimes called
one-sided) t-test. For example, when you are only interested in detecting whether the
population mean of the second group is larger than the population mean of the first group,
the hypotheses become the following:

H0
: µ1 = µ2
(the population means of the two groups are the same).

Ha
: µ2 > µ1
(the population mean of the second group is larger than the population
mean of the first group).

Since SPSS always reports a two-tailed p-value, you must modify the reported p-value to
fit a one-tailed test by dividing it by 2. Thus, if the p-value reported for a two-tailed t-test
is 0.06, then the p-value for this one-sided test would be 0.03 if the results are supportive
of the alternative hypothesis (i.e., if X̄2 > X̄1
). If the one-sided hypotheses above are tested
and X̄2 < X̄1
, then the p-value would actually be greater than 0.50, and the null hypothesis
should not be rejected.

# Hypothetical Example

Click Here To Download Sample Dataset (SPSS Format)A researcher wants to know whether one chemical (Brand 1) causes plants to grow faster than another brand of chemical (Brand 2). Starting with seeds, he grows plants in identical consitions and randomly assigns chemical "Brand 1" to seven plants and chemical "Brand 2" to six plants. The data for this experiemnt are as follows, where outcome measurement is the height of the plant after 3 weeks of growth. The data are shown as below:

**Chemical Data**

Chemical Brand 1 (In CMs) |
Chemical Brand 1 (In CMs) |

51 | 54 |

53.3 | 56.1 |

55.6 | 52.1 |

51 | 56.4 |

55.5 | 54 |

53 | 52.9 |

52.1 |

Since either fertilizer could be superior, a two-sided t-test is appropriate. The hypotheses for this test are H0
:
µ1 = µ2
versus Ha
: µ1 ≠ µ2
or, in words, the following:

Null hypothesis (H0
): The mean growth heights of the plants using the two different fertilizers are the
same.

Alternative hypothesis (Ha
): The mean growth heights of the plants using the two fertilizers are
different.

**Sample Output : Two-Sample t-Test Output for Chemical Data**

Group Statistics

Group Statistics

Type | N | Mean | Std. Deviation | Std. Error Mean | |

Height | 1 | 7 | 53.0714 | 1.90938 | 0.72168 |

2 | 6 | 54.2500 | 1.70968 | 0.69797 |

**Independent Samples Test**

Levene's Test for Equality of Variance | t-test for Equality of Means | |||||||||

F | Sig | t | df | Sig. (2-tailed) |
Mean Difference | Std. Error Difference | 95% Confidence Interval of the Difference | |||

Lower | Upper | |||||||||

Height | Equal Variance Assumed | 0.75 | 0.790 | -1.163 | 11 | 0.269 | -1.17857 | 1.01329 | -3.40881 | 1.05166 |

Equal Variances Not Assumed | -1.174 | 10.963 | 0.265 | -1.17857 | 1.00398 | -3.38922 | 1.03208 |

### Interpretation

If we notice the results of the F-test (Levene’s test) for
evaluating the equality of variance than the p-value 0.79 indicates that the variances are not significantly
different.

Therefore, if you are comfortable with this information, the appropriate t-test is the one that assumes equal
variances. However, if you choose to go with the conservative approach, you would use the “Equal variances
not assumed” t-test. In this case, your final decision for the significance of the t-test would not be different.

Making a decision based on the p-value. The p-value for the equal variances t-test is p = 0.269. Since
this p-value is greater than 0.05, the decision would be that there is no significant variance difference
between the two groups. (Do not reject the null hypothesis.) Thus, there is not enough evidence to
conclude that the mean heights are different. If you use the approach in which equal variances are not
assumed, the p-value is p = 0.265, which is almost identical to the “equal variance” p-value. Thus,
your decision would be the same.

Making a decision based on the confidence interval. The 95% confidence intervals for the difference
in means are given in the last two columns of Independent Samples Test Table. The interval associated with the assumption
of equal variances is [–3.41 to 1.05], while the confidence interval when equal variances are not
assumed is [–3.39 to 1.03]. Since these intervals include 0 (zero), we again conclude that there is no
significant difference between the means using either assumption regarding the variances. Thus, you
would make the same decisions discussed in the p-value section above. The confidence interval gives
more information than a simple p-value. Each interval above indicates that plausible values of the
mean difference lie between about –3.4 and 1.0. Depending on the nature of your experiment, the
information about the range of the possible mean differences may be useful in your decision-making
process.

Finally, we do not have enough evidences to reject null hypothesis. So, we fail to reject it. Hence, significant difference does not exist for both
chemical brands. Althogh, mathematically means are different but statistically the values are not that much significant for the population. Therefore, the mean growth heights of the plants using the two different fertilizers are the same.

# Paired t-Test

The paired t-test (also called a dependent samples t-test) is appropriate for data in which the two samples are correlated or related in some way. This type of analysis is appropriate for the three separate data collection scenarios:

In all cases, the data to be analyzed are the differences within pairs (e.g., the right eye measurement minus the left eye measurement). The difference scores are then analyzed as a one-sample t-test.

### Appropriate Applications for a Paired t-Test

The following are examples of paired data that would properly be analyzed using a paired
t-test.

### Design Considerations for a Paired t-Test

#### Pairing Observations May Increase the Ability to Detect Differences

A paired t-test is recommended when variability between groups may be sufficiently large to mask any mean differences that might exist between the groups. Pairing is a method for obtaining a more direct measurement on the difference being examined. For example, in the diet example above, one method of assessing the performance of the diet would be to select 30 subjects and randomly assign 15 to go on the diet and 15 to eat regularly for the next month (i.e., the control group). At the end of the month, the weights of the subjects on the diet could be compared with those in the control group to determine whether there is evidence of a difference in average weights. Clearly, this is not a desirable design since the variability of weights of subjects within the two groups will likely mask any differences that might be produced by one month on the diet. A better design would be to select 15 subjects (or even better, 30 subjects) and measure the weights of these subjects before and after the month on the diet. The 15 differences between the before and after weights for the subjects provide much more focused measurements of the effect of the diet than would independent samples.

#### Paired t-Test Analysis Is Performed on the Difference Scores

The data to be analyzed in a paired t-test are the differences between pairs (e.g., the before minus after weight for each subject in a diet study or differences between matched pairs in the study of teaching methods). The difference scores are then analyzed using a onesample t-test.

#### The Paired t-Test Assumes Normality of the Differences

The basic assumption for the paired t-test to be valid when you have small sample sizes is that the difference scores are normally distributed and that the observed differences represent a random sample from the population of differences. Also, using difference scores can be misleading in cases where there are ceiling or floor limits to the individual values, when there is a substantial “regression to the mean” for observations, or when initial values are not representative of the sample (particularly in longitudinal data).

# Hypotheses for a Paired t-Test

The hypotheses to be tested in a paired t-test are similar to those used in a two-sample ttest. In the case of paired data, µ1 and µ2
refer to the population means of the before and
after measurements on a single group of subjects or to the first and second pair in the case
of matched subjects. The null hypotheses may be stated as H0
: µ1 = µ2
. However, in the
case of paired data, it is common practice to make use of the fact that the difference
between the two population means (i.e., µ1 –; µ2
) is equal to the population mean of the
difference scores, denoted µd
. In this case, the hypotheses are written as follows:

H0
: µd = 0 (the population mean of the differences is zero).

Ha
: µd ≠ 0 (the population mean of the differences is not zero).

# Hypothetical Example

Click Here To Download Sample Dataset (SPSS Format)The data for this example include two variables reporting before
and after weights for 15 randomly selected subjects who participated in a test of a new diet for a 1-month
period. In this case, we want to determine whether there is evidence that the diet
works. That is, if we calculate differences as di = “before” weight minus “after” weight, then we should test
the following hypotheses:

H0
: µd = 0 (the mean of the differences is zero; i.e., the diet is ineffective).

Ha
: µd > 0 (the mean of the differences is positive; i.e., the diet is effective).

**Sample Output : Paired t-Test**

Paired Samples Statistics

Paired Samples Statistics

Mean | N | Std. Deviation | Std. Error Mean | ||

Pair 1 | Before | 193.40 | 15 | 22.232 | 5.740 |

After | 189.87 | 15 | 21.250 | 5.487 |

**Paired Samples Test**

Paired Differences | t | df | Sig. (2-tailed) |
||||||

Mean | Std. deviation | Std. Error Mean | 95% Confidence Interval of the Difference | ||||||

Lower | Upper | ||||||||

Pair-1 | Before-After | 3.533 | 5.330 | 1.376 | 0.582 | 6.458 | 2.567 | 14 | 0.022 |

### Interpretation

In this output, the sample mean of the difference scores is 3.533, with a standard deviation of the differences
given by 5.330. The calculated t-statistic (with 14 df) is given by 2.567, which has a p-value of 0.022. When
interpreting these results, notice that the mean of the “before minus after” differences is positive, which is
supportive of the alternative hypothesis that µd = 0. Since this experiment from its inception was only
interested in detecting a weight loss, it can be viewed as a one-sided test. Thus, for a one-sided hypothesis
test, the reported p-value should be one half of the p-value given in the computer output (i.e., p = 0.011).
That is, at the α = 05 level, we reject H0
and conclude that the weight loss diet is effective.

It should be noted that this confidence interval is two-sided. In this case, the fact that the interval [0.58, 6.49]
contains only positive values suggests that µd > 0 (i.e., that the diet is effective).