Inferential Statistical Analysis (Chapter - 1: Comparing Means using t-Tests)

COMPARING MEANS USING T-TESTS

There are three types of t-tests in this chapter.

1). One-Sample t - test, which is used to compare a single mean to a fixed number or "gold standard"

2). Two-Sample t - test, which is used to compare two population means based on independent samples from two populations or groups

3). Paired Sample t - test, which is used to compare two means based on samples that are related in some way



One Sample t - test

The one-sample t-test is used for comparing sample results with a known value. Specifically, in this type of test, a single sample is collected, and the resulting sample mean is compared with a value of interest, sometimes a “gold standard,” that is not based on the current sample. For example, this specified value might be one of the following:



The weight indicated on a can of vegetables
The advertised breaking strength of a type of steel pipe
Government specification on the percentage of fruit juice that must be in a drink before it can be advertised as “fruit juice”


The purpose of the one-sample t-test is to determine whether there is sufficient evidence to conclude that the mean of the population from which the sample is taken is different from the specified value.

Related to the one-sample t-test is a confidence interval on the mean. The confidence interval is usually applied when you are not testing against a specified value of the population mean but instead want to know a range of plausible values of the unknown mean of the population from which the sample was selected.



Appropriate Applications for a One-Sample t-Test

The following are examples of situations in which a one-sample t-test would be appropriate:

  • Are the soft drink bottles full? Does the average volume of liquid in filled soft drink bottles match the 12 ounces advertised on the label?
  • Does the diet work? Is the mean weight loss more than 5 pounds after 3 months for men ages 50 to 60 years, who are given a brochure and training describing a lowcarbohydrate diet?
  • Have SAT scores fallen? Based on a random sample of 200 students, can we conclude that the average SAT score this year is lower than the national average from 3 years ago?


  • Design Considerations for a One-Sample t-Test

    The key assumption underlying the one-sample t-test is that the population from which the sample is selected is normal. However, this assumption is rarely if ever precisely true in practice, so it is important to know how concerned you should be about apparent nonnormality in your data. The following are rules of thumb (Moore & McCabe, 2012):

  • If the sample size is small (less than 15), then you should not use the one-sample ttest if the data are clearly skewed or if outliers are present.
  • If the sample size is moderate (at least 15), then the one-sample t-test can be safely used except when there are severe outliers.
  • If the sample size is large (at least 40), then the one-sample t-test can be safely used without regard to skewness or outliers.

  • You will see variations of these rules throughout the literature. In particular, some statisticians will add that the one-sample t-test may not be appropriate if the data are skewed (even with a large sample size), or if there are substantial outliers. In these cases, a nonparametric test might be more desirable.



    Hypotheses for a One-Sample t-Test

    When performing a one-sample t-test, you may or may not have a preconceived assumption about the direction of your findings. Depending on the design of your study, you may decide to perform a one- or two-tailed test.


    Two-Tailed t-Tests

    The basic hypotheses for the one-sample t-test are as follows: where µ denotes the mean of the population from which the sample was selected, and µ0 denotes the hypothesized value of this mean. It should be reiterated that µ0 is a value that does not depend on the current sample.

    H0 : µ = µ0 (in words, the population mean is equal to the hypothesized value µ0).

    Ha : µ ≠ µ0 (the population mean is not equal to µ0 ).


    One-Tailed t-Tests

    If you are only interested in rejecting the null hypothesis if the population mean differs from the hypothesized value in a direction of interest, you may want to use a one-tailed (sometimes called a one-sided) test. If, for example, you want to reject the null hypothesis only if there is sufficient evidence that the mean is larger than the value hypothesized under the null (i.e., µ0 ), the hypotheses become the following:

    H0 : µ = µ0 (the population mean is equal to the hypothesized value µ0 ).

    Ha : µ > µ0 (the population mean is greater than µ0 ).

    Analogous hypotheses could be specified for the case in which you want to reject H0 only if there is sufficient evidence that the population mean is less than µ0 .

    SPSS always reports a two-tailed p-value, so you should modify the reported p-value to fit a one-tailed test by dividing it by 2 if your results are consistent with the direction specified in the alternative hypothesis and an a priori decision was made that a one-tailed test was appropriate.



    Hypothetical Example

    Click Here To Download Sample Dataset (SPSS Format)

    A certain pen is designed to be 4 inches in length. The lengths of a random sample of 15 pens are 4, 3.95, 4.01, 3.95, 4, 3.98, 3.97, 3.97, 4.01, 3.98, 3.99, 4.01, 4.02, 4.02, and 3.98. Test whether pens are out of the design or not.


    In this example, the pens are out of the design when they are too short or too long. Therefore, in the one sample t-test, we test the two-tailed hypothesis:

    Null hypothesis (H0 ): µ = 4 (the population mean is equal to 4”).

    Alternative hypothesis (Ha ): µ ≠ 4 (the population mean is not equal to 4”).

    Sample Output
    One-Sample Statistics

    N Mean Std. Deviation Std. Error Mean
    Length 15 3.9893 0.02314 0.00597

    One-Sample Statistics

    Test Value=4
    t df Sig.
    (2-tailed)
    Mean Difference 95% Confidence Interval of the Difference
    Lower Upper
    Length -1.786 14 0.096 -0.01067 -0.0235 0.0021


    Interpretation:

    Sample mean length is 3.9893 with standard deviation of 0.02314. So, not much deviation exist among the sample data points which we have randomly selected. Further, P-Value for the test is 0.096 which is greater than the value of significance (By Default: 0.05), hence, we do not reject the null hypothesis, and we do not conclude that there is a problem with the lengths of the pens.

    Although, it is clearly seen that there is a mathematical difference between required length (i.e. 4) and sample mean length (i.e. 3.9893), but this difference of 0.0107 is not statistically significant as far as complete population is into consideration. Therefore, we conclude that population mean length of the pens is also 4. Since, there is no significant evidence that the mean pen length is different from 4. So, pens are not out of design.



    Two-Sample t-Test

    The two-sample (independent groups) t-test is used to determine whether the unknown means of two populations are different from each other based on independent samples from each population. If the two-sample means are sufficiently different from each other, then the population means are declared to be different. A related test, the paired t-test, to be discussed in the next section, is used to compare two population means using samples that are paired in some way.

    The samples for a two-sample t-test can be obtained from a single population that has been randomly divided into two subgroups, with each subgroup subjected to one of two treatments (e.g., two medications) or from two separate populations (e.g., male and female). In either case, for the two sample t-test to be valid, it is necessary that the two samples are independent (i.e., unrelated to each other).



    Appropriate Applications for a Two-Sample t-Test

    In each of the following examples, the two-sample (independent groups) t-test is used to determine whether the population means of the two groups are different.

  • How can my flour make more dough? Distributors often pay extra to have products placed in prime locations in grocery stores. The manufacturer of a new brand of whole-grain flour wants to determine if placing the product on the top shelf or on the eye-level shelf produces better sales. From 40 grocery stores, he randomly chooses 20 for top-shelf placement and 20 for eye-level placement. After a period of 30 days, he compares average sales from the two placements.
  • What’s the smart way to teach economics? A university is offering two sections of a microeconomics course during the fall semester:
    (1) meeting once a week with taped lessons provided on a CD or on the Internet and
    (2) having three sessions a week using standard lectures by the same professor. Students are randomly placed into one of the two sections at the time of registration. Using results from a standardized final exam, the researcher compares mean differences between the learning obtained in the two types of classes.
  • Are males and females different? It is known that males and females often differ in their reactions to certain drugs. As a part of the development of a new antiseizure medication, a standard dose is given to 20 males and 20 females. Periodic measurements are made to determine the time it takes until a desired level of drug is present in the blood for each subject. The researcher wants to determine whether there is a gender difference in the average speed at which the drug is assimilated into the blood system.

  • Design Considerations for a Two-Sample t-Test

    The characteristics of the t-tests in the above examples are the following:


    A Two-Sample t-Test Compares Means

    In an experiment designed to use the two-sample t-test, you want to compare means from a quantitative variable such as height, weight, amount spent, or grade. In other words, it should make sense to calculate the mean of the observations. This measurement is called your “response” or “outcome” variable. Note: The outcome measure should not be a categorical (nominal/discrete) variable such as hair color, gender, or occupational level, even if the data have been numerically coded.


    You Are Comparing Independent Samples

    The two groups contain subjects (or objects) that are not paired or matched in any way. These subjects typically are obtained in one of two ways:

  • Subjects (or items) are selected for an experiment in which all come from the same population and are randomly split into two groups (e.g., placebo vs. drug or two different marketing campaigns). Each group is exposed to identical conditions except for a “treatment,” which may be a medical treatment, a marketing design factor, exposure to a stimulus, and so on.
  • Subjects are randomly selected from two separate populations (e.g., male vs. female) as in the medical example above.

  • The t-Test Assumes Normality

    A standard assumption for the t-test to be valid when you have small sample sizes is that the outcome variable measurements are normally distributed. That is, when each sample is graphed as a histogram, the shape approximates a bell curve. When the distribution of the data is markedly skewed, the mean is a poor representation of central tendency and thus violates the assumptions of this test.


    Are the Variances Equal?

    Another consideration that should be addressed before using the t-test is whether the population variances can be considered to be equal. The two-sample t-test is robust against moderate departures from the normality and variance assumption, but independence of samples must not be violated. For specifics, see the section below titled “Deciding Which Version of the t-Test Statistic to Use.”



    Hypotheses for a Two-Sample t-Test

    As with any version of the t-test, when performing a two-sample t-test, you may or may not have a preconceived assumption about the direction of your findings. Depending on the design of your study, you may decide to perform a one- or two-tailed test.


    Two-Tailed Tests

    In this setting, there are two populations, and we are interested in testing whether the population means (i.e., µ1 and µ2 ) are equal. The hypotheses for the comparison of the means in a two-sample t-test are as follows:

    H0 : µ1 = µ2 (the population means of the two groups are the same).

    Ha : µ1 ≠ = µ2 (the population means of the two groups are different).


    One-Tailed Tests

    If your experiment is designed so that you are only interested in detecting whether one mean is larger than the other, you may choose to perform a one-tailed (sometimes called one-sided) t-test. For example, when you are only interested in detecting whether the population mean of the second group is larger than the population mean of the first group, the hypotheses become the following:

    H0 : µ1 = µ2 (the population means of the two groups are the same).

    Ha : µ2 > µ1 (the population mean of the second group is larger than the population mean of the first group).

    Since SPSS always reports a two-tailed p-value, you must modify the reported p-value to fit a one-tailed test by dividing it by 2. Thus, if the p-value reported for a two-tailed t-test is 0.06, then the p-value for this one-sided test would be 0.03 if the results are supportive of the alternative hypothesis (i.e., if X̄2 > X̄1 ). If the one-sided hypotheses above are tested and X̄2 < X̄1 , then the p-value would actually be greater than 0.50, and the null hypothesis should not be rejected.



    Hypothetical Example

    Click Here To Download Sample Dataset (SPSS Format)

    A researcher wants to know whether one chemical (Brand 1) causes plants to grow faster than another brand of chemical (Brand 2). Starting with seeds, he grows plants in identical consitions and randomly assigns chemical "Brand 1" to seven plants and chemical "Brand 2" to six plants. The data for this experiemnt are as follows, where outcome measurement is the height of the plant after 3 weeks of growth. The data are shown as below:

    Chemical Data
    Chemical Brand 1 (In CMs) Chemical Brand 1 (In CMs)
    51 54
    53.3 56.1
    55.6 52.1
    51 56.4
    55.5 54
    53 52.9
    52.1


    Since either fertilizer could be superior, a two-sided t-test is appropriate. The hypotheses for this test are H0 : µ1 = µ2 versus Ha : µ1 ≠ µ2 or, in words, the following:

    Null hypothesis (H0 ): The mean growth heights of the plants using the two different fertilizers are the same.

    Alternative hypothesis (Ha ): The mean growth heights of the plants using the two fertilizers are different.

    Sample Output : Two-Sample t-Test Output for Chemical Data
    Group Statistics

    Type N Mean Std. Deviation Std. Error Mean
    Height 1 7 53.0714 1.90938 0.72168
    2 6 54.2500 1.70968 0.69797


    Independent Samples Test
    Levene's Test for Equality of Variance t-test for Equality of Means
    F Sig t df Sig.
    (2-tailed)
    Mean Difference Std. Error Difference 95% Confidence Interval of the Difference
    Lower Upper
    Height Equal Variance Assumed 0.75 0.790 -1.163 11 0.269 -1.17857 1.01329 -3.40881 1.05166
    Equal Variances Not Assumed -1.174 10.963 0.265 -1.17857 1.00398 -3.38922 1.03208


    Interpretation

    If we notice the results of the F-test (Levene’s test) for evaluating the equality of variance than the p-value 0.79 indicates that the variances are not significantly different.

    Therefore, if you are comfortable with this information, the appropriate t-test is the one that assumes equal variances. However, if you choose to go with the conservative approach, you would use the “Equal variances not assumed” t-test. In this case, your final decision for the significance of the t-test would not be different.

    Making a decision based on the p-value. The p-value for the equal variances t-test is p = 0.269. Since this p-value is greater than 0.05, the decision would be that there is no significant variance difference between the two groups. (Do not reject the null hypothesis.) Thus, there is not enough evidence to conclude that the mean heights are different. If you use the approach in which equal variances are not assumed, the p-value is p = 0.265, which is almost identical to the “equal variance” p-value. Thus, your decision would be the same.

    Making a decision based on the confidence interval. The 95% confidence intervals for the difference in means are given in the last two columns of Independent Samples Test Table. The interval associated with the assumption of equal variances is [–3.41 to 1.05], while the confidence interval when equal variances are not assumed is [–3.39 to 1.03]. Since these intervals include 0 (zero), we again conclude that there is no significant difference between the means using either assumption regarding the variances. Thus, you would make the same decisions discussed in the p-value section above. The confidence interval gives more information than a simple p-value. Each interval above indicates that plausible values of the mean difference lie between about –3.4 and 1.0. Depending on the nature of your experiment, the information about the range of the possible mean differences may be useful in your decision-making process.

    Finally, we do not have enough evidences to reject null hypothesis. So, we fail to reject it. Hence, significant difference does not exist for both chemical brands. Althogh, mathematically means are different but statistically the values are not that much significant for the population. Therefore, the mean growth heights of the plants using the two different fertilizers are the same.



    Paired t-Test

    The paired t-test (also called a dependent samples t-test) is appropriate for data in which the two samples are correlated or related in some way. This type of analysis is appropriate for the three separate data collection scenarios:



  • Pairs consist of before and after measurements on a single group of subjects or patients.
  • Two measurements on the same subject or entity (right and left eye, for example) are paired.
  • Subjects in one group (e.g., those receiving a treatment) are paired or matched on a one-to-one basis with subjects in a second group (e.g., control subjects). (Sometimes this is done using biological twins.)


  • In all cases, the data to be analyzed are the differences within pairs (e.g., the right eye measurement minus the left eye measurement). The difference scores are then analyzed as a one-sample t-test.


    Appropriate Applications for a Paired t-Test

    The following are examples of paired data that would properly be analyzed using a paired t-test.

  • Does the diet work? A developer of a new diet is interested in showing that it is effective. He randomly chooses 15 subjects to go on the diet for 1 month. He weighs each patient before and after the 1-month period to see whether there is evidence of a weight loss at the end of the month.
  • Is a new teaching method better than standard methods? An educator wants to test a new method for improving reading comprehension. Twenty students are assigned to a section that will use the new method. Each of these 20 students is matched (age, race, gender, initial reading level) with a student with similar reading ability who will spend the semester in a class using the standard teaching methods. At the end of the semester, the students in both sections will be given a common reading comprehension exam, and the average reading comprehension differences between the matched pairs is compared.
  • Do new eye drops work better than standard drops? A pharmaceutical company wants to test a new formulation of eye drops with its standard drops for reducing redness. Fifty subjects who have similar problems with eye redness in each eye are randomly selected for the study. For each subject, an eye is randomly selected to be treated with the new drops, and the other eye is treated with the standard drops. At the end of the treatment schedule, the redness in each eye is measured using a quantitative scale.

  • Design Considerations for a Paired t-Test


    Pairing Observations May Increase the Ability to Detect Differences

    A paired t-test is recommended when variability between groups may be sufficiently large to mask any mean differences that might exist between the groups. Pairing is a method for obtaining a more direct measurement on the difference being examined. For example, in the diet example above, one method of assessing the performance of the diet would be to select 30 subjects and randomly assign 15 to go on the diet and 15 to eat regularly for the next month (i.e., the control group). At the end of the month, the weights of the subjects on the diet could be compared with those in the control group to determine whether there is evidence of a difference in average weights. Clearly, this is not a desirable design since the variability of weights of subjects within the two groups will likely mask any differences that might be produced by one month on the diet. A better design would be to select 15 subjects (or even better, 30 subjects) and measure the weights of these subjects before and after the month on the diet. The 15 differences between the before and after weights for the subjects provide much more focused measurements of the effect of the diet than would independent samples.


    Paired t-Test Analysis Is Performed on the Difference Scores

    The data to be analyzed in a paired t-test are the differences between pairs (e.g., the before minus after weight for each subject in a diet study or differences between matched pairs in the study of teaching methods). The difference scores are then analyzed using a onesample t-test.


    The Paired t-Test Assumes Normality of the Differences

    The basic assumption for the paired t-test to be valid when you have small sample sizes is that the difference scores are normally distributed and that the observed differences represent a random sample from the population of differences. Also, using difference scores can be misleading in cases where there are ceiling or floor limits to the individual values, when there is a substantial “regression to the mean” for observations, or when initial values are not representative of the sample (particularly in longitudinal data).



    Hypotheses for a Paired t-Test

    The hypotheses to be tested in a paired t-test are similar to those used in a two-sample ttest. In the case of paired data, µ1 and µ2 refer to the population means of the before and after measurements on a single group of subjects or to the first and second pair in the case of matched subjects. The null hypotheses may be stated as H0 : µ1 = µ2 . However, in the case of paired data, it is common practice to make use of the fact that the difference between the two population means (i.e., µ1 –; µ2 ) is equal to the population mean of the difference scores, denoted µd . In this case, the hypotheses are written as follows:

    H0 : µd = 0 (the population mean of the differences is zero).

    Ha : µd ≠ 0 (the population mean of the differences is not zero).



    Hypothetical Example

    Click Here To Download Sample Dataset (SPSS Format)

    The data for this example include two variables reporting before and after weights for 15 randomly selected subjects who participated in a test of a new diet for a 1-month period. In this case, we want to determine whether there is evidence that the diet works. That is, if we calculate differences as di = “before” weight minus “after” weight, then we should test the following hypotheses:

    H0 : µd = 0 (the mean of the differences is zero; i.e., the diet is ineffective).

    Ha : µd > 0 (the mean of the differences is positive; i.e., the diet is effective).

    Sample Output : Paired t-Test
    Paired Samples Statistics

    Mean N Std. Deviation Std. Error Mean
    Pair 1 Before 193.40 15 22.232 5.740
    After 189.87 15 21.250 5.487

    Paired Samples Test
    Paired Differences t df Sig.
    (2-tailed)
    Mean Std. deviation Std. Error Mean 95% Confidence Interval of the Difference
    Lower Upper
    Pair-1 Before-After 3.533 5.330 1.376 0.582 6.458 2.567 14 0.022



    Interpretation

    In this output, the sample mean of the difference scores is 3.533, with a standard deviation of the differences given by 5.330. The calculated t-statistic (with 14 df) is given by 2.567, which has a p-value of 0.022. When interpreting these results, notice that the mean of the “before minus after” differences is positive, which is supportive of the alternative hypothesis that µd = 0. Since this experiment from its inception was only interested in detecting a weight loss, it can be viewed as a one-sided test. Thus, for a one-sided hypothesis test, the reported p-value should be one half of the p-value given in the computer output (i.e., p = 0.011). That is, at the α = 05 level, we reject H0 and conclude that the weight loss diet is effective.

    It should be noted that this confidence interval is two-sided. In this case, the fact that the interval [0.58, 6.49] contains only positive values suggests that µd > 0 (i.e., that the diet is effective).

    Continue to Index

    Welcome to HoneyBee - The Learning Platform

    By Binit Patel

    HoneyBee is a learning platform which is an integrated set of informative online services that enable learners involved in education with information, tools and resources to support and enhance Teaching, Learning and Management.

    Contact With Me