One Way Ananlysis of Variance (ANOVA)

The one-way ANOVA (also called a one-factor ANOVA or completely randomized design) is a staple of almost every research discipline. It is widely used and, as you will see, relatively easy to perform and interpret. This model is a direct extension of the two sample (independent group) t-test Comparing One or Two Means Using the t-Test. It is used to determine whether there are differences among the group means.

Appropriate Applications for a one-way ANOVA

Length of hospital stay. Anorexic patients in a psychiatric ward are randomly grouped into a control treatment group (standard treatment) and two experimental treatment groups to determine whether either of the two experimental treatments reduces the average length of stay

Which is the best brand? The shear strengths of bolts manufactured by four different companies are compared to establish whether there are differences in the average shear strength.

What is the best package color? A marketing study attempts to find out whether average sales are different for three choices of packaging color.

Design Considerations for a one-way ANOVA

The One-Way ANOVA Assumptions

1. Independent samples. The groups contain observed subjects (or objects) that are split into groups but are not paired or matched in any way. The groups are typically obtained in one of two ways:

a. Random split. Subjects (or items) all come from the same population and are randomly split into groups. Each group is exposed to identical conditions, except for a “treatment” that may be a medical treatment, a marketing design factor, exposure to a stimulus, and so on.
b. Random selection. Subjects are randomly selected from separate populations (i.e., by race, stores by region, machine by manufacturer, etc.).

2. Normality. A standard assumption for the one-way ANOVA to be valid is that the measurement variable is normally distributed within each group. That is, when graphed as a histogram, the shape approximates a bell curve.

Equal variances. Another assumption is that the within-group variances are the same for each of the groups.

Hypotheses for a Multiple Linear Regression Analysis

H0 : µ1 =µ2 =… =µk (the population means of all groups are the same).

Ha : µi≠=µj for some i ≠=j (the population means of at least two groups are different).

Hypothetical Example

Click Here To Download Sample Dataset (SPSS Format)

Research Scenario and Test Selection

The Municipal Forest Service (MFS) had the idea that the monies spent to maintain five disparate mountain hiking trails in the Santa Lucia Mountains, California, were significantly different. Random samples of such expenditures were taken from the records for the past 50 years. Six years were selected at random and the expenditures for trail maintenance recorded. The MFS wished to use the findings of their analysis to help them allocate resources for future trail maintenance.

The dependent variable is the amount of money spent on each of the five hiking trails. The independent variable, “hiking trails,” consists of five separate trails. Since the data were measured at the scale level, we may calculate means and standard deviations for the amount of money spent. The mean expenditures for each trail were based on the random samples of size six (n = 6). We assume that the distributions of trail expenditures are approximately normally distributed with equal variances. Which statistical test would you use to develop evidence in support of the MFS’s belief that the amount of money spent on trail maintenance was significantly different for the five trails?

We have scale data for our dependent variable (“expenditures in dollars”); therefore, we are able to look for the differences between means. We can eliminate t tests from consideration since we have more than two means. At this point, the ANOVA, designed to compare three or more means, seems like a worthy candidate. To use ANOVA, there must be random selection for sample data—this was accomplished. The distributions of expenditures appear to approximate the normal curve, and their variances approximate equality. Based on this information, we select the ANOVA. If the ANOVA results in significance, the Scheffe post hoc analysis will be used to identify which pairs of means contributed to the significant F value.

Research Question Á Hypothesis

The researcher’s idea, and reason for conducting this investigation, is the belief that there is a significant difference in maintenance expenditures for the five hiking trails. Recall that the alternative hypothesis (HA) is simply a restatement of the researcher’s idea. We write the following expression for the alternative hypothesis:

HA1: One or more of the five hiking trails have unequal maintenance expenditures.

The null hypothesis (H0) states the opposite and is written as

H01: µ1 = µ2 = µ3 = µ4 = µ5.

The expression H01 states that there are no differences between the means of the populations. The MFS researcher would prefer to reject the null hypothesis, which would then provide statistical evidence for the idea that maintenance expenditures for the five trails are indeed significantly different.

If there is evidence of overall significance, leading to the rejection of the null hypothesis (H01), the researcher would then wish to identify which of the five groups are different and which are equal. The following null and alternative hypotheses will facilitate that task.

If the null hypothesis (H01) is rejected, then the following null hypotheses should be tested:

H02: µ1 = µ2, H03: µ1 = µ3, H04: µ1 = µ4, H05: µ1 = µ5, H06: µ2 = µ3, H07: µ2 = µ4, H08: = µ2 = µ5, H09: µ3 = µ4, H10: µ3 = µ5, H011: µ4 = µ5.

The alternative hypotheses for these new null hypotheses follow:

HA2: µ1 ≠ µ2, HA3: µ1 ≠ µ3, HA4: µ1 ≠ µ4, HA5: µ1 ≠ µ5, HA6: µ2 ≠ µ3, HA7: µ2 ≠ µ4, HA8: µ2 ≠ µ5, HA9: µ3 ≠ µ4, HA10: µ3 ≠ µ5, HA11: µ4 ≠ µ5

Sample Output

Interpretation:

The most important thing to note in table present here is that SPSS found a significant difference between at least one of the 10 pairs of means. This is shown by the .000 found in the Sig. column. In other words, we now have statistical evidence in support of the researcher’s idea that the expenditures for these trails were not equal. As in the prior tests for differences between means, significance is indicated by the small value (.000) shown in the Sig. column in the table.

The Scheffe test is another test of significance, but this time the test compares each possible combination of means one at a time. Examination of the Scheffe post hoc analysis reveals that there are three mean comparisons that are significantly different. First, we find that there was an average difference of $10,167 per year on the maintenance of the Eaton Canyon and Grizzly Flat trails. Second, it is determined that there is an average difference of $12,167 for expenditures on the Eaton Canyon and Rattlesnake trails. Finally, it is determined that there is an average difference of $14,500 for the Eaton and Bailey Canyon trails. All these differences are statistically significant at the .05 level.

Summarizing our findings, we may state that the maintenance expenditures for these five mountain trails are significantly different. There is now statistical support for the researcher’s hypothesis as well as additional details regarding which of the trail pairs are significantly different.

Continue to Index

Inferential Statistical Analysis (Chapter - 5: Analysis of Variance (ANOVA))