The two-way ANOVA (Analysis of Variance) serves as a prominent analytical technique in statistics. It is designed to analyze the effects of two independent categorical variables on a continuous dependent variable. On the contrary, stands the one-way ANOVA, which evaluates the impact on only one variable. This type of assessment, allows researchers to determine relationships between variables on a deeper level, providing more accurate insights into data sets.
Definition: Two-way ANOVA
ANOVA stands for Analysis of Variance, a statistical method used to determine whether there are significant differences between two or more groups. It is used to compare the means of two or more groups and determine if they are significantly different from each other.
In this case, a two-way ANOVA is used to determine whether gender and age group significantly impact the average salary of employees.
When is a two-way ANOVA used?
A two-way ANOVA is appropriate when you have gathered data on a continuous dependent variable measured at different levels of two categorical independent variables. The dependent variable in a two-way ANOVA can be a numerical measure of a characteristic or behavior that is averaged across groups to calculate the mean.
Salary is a quantitative variable because it represents income. It can be divided to find the average salary per person.
A categorical variable represents a set of categories or groups. It is a variable that can take on one of a limited number of values or levels, which are often represented by labels or names. Gender types male and female are levels within the categorical variable gender type. Age groups, 1,2 and 3 are levels within the categorical variable age group.
The Function of the two-way ANOVA
The two-way ANOVA utilizes the F test to determine the statistical significance of the differences between groups. The F test compares the variability in each group mean to the overall variance in the dependent variable in what is known as a group-wise comparison test.
In a two-way ANOVA with interaction, three hypotheses can be tested:
- No significant difference between the means of the groups formed by varying factor 1.
- No significant difference between the means of the groups formed by varying factor 2
- No significant difference in the means of the groups formed by varying the levels of factor 1 and 2.
In contrast, a two-way ANOVA with no interaction tests whether each factor has a main effect on the dependent variable but no interaction between the factors. In our average salary experiment, we can use two-way ANOVA to test three hypothesis:
|Null hypothesis (H0)||Alternate hypothesis (Ha)|
|There is no difference in average salary
for any gender type
|There is a difference in average salary by gender type|
|There is no difference in average salary at any age bracket||There is a difference in average salary at any age bracket|
|The effect of one independent variable on average salary does not depend on the effect of the other independent variable (a.k.a. no interaction effect)||There is an interaction effect between age group and gender type on average salary|
Two-way ANOVA assumptions
A two-way ANOVA makes several assumptions about the data and the statistical model that must be met for the results to be reliable and valid. These are:
- Homogeneity of variance: The variance of the dependent variable should be equal across all groups. Use a non-parametric test like Kruskal-Wallis test if your data set fails to exhibit homogeneity.
- Independence of observations: The observations should be independent of each other. This means that the values of the dependent variable in one group should not be related to the values in any other group.
- Normally-distributed dependent variable: The data within each group should follow a normal distribution. This can be checked using normal probability plots or other tests of normality.
Conducting a two-way ANOVA
The dataset from our income experiment includes observations of:
- Income (average salary per person)
- Gender type (male, female)
- Age group (1 = 18-30, 2 = 31-50, or 3= 51 and above)
- Industry (1, 2, 3, 4)
Two-way ANOVA in R
The two-way ANOVA will test whether the independent variables (gender type and age group) affect the dependent variable (average salary). But there are some other possible sources of variation in the data that we want to take into account.
After loading the data into the R environment, we will create each of the three models using the aov() command, and then compare them using the aictab() command.
Two-way ANOVA R code
two.way aov(salary ~ gender + age group, data = worker.data)
In the second model, to test whether the interaction of gender and age group influences the salary, use a ‘ * ‘ to specify that you also want to know the interaction effect.
Two-way ANOVA with interaction R code
interaction aov(salary ~ gender* age group, data = worker.data)
Because our workers were randomized within industries, we add this variable as a blocking factor in the third model. We can then compare our two-way ANOVAs with and without the blocking variable to see whether the industry matters.
Two-way ANOVA with blocking R code
blocking aov(salary~ gender * age group + block, data = worker.data)
We can use Akaike information criterion (AIC) to calculate the best-fit model by finding the model that uses the fewest parameters to explain the largest variation. We can use the aictab() to perform a model comparison.
AIC R Sample code
model.set list(two.way, interaction, blocking)
model.names c(“two.way”, “interaction”, “blocking”)
aictab(model.set, modnames = model.names)
Two-Way ANOVA – Result interpretation
The output looks like this:
|Df||Sum Sq||Mean Sq||F value||Pr(>F)|
|Signif. codes:||0 `***' 0.001 "*' 0.01 "1 0.05 0.1 ‘’ 1|
The model can be interpreted using the following columns:
- Df displays the degrees of freedom for each variable, which is equal to the number of levels in the variable minus 1.
- Sum sq represents the sum of squares, which is the variation between the group means created by the independent variable levels and mean.
- Mean sq refers to the mean of the sum of squares, which is calculated by the sum of squares divided by the degrees of freedom.
- F value is the test statistic obtained from the F test, which is the mean square of the variable divided by the mean square of each parameter.
- Pr(>F) indicates the p-value of the F statistic, which depicts the likelihood that the F value from the F test would occur if the null hypothesis of no difference were true.
A post-hoc test will be used to test which levels are actually different from each other since ANOVA only shows which parameters are significant. We use the Tukey’s Honestly-Significant-Difference (TukeyHSD) test as shown below:
Tukey R code
The output looks like this:
Tukey multiple comparisons of means 95% family-wise confidence level
Fit: aov(formula = salary – gender + age, data = worker.data)
Two-way ANOVA – Result presentation
The following shows an example of a potential discussion of the results.
A one-way ANOVA is used to test for differences between two or more groups on a single independent variable, whereas a two-way ANOVA is used to test for differences between groups on two independent variables, and their interaction effect on a single dependent variable.
ANOVA is typically used when you want to determine if there is a significant difference between the means of two or more groups.
The assumptions of ANOVA include normality, homogeneity of variances, and independence of observations.
The results of a two-way ANOVA test are typically reported as F-statistics, with a corresponding p-value that indicates the statistical significance of the differences between the means of the groups being compared.