# Factor Analysis

Suppose you have collected data on a lot of variables for your subjects, and you want to summarize the information collected into a smaller set of features or factors. You might use this summarized information to create a more manageable collection of variables on which to do further analysis. You might also be interested in obtaining information on underlying latent variables (or factors) that cannot be directly measured. For example, Gardiner (2006) identifies several types of intelligence, two of which are LogicalMathematical Intelligence (Number/Reasoning Smart) and Linguistic Intelligence (Word Smart). Each of these types of intelligence encompasses a variety of attributes. Someone who is Number/Reasoning Smart is good at performing calculations, using symbolic thought, logical reasoning, etc. Certainly there is not a single variable or measurement that can quantify a person’s “Number/Reasoning Smartness.” Quantifying the Number/Reasoning Smartness of an individual would require collecting data on a large number of variables/questions that measure various aspects of this type of intelligence. You might consider a study in which you collect many variables measuring a variety of aspects of “intelligence” in an attempt to either arrive at your own classification of intelligences or perhaps to see whether you can obtain measurements on the underlying intelligence factors identified by Gardiner. SPSS provides a variety of tools (under the general heading of “Dimension Reduction”) that allow us to perform these types of analyses. The tool we will discuss in this chapter is Factor Analysis.

Factor Analysis can be divided into two approaches: Exploratory Factor Analysis and Confirmatory Factor Analysis.

Exploratory factor analysis, as the name suggests, involves techniques for examining data sets for purposes of identifying factors or latent variables, examining which variables contribute most information, etc.

Confirmatory factor analysis (which we will not discuss) involves a set of techniques for testing hypotheses to, again as the name suggests, try to confirm certain theories, etc.

### Appropriate Applications of Factor Analysis

### Design Considerations for Factor Analysis

#### Appropriate Sample Size

Several authors have studied the question of what sample size is needed for you to feel comfortable that your results generalize well. Some authors, e.g., Tabachnick and Fidell (2013), recommend 300 observations but would be comfortable with 150 in some situations. Other authors suggest that it is not the overall sample size that matters as much as the ratio of the number of observations to variables. Sample size recommendations for obtaining stable estimates range from at least 5 per variable and at least 100 overall, to 10– 20 observations per observed variable.

#### Sufficient Correlation Structure Within the Data Set

Recall that factor analysis is a collection of techniques for examining the correlations among the variables in your data set in order to identify factors or underlying latent variables. Consequently, if correlations among all of your observed variables are small, then there is little hope that factor analysis will provide information on underlying latent factors. SPSS provides two tools for assessing the sufficiency of the correlation structure: Bartlett’s test of sphericity and the Kaiser-Meyer-Olkin measure of sampling adequacy.

# Hypothetical Example

Click Here To Download Sample Dataset (SPSS Format)

### Research Scenario and Hypothesis

The data set INTEL.SAV contains a synthetically constructed set of intelligence data collected on 200 (hypothetical) subjects. Six variables were constructed to measure either Word Smartness or Numerical/Reasoning Smartness. These variables are all measured on a 10 point scale (0–10), and are considered to be scores on the following measurements or tests:

We will use the techniques of factor analysis to determine whether the two components of intelligence, Word Smartness and Numerical Reasoning Smartness, can be extracted from the data.

# Sample Output

### Interpretation:

Based on these preliminary observations, we would expect an exploratory factor analysis procedure to detect two underlying factors. The KMO and Bartlett’s test results shown in Table here indicate it is reasonable to run a factor analysis on the data.
KMO should be greater than 0.5 to carry out Factor Analysis for further interpreation. It shows that sample is adequate to carry out Factor Analysis further.

Bartlett's Test of Sphericity suggests that multi colinearity exists among the variables if it found to be significant. In this case p-value is 0.000 which is less than 0.05. So, it is found to be significant and we can conclude that, multi colinearity exists.

It can be seen that the first principal component has a variance (eigenvalue) of 2.68, which is 44.672% of the total variance, 6. The second principal component has an eigenvalue of 2.07, which is 34.497% of the total variance. The first two principal components represent (or explain) 79.169% of the total variance. That means we have retain about 79% information in terms of 02 (two) extracted factors from the existing data and we lose about 21% information in order to reduce the dimensions / factors.

If we consider cut-off value for factor loading 0.5 (can increase as per requirement, but it must be greater than 0.5) for this case then the resulting rotated factor matrix is quite easily interpreted with Factor 1 seeming to measure Word Smartness and Factor 2 measuring Numerical/Reasoning Smartness seen by the fact that Factor 1 has heavy loadings on variables Vocabulary, Writing, and Grammar, while Factor 2 has large loadings on variables Computation, Inference, and Reasoning. Wow! I understand these factors (and in fact that’s the way we generated the data).

So, finally we have extracted 02 (two) new factors out of 06 (six) variables. The new names that were given for this example were "Word Smartness" and "Numerical / Resaoning Smartness". **Note:** The new names can be anything that represents all variables with higher factor loadings decided after cut-off value.