Home :: Web Topics :: Web Topic 7.1

Web Topic 7.1: Measuring Sex Differences

[Referenced on textbook p. 194]

Let’s say we’re investigating possible sex differences in verbal fluency (speed at producing words corresponding to a certain category, such as fruit). We administer a quantitative test of verbal fluency to a large number of men and women who, ideally, are drawn at random from the entire population of men and women, or from some defined subpopulation, such as college students. The distribution of results for each sex will approximately follow a bell-shaped curve (the normal distribution), but the means for the two distributions may differ (see Figure 1).

Figure 1  Difference in means. These distributions show the values for a trait in which females tend to score higher than males. In this example, the difference between the means (the vertical lines) is about 80% of one standard deviation, giving an effect size, d, of about 0.8.

How meaningful is this difference in the means? Traditional statistical tests measure the statistical significance of the difference—that is, the probability that the difference is a chance result of sampling rather than reflecting a real difference in the entire population. But even tiny differences may be statistically significant if the number of subjects is very large. If the two distributions overlap greatly, such tiny differences are not very meaningful in any practical sense.

We therefore need a measure that relates the size of the difference in the means to the degree of dispersion within the two groups. A common measure of dispersion is the standard deviation. (Technically, this is the square root of the variance, which is the mean squared deviation of the individual values from the sample mean.) In a normally distributed population, about 68% of the individual values will fall within one standard deviation of the mean, and about 96% will fall within two standard deviations of the mean. The difference between the means of the two samples, divided by the pooled standard deviation, is called the effect size (d). Thus if d = 0.35, the pooled standard deviation is 35% of the difference between the means of the male and female samples.

An effect size of 0.1 or less is considered a trivial or nonexistent difference. An effect size of 0.2 is considered small. An effect size of 0.5 is considered moderate—it corresponds, for example, to a physical difference that is quite evident to the naked eye, such as the difference in height between 14-year-old and 18-year-old girls. An effect size of 0.8 or greater is considered large.

To combine the results of all published studies on sex differences in verbal fluency, researchers would assign a weight to the d value for each study that reflects the statistical properties of that study (such as the number of subjects), and then calculate the mean for the weighted d values of all the studies. This process is called meta-analysis.

If you inspect the curves in the figure, you will see that the ratio of females to males becomes increasingly unbalanced toward either end of the combined distribution: most of the extreme values at one end are female and most of the extreme values at the other end are male. Thus even when the d value is only moderate, processes that select individuals with extreme characteristics may turn up highly unbalanced numbers of males and females. In real life, such selection processes could include the hiring of people with outstanding abilities in a particular field, the imposition of the death penalty on extremely aggressive individuals, and the like.

A second aspect of the study of sex differences that can be aided by statistical analysis is the search for common cognitive or psychological structures underlying data from a variety of different tests. Let’s say that you have tested men and women on measures of mathematical reasoning, rapid calculation, visuospatial problem solving, number recall, verbal reasoning, verbal fluency, and word recall. Do these seven tests tap into seven independent psychological traits, or into a smaller number of underlying structures? And if the latter, what are these structures?

Such questions can be tackled by means of factor analysis, of which there are several different varieties. Briefly, your computer looks at all possible pairs of the tests (mathematical reasoning vs. rapid calculation, mathematical reasoning vs. number recall, etc.). For each pair, it measures the correlation between the scores on the two tests—that is, the extent to which an individual’s score on one test predicts their score on the other test. It thereby generates a matrix of correlations for all the pairs. From the matrix, the computer derives a number of independent (uncorrelated) factors, each of which accounts for part of the total interpair correlation. If there are indeed common underlying structures, there will be fewer factors than tests, and each factor will derive most of its loading (that is, its dependence on the interpair correlations) from some subset of the tests.

Taking the seven tests described, it might be that one factor derives most of its loading from “mathematical reasoning,” “rapid calculation,” and “number recall.” One might call this factor “general mathematical ability.” A second factor might be “general verbal ability,” and a third might be “general visuospatial ability.” Alternatively, one factor might derive most of its loading from “mathematical reasoning,” “verbal reasoning,” and “visuospatial reasoning.” One might call this factor “general reasoning ability.” In this case, a second factor might be “general fluency” and a third might be “general memory ability.”

The value of this approach is that it gets away from presuppositions about how mental skills are grouped, and instead allows the groupings to emerge from the data themselves. Once the factors have been identified, it is possible to calculate sex differences and effect sizes for the underlying factors rather than for the individual tests. Another benefit of factor analysis is that it facilitates the design of future test batteries so as to provide the greatest possible information about the underlying factors.