A useful approach to answering the Next Generation Science Standards’ call for teaching students to demonstrate understanding using mathematical representations is use of the Hardy-Weinberg equilibrium (H-W eq). This article is focused on the meaning of H-W eq and its application, rather than mathematical manipulation. Typical textbook problems are critiqued, and a model problem is presented.
One of the more difficult topics for introductory biology students to understand and for teachers to teach is the Hardy-Weinberg equilibrium (H-W eq) principle. One reason for this difficulty is the students’ mathematical background. More problematic than lack of manipulative skill1 is the difficulty of understanding why the principle is true and understanding how the principle applies to specific populations or, more importantly, the value of its application.
The H-W eq principle is, of course, the cornerstone of introductory population genetics and is therefore an important part of understanding evolution, as is recognized in most science standards. For example, “adaptation by natural selection” is one of the “Core Ideas” in the Next Generation Science Standards (NGSS), focused on how the distribution of traits in a population changes (NGSS Lead States, 2014). Likewise, the NGSS target the ability to “use mathematical representations to support scientific conclusions and design solutions.”
Some NGSS objectives2 address H-W eq more explicitly:
Students who demonstrate understanding can use mathematical representations to support explanations of how natural selection may lead to increases and decreases of specific traits in populations over time. (MS-LS4-6)
Given its importance and the well-recognized difficulty in teaching and learning it in accord with these standards, we here explain basic concepts of H-W eq, emphasizing distinctions that are sometimes ignored at the cost of coherent understanding.
What Is Hardy-Weinberg Equilibrium?
In general, a system is said to be in equilibrium if all competing influences are balanced. In the body, for example, we speak of homeostasis as the ability to maintain the internal equilibrium regardless of changes in the environment (e.g., temperature). A basic precept of evolution is that under certain conditions the frequencies of genotypes (and therefore of alleles) do not change (they remain at “equilibrium”), but in the absence of these conditions the frequencies do change.
Definition: A population is in Hardy-Weinberg equilibrium if the genotype frequencies and allele frequencies are the same in each generation at birth.
Consider the simplest situation of a monogenic Mendelian trait: a pair of alleles, one dominant A and the other recessive a, within a population of n individuals. The frequency of the A allele is the number of A alleles divided by the total number of alleles at this locus within the population (2× the number of individuals). For example, if n = 4000, and 2000 of the alleles of this locus in a population are a, the frequency of the A allele is 3/4 and that of the a allele (i.e., the remaining non-A alleles) is therefore 1/4. These would be the allele frequencies if there are 1000 aa and 3000 AA individuals (or 500 aa, 1000 Aa, and 2500 AA). But if these individuals randomly mate,3 the next generation has the same allele frequency but a genotype frequency of 1/16 aa, 3/8 Aa, and 9/16 AA. If a finite population is at H-W eq, however, both the genotype and the allele frequencies will be essentially the same in subsequent generations.
Requirements for Hardy-Weinberg Equilibrium
Below, we will demonstrate that a population is in H-W eq if the following conditions hold (with respect to a particular gene):
There is no migration (“gene flow”) in or out of the population.
Natural selection is not occurring.
Mutation is not occurring.
Each member of the population is equally likely to breed.
The population is infinitely large.
Fully random mating: each pair from the population is equally likely to breed. (This is not the case when females of a species often prefer males with certain traits. Examples originally identified by Darwin include peacock feather displays, antlers in deer, and the manes of lions.)4
Observation 1. As long as a population satisfies biological conditions 1–5, the allele frequencies (p and q) are the same in each generation.
Why is this so? Conditions 1 and 2 guarantee that there is no change in the allele frequencies between the birth and maturity of the next generation; there are no unaccounted forces that would change the allele frequencies (i.e., one phenotype is not more fit than the other). Conditions 3 and 4 guarantee that at birth, the pool of alleles in the next generation is the same as in the current generation; mating just reshuffles the alleles; the allele frequencies remain the same.
The population needs to be infinite to guarantee that the frequencies remain exactly p and q. The probabilities p and q represent the averages over many trials, so it will only be approximate in a particular trial on a finite population. (A change in allele frequencies can be caused by “genetic drift” or a “bottleneck.”) Of course, no population is truly infinite; therefore, condition 5 can never be strictly met. If a population is large enough, however, it is considered “effectively infinite.”5
Likewise, the other assumptions are rarely, if ever, true of a given population (e.g., the mutation rate is rarely zero). Thus, H-W eq is largely a theoretical state, like a frictionless plane, an absolute vacuum, or travel at the speed of light. As with those concepts in physics, it nevertheless plays a fundamental conceptual role in biology and is a valuable tool for understanding evolution. The H-W principle has many applications in the modern practice of evolutionary biology, where its value often lies in identifying when H-W eq does not exist and then determining which factor (or combination of factors) most likely explains the observed change in allele/phenotype frequencies over time (i.e., what the drivers of evolution are in this population). Also, for evolution-neutral mutations, the population is often close enough to equilibrium to provide a tool for comparing their frequencies against the frequencies of linked genes of interest to determine how close the latter are to H-W eq (Chen, 2010).
History & Derivation of the Hardy-Weinberg Principle
Building on the work of other biologists and mathematicians, in 1908 Wilhelm Weinberg (1862–1937), a German obstetrician-gynecologist, and G. H. Hardy (1877–1947), a leading mathematician of his day, independently demonstrated the conditions required for genotype equilibrium (Figure 1). In a famous lecture earlier that same year, R. C. Punnett (Figure 2) had combined Mendelian genetics with natural selection (Edwards, 2008). After the talk, Udny Yule (Figure 3), one of the founders of modern statistics, asked whether a dominant–recessive allele pair would not eventually achieve a 3:1 ratio (Yule, 1908). (He was apparently assuming an initial frequency of 1/2 for each allele.) In 1902, Yule had shown that genotype frequencies would remain constant under random mating in the special case of a simple Mendelian trait with only two alleles of equal frequency (p = q = 1/2), although he failed to recognize that this fact holds for all initial allele frequencies (Edwards, 2008). Punnett's (1908) response, though not entirely apt, was a suggestion that a dominant allele should eventually drive the recessive out (which is not the case). Punnett later asked his friend Hardy about this question, prompting the analysis we now describe.
The first of two contributions of Hardy and Weinberg was to remove the restriction that p = q = 1/2. Let A and a represent the two possible alleles of a simple Mendelian trait; let p and q represent the frequencies of A and a, respectively, in the parent generation. Hardy and Weinberg argued that if every pair of individuals is equally likely to mate (condition 6), then the frequencies of the three possible genotypes at birth can be determined by thinking of a Punnett square but labeling the rows and columns with allele frequencies instead of alleles (Figure 4). This Punnett square demonstrates the crucial H-W insight: Under fully random mating, the frequency of AA homozygotes in the next generation is p2, that of heterozygotes is 2pq, and that of aa homozygotes is q2.
Now we can deduce the Hardy-Weinberg Equilibrium Principle: Consider a population satisfying biological conditions 1–6. If, in a certain generation, the allele frequencies are p and q and the genotype frequencies are p2, 2pq, q2, then both the genotype and allele frequencies remain the same for as many generations as conditions 1–6 continue to hold.
Here is why this principle holds. When a population satisfies conditions 1–5, Observation 1 ensures that allele frequencies will remain unchanged in every succeeding generation that satisfies those conditions. Applying condition 6 and the crucial H-W insight, in each generation, after,6 the genotype frequencies are p2, 2pq, and q2. This is the genius of the H-W principle: after one generation of fully random mating, both the genotype and allele frequencies are fixed until one of conditions 1–6 is violated.
In any population in which all three genotypes can be identified (incomplete/codominance [e.g., red, pink, and white flowers] in 1908 or alleles identified by DNA analysis in 2008), regardless of whether the population is at H-W eq or not, Mendelian genetics allows us to determine the allele frequencies from the genotype frequencies. Namely, in any population, the frequency of the dominant allele of a Mendelian pair is the sum of 2× the number of dominant homozygotes (AA) + the number of heterozygotes (Aa) divided by the total number of these alleles.
To summarize, (1) allele frequencies can always be computed from the genotype frequencies in the same generation (if all genotypes can be identified), but not vice versa; and (2) if the population is in H-W equilibrium, genotype frequencies in the current or the next generation can be computed from the current allele frequencies.
We have not yet mentioned the “H-W equations.” In fact, Hardy and Weinberg never mentioned them! These two equations are widely used in biology teaching, but all too often they are used as a mathematics exercise that does not promote understanding.
Interestingly, using only these formulae, we can determine whether H-W eq exists in a single generation of a population by determining whether the genotype distribution matches that predicted from the allele distribution, but this requires that both the allele frequencies and the genotype frequencies are known. Hardy (1908) provided an ingenious way to determine whether H-W eq exists in a single generation, given only the genotype frequencies. (For a short account of Hardy's proof in modern language accessible to advanced students, along with several other proofs of the H-W principle, see Baldwin, 2014.)
Five Example Problems
The following textbook problems are built on the assumption that, if a population is in H-W eq (which is often a dubious assumption), then it is possible to calculate the allele frequencies from the frequency of the homozygous recessives (which can be found by observation). That is, under the assumption of H-W eq, if b percent of the population is homozygous, then q2 = b, so q = √b. And because p = 1 − q, then p = 1 − √b.
The data below demonstrate the frequency of tasters and non-tasters in an isolated population at H-W eq. The allele for non-tasters is recessive. How many of the tasters in the population are heterozygous for tasting?
|Tasters .||Non-tasters .|
|Tasters .||Non-tasters .|
Solution to Problem 1
An acceptable answer would be any number in the range of 6030–6156, depending on how the students rounded the variables in the H-W equation.
Comment on Problem 1
This is a standard H-W eq problem. The frequency of non-tasters (homozygous recessive individuals, q2) is 4325/12,560. Assuming that the population is in H-W eq, the frequency of the homozygous recessive allele (q) is computed as the square root of the frequency of homozygous individuals (q2): q = √(4325/12,560); therefore, q = 0.58 and p = 1 − q = 0.42. Then the frequency of the heterozygotes (2pq) = 2 (0.58) (0.42) = 0.49. This yields the number of heterozygotes as 0.49 × 12,560 = 6119.
Problem 2 (Trout, 2012)
“The ability to taste PTC is due to a single dominant allele T. You sampled 215 individuals and determined that 150 could detect the bitter taste of PTC and 65 could not. Calculate the following frequencies. a. The frequency of the recessive allele. b. The frequency of the dominant allele. c. The frequency of the heterozygous individuals.”
Solution to Problem 2 (from the Teacher's Guide)
The frequency of the recessive allele.
The frequency of the dominant allele.
p = 1 − q p = 0.55 p = 0.45
The frequency (x) of the heterozygous individuals is given by
2pq 2(0.45 × 0.55) 0.495
Comments on Problems 1 and 2
Both problems are focused on making calculations that students can do without understanding what H-W eq is. In Problem 1, H-W eq is explicitly assumed so that the problem is technically correct. But it doesn't say what chemical was being tasted (presumably PTC), so it doesn't ask students whether H-W eq conditions could be met for this trait. And students can solve Problem 2 only by assuming H-W eq, which is not justified.
The ability to taste PTC (phenylthiocarbamide), a bitter substance that cannot be tasted by some individuals, is frequently used in H-W eq problems, likely because it is assumed to be selected neither for nor against, given that PTC does not occur in nature. Thus, the student is expected to deduce (or more likely assume) that the H-W conditions apply. Teachers and textbooks, however, rarely make this reasoning explicit, leaving students with the misperception that understanding PTC tasting is just a game or a puzzle that likely seems unimportant to them because it doesn't relate to their daily lives.
In fact, recent research has shown that the ability to taste PTC is strongly correlated with the ability to taste other bitter substances that do occur naturally, many of which are toxins. In fact, humans have ~30 genes that code for bitter taste receptors, allowing people to taste a wide variety of bitter substances (Genetic Science Learning Center, 2014). Thus, it seems likely that the ability to taste bitter substances (such as PTC) is positively selected for.
Problem 3 (K-State Parasitology Laboratory, 2000)
Sickle-cell anemia is an interesting genetic disease. Normal homozygous individuals (SS) have normal blood cells that are easily infected with the malarial parasite. Thus, many of these individuals become very ill from the parasite and many die. Individuals homozygous for the sickle-cell trait (ss) have red blood cells that readily collapse when deoxygenated. Although malaria cannot grow in these red blood cells, individuals often die because of the genetic defect. However, individuals with the heterozygous condition (Ss) have some sickling of red blood cells, but generally not enough to cause mortality. In addition, malaria cannot survive well within these “partially defective” red blood cells. Thus, heterozygotes tend to survive better than either of the homozygous conditions. If 9% of an African population is born with a severe form of sickle-cell anemia (ss), what percentage of the population will be more resistant to malaria because they are heterozygous (Ss) for the sickle-cell gene?
Solution to Problem 3 (from the Teacher's Guide)
9% = 0.09 = ss = q2. To ﬁnd q, simply take the square root of 0.09 to get 0.3. Since p = 1 − 0.3, then p must equal 0.7. 2pq = 2(0.7 × 0.3) = 0.42 or 42% of the population are heterozygotes (carriers).
Comment on Problem 3
The solution above, which assumes H-W eq and that natural selection is not occurring with regard to this gene, contradicts the statement of the problem, which notes selective pressures for one and against another of two blood-cell phenotypes.
A more sophisticated version of this problem (Trout, 2012) states that sickle-cell disease affects ~9% of the African population and then asks the students to use the H-W equations to calculate the predicted genotype frequencies. The students are then asked, “Based on this analysis, is the African population in Hardy-Weinberg equilibrium? Justify your answer.”
Solution to Problem 4 (from the Teacher's Guide)
“No. Because the members of the population that contract sickle-cell because they are homozygous recessive will likely die before reproducing, the frequency of alleles in the population is not stable. There is natural selection taking place.”
Comments on Problem 4
Although this problem instructs students to use the H-W equations, again the known effects of natural selection at this locus mean that H-W eq is impossible. The problem therefore asks for what is, in fact, a meaningless calculation. Then it asks students to answer a question that demonstrates that the computation was meaningless but does not ask them to recognize that it was meaningless! The H-W eq-based frequencies are irrelevant. In fact, the decision of whether the population is in H-W eq is not “based on this analysis.” (Note: The response also states “contract sickle-cell.” One does not “contract” an inborn, genomic error such as sickle-cell.)
A second issue that arises in problems about sickle-cell anemia is that two opposing selective pressures are at work – a positive selection for heterozygosity and a negative selection against affected homozygotes. (Such a situation can produce balanced polymorphism equilibrium but not H-W eq, because the calculation to keep the genotype constant requires further parameters.) This makes sickle-cell anemia a poor choice for the context of most introductory-level H-W problems.
A third objection to this entire analysis of sickle-cell anemia is that H-W analysis requires addressing the idea of a single “generation.” This is a very difficult concept to apply to human populations without careful data collection, because people living at any one time can represent three or four generations.
A more conceptual shortcoming of all these problems is that there is no readily apparent value to the calculation. This is the “So what? Who cares?” question. How might such a calculation be used to answer a research question or be applied to a case that is at least interesting to the students? When students see the utility of such calculations or find the case interesting, they are more likely to engage in this learning. Our final example is a homework problem that addresses some of these issues.
People who are homozygous for (have two copies of) a certain 32-base-pair deletion mutation in a gene known as CCR5 are known to be largely resistant to HIV infection. (CCR5 is the main co-receptor molecule that allows the virus to attach to certain white blood cells and enter them, establishing an infection; Jones et al., 2011.) In a study of 1318 random Caucasians of childbearing age in the United States, 1102 individuals were found to be homozygotes free of this deletion (Glass et al., 2006). Assume that the U.S. Caucasian population is at H-W equilibrium at this locus.
Susan is a Caucasian American woman at increased risk of HIV infection because she has multiple sex partners.
What is the probability that Susan has little reason to worry about HIV infection (i.e., that she is homozygous for the deletion)?
What is the predicted frequency of U.S. Caucasians of this age who are “carriers” of the protective deletion? How many heterozygotes would be expected in this sample?
Before HIV appeared, would you have expected the population to have been at H-W equilibrium at this locus? Why or why not? State your assumptions. In the absence of effective HIV treatments, what would you expect to happen to the allele frequencies over time? How would you expect the allele frequencies to change over time once effective HIV treatment was in use? Why? How would your answer change if the HIV treatment was effective only for people past child-bearing age?
Solution to Problem 5
Let p be the frequency of the allele without the deletion: p2 = 1102/1318 = 0.836. Since we assumed H-W eq, p = √0.836 = 0.914. Because there are only two alleles, q, the frequency of the allele with the deletion, satisfies q = 1 − p = 1 − 0.914 = 0.086. q2 = 0.007, so Susan has a 0.7% chance of being at lower risk.
Again assuming H-W eq, 2pq = (0.914) (0.086) = 0.157.
0.157 × 1318 = 207
Before HIV appeared, the population was likely at H-W eq at this locus. This assumes that the deletion was selection-neutral. With the appearance of HIV and in the absence of effective treatment, we would expect that the frequency of people without the homozygous deletion (p2 and 2pq) should decrease, shifting the population out of H-W eq. The advent of an effective treatment should move the allele frequency distribution back toward equilibrium. A treatment that has an effect only after childbearing age would have effects on the allele distribution similar to no treatment because, for a mutation to have a natural selection effect, it must affect reproductive success.
Summary & Implications
Many textbook and Internet H-W eq problems have substantial shortcomings. They may
fail to focus on understanding,
be unclear about which specific gene is involved in the problem (e.g., “tasters”),
be unclear about the characteristics of the population being studied (especially size),
assume that H-W eq exists in the population without saying so explicitly,
assume that students have certain biological knowledge about the gene involved in the problem,
make assumptions that are contradicted in fact or are likely impossible,
ask for judgments about populations that are constituted by members of multiple generations,
ask for calculations that are meaningless in the given context, or
ask for solutions that have no apparent value or are not related to genuine research questions (failing the “So what?” test).
Put simply, a population is in H-W eq if the genotype frequencies are the same in each generation. This equilibrium requires a set of conditions that ensure that there are no unaccounted forces that would change the allele frequencies. For such a population, then, the genotype frequencies in the current or the next generation can be computed from the current allele frequencies. Focusing on this level of understanding for students – as well as avoiding confusing misstatements and flawed problems – is the primary key to effective teaching about Hardy-Weinberg equilibrium. Our next article will provide more practical suggestions for how to implement these ideas in introductory biology instruction.