Phylogenetics has a central role in the biological sciences. We suggest a hands-on exercise to demonstrate the task of character coding and its importance in phylogenetic systematics. This exercise is appropriate for undergraduate students in life sciences and related courses. The teacher must provide a single group of masks in which color patterns, textures, and formats provide the characters to fill the data matrix. (The masks could be replaced by a set of other complex objects.) In this case, because there is no actual phylogeny, students will not be concerned with recovering the correct topology. Character coding is the aim of the exercise. After the character matrix is completed, a phylogenetic tree is drawn and the students interpret the evolution of a single character, starting from the common ancestor, based on the topological pattern of the tree and on the data matrix. In sequence, the students name and provide a full diagnosis for the group of masks as revealed by the topological pattern. The comparison between group results is also educational: there will be some common patterns between trees, but others will differ as in biological systematics.
In 1857, two years before the first edition of On the Origin of Species was published, Charles Darwin wrote, in a letter to T. H. Huxley: “The time will come I believe, though I shall not live to see it, when we shall have fairly true genealogical trees of each great kingdom in nature” (https://www.darwinproject.ac.uk/letter/entry-2143). More than 150 years later, Darwin's dream is still far from our daily routine. Nevertheless, we have come very far since the days when a phylogenetic tree was built on the basis of a few morphological attributes.
Since the early 1980s, the availability of molecular sequences and even entire genomes has changed the scenario for phylogenetic reconstruction. The large amount of sequence data helped to disclose unprecedented resolution into major radiations of life. Examples include the three domains of life (Woese, 1977), the superphyla of invertebrates Ecdysozoa and Lophotrochozoa (Aguinaldo et al., 1997), and the great divisions of mammals (O'Leary et al., 2013) and birds (Jarvis et al., 2014). In the life sciences, phylogenetic trees today play a pivotal role (de Queiroz & Gauthier, 1992; Meisel, 2010), enabling us to understand the distribution of characteristics and phenomena in biological diversity (Eldredge, 2010).
In contrast to its central role, however, phylogenetics is one area in which students tend to believe that they have a complete understanding before they actually do (Sandvik, 2008; Kampourakis & Minelli, 2014; Yates & Marek, 2014). Core concepts in tree-thinking involve an abstract-oriented mind (Halverson, 2011) that is not common among students in the life sciences (Young et al., 2013). Additionally, students require a strong background in both cellular and molecular biology to fully comprehend phylogenetics (White et al., 2013). A limited grasp of the phylogenetic perspective is not restricted to undergraduate students; it is also pervasive among professionals in the field. An intuitive understanding of phylogenetics is usually acquired only after many years of practice (Meisel, 2010; Halverson, 2011; Rigato & Minelli, 2013).
Low-cost, hands-on educational tools are fundamental for grasping abstract concepts such as phylogenetics. A large number of phylogenetic educational resources are currently available (e.g., Tuimala, 2006; Lents et al., 2010; McCabe, 2014), many of which emerged as a result of initial teaching exercises focused on ingenious systems of artificial organisms such as the “Caminalcules” (Sokal, 1983a, b, c, d). These artificial organisms were developed according to evolutionary principles; they have a fossil record and a correct phylogeny to be pursued during the exercise. In exercises that rely on actual evolutionary processes, the student will tend to be concerned about finding the correct phylogeny.
Our exercise uses real Chinese masks (Figure 1) to teach students the core of systematic practice: the coding of characters, which are the fundamental units of phenotypic evolution (Schwenk, 2001). By using objects that are not chronologically derived, our approach allows students to be fully open-minded about interpreting the characters in their own terms. This situation mirrors real life, in which characters and actual phylogenies are unknown when the researcher starts coding characters in biological groups.
Although other phylogenetic exercises not based on actual organisms are now available, including some interesting ones that use twigs (Flinn, 2015), pipe cleaners (Halverson, 2010), or chocolate bars (Burks & Boles, 2007), the main focus of those is building the phylogenetic tree, rather than the character-coding process. Furthermore, the complex nature of the masks’ shapes, textures, and paintings prevents a single solution, such that many different topologies will turn up in class. The fact that all these topologies may be correct is another interesting aspect that can stimulate actual phylogenetic debates among the students. Student groups may defend their own choice of characters and topologies much as actual systematists do in scientific symposia. Only by using a set of nonrelated objects is all this possible. This exercise lets students use their intuition and judgment to make the character-coding decisions that are the major challenges that systematists face (Wiens, 2001). Other groups of complex, tridimensional objects could also be used, such as stuffed objects, dragons, or mythological creatures. Other nonbiologically related objects could be used as well, such as the contents of pencil cases.
Basic evolutionary biology concepts, such as those listed below, must be introduced before the exercise (see also Figure 2).
The evolutionary process generates nested hierarchical relationships between reproductively isolated lineages, and these relationships are best portrayed in a phylogenetic tree (Gregory, 2008). The phenotypic and genetic characters that two lineages share are directly dependent on the time since they were connected – that is, since their exclusive common ancestor (Wiley, 1981).
All living and fossil organisms are connected by a single phylogenetic tree (Hennig, 1966) that would be drawn from the origin of life (when life was a single species), having differentiated into the millions of living species (Sandvik, 2008). However, small subsets of this universal phylogeny are also informative, and the relationships between a subset of lineages tend to mirror those depicted in the universal tree.
The selection of taxa for a phylogenetic analysis depends on the question to be answered. The focus group (aka, ingroup) is selected on the basis of its relevance to a specific phylogenetic issue that will be addressed using the final phylogenetic tree. Additionally, one or more taxa that are closely related to the ingroup (aka, outgroup) must be included in the dataset. The topological connection between ingroup and outgroup represents a special node in the tree that is termed the root.
The outgroup allows the phylogenetic tree to be rooted – and time, in a rooted tree, flows from the root to the tips. Let us consider the possibility that a researcher is trying to find out if flying and gliding capabilities are evolutionarily related among mammals by using a phylogenetic tree. In this case, the mammals represent the ingroup that would include bats, flying squirrels, and flying lemurs and other mammalian species that do not fly or glide. Also, the dataset must contain a nonmammalian species as outgroup that would preferably be as close to mammals as possible. Thus, birds, turtles, crocodiles, and other nonmammalian amniotes would be the best choice for outgroups.
Species are the basic unit of evolution. Members of the same species evolve together, joined by gene flow that results from their ability to reproduce and homogenize their characters (O'Hara, 1994). Evolutionary novelties may reach, in future generations, descendants of all members of this lineage. Such novelties typically do not pass between lineages. If speciation occurs in the original lineage, both descendant species will inherit the novelty. In order to pass from the ancestral to its descendants, the novelty must be on the ancestral genome, which may also be reflected in the phenotype. This exercise deals with species concepts that are related to the reproductive cohesion property of species.
Speciation is the process whereby two descendant species are derived from a single ancestral species. It usually requires the genetic isolation of ancestral species lineages. This process is central because, by rupturing gene flow, it prevents the homogenization process of reproduction and enables actual genetic differentiation to take place between lineages (Wiley, 1981). At first, populations on both sides of the barrier are very similar, but they will eventually differentiate and become different species because the respective evolutionary novelties that arise in the two populations will necessarily be different.
Phylogenetic trees are the visual representation of the evolutionary relationships between lineages (Halverson, 2011). Nodes in the trees represent hypothetical exclusive common ancestors before speciation events occurred, and the tips of the trees represent extant species (Gregory, 2008). The root is a node that marks the common ancestor for all species depicted in the tree.
Phylogenetic trees can be recovered by searching for homologous characters that document evolutionary relationships (Hennig, 1966). This hands-on exercise aims to focus on this item. As the speciation process is coupled with eventual morphological and genetic variations, the phylogenetic pattern of speciation may be recovered from the analysis of morphological and genomic variation. This is the coding of characters that is the aim of this exercise. Comparing homologous parts between organisms may seem intuitive, but the common origin of a homologous set of characters must be made clear to students. To clarify this concept, the teacher can present an example, such as a comparison of the internal wing structure of chickens and bats, which shows that distinct bones and muscles make the structure for each wing type and, hence, they are not homologous. In this case the wing of the bat consists of elongated fingers, whereas in that of chickens the entire arm structures the wing. But ducks and chickens have the same basic wing structure (arms) and are homologous. The idea of homology is tightly linked to that of the origin of characters. Birds and bats have different structures for their wings because these organs did not evolve from a common winged ancestor; rather, the wings evolved independently in different lineages. By contrast, the exclusive common ancestor of chickens and ducks was a winged bird, so their wings have the same structure. One important step in the process lies in the distinction between character and character states. A character can be defined as a part of the organism that has an identity and that plays a role in a biological process (for a detailed review, see Wagner, 2001 [and other chapters in the same book]). The example of human eye color illustrates this concept well. In this case, eye color is the character, and the character states represent the range of variation in actual populations (i.e., black, brown, hazel, blue, green, and gray). For the character eye color, for instance, there are at least six states among humans. The exercise uses a set of Chinese masks to teach character coding to students.
We suggest that the eight Basic Concepts above be given to students as handouts, which can also include step-by-step instructions on informative characters for parsimony and on how to build a most parsimonious tree.
The class can be divided into groups of four students each, and a timeline for the exercise is provided (Figure 3). Each student group will need only a printed handout that presents a set of Chinese masks (Figure 1). The masks depict taxonomic units with very distinct character distributions, such as face color and beard length. If laminated, color handouts can be reused indefinitely; but for one-time use, black-and-white copies will suffice.
At the same time, the actual masks must be available for manual inspection by the students. This is an essential part of the exercise. The reason is that only by handling the actual masks will the students be able to code tactile features and other details that are not clear from the printed handouts, such as texture and weight. In this sense, the students tend to appreciate the examination of the actual masks, just as a researcher would go to several museums to observe the small details of the specimens deposited there.
In the Classroom
Students must examine all the masks at once; however, each student may analyze a particular group of characters to code (e.g., shape patterns around the eyes, color of cheek, apparent chin, ornaments). All students from the group must review each new character. Once all groups have finished the exercise, the teacher will collect and analyze the results and discuss the conclusions with the students.
Again, it's important that the actual masks be available at the teacher's table throughout the exercise, so that students have an experience similar to that of a researcher who goes to a natural history museum to examine holotype specimens. The teacher should emphasize the purpose and importance of such depository institutions for systematics (Diamond & Evans, 2007), a topic that can be further discussed and explored.
Step 1: The Character Matrix
First, the students are told to construct a character matrix with six masks (Figure 1) and 10 characters. In the matrix, each row and column should represent a different character and a mask, respectively (Figure 4). Characters must be coded after observing all of the masks (or set of objects). All characters must show variability. Looking at the set of masks in Figure 1, an obvious character would be beard color, a character with two character states: red (0) and black (1). In this case, mask 1 has no beard and, thus, this character must not be coded for this mask (see Figure 4). For all characters, each character state must be described and numbered (0, 1 for two-state characters; 0, 1, 2 for three-state characters; and so on).
Multiple characters showing exactly the same pattern across all objects must be considered a single character, and one of them must be eliminated. The nonredundancy of characters is extremely important in systematic studies. For example, if all masks with mustaches have an apparent mouth, only one of these characters (i.e., presence of mustache or apparent mouth) should be chosen. In Figure 4, six characters are listed as examples, but given the complexity of the masks, many others are available.
It is important to include some characters that will be informative to parsimony. This is a more intuitive method of phylogenetic reconstruction that is preferable when introducing phylogenetic analysis and concepts to students. In this case, a parsimony-informative character will have a minimum of two character states, each represented by at least two masks. Using these informative characters, the phylogenetic trees will have different numbers of steps, which will allow for the differentiation of phylogenetic trees using the most parsimonious criteria. A minimum of four parsimony-informative characters must be in the character matrix for each student group. When the students are coding characters, it is important to be open to new ideas. The students should be making their own decisions. As long as the character is sound and has a straightforward description, new characters are welcome.
It is also important to let the students determine any special issues regarding the masks. For example, if a mask is cracked or the paint is faded, students may ask their teacher for direction. However, the decision should be theirs alone as they play the role of mask specialists in this activity. They must interpret the conditions and character states and take responsibility for their decisions. In practice, a data matrix with more characters is better, but 10 characters will provide a sufficient character matrix for the exercise.
Step 2: The Phylogenetic Tree
Once the 10 characters are coherently described, the teacher must select four masks to be used for the remainder of the activity so that the results are comparable among all student groups. With four masks (or objects), only three phylogenetic trees are possible and the students may derive the phylogenetic tree themselves by hand (Figure 5A). Using five masks, 15 trees are possible because the fifth mask may be connected to five different locations in each of the three trees generated using four masks. Using six masks, >100 trees are possible. In this particular case, the exclusion of two random masks will reveal that some of the characters initially chosen by the students may become noninformative characters for parsimony analyses, while other characters will remain informative.
For this initial phylogenetic analysis, the students should ideally derive the most parsimonious tree. They should map the character states on the three alternative phylogenetic trees and count the number of (mutation) steps for all characters for each tree. Students should select the tree that requires the minimum number of changes, considering all characters. In this case, all characters that are noninformative to parsimony will present the same number of changes in all trees.
An example will clearly illustrate this process. If a single mask has a second pair of eyes and all others have only one pair, the character number of eyes is noninformative for parsimony because all trees will require a single change on the branch leading to the mask with two pairs of eyes. Beard color is a parsimony-informative character because the three possible trees require different numbers of steps to reflect a change in character state. Examining this character enables the selection of a tree according to the parsimonious criterion of least change. In this example, tree 1 requires only a single change (mutation step) because masks sharing the same character state are on the same side of the unrooted tree (Figure 5B). By contrast, trees 2 and 3 require a minimum of two changes to explain the same character. Therefore, for this character, tree 1 is the most parsimonious tree. After analyzing all characters, the total number of required changes must be counted for each possible tree. Finally, the tree that requires the minimum number of changes is the one that best explains the character matrix.
Step 3: Rooting the Tree
Once each student group has chosen the most parsimonious tree, the teacher must reveal which of the four masks (or objects) represents the outgroup. The outgroup will root the tree, thus enabling the students to polarize the character changes along the tree and indicating which character states were altered since the shared common ancestor with the outgroup. Furthermore, this procedure adds a time-frame to the branching patterns of the unrooted tree.
Once all student groups have a rooted tree, the analysis can begin. At this point, the tree can be interpreted historically – that is, the branching pattern indicates the relative timing of the speciation events. It is also critical that the students are familiarized with the appropriate terminology. Terms such as higher species, primitive taxon, main and side branches, and basal lineages represent common misconceptions that, among many others, can be corrected as the discussion progresses (for a review of common phylogenetic misconceptions, see Gregory, 2008).
To examine whether the terminology is correct, the teacher can ask each student to select a single character, map the state changes on the tree (Figure 4), and explain the evolution. To do this, the student must first determine the character state of the common ancestor by parsimony, by selecting the state that demands the lowest number of changes on the tree. After this, the student has to explain all the alternative character states guided by the tree topology to determine the ancestors that went through character state changes.
Step 4: Phylogenetic Systematics
The teacher can then point to the outgroup mask (or object), allowing the students to root their tree. Once a rooted tree has been reconstructed, the students can designate taxonomic names to the tree branches in a phylogenetic systematics approach. This is the final step of the exercise. It will be an even more interesting exercise if the students are encouraged to propose a formal description and a name for the hypothetical taxon based on the shared characteristics of the masks (or objects) in the tree branch. The point would be asking the students to imagine the ancestral species with those diagnostic characteristics before the speciation events that originated the mask diversity.
As in all species with sexual reproduction, members of the ancestral species would be constantly mixing and homogenizing their genomes through breeding. Thus, before speciation, members of the ancestral species shared the entire genome (Mello & Russo, 2011). From a phylogenetic systematics perspective, names indicate monophyletic groups, a diversity that was once a single species sharing the genome through homogenization by reproduction. It follows that this descendant group of species (tree branch) share more than what we know they share, more than their diagnostic characteristics. Indeed, these descendants share many genomic features that have not been discovered yet. If they are a monophyletic group, these descendant species share extra introns, exons that have been lost, and gene duplications that will remain undisclosed until more detailed analyses are available for all diversity. These shared features would have been in the ancestor genome before the first speciation events. The age of the common ancestor is directly related to the similarity between their genomes.
After the exercise with the masks is completed, it would be interesting to get students to go through follow-up exercises using these guidelines. One possible follow-up would be the building of an actual phylogenetic tree using a real set of biological organisms. Working with real organisms may spark the students’ curiosity in the biological sciences. Another possibility would be a follow-up using a hypothetical sequence alignment for the masks that would allow the students to compare phylogenetic trees derived from morphological and molecular sets of characters.
Final Remarks & Directions for Discussion
This exercise provides an opportunity for class discussion on broader topics than phylogenetics, at the teacher's discretion. Some examples of these topics are given below.
Models and actual data. By comparing students’ experience of acquiring knowledge and practical skills by analyzing the artificial masks (the exercise) and a set of real organisms (the follow-up), the teacher will have an opportunity to consider the importance of having both experiences.
Controversies and agreement in science. Students may notice that their trees differ among groups, but they also may share some equivalent nodes. This is representative of the systematic (and scientific) practice; professionals will often agree in some aspects of their analysis but differ in opinion in others.
New knowledge requires old knowledge. This exercise is a good opportunity to demonstrate that previous (phylogenetic) knowledge is necessary to acquire more (phylogenetic) knowledge. At first, it may seem contradictory to the students, but it is important that they realize that new scientific knowledge is built upon old knowledge. To reconstruct a phylogenetic tree for a group of organisms (ingroup), it is crucial to have the background knowledge that informs an accurate selection of the outgroup.
Not a single solution. Published phylogenies are only estimates of the true unknown phylogeny for most organisms (but see Lambret-Frotté et al., 2012). This exercise introduces the idea that many trees may be correct. The systematic eye for characters will vary at the groups’ discretion, and thus the character matrix is expected to vary. This exercise demonstrates the important educational and scientific issue that questions will not always have a single correct response, as students may think.
Building biological knowledge. In essence, biological knowledge is the association of a taxon name and a biological pattern, such as “Birds have feathers.” Thus, to associate a pattern (presence of feathers) with a particular group (birds), biological knowledge requires monophyletic groups. All birds have feathers because they represent a monophyletic group of a set of descendant species that have a common ancestor that had feathers. It also follows that new shared features will be much more likely to be discovered by analyzing members of a monophyletic group. On the other hand, nonmonophyletic groups that were never a single breeding species share only the features that originally defined the group; no future analysis will find more shared characteristics, because they were never homogenizing through reproduction. Therefore, the phylogenetic perspective of diversity is a pivotal component to build biological knowledge.
We thank all the students who, over the years, went over this practice, built our expertise, and signaled the success of the exercise as an educational tool. This study was funded by the Rio de Janeiro State Research Foundation (FAPERJ) and the National Research Council (CNPq) in Brazil.