The central dogma of molecular biology is key to understanding the relationship between genotype and phenotype, although it remains a challenging concept to teach and learn. We describe an activity sequence that engages high school students directly in modeling the major processes of protein synthesis using the major components of translation. Students use a simple system of codes to generate paper chains, allowing them to learn why codons are three nucleotides in length, the purpose of start and stop codons, the importance of the promoter region, and how to use the genetic code. Furthermore, students actively derive solutions to the problems that cells face during translation, make connections between genotype and phenotype, and begin to recognize the results of mutations. This introductory activity can be used as an interactive means to support students as they learn the details of translation and molecular genetics.

Introduction

Research has shown that students often have difficulty understanding molecular genetics. Connecting the concepts of genes with their protein product and the protein product to phenotype has been shown to be particularly challenging for them (Rotbain et al., 2008; Reinagel et al., 2016). Reasons for such difficulty include that genetics concepts extend across multiple organizational levels (Marbach-Ad, 2001) and require students to understand that physical structures can “contain” information (Duncan & Reiser, 2007). In order to understand the concepts associated with the central dogma of molecular biology (DNA → RNA → protein) and eventually genetics, students first need to understand the relationship between DNA, mRNA, and proteins, and subsequently that between protein function and disease. It is critical they understand why the genetic code uses three consecutive nucleotides for each codon, why start and stop codons are required, the purpose of promoter regions, and how genetic mutations affect phenotype, can cause disease, and form the basis for variation (Speth et al., 2014). However, these concepts remain difficult for students to grasp.

To date, there have been a variety of suggestions for how to effectively support students' learning of molecular genetics. Many of these interventions have utilized student-centered teaching strategies, whereby learners take a more active role in the learning process. For example, some activities have involved students using computer animations to manipulate various molecular components and processes (e.g., Marbach-Ad et al., 2008; Rotbain et al., 2008) while others have engaged students in physically modeling the processes under study (e.g., Takemura & Kurabayashi, 2014; Marshall, 2017). Takemura and Kurabayashi (2014) involved students in a role-playing activity with physical props to teach transcription and translation, while Marshall (2017) engaged undergraduate genetics students in a paper-modeling activity to simulate molecular processes. These authors have contended that students should interact with the molecular entities as much as possible to best learn the complex material. “Clearly interactivity, a factor known to facilitate learning, can help overcome the difficulties of perception and comprehension” (Marbach-Ad et al., 2008, p. 287; Rotbain et al., 2008).

The activity sequence described here contributes to the growing body of interactive instructional activities to help teach the central dogma. It provides an inquiry-based, hands-on, tangible, dry-lab platform by which students can consider and learn the concepts of molecular biology such as codons, promoters, and the genetic code. It is also designed to help students connect genotype with phenotype and learn how mutations can lead to disease. This four-part activity sequence, in its entirety, helps meet two high school NGSS life standards (NGSS Lead States, 2013):

  • HS-LS3-1. Ask questions to clarify relationships about the role of DNA and chromosomes in coding the instructions for characteristic traits passed from parents to offspring.

  • HS-LS3-2. Make and defend a claim based on evidence that inheritable genetic variations may result from (1) new genetic combinations through meiosis, (2) viable errors occurring during replication, and/or (3) mutations caused by environmental factors.

When used as a whole, this activity sequence ties together several important topics foundational to molecular biology: embedded genes, translation, mutations, and genetics and disease. The sequence allows students to think through core concepts and investigate connections using hands-on tasks and assignments that guide construction of student understanding. While the sequence was designed to be implemented in its entirety, the various parts can be used separately if desired.

Activity Sequence Overview

The sequence called “Introduction to Molecular Biology Activity” is available for download in both Word and PDF formats (see below). These activity handouts guide students through the four different parts of the sequence. (For convenience, several of the important components of these handouts are also included as figures within this article.)

There are four main parts to this activity sequence that lead the students through concepts related to the genetic code (part 1), use of the genetic code (part 2), the effects of genetic mutations on the code's resulting protein product (part 3), and how this relates to human disease (part 4). The activity is flexible to fit within various course schedules, and portions of it can be assigned as homework. We generally have our students complete parts 1 and 2 in class within small groups of three or four students, with parts 3 and 4 done as homework. The homework assignments are followed by extensive discussion in class the next day. The understandings students gain from this activity are then interwoven throughout the “molecular biology” unit and can be used to reorient students to these concepts throughout the rest of the course.

Materials

General Workflow

This activity sequence can be used at the beginning of a course's molecular biology unit to introduce key concepts, and then as a foundation from which to draw as students further explore the central dogma of molecular biology and molecular genetics.

Teacher preparation prior to the activity sequence

  1. Teachers should cut strips of colored paper using 10 standard colors of construction paper (black, blue, brown, green, orange, pink, purple, red, white, and yellow). By cutting the colored strips of paper into long and short strips, 20 categories of paper strips are made (10 colors of long strips and 10 colors of short strips). The strips should be kept organized by category, and each student or group of students should be given around five of each length and color. This can be somewhat time consuming for the instructor. Thus, if time or resources are limiting, the teacher can premake a few paper chains that can be displayed to the class during the activity (see notes/potential modifications in the description for part 2 of the activity below) and have the students simply write down the chain sequences as they “translate” them. Teachers should also premake the short chain translated in part 3 of the activity and at least one “mutated” chain based on the various mutations identified in part 3 of the activity sequence. We suggest making one of the chains that would result from an insertion or deletion to demonstrate how dramatically the chain will change when the reading frame is altered. These premade chains can be stored and used in subsequent years to save repeated preparation time.

  2. Teachers should print the “Introduction to Molecular Biology Activity” handout packet, with a copy for each student or group of students.

  3. Teachers should familiarize themselves with the discussion slides and the key concepts that they address, adapting the slides as appropriate to the teacher's style and course needs.

Implementation

Part 1: Thinking about Codes

Objectives

  1. Students will devise a simple code using four shapes to convey complex information.

  2. Students will be able to explain why three consecutive shapes are required to adequately code for all 26 letters of the English alphabet, and why this results in codon degeneracy.

  3. Students will understand why other aspects beyond the letters must be considered, such as where to start, punctuation, etc.

Classroom workflow and teaching instructions

Students should be given the entire “Introduction to Molecular Biology Activity” handout packet at the beginning of class and instructed to work on part 1 (page 1) without progressing to part 2 until further instructions are given. In part 1, students are asked to develop a code that can be used to distribute secret messages. They must consider how to code for letters of the English alphabet using only four distinct shapes (triangle, star, square, and circle; Figure 1) and consider what other aspects beyond just the English letters might require coding in order to effectively communicate the secret message. Students are not given any additional information and are instructed to work in their groups to design their code. At this time, the teacher should walk around and give general guidance but allow the students to struggle with how to create a valid code.

Figure 1.

(A) The four shapes that students are to use to generate a code for standard English. This code should include each of the 26 different letters of the alphabet, punctuation, and everything required for a complete message. (B) A common strategy that students first identify is to use different lengths of “codons” for each letter. By allowing students to try this strategy and then debunk it as a viable strategy, students are better able to understand why codons must be consistent in length, ultimately understanding why codons of three nucleotides in length are required to code for all requirements of the English language. (C) Valid codes should include consistent lengths of unique sequences of three shapes encoding each letter.

Figure 1.

(A) The four shapes that students are to use to generate a code for standard English. This code should include each of the 26 different letters of the alphabet, punctuation, and everything required for a complete message. (B) A common strategy that students first identify is to use different lengths of “codons” for each letter. By allowing students to try this strategy and then debunk it as a viable strategy, students are better able to understand why codons must be consistent in length, ultimately understanding why codons of three nucleotides in length are required to code for all requirements of the English language. (C) Valid codes should include consistent lengths of unique sequences of three shapes encoding each letter.

Note: In our experience, a large portion of students will start to generate codes resembling those shown in Figure 1B, in which the four single shapes represent letters A–D, then a series of two shapes in a row represent letters D–T, followed by using three shapes in a row to represent U–Z.

After about 5–10 minutes, the teacher should lead a general discussion, asking the class about the codes they have devised. Having walked around the classroom, he or she will have an idea about each group's strategy. If a classroom atmosphere has been established in which it is fine to take a risk and be wrong, one of the groups that devised a strategy similar to that shown in Figure 1B should be asked to present their strategy, followed by discussion of the strengths and weaknesses. Another option is for the teacher to discuss how many groups came up with this type of strategy (putting the strategy on the board) and then discuss strengths and weaknesses with the class. After some brief discussion, the teacher should write a sequence of 10–20 shapes on the board and ask students to decode it using that strategy. This allows students to realize that there is no way of knowing where the “codon” for one letter stops and the next begins. For example, do three consecutive triangles represent three A's in a row? An A and an E? Or a V? Being forced to actively think through these issues and possible “solutions,” students begin to realize that the codon lengths need to be consistent.

Note: Sometimes students devise a way around the issue of different-length codons by having one of the shapes specifically separate codons (for example, a triangle indicates the end of each codon, with the other shapes used in various codon lengths to represent different letters). This is a clever way of accomplishing this task, but students should be asked to think about the limitations of this strategy, including how this further limits the code (four shapes are now basically limited to three for the purpose of coding letters), and varying codon length adds a level of complexity that can be slower and more mentally tasking to decode.

The students then continue working on their codes for an additional 5–10 minutes and, having realized that each codon needs to be a consistent length, eventually discover that the only way to code for all 26 letters of the alphabet is to use three consecutive shapes to code for each letter (Figure 1C). Codons that have a length of three shapes is the minimum consistent-length codon strategy required to code for the letters of the English alphabet (4^3 = 64) (Figure 2). By actively figuring this out, with some conceptual struggles along the way, students are primed to understand why the genetic code requires codons that are a consistent length of three nucleotides in order to code for the 20 amino acids. This also facilitates initial understanding of codon degeneracy, since we end up with 64 distinct codons by using groups of three. In order to achieve the minimum of 20 amino acid options (plus a stop codon), codons must be three nucleotides in length, which immediately results in a jump from 16 codon options (if codons were only two nucleotides in length) to 64 codon options. By nature of this requirement, we are left with extra codon options, generating the possibility of “codon degeneracy.” If desired, the teacher can briefly lead a class discussion about this topic before moving on to part 2. However, a broader discussion should wait until after students have completed part 2, which transitions from coding for language to coding for instructions used to build a different sequence, more similar to how the genetic code is used to code for amino acid sequences.

Figure 2.

Explanation for why the genetic code requires sequences of three nucleotides to create unique codons for all 20 amino acids.

Figure 2.

Explanation for why the genetic code requires sequences of three nucleotides to create unique codons for all 20 amino acids.

Part 2: Using a Code to Create a Paper Chain

Objectives

  1. Students will relate their understanding of three nucleotide codons to molecular biology.

  2. Students will use the genetic code to build chains with specific sequences based on an embedded code and, in doing so, become comfortable using the genetic code. (The code used for the paper chains is directly analogous to the genetic code; each type of paper represents a single amino acid and is encoded by the same set of codons for the equivalent amino acid.)

  3. Students will be able to explain why promoter regions and start/stop codons are necessary.

Classroom workflow and teaching instructions

In part 2, students learn how to identify where the start of a chain is located within a long embedded code, and how to use the modified “genetic code” to determine each chain's sequence. Students should be told to remove the decoder options on pages 9 and 10 of the handout packet (also shown in Figure 3B) to use them as they decode the sequence of their chains from the code of shapes provided on page 3 of the activity (Figure 3D). Initially, the teacher may want to briefly explain how to use the two decoders (Figure 3B) and point out the start and stop codons, thus connecting this to part 1, in which students have learned that any code needs to have something indicating a start and a stop point.

Figure 3.

Chain identifiers (A) are used to help students find the general location for each of the eight coded chains within the full “shape sequence” (D). This is analogous to the promoter regions that aid in RNA polymerase identification of transcription start sites. (B, C) The decoders for determining which color and size of paper is to be used for making the paper chains. These are the exact same decoders as used to decode the genetic code. The only difference is that A, C, G, and U are substituted for the four shapes, and each amino acid has been substituted with a paper chain color and size determinant. (D) Sequence of shapes that contain embedded codes for the paper chains. This is analogous to the genome, in which genes are embedded within continuous nucleotide sequences (chromosomes). The gene transcription site must be identified (identifiers; promoter regions), the translation start site is required (start codon), and then the sequence of codons must be decoded such that the paper chain (peptide chain) is created in the correct sequence from the start to the stop codon.

Figure 3.

Chain identifiers (A) are used to help students find the general location for each of the eight coded chains within the full “shape sequence” (D). This is analogous to the promoter regions that aid in RNA polymerase identification of transcription start sites. (B, C) The decoders for determining which color and size of paper is to be used for making the paper chains. These are the exact same decoders as used to decode the genetic code. The only difference is that A, C, G, and U are substituted for the four shapes, and each amino acid has been substituted with a paper chain color and size determinant. (D) Sequence of shapes that contain embedded codes for the paper chains. This is analogous to the genome, in which genes are embedded within continuous nucleotide sequences (chromosomes). The gene transcription site must be identified (identifiers; promoter regions), the translation start site is required (start codon), and then the sequence of codons must be decoded such that the paper chain (peptide chain) is created in the correct sequence from the start to the stop codon.

Embedded within the page of shapes on page 3 of the activity (Figure 3D) are eight different chain sequences, each with a different chain “identifier.” We assign one or two of the chains for each group to make. The students use the chain identifiers (Figure 3A) to find the general region near the chain's start site and then move sequentially to the right until they find the first “start codon” (depicted by the sequence ). They then use the decoders to determine the sequence of the paper chain, using consecutive groups of 3 shapes (codons) until they reach the stop codon. This usually takes students ~20 minutes (total) and may require some guidance for students to realize they must use the chain identifiers to find the correct general location, followed by locating the first subsequent start codon to identify the correct embedded code for their paper chain. The teacher should walk among the student groups to provide help as needed while the students work on constructing their chains (either physically building them or simply writing out the sequences), but best results are obtained when students are required to think about the issue and struggle with the correct solution on their own. With appropriate guidance from the teacher or from fellow students, all students seem to figure out what to do fairly quickly. Again, teachers can determine how much instruction and guidance to give the students ahead of time, based on the class level, but generally some level of thinking and struggle adds to student comprehension and retention.

In the questions associated with this section (see handout packet, p. 4), students review why the codons need to be at least three consecutive shapes, and why start and stop codons are required. After creating the chains, students are asked to reflect on the purpose of the chain identifiers. They also consider what would happen if they started making their chain using the incorrect start codon, which primes students for understanding the concept of a “reading frame.” These questions also lead into the thought questions for part 3, where the effect of mutations is addressed. At this point, the first 23 discussion slides can be used to guide a teacher-led class discussion on how this activity relates to molecular biology. These slides help students realize that they have basically just learned (1) how to use the genetic code to determine protein sequences from nucleotide genomic sequences, (2) why biology requires identifier sequences (promoter regions) in the genome for transcription and translation in order to find the correct gene sequence embedded within the human genome, and (3) why codons consist of three consecutive nucleotides. Students are also now primed to learn how nucleotide sequences within genes relate to protein sequence, and ultimately to protein fold and function. This is the start to understanding the relationship between genome and proteome, and between genotype and phenotype. The discussion will likely take 20+ minutes to adequately complete, and the teacher may want to give an additional 5 minutes for students to complete their answers to the questions associated with part 2, once the discussion is finished.

Notes/potential modifications: The code was originally designed so that the sequences of letters, when decoded, spelled out short inspirational messages. However, we found that using paper chains instead helps students better understand the relationship to molecular biology, for two reasons: (1) Exactly 20 different strips of paper can be made (10 colors with long vs. short strips for each color), which correlates perfectly with the number of amino acids, therefore allowing a perfect match to the genetic code, including amino acid codon degeneracy; and (2) by actually building a chain by sequentially linking the strips of paper together based on the code, students get a better feel for how protein chains are made by linking amino acids together. This also provides a threaded analogy for peptide bonds and protein folding whereby the teacher can show how different chain sequences can physically fold into different functional shapes based on sequence. While these advantages have continued to be important, it can be tedious to cut all those strips of paper for student groups, especially if it is a large class, and the process of students physically building the chains can take valuable class time. There is still value in the students actually making chains if time and resources permit. Alternatively, students can just write out the sequence for the chain and the eight different premade chains can be shown to the class during discussion. Those chains can be used throughout the unit to represent protein chains for discussions on folding and the relationship between sequence, structure, and function.

Overview of Parts 3 & 4

Parts 3 and 4 can be assigned as homework or can be included as further in-class activities if time permits. Depending on the level of the class, this should take students another 30–40 minutes to complete (about 15–25 minutes for part 3 and about 10–15 minutes for part 4). Students should be instructed to simply write down the sequence for chain 8, which is depicted at the top of part 3 (Figure 4 and page 4 of the handout packet), and write down the sequences of the chains that would result from the given “mutations” in the questions for part 3. Students should also be instructed to look up the various genetic conditions included in part 4 and provide the information required. After parts 3 and 4 are completed, either after the full in-class activity or at the start of the next class when students have completed the homework, slides 19–32 in the discussion slides can be used to connect genotype to phenotype and to discuss the possible changes to protein sequence that can be caused by various genetic mutations. The discussion will likely take an additional 20+ minutes to adequately complete.

Figure 4.

A series of questions designed to walk the students through determining the various consequences of alterations in the coding sequence. This allows students to actively learn about various mutation types and their consequences, including silent mutations, missense mutations, nonsense mutations, and insertion or deletion mutations.

Figure 4.

A series of questions designed to walk the students through determining the various consequences of alterations in the coding sequence. This allows students to actively learn about various mutation types and their consequences, including silent mutations, missense mutations, nonsense mutations, and insertion or deletion mutations.

Part 3: Mutations

Objectives

  1. Students will assess the effects of various types of point mutations.

  2. Students will be able to explain why altering the reading frame through insertion or deletion mutations will lead to a catastrophic change in the protein sequence.

  3. Students will begin to understand the relationship between genetic mutations (mutations in the code) and changes in the protein chain (phenotypic changes).

Classroom workflow and teaching instructions

In this section of the activity, students work with possible “mutations” in the code to determine the various potential effects. This section also helps students understand the importance of the “reading frame,” which connects directly with questions in part 2 where students have pondered and answered why it is important to find the correct start codon. The wrong start site would generate an entirely different chain, even if the faux start was within an actual gene-coding region. Within this section, students learn about silent mutations made possible because of the genetic code degeneracy, a side product of having to use sequences of three consecutive nucleotides for each codon, which subsequently yields 64 unique codons. Students also learn about missense mutations, where one link (or amino acid) of the chain is altered without affecting the rest; and nonsense mutations, which result in a premature stop codon, thus preventing the rest of the chain from being made. Finally, the catastrophic effects of insertion and deletion mutations are shown and related to their effects on the reading frame (Figures 4 and 5).

Figure 5.

Conceptual understanding of the effects of insertions and deletions on the reading frame.

Figure 5.

Conceptual understanding of the effects of insertions and deletions on the reading frame.

During discussion, the physical chain 8 coded by the original sequence should be compared with a premade chain generated by the nonsense mutation (so students can visualize the effect of a premature stop codon), and with a premade chain with the insertion or deletion mutation (so students can visualize the effects of altering the reading frame). In this way, the students can see the physical effect on chain sequence, along with potential alterations in the fold and function. Discussion slides 24–30 can then be used to relate this to genomic mutations and the subsequent effects on protein sequence and protein function. All of this leads to part 4, in which students look up information on specific proteins and diseases and relate actual gene mutations to phenotypic disease.

Part 4: The Many Roles of Protein

Objectives

  1. Students will explore the causes and consequences of three different genetic conditions.

  2. Students will further explore the relationship between genetic mutations and changes in phenotype.

Classroom workflow and teaching instructions

Students can find information on the three conditions (see handout packet pages 6–7) fairly easily online. Teachers can easily adjust the conditions that students investigate as part of the homework to relate with phenotypes or diseases that have been, or will be, discussed further in their own particular course, as long as those conditions have a clear genetic component. The debriefing and discussion should revolve around tying together the relationship between genotype and phenotype, and how the genomic sequence is the key long-term code from which the protein sequence is determined. Proteins are made and degraded repeatedly and, thus, the genetic code must be stable and remain intact, long term, for repeated decoding as new proteins are made. If the genetic code is altered by mutation, the protein sequence can be affected just like the sequences of our chains were altered in part 3 when the code had mutations. If the protein sequence is changed, it can affect fold and function, thereby causing biological issues and potential disease.

Summary

This activity sequence is designed to help students think through some of the major concepts of molecular biology's central dogma. It also connects the content of molecular biology with that of genetics by helping students understand mutations, the effects of mutations on protein sequence, and the relationship between genetic mutations, protein sequence, and disease (genotype → phenotype). As a result, this activity may help address some of the issues surrounding students' difficulties reconciling proteins and phenotypes as discussed in the literature (Marbach-Ad, 2001; Speth et al., 2014; Reinagel & Speth, 2016). By allowing students to actively work through these concepts using simple coded shapes, they build their own understanding and actively refute many of their own inherent misunderstandings. After working through these tasks, it is an easy conceptual jump for the students to think of the single-letter representations of nucleotides (A, C, G, and T) as a code similar to the shapes used in this activity. The letters represent actual chemical structures, which are as meaningful a code to the cellular machinery as the shapes are to us in deciphering the code. This connection is made within the accompanying discussion slides available for download. Students also gain an introductory understanding of the role of promoter regions and understand that genes are embedded within longer sequences that make up our entire genome.

As with all models, this model has some limitations. Transcription (DNA → RNA) has been largely ignored in order to focus more directly on translation and key aspects of turning the genetic code into protein. Teachers should be aware that this results in having an RNA “genome,” since the code (representing the genome) is directly decoded into a chain (representing protein) using a version of the genetic code, thus using uracil instead of thymine. In our experience, this has not been an issue as students tend to understand transcription fairly well and easily accept the minor alterations that were made for simplicity's sake. Having extensively used the genetic code during this activity, students tend to have no problem making the transition from uracil to thymine, or learning the intermediate step of transcription that was ignored in this activity. When used as an introduction, this activity sequence provides a central lattice on which student understanding of concepts central to molecular biology and molecular genetics can be built.

References

References
Duncan, R.G. & Reiser, B.J. (
2007
).
Reasoning across ontologically distinct levels: students' understandings of molecular genetics
.
Journal of Research in Science Teaching
,
44
,
938
959
.
Marbach-Ad, G. (
2001
).
Attempting to break the code in student comprehension of genetic concepts
.
Journal of Biological Education
,
35
,
183
189
.
Marbach-Ad, G., Rotbain, Y. & Stavy, R. (
2008
).
Using computer animation and illustration activities to improve high school students' achievement in molecular genetics
.
Journal of Research in Science Teaching
,
45
,
273
292
.
Marshall, P.A. (
2017
).
A hands-on activity to demonstrate the central dogma of molecular biology via a simulated VDJ recombination activity
.
Journal of Microbiology & Biology Education
,
18
(
2
).
NGSS Lead States
(
2013
).
Next Generation Science Standards: For States, By States
(
HS-LS3-1 and HS-LS3-2
).
Washington, DC
:
National Academies Press
.
Reinagel, A. & Bray Speth, E. (
2016
).
Beyond the central dogma: model-based learning of how genes determine phenotypes
.
CBE–Life Sciences Education
,
15
,
ar4
.
Rotbain, Y., Marbach-Ad, G. & Stavy, R. (
2008
).
Using a computer animation to teach high school molecular biology
.
Journal of Science Education and Technology
,
17
,
49
58
.
Speth, E.B., Shaw, N., Momsen, J., Reinagel, A., Le, P., Taqieddin, R. & Long, T. (
2014
).
Introductory biology students' conceptual models and explanations of the origin of variation
.
CBE–Life Sciences Education
,
13
,
529
539
.
Takemura, M. & Kurabayashi, M. (
2014
).
Using analogy role-play activity in an undergraduate biology classroom to show central dogma revision
.
Biochemistry and Molecular Biology Education
,
42
,
351
356
.