Adapting research-driven routines to the classroom context can promote innovative and motivational learning environments. Using a case-study approach, we propose a set of bioinformatics-based activities supported by a tutorial video aiming to identify genes and disclosing their genomic context in different species. The rationale is to strengthen teachers’ competencies to introduce bioinformatics resources and tools (e.g., NCBI, ORFinder, BLAST, and MaGe) in their teaching practices. By doing so, teachers will ultimately enhance students’ understanding of how genomic data mining and comparative genomics are instrumental for biological research.
Nowadays computers have a central function in scientists’ daily routine. A personal computer connected to the web is all that it takes to access a myriad of bioinformatics resources capable of deconstructing genomic information into biologically meaningful data. Bioinformatics provides tools to comprehensively analyze and save large amounts of biological data that would be impossible to investigate without informatics-based approaches (Bloom, 2001; Madigan et al., 2018). Here, we present a series of bioinformatics activities that enable students, under the guidance of their teachers, to query an unknown DNA sequence, mimicking a real research scenario. Activities that encourage research-driven problems appear to be a stimulus to students’ interest in scientific careers (STEM), since research-inspired activities allow them to get familiar with scientific professions and the academic training required to pursue them (Kovarik et al., 2013).
In order to reconcile simple yet curriculum-oriented bioinformatics activities intended for high school students (15–17 years old) with high learning impact and didactic value, an inquiry-based scenario structured in four bioinformatics exercises was designed. Besides having a positive impact on students’ engagement and motivation (Campbell, 2003), the educational value of these activities extends to the multiple curricular exploration opportunities they offer. For instance, simply by selecting the query DNA sequences to be used, teachers can address a plethora of topics framed in the Next Generation Science Standards, such as gene regulation, evolution, and drug resistance (Moss, 1997; Brock, 1998; National Research Council, 2013; Taylor et al., 2014; Cooper, 2015; Newman et al., 2016).
The bioinformatics applications used in the exercises detailed below are open-access and web-based, with user-friendly interfaces that run in common web browsers of PC and Mac computers. Although the applications chosen are hosted in long-established web-based platforms that are widely used and currently indispensable in daily research routines, it is important to instruct students about the evolving dynamics of these bioinformatics applications, resulting from the addition of more data, the development of new resources, or the display of increasingly intuitive interfaces. A pilot trial of these bioinformatics activities was carried out in a classroom setting with the collaboration of 14 teachers from six schools and involving a total of 387 high school students (15–18 years old).
Specific learning objectives are detailed after each exercise. Through all these activities, students
strengthen their knowledge of concepts such as genome, chromosomes, genes (structural, operator, repressor, regulator, promoter), start and stop codons, and operons;
learn new concepts such as open reading frames, synteny, and comparative genomics; and
improve their computational skills and increase their digital literacy.
To adapt the bioinformatics activities to a classroom context properly integrated in the high school curricula, the exercises were designed in collaboration with the teachers who took part in the pilot trial. Taking into account teachers’ suggestions, we propose a class workflow comprising four parts (I–IV), as schematically represented in Figure 1 and detailed below. To further assist teachers in implementing the class workflow, a tutorial video detailing the four parts was produced (see the online version of the journal to view the supplemental video). The estimated times correspond to the average time required by teachers to implement the full set of activities described below with their students. Regardless of the suggested timeline, it is important to emphasize that each teacher may easily reschedule the class workflow according to their teaching agenda either by cutting one or more of the four parts or, alternatively, by stimulating the students’ discussion after each exercise.
Setting up the theoretical background (estimated time: 60 minutes): The teacher emphasizes the importance of identifying genes from a genomic sequence. Besides recalling basic concepts such as genome, chromosomes, genes (structural, operator, repressor, regulator, promotor), and operons, students are introduced to important notions, namely start and stop codons, open reading frames (ORFs), synteny, and comparative genomics.
Introduction to bioinformatics databases and tools (estimated time: 30 minutes): The teacher highlights the importance of bioinformatics by explaining the exercises and introducing students to the bioinformatics resources and tools they will use, namely NCBI database, NCBI ORF finder, NCBI BLAST, and Microscope (MaGe). The tutorial video (Supplemental Material) should help teachers in this task and assist students throughout the exercises.
Bioinformatics exercises (estimated time: 70 minutes): Students carry out the exercises autonomously with the teacher's supervision to identify difficulties and answer questions.
Discussion of the results (estimated time: 20 minutes): The class discusses the results obtained in each exercise and assay to draw conclusions. Ultimately, the teacher might challenge the students to explore other case studies and study different genomic regions. In addition, we should not neglect students’ endeavor to explore autonomously the bioinformatics resources, particularly taking into account their user-friendly and intuitive interfaces. In fact, during the pilot trial, we observed that some students took the initiative to extend their in silico experiments beyond the assigned activities by pursuing their own research queries, as, for instance: “What is the size of the genome of a spider?”; “Are virus genomes such as HIV also available at this database?”; or “Let's search for the gene coding for insulin.”
The bioinformatics-based activities described below are structured according to four distinct exercises (see the video tutorial): 1 – getting the target DNA sequence; 2 – looking for ORFs; 3 – deciding which of the retrieved ORFs are likely to be genes; and 4 – analyzing the gene(s) identified within their expected genomic context. Having in mind that laboratory-based activities should meet the curricular agenda, and acknowledging the fact that lac operon is a common example for teaching gene regulation, the query DNA sequence chosen to exemplify these exercises corresponds to lacI and flanking regions. Furthermore, to frame the bioinformatics-based activities in an inquiry-based approach, all exercises start with a guiding question.
1. Getting the DNA Sequence
This initial exercise aims to answer the question “How does one access a comprehensive gene bank database to obtain the specific DNA sequence to be studied?”
1.1. Access NCBI website: http://www.ncbi.nlm.nih.gov/.
1.2. Choose Genome in menu next to the search box.
1.3. Search by “E. coli”.
1.4. At the beginning of the new page, select Reference Genome by clicking the E. coli strain K12.
1.5. Scroll down and click on the accessing number corresponding to E. coli strain K12 in the Reference Sequence command to retrieve the full genome sequence.
1.6. Choose the FASTA format.
1.7. Open the selection box Change region shown and type down the coordinates 366001–368041.
1.8.Copy, paste and save the sequence in a Word or Notepad document.
Learning objectives. Through the exploration of the comprehensive bioinformatics database NCBI, students learn
how the database is organized, its complexity, and
how to search for DNA sequences and gene sequences for different organisms.
2. Deconstructing the DNA Sequence
This exercise was planned to instruct students how to go from an unknown DNA sequence to the identification of hypothetical coding sequences. Students are introduced to the notion of ORFs, which frequently escapes the scientific lexicon of elementary and high school biology curricula, but which is instrumental for answering the question “How is a new DNA sequence deconstructed?”
2.1. Access NCBI ORFfinder: http://www.ncbi.nlm.nih.gov/orffinder/.
2.2.Paste the sequence previously saved as Word or Notepad document into the text box provided.
2.3. Choose the genetic code: 11. Bacterial, Archaeal and Plant Plastid.
2.4. Choose the option “ATG” and alternative initiation codons.
2.5. Click Submit.
2.6. Analyze the obtained results (Figure 2).
Learning objectives. With this exercise, students
recognize the six different reading frames in a DNA sequence,
understand the meaning of ORF, and
recognize the importance of start and stop codons for identifying all possible ORFs.
3. Which ORFs Are Potential Genes?
Basic Local Alignment Search Tool (BLAST) is a powerful algorithm capable of finding similarities between a query sequence (DNA or a protein sequence) and the sequences available in databases (Altschul et al., 1990). Using this application, the students can address the following questions: “Which of the ORFs retrieved in the previous exercise are probable genes? Which ORFs are unlikely functional coding sequences?”
3.1.Select one ORF to study (example: ORF 28).
3.2. Start BLAST of the selected ORF by clicking on BLAST ORF.
3.3. Click on BLAST in the new page opened.
3.4. Identify the gene (Figure 3).
3.5.Repeat the procedure for other ORFs and analyze the results obtained.
Learning objectives. Students learn that
not all DNA sequences bracketed by a start and a stop codon (i.e., ORFs) are coding sequences,
ORFs can be located in different reading frames and oriented in either direction, and
scrutinizing gene banks by a BLAST search is an effective approach for identifying putative genes among retrieved ORFs.
Students can discuss possible scenarios to explain a BLAST search in which no similarities are found.
4. Comparative Genomics
To fully exploit the potential of this activity, the fourth exercise compares the presence of the identified gene or genes, their genomic context, and their occurrence across different taxa. Using MaGe, a robust comparative genomics platform (Vallenet et al., 2006), the students further confirm the identity and putative function of the gene(s) determined during the BLAST search. Student might ask,“Is there any evolutionary relationship to explain the occurrence of the studied genes across different taxa?”
4.1. Access MicroScope website: https://www.genoscope.cns.fr/agc/microscope/home/index.php.
4.2. Choose Escherichia coli K12 and select Load into genome browser.
4.3. To identify the gene, search for “lacI” and click Move to.
4.4. Identify lacI gene putting the mouse over each red bar.
4.5. Select options menu.
4.6. In the new window opened, look for the section Viewer Comparative Map default and choose synteny.
4.7. In the section PkGDB Organism Synteny, press the button CTRL and choose Bacillus anthracis, another Escherichia species, Salmonella bongori, Shigella sonnei, and Vibrio cholera.
4.8. Click Save options.
4.9. Compare the presence and the function of the gene in different taxa (Figure 4).
Learning objectives. Through this simple comparative genomics analysis, students learn
to localize their target gene(s) within the chromosome,
to identify the genomic features of the flanking regions,
to determine gene homologies with selected taxa, and
concepts such as synteny, homology, insertions, deletions, and horizontal gene transfer.
The pilot trial showed that Internet access was not a limitation when implementing these activities at schools. Nevertheless, teachers could easily choose to exclude one of the exercises or, alternatively, to challenge the students to carry them out as homework, and later resume the bioinformatics exercises in the classroom.
Ana Sofia Martins is supported by a fellowship from Fundação para a Ciência e Tecnologia – FCT (SFRH/BD/112038/2015). The authors are grateful to all the participant schools and school teachers for the opportunity to implement the bioinformatics exercises detailed in this work, which contributed to improving the described activity.