Adapting research-driven routines to the classroom context can promote innovative and motivational learning environments. Using a case-study approach, we propose a set of bioinformatics-based activities supported by a tutorial video aiming to identify genes and disclosing their genomic context in different species. The rationale is to strengthen teachers’ competencies to introduce bioinformatics resources and tools (e.g., NCBI, ORFinder, BLAST, and MaGe) in their teaching practices. By doing so, teachers will ultimately enhance students’ understanding of how genomic data mining and comparative genomics are instrumental for biological research.

Introduction

Nowadays computers have a central function in scientists’ daily routine. A personal computer connected to the web is all that it takes to access a myriad of bioinformatics resources capable of deconstructing genomic information into biologically meaningful data. Bioinformatics provides tools to comprehensively analyze and save large amounts of biological data that would be impossible to investigate without informatics-based approaches (Bloom, 2001; Madigan et al., 2018). Here, we present a series of bioinformatics activities that enable students, under the guidance of their teachers, to query an unknown DNA sequence, mimicking a real research scenario. Activities that encourage research-driven problems appear to be a stimulus to students’ interest in scientific careers (STEM), since research-inspired activities allow them to get familiar with scientific professions and the academic training required to pursue them (Kovarik et al., 2013).

In order to reconcile simple yet curriculum-oriented bioinformatics activities intended for high school students (15–17 years old) with high learning impact and didactic value, an inquiry-based scenario structured in four bioinformatics exercises was designed. Besides having a positive impact on students’ engagement and motivation (Campbell, 2003), the educational value of these activities extends to the multiple curricular exploration opportunities they offer. For instance, simply by selecting the query DNA sequences to be used, teachers can address a plethora of topics framed in the Next Generation Science Standards, such as gene regulation, evolution, and drug resistance (Moss, 1997; Brock, 1998; National Research Council, 2013; Taylor et al., 2014; Cooper, 2015; Newman et al., 2016).

Online Resources

The bioinformatics applications used in the exercises detailed below are open-access and web-based, with user-friendly interfaces that run in common web browsers of PC and Mac computers. Although the applications chosen are hosted in long-established web-based platforms that are widely used and currently indispensable in daily research routines, it is important to instruct students about the evolving dynamics of these bioinformatics applications, resulting from the addition of more data, the development of new resources, or the display of increasingly intuitive interfaces. A pilot trial of these bioinformatics activities was carried out in a classroom setting with the collaboration of 14 teachers from six schools and involving a total of 387 high school students (15–18 years old).

Learning Objectives

Specific learning objectives are detailed after each exercise. Through all these activities, students

  • strengthen their knowledge of concepts such as genome, chromosomes, genes (structural, operator, repressor, regulator, promoter), start and stop codons, and operons;

  • learn new concepts such as open reading frames, synteny, and comparative genomics; and

  • improve their computational skills and increase their digital literacy.

Class Workflow

To adapt the bioinformatics activities to a classroom context properly integrated in the high school curricula, the exercises were designed in collaboration with the teachers who took part in the pilot trial. Taking into account teachers’ suggestions, we propose a class workflow comprising four parts (I–IV), as schematically represented in Figure 1 and detailed below. To further assist teachers in implementing the class workflow, a tutorial video detailing the four parts was produced (see the online version of the journal to view the supplemental video). The estimated times correspond to the average time required by teachers to implement the full set of activities described below with their students. Regardless of the suggested timeline, it is important to emphasize that each teacher may easily reschedule the class workflow according to their teaching agenda either by cutting one or more of the four parts or, alternatively, by stimulating the students’ discussion after each exercise.

  • Setting up the theoretical background (estimated time: 60 minutes): The teacher emphasizes the importance of identifying genes from a genomic sequence. Besides recalling basic concepts such as genome, chromosomes, genes (structural, operator, repressor, regulator, promotor), and operons, students are introduced to important notions, namely start and stop codons, open reading frames (ORFs), synteny, and comparative genomics.

  • Introduction to bioinformatics databases and tools (estimated time: 30 minutes): The teacher highlights the importance of bioinformatics by explaining the exercises and introducing students to the bioinformatics resources and tools they will use, namely NCBI database, NCBI ORF finder, NCBI BLAST, and Microscope (MaGe). The tutorial video (Supplemental Material) should help teachers in this task and assist students throughout the exercises.

  • Bioinformatics exercises (estimated time: 70 minutes): Students carry out the exercises autonomously with the teacher's supervision to identify difficulties and answer questions.

  • Discussion of the results (estimated time: 20 minutes): The class discusses the results obtained in each exercise and assay to draw conclusions. Ultimately, the teacher might challenge the students to explore other case studies and study different genomic regions. In addition, we should not neglect students’ endeavor to explore autonomously the bioinformatics resources, particularly taking into account their user-friendly and intuitive interfaces. In fact, during the pilot trial, we observed that some students took the initiative to extend their in silico experiments beyond the assigned activities by pursuing their own research queries, as, for instance: “What is the size of the genome of a spider?”; “Are virus genomes such as HIV also available at this database?”; or “Let's search for the gene coding for insulin.”

Figure 1.

Proposed class workflow and timeline, taking into account the feedback of 14 inservice teachers who implemented the exercises in their high school classes as a pilot trial.

Figure 1.

Proposed class workflow and timeline, taking into account the feedback of 14 inservice teachers who implemented the exercises in their high school classes as a pilot trial.

Bioinformatics Exercises

The bioinformatics-based activities described below are structured according to four distinct exercises (see the video tutorial): 1 – getting the target DNA sequence; 2 – looking for ORFs; 3 – deciding which of the retrieved ORFs are likely to be genes; and 4 – analyzing the gene(s) identified within their expected genomic context. Having in mind that laboratory-based activities should meet the curricular agenda, and acknowledging the fact that lac operon is a common example for teaching gene regulation, the query DNA sequence chosen to exemplify these exercises corresponds to lacI and flanking regions. Furthermore, to frame the bioinformatics-based activities in an inquiry-based approach, all exercises start with a guiding question.

1. Getting the DNA Sequence

This initial exercise aims to answer the question “How does one access a comprehensive gene bank database to obtain the specific DNA sequence to be studied?”

  • 1.1. Access NCBI website: http://www.ncbi.nlm.nih.gov/.

  • 1.2. Choose Genome in menu next to the search box.

  • 1.3. Search by E. coli.

  • 1.4. At the beginning of the new page, select Reference Genome by clicking the E. coli strain K12.

  • 1.5. Scroll down and click on the accessing number corresponding to E. coli strain K12 in the Reference Sequence command to retrieve the full genome sequence.

  • 1.6. Choose the FASTA format.

  • 1.7. Open the selection box Change region shown and type down the coordinates 366001–368041.

  • 1.8.Copy, paste and save the sequence in a Word or Notepad document.

Learning objectives. Through the exploration of the comprehensive bioinformatics database NCBI, students learn

  • how the database is organized, its complexity, and

  • how to search for DNA sequences and gene sequences for different organisms.

2. Deconstructing the DNA Sequence

This exercise was planned to instruct students how to go from an unknown DNA sequence to the identification of hypothetical coding sequences. Students are introduced to the notion of ORFs, which frequently escapes the scientific lexicon of elementary and high school biology curricula, but which is instrumental for answering the question “How is a new DNA sequence deconstructed?”

  • 2.1. Access NCBI ORFfinder: http://www.ncbi.nlm.nih.gov/orffinder/.

  • 2.2.Paste the sequence previously saved as Word or Notepad document into the text box provided.

  • 2.3. Choose the genetic code: 11. Bacterial, Archaeal and Plant Plastid.

  • 2.4. Choose the option “ATG” and alternative initiation codons.

  • 2.5. Click Submit.

  • 2.6. Analyze the obtained results (Figure 2).

Figure 2.

ORFfinder output at NCBI, disclosing all possible open reading frames (ORFs) and their direction within the query DNA sequence. In addition to the graphic view, details such as ORF coordinates, length, strand, and frame are highlighted in the table below. By selecting each ORF, it is possible to obtain the translated aminoacid sequence. Immediate BLAST of each ORF may be executed with the command BLAST ORF.

Figure 2.

ORFfinder output at NCBI, disclosing all possible open reading frames (ORFs) and their direction within the query DNA sequence. In addition to the graphic view, details such as ORF coordinates, length, strand, and frame are highlighted in the table below. By selecting each ORF, it is possible to obtain the translated aminoacid sequence. Immediate BLAST of each ORF may be executed with the command BLAST ORF.

Learning objectives. With this exercise, students

  • recognize the six different reading frames in a DNA sequence,

  • understand the meaning of ORF, and

  • recognize the importance of start and stop codons for identifying all possible ORFs.

3. Which ORFs Are Potential Genes?

Basic Local Alignment Search Tool (BLAST) is a powerful algorithm capable of finding similarities between a query sequence (DNA or a protein sequence) and the sequences available in databases (Altschul et al., 1990). Using this application, the students can address the following questions: “Which of the ORFs retrieved in the previous exercise are probable genes? Which ORFs are unlikely functional coding sequences?”

  • 3.1.Select one ORF to study (example: ORF 28).

  • 3.2. Start BLAST of the selected ORF by clicking on BLAST ORF.

  • 3.3. Click on BLAST in the new page opened.

  • 3.4. Identify the gene (Figure 3).

  • 3.5.Repeat the procedure for other ORFs and analyze the results obtained.

Figure 3.

BLAST output at NCBI, highlighting the similarity scores between the query ORF and the 100 best BLAST hits retrieved in the database. Clicking over a line displays the alignment between the query sequence and the subject sequence, allowing identification of differences between the two sequences.

Figure 3.

BLAST output at NCBI, highlighting the similarity scores between the query ORF and the 100 best BLAST hits retrieved in the database. Clicking over a line displays the alignment between the query sequence and the subject sequence, allowing identification of differences between the two sequences.

Learning objectives. Students learn that

  • not all DNA sequences bracketed by a start and a stop codon (i.e., ORFs) are coding sequences,

  • ORFs can be located in different reading frames and oriented in either direction, and

  • scrutinizing gene banks by a BLAST search is an effective approach for identifying putative genes among retrieved ORFs.

Students can discuss possible scenarios to explain a BLAST search in which no similarities are found.

4. Comparative Genomics

To fully exploit the potential of this activity, the fourth exercise compares the presence of the identified gene or genes, their genomic context, and their occurrence across different taxa. Using MaGe, a robust comparative genomics platform (Vallenet et al., 2006), the students further confirm the identity and putative function of the gene(s) determined during the BLAST search. Student might ask,“Is there any evolutionary relationship to explain the occurrence of the studied genes across different taxa?”

  • 4.1. Access MicroScope website: https://www.genoscope.cns.fr/agc/microscope/home/index.php.

  • 4.2. Choose Escherichia coli K12 and select Load into genome browser.

  • 4.3. To identify the gene, search for “lacI” and click Move to.

  • 4.4. Identify lacI gene putting the mouse over each red bar.

  • 4.5. Select options menu.

  • 4.6. In the new window opened, look for the section Viewer Comparative Map default and choose synteny.

  • 4.7. In the section PkGDB Organism Synteny, press the button CTRL and choose Bacillus anthracis, another Escherichia species, Salmonella bongori, Shigella sonnei, and Vibrio cholera.

  • 4.8. Click Save options.

  • 4.9. Compare the presence and the function of the gene in different taxa (Figure 4).

Figure 4.

Comparative genomics analysis carried out using MaGe. The genes and corresponding reading frames (+3, +2, +1, −1, −2, −3) of the query genes are shown at the top. Below is an outline of other bacteria with which the query gene(s) are being compared.

Figure 4.

Comparative genomics analysis carried out using MaGe. The genes and corresponding reading frames (+3, +2, +1, −1, −2, −3) of the query genes are shown at the top. Below is an outline of other bacteria with which the query gene(s) are being compared.

Learning objectives. Through this simple comparative genomics analysis, students learn

  • to localize their target gene(s) within the chromosome,

  • to identify the genomic features of the flanking regions,

  • to determine gene homologies with selected taxa, and

  • concepts such as synteny, homology, insertions, deletions, and horizontal gene transfer.

Additional Remarks

The pilot trial showed that Internet access was not a limitation when implementing these activities at schools. Nevertheless, teachers could easily choose to exclude one of the exercises or, alternatively, to challenge the students to carry them out as homework, and later resume the bioinformatics exercises in the classroom.

Ana Sofia Martins is supported by a fellowship from Fundação para a Ciência e Tecnologia – FCT (SFRH/BD/112038/2015). The authors are grateful to all the participant schools and school teachers for the opportunity to implement the bioinformatics exercises detailed in this work, which contributed to improving the described activity.

References

References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (
1990
).
Basic local alignment search tool
.
Journal of Molecular Biology
,
215
,
403
410
.
Bloom, M. (
2001
).
Biology in silico: the bioinformatics revolution
.
The American Biology Teacher
,
63
,
397
403
.
Brock, D.L. (
1998
).
Now you see it, now you don't! Making regulation of gene expression come alive for all students
.
The American Biology Teacher
,
60
,
288
290
.
Campbell, A.M. (
2003
).
Public access for teaching genomics, proteomics, and bioinformatics
.
Cell Biology Education
,
2
,
98
111
.
Cooper, R.A. (
2015
).
Teaching the big ideas of biology with operon models
.
The American Biology Teacher
,
77
,
30
39
.
Kovarik, D.N., Patterson, D.G., Cohen, C., Sanders, E.A., Peterson, K.A., Porter, S.G. & Chowning, J.T. (
2013
).
Bioinformatics education in high school: implications for promoting science, technology, engineering, and mathematics careers
.
CBE Life Sciences Education
,
12
,
441
459
.
Madigan, M., Bender, K., Buckley, D., Sattley, W. & Stahl, D. (
2018
).
Brock: Biology of Microorganisms, 15th Ed
.
San Francisco, CA
:
Pearson Education/Benjamin Cummings
.
Moss, R. (
1997
).
A discovery lab for studying gene regulation
.
The American Biology Teacher
,
59
,
522
526
.
National Research Council
(
2013
).
Next Generation Science Standards: For States, By States
.
Washington, DC
:
National Academies Press
.
Newman, L., Duffus, A.L.J. & Lee, C. (
2016
).
Using the free program MEGA to build phylogenetic trees from molecular data
.
The American Biology Teacher
,
78
,
608
612
.
Taylor, J.M., Davidson, R.M. & Strong, M. (
2014
).
Drug-resistant tuberculosis: a genetic analysis using online bioinformatics tools
.
The American Biology Teacher
,
76
,
386
394
.
Vallenet, D., Labarre, L., Rouy, Z., Barbe, V., Bocs, S., Cruveiller, S. & Médigue, C. (
2006
).
MaGe: a microbial genome annotation system supported by synteny results
.
Nucleic Acids Research
,
34
,
53
65
.

Supplementary data