Open-source data are information provided free online. It is gaining popularity in science research, especially for modeling species distribution. MaxEnt is an open-source software that models using presence-only data and environmental variables. These variables can also be found online and are generally free. Using all of these open-source data and tools makes species distribution modeling (SDM) more accessible. With the rapid changes our planet is undergoing, SDM helps understand future habitat suitability for species. Due to increasing interest in biogeographic research, SDM has increased for marine species, which were previously not commonly found in this modeling. Here we provide examples of where to obtain the data and how the modeling can be performed and taught.

Introduction

Species distribution modeling (SDM) takes occurrence data, from observation or abundant datasets, and combines it with environmental variables to provide insight to species spatial distribution (Elith & Leathwick, 2009; Senay et al., 2013). SDM provides great potential for conservation planning and management (Elith et al., 2006; Fourcase et al., 2014; Marshall et al., 2014), especially for marine environments (Stirling et al., 2016). Over the years, SDM has promoted conservation by mapping regional and world patterns such as dispersal, change in habitat suitability, and potential extinctions (Sinclair et al., 2010), and forecasting species distribution (Araújo & Guisan, 2006; Elith et al., 2006; Guisan et al., 2013). A correlative SDM uses occurrence data and environmental variables to model species habitat suitability (Jarnevich et al., 2015). In recent years, interest in modeling marine species has increased (Redfern et al., 2006; Valavanis et al., 2008), although modeling of marine mammals is more common than other species (Robinson et al., 2011).

Maximum Entropy (MaxEnt) is a statistical procedure that allows the least biased predictions of probability when it comes to distributions and distribution patterns (Harte & Newman, 2014). An open-source software is available online, also called MaxEnt. It effectively handles complex functions and most accurately predicts habitat suitability (Sinclair et al., 2010; Fourcase et al., 2014; Porfirio et al., 2014). It generates predicted habitat suitability maps, current and future, using known or presence-only species occurrences with environmental variables (Fourcase, 2016). Both the presence-only and the environmental variables data can be found online through numerous websites dedicated to creating and updating this data. (For more specific details about MaxEnt, see Phillips et al., 2006, and Elith et al., 2011).

The overall goal of an activity using this information is to have students use technology to increase their awareness and understanding of habitat suitability changes and the effects on threatened species, and to increase their critical thinking about conservation methods to prevent future extinctions. The case study provided here was designed to meet biology standards; however, it can be implemented in environmental sciences, ocean sciences, geosciences, and geography classes. It covers many of the key concepts and practices suggested by the AP Biology curriculum (Table 1) and the Next Generation Science Standards (Table 2).

Table 1.
Connection to the AP Biology curriculum.

These are the relevant essential knowledge and science practices from the AP Biology Curriculum framework that are covered by this activity. After completing this activity, students should know the topics listed here. (All information obtained from College Board, 2015).

Essential Knowledge  
4.B.3 Distribution of local and global ecosystems changes over time. 
Science Practice  
The student can use representations and models to communicate scientific phenomena and solve scientific problems. 
The student can plan and implement data collection strategies appropriate to a particular scientific question. 
The student can perform data analysis and evaluate evidence. 
Essential Knowledge  
4.B.3 Distribution of local and global ecosystems changes over time. 
Science Practice  
The student can use representations and models to communicate scientific phenomena and solve scientific problems. 
The student can plan and implement data collection strategies appropriate to a particular scientific question. 
The student can perform data analysis and evaluate evidence. 
Table 2.
Connection to Next Generation Science Standards.

Below are the key concepts high school students will know after completing this activity. (All information obtained from National Research Council, 2012).

DisciplineDisciplinary Core Ideas
Life Sciences  
  • HS-LS2 Ecosystems

 
  • LS2. A. Interdependent Relationships in Ecosystems

  • LS2. C. Ecosystems, Dynamics, Functioning, and Resilience

 
DisciplineDisciplinary Core Ideas
Life Sciences  
  • HS-LS2 Ecosystems

 
  • LS2. A. Interdependent Relationships in Ecosystems

  • LS2. C. Ecosystems, Dynamics, Functioning, and Resilience

 

Finding the Data

Open-source data for species can be found in numerous websites including Global Biodiversity Information Facility (GBIF), International Union for Conservation of Nature (IUCN), eBird, Ocean Biogeographic Information System (OBIS), iNaturalist, among many others. For the environmental variables, WorldClim is the site most researches use for the global climate data, but for marine species, most of the data on this website is not useful. Other sites such as BioORACLE, NASA's SeaWIFS, Aqua MODIS, and the General Bathymetric Chart of the Oceans (GEBCO) provide environmental variables for marine environments. When selecting environmental variables, it is important to select those variables hypothesized to affect the species directly to get a better output when modeling. (Reviewing scientific literature and selecting relevant and sufficient variables is recommended). For example, BioCLIM data (used to model terrestrial species) offer 12 variables: annual mean temperature, minimum temperature of the coldest month, annual precipitation, and precipitation of the wettest month, to name a few. If all variables are selected, it will affect the reliability of the model output (Elith et al., 2011). In this paper, a case study of whale shark's current and future habitat suitability is provided. Whale shark records were obtained from the IUCN website, and the environmental variables selected were from BioORACLE. Only salinity, sea surface temperature (SST), and sea air temperature (SAT) were selected. Other available variables from BioORACLE include ice thickness, phosphate, nitrate, pH, and sea ice concentration, among others.

Cleaning the Data

Novice users of open-source data should give attention to the source and quality of the dataset obtained. The source of the dataset can partly ensure that the final output is reliable. For example, museum records often include the location of the facility but not where the species was observed. For whale shark data, only records that where from the Wildbook for Whale Sharks were used. This database is maintained by local researchers who validate the data. Quality of open-source data can be compromised by inaccurate observations or inaccurate georeferencing (i.e., a specific location that can be mapped), duplicate records, origin of the source for validation, and accuracy of observations (e.g., eliminating a whale shark observation documented in the Himalayas). For the whale shark case study, only georeferenced records from 2000 to 2015 were chosen to match the dates of the environmental variables used.

Modeling

MaxEnt has a simple design that does not require extensive hours of training. Inputting both datasets is self-evident, and the software provides the output in a timely matter. In the software's website, MaxEnt and a tutorial can be downloaded (see http://biodiversityinformatics.amnh.org/open_source/maxent/). The tutorial, with its visuals, provides explanations of the various features, such as the “Jackknife” (Figure 1) and “Response curve” (Figure 2), that address the significance of each environmental variable and the value for the area under the curve (AUC), respectively (Phillips et al., 2006). The output MaxEnt provides is a species distribution map (Figure 3).

Figure 1.

Jackknife results of whale shark case study. The variables include salinity, and mean, maximum, and minimum of sea surface temperature (SST) and of sea air temperature (SAT).

Figure 1.

Jackknife results of whale shark case study. The variables include salinity, and mean, maximum, and minimum of sea surface temperature (SST) and of sea air temperature (SAT).

Figure 2.

Response curve for habitat suitability of the whale shark.

Figure 2.

Response curve for habitat suitability of the whale shark.

Figure 3.

Future habitat suitability map for whale sharks under climate change scenario A2 (year 2100). The color blue (0) represents low suitability and red (1) most optimal suitability.

Figure 3.

Future habitat suitability map for whale sharks under climate change scenario A2 (year 2100). The color blue (0) represents low suitability and red (1) most optimal suitability.

Teaching Methods

Since all the information needed is available and accessible online, the exercise can be part of a formative assessment during a class period. Teachers are encouraged to follow the process detailed in the previous sections “Finding the Data” and “Modeling” to familiarized themselves with the software, find the dataset, and prepare the student handout with questions (Table 3). The teacher should create an instruction handout to guide students on this process (similar to the MaxEnt tutorial). Students can then answer questions about the area(s) of higher and lower habitat suitability, change in habitat suitability (if a similar model is performed with current environmental variables), and possible conservation actions.

Table 3.
Questions instructors can ask students after they perform Species Distribution Modeling
QuestionStandard or practice it meets (from Tables 1 and 2)
1. Name current areas of high suitability. 
  • Science Practice 5

  • Life Science Core Idea LS2.C

 
2. Name future areas of high suitability. 
  • Science Practice 5

  • Life Science Core Idea LS2.C

 
3. Name current areas of low suitability. 
  • Science Practice 5

  • Life Science Core Idea LS2.C

 
4. Name future areas of low suitability. 
  • Science Practice 5

  • Life Science Core Idea LS2.C

 
5. Which environmental variable has the most importance in the future habitat suitability model? 
  • Science Practice 4

 
6. Looking at both current and future habitat suitability maps, name the areas that experienced change of habitat suitability. 
  • Essential Knowledge 4.B.3

 
7. Name the area(s) that should be preserve to allow conservation of the species. 
  • Science Practices 1, 4, and 5

 
QuestionStandard or practice it meets (from Tables 1 and 2)
1. Name current areas of high suitability. 
  • Science Practice 5

  • Life Science Core Idea LS2.C

 
2. Name future areas of high suitability. 
  • Science Practice 5

  • Life Science Core Idea LS2.C

 
3. Name current areas of low suitability. 
  • Science Practice 5

  • Life Science Core Idea LS2.C

 
4. Name future areas of low suitability. 
  • Science Practice 5

  • Life Science Core Idea LS2.C

 
5. Which environmental variable has the most importance in the future habitat suitability model? 
  • Science Practice 4

 
6. Looking at both current and future habitat suitability maps, name the areas that experienced change of habitat suitability. 
  • Essential Knowledge 4.B.3

 
7. Name the area(s) that should be preserve to allow conservation of the species. 
  • Science Practices 1, 4, and 5

 

Conclusion

Conducting classroom activities using online resources enhances student learning, as suggested by education standards, and provides students with tools to conduct scientific research. As long as students have access to the software and dataset, it is an engaging activity that enables them to conduct and participate in scientific research. SDM involves the study of specific species; hence, using charismatic species, such as the whale shark, can generate greater interest while students explore current problems threating the species. It is suggested to use species that are currently in the media to keep students engaged and motivated; for example, the whale shark appeared in a recent animated movie and in several news sources due to illegal hunting for fin shark soup. Current problems, such as conservation and climate change, give students a more in-depth understanding of nature and society interactions. Teachers are encouraged to modify and ask questions that connect with material and techniques covered prior to this activity. This in-class activity encourages group discussions, introduces students to SDM, exposes them to spatial data-management processes, promotes spatial thinking through distribution maps, highlights ecosystem dynamics, and requires them to communicate scientific phenomena.

References

References
Araújo, M. B., & Guisan, A. (
2006
).
Five (or so) challenges for species distribution modelling
.
Journal of Biogeography
,
33
,
1677
1688
.
Elith, J., & Leathwick, J. R. (
2009
).
Species Distribution Models: Ecological Explanation and Prediction Across Space and Time
.
Annual Review of Ecology, Evolution, and Systematics
,
40
,
677
697
.
Elith, J., Graham, C. H., Anderson, R. P., Dudik, M., Ferrier, S., Guisan, A.Zimmermann, N. E. (
2006
).
Novel methods improve prediction of species’ distributions from occurrence data
.
Ecography
,
29
,
129
151
. Retrieved from http://www.ecography.org/appendix/e4596
Elith, J., Phillips, S. J., Hastie, T., Dudik, M., Chee, Y. E., & Yates, C. J. (
2011
).
A statistical explanation of MaxEnt for ecologists
.
Diversity and Distributions
,
17
,
43
57
.
Fourcase, Y. (
2016
).
Comparing species distributions modelled from occurrence data and from expert-based range maps: Implication for predicting range shifts with climate change
.
Ecological Informatics
,
36
,
8
14
.
Fourcase, Y., Engler, J. O., Rödder, D., & Secondi, J. (
2014
).
Mapping Species Distributions with MAXENT Using a Geographically Biased Sample of Presence Data: A Performance Assessment of Methods for Correcting Sampling Bias
.
PLoS ONE
,
9
,
e97122
.
Guisan, A., Tingley, R., Baumgartner, J. B., Naujokaitis-Lewis, I., Sutcliffe, P. R., Tulloch, A.I.T.Buckley, Y. M. (
2013
).
Predicting species distributions for conservation decisions
.
Ecology Letters
,
16
,
1424
1435
. https://doi.org/10.1111/ele.12189
Harte, J., & Newman, E. A. (
2014
).
Maximum information entropy: A foundation for ecological theory
.
Trends in Ecology & Evolution
,
29
(
7
),
384
389
.
Jarnevich, C. S., Stohlgren, T. J., Kumar, S., Morisette, J. T., & Holcombe, T. R. (
2015
).
Caveats for correlative species distribution modelling
.
Ecological Informatics
,
29
,
6
15
.
Marshall, C. E., Glegg, G. A., & Howell, K. L. (
2014
).
Species distribution modelling to support marine conservation planning: The next steps
.
Marine Policy
,
45
,
330
332
.
National Research Council
. (
2012
).
A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas
.
Washington, DC
:
The National Academies Press
.
Phillips, S. J., Anderson, R. P., & Schapire, R. E. (
2006
).
Maximum entropy modeling of species geographic distributions
.
Ecological Modelling
,
190
,
231
259
.
Porfirio, L. L., Harris, R.M.B, Lefroy, E. C., Hugh, S., Gould, S. F., Lee, G., Bindoff, N. L., & Mackey, B. (
2014
).
Improving the Use of Species Distribution Models in Conservation Planning and Management under Climate Change
.
PLoS ONE
,
9
,
e113749
.
Redfern, J. V., Ferguson, M. C., Becker, E. A., Hyrenback, K. D., Good, C., Barlow, J.Werner, F. (
2006
).
Techniques for cetacean-habitat modeling
.
Marine Ecology Progress Series
,
310
,
271
295
. Retrieved from https://www.int-res.com/abstracts/meps/v310/p271-295/
Robinson, L. M., Elith, J., Hobday, A. J., Pearson, R. G., Kendall, B. E., Possingham, H. P., & Richardson, A. J. (
2011
).
Pushing the limits in marine species distribution modelling: Lessons from the land present challenges and opportunities
.
Global Ecology and Biogeography
,
20
,
789
802
.
Senay, S. D., Worner, S. P., & Ikeda, T. (
2013
).
Novel Three-Step Pseudo-Absence Selection Technique from Improved Species Distribution Modelling
.
PLoS ONE
,
8
,
e71218
.
Sinclair, S. J., White, M. D., & Newell, G. R. (
2010
).
How useful are species distribution models for managing biodiversity under future climates?
Ecology and Society
,
15
,
8
. Retrieved from http://www.ecologyandsociety.org/vol15/iss1/art8/
Stirling, D. A., Boulcott, P., Scott, B. E., & Wright, P. J. (
2016
).
Using verified species distribution models to inform the conservation of a rare marine species
.
Diversity and Distributions
,
22
,
808
822
.
Tyberghein, L., Verbruggen, H., Pauly, K., Troupin, C., Mineur, F., & De Clerck, O. (
2017
).
Bio-ORACLE: A global environmental dataset for marine species distribution modelling
.
Global Ecology and Biogeography
,
21
,
272
281
.
Valavanis, V. D., Pierce, G. J., Zuur, A. F., Palialexis, A., Saveliev, A., Katara, I., & Wang, J. J. (
2008
).
Modelling of essential fish habitat based on remote sensing, spatial analysis and GIS
.
Hydrobiologia
,
612
,
5
20
.