The ubiquitous marine bacterium Vibrio parahaemolyticus is a leading cause of illness associated with seafood consumption. The emergence of two genetically distinct ecotypes (ST3 and ST36) has led to an alarming increase in the size and frequency of disease outbreaks. We conducted a genomic comparison of 30 V. parahaemolyticus genomes that represent a diverse collection of 15 genetically distinct ecotypes, including newly sequenced representatives of ST3 and ST36, isolated from both clinical and environmental sources. A multistep evolutionary analysis showed that genes associated with sensing and responding to environmental stimuli have evolved under positive selection, identifying examples of convergent evolution between ST3 and ST36. A comparison of predicted proteomes indicated that ST3 and ST36 ecotypes laterally acquired tens of novel genes associated with a variety of functions including dormancy, homeostasis and membrane transport. Genes identified in this study play an apparent role in environmental fitness and may confer cross protection against stressors encountered in the human host. Together, these results show the evolution of stress response is an important genetic mechanism correlated with the recent emergence of the ST3 and ST36 ecotypes.
Vibrio parahaemolyticus is a Gram-negative bacterium adapted for life in the marine environment (Joseph et al., 1982). This bacterium is also a coincidental pathogen of humans and a leading cause of gastroenteritis associated with the consumption of raw or undercooked seafood (Daniels et al., 2000; Su and Liu, 2007). Historically, outbreaks of V. parahaemolyticus related illness were localized and attributed to a diversity of genetically distinct strains (Chowdhury et al., 2000; Matsumoto et al., 2000). Over the last two decades, the emergence of a pandemic ecotype, characterized by multilocus sequence typing (MLST) as ST3, which includes serotype O3:K6 and its many serovariants, has led to a dramatic increase in infections worldwide (Nair et al., 2007). Coincident with the global spread of ST3, larger and less localized outbreaks in North America have been attributed to additional ecotypes such as ST36 (O4:K12) (Abbott et al., 1989; Turner et al., 2013). Further, the recent transoceanic spread of ST36 to Europe in 2012 (Martinez-Urtaza et al., 2013; González-Escalona et al., 2015) has raised concern over the possible emergence of ST36 as a second pandemic ecotype.
The majority of V. parahaemolyticus strains are innocuous to humans (DePaola et al., 2000). Those strains that do cause illness often carry genes encoding hemolytic exotoxins such as the thermostable direct hemolysin (TDH) and TDH-related hemolysin (TRH) (Shirai et al., 1990), which cause fluid loss across host cell membranes (Raimondi et al., 2000; Matsuda et al., 2010). Detection of the tdh and trh genes is a test frequently used to identify potentially virulent strains (Bej et al., 1999) although some virulent strains lack both hemolysins (Raghunath, 2014). The mechanism of pathogenesis is also associated with the secretion of effector proteins by a type III secretion system (T3SS) responsible for toxicity in various in vitro and in vivo models (Burdette et al., 2008; Hiyoshi et al., 2010; Ham and Orth, 2012). Yet the overall mechanism of pathogenesis remains unclear and is thought to involve the concerted expression of multiple virulence-associated genes (Broberg et al., 2011; Zhang and Orth, 2013).
The predominance of ST3 and ST36 ecotypes among clinical isolates suggests these ecotypes benefit from enhanced fitness in the marine environment or the human host. Moreover, the ability of both ecotypes to cause large, non-localized outbreaks suggests parallel evolutionary strategies. Previous studies have linked the emergence of ST3 with the lateral acquisition of seven genomic islands (Hurley et al., 2006; Boyd et al., 2008). One island contains two copies of the tdh gene (tdh1 and tdh2) and a type III secretion system (T3SS2-α) (Makino et al., 2003). Similarly, the ST36 ecotype has acquired tdh and trh gene clusters and a homologous type III secretion system (T3SS2-β) (González-Escalona et al., 2011; Paranjpye et al., 2012; Xu et al., 2015). However, the widespread occurrence of these genomic islands among diverse collections of clinical and environmental ecotypes (Paranjpye et al., 2012; Xu et al., 2015) suggests that additional forces correlate with predominance.
Given that V. parahaemolyticus is an environmental bacterium and the human host is an alternate and novel ecological niche, we hypothesized that the evolution of dual-purpose genes, integral to both environmental fitness and human pathogenesis, would correlate with the recent emergence of the ST3 and ST36 ecotypes. To test this hypothesis, we conducted a genomic analysis of 30 V. parahaemolyticus genomes to identify the evolutionary forces correlated with clinically predominant ecotypes (ST3 and ST36). We dissected two mechanisms of adaptive evolution: the mutation of genes comprising the core genome and the lateral acquisition of genes comprising the accessory genome.
Materials and methods
We compared a total of 30 V. parahaemolyticus genomes (two closed, 28 draft) sequenced from clinical and environmental isolates (n = 21 and 9, respectively) collected from ten countries over a 27-year period (Table S1). Nineteen of the genomes were sequenced in this study (below) and the remaining eleven were downloaded from NCBI. The genomes were selected to represent a diverse collection of 15 genetically distinct ecotypes, and a diversity of tdh/trh genotypes (i.e., +/+, +/–, –/+, and –/–). The presence or absence of the virulence-associated tdh and trh genes are reported in Table S1. We consider all clinical strains as virulent; however, in lieu of animal model testing, we cannot predict the virulence or avirulence of environmental strains. The closed genome of the ST3 clinical strain V. parahaemolyticus RIMD2210633 (Makino et al., 2003) and the draft genome of the ST36 clinical strain V. parahaemolyticus 10329 (González-Escalona et al., 2011) were defined as reference genomes.
Bacterial isolates, culture conditions and isolation of genomic DNA
V. parahaemolyticus isolates (n = 19) originating from the Pacific Northwest (PNW) coast of the United States (Table S1) were selected for genome sequencing based on previous genotyping (Paranjpye et al., 2012) and multilocus sequence typing (MLST) analysis (Turner et al., 2013). In V. parahaemolyticus, MLST is a high-resolution genetic typing scheme based on seven conserved housekeeping genes (González-Escalona et al., 2008) that serve as the framework for a global database containing more than 2,200 isolates (http://pubmlst.org/vparahaemolyticus/). Multilocus sequence typing was chosen ahead of serotyping, as the O- and K-antigens are prone to rapid recombination and serotype conversion, and more than 20 O3:K6 serovariants have been identified (Chen et al., 2010, 2011). Late log-phase cultures were grown overnight (16–20 hours) in tryptic soy broth (TSB) at 30oC with shaking (150 rpm). Genomic DNA was isolated by the cetyltrimethyl ammonium bromide (CTAB) phenol/chloroform method (Atashpaz et al., 2010).
Sequencing, assembly and annotation
Draft genomes were generated for 19 V. parahaemolyticus strains by SOLiD 4 sequencing (Applied Biosystems Inc., Foster City, CA, USA). Barcoded paired-end (50 and 25 bp) libraries were constructed with a target insert size of 250 bp. Reads were error-corrected using the SOLiD Accuracy Enhancement Tool and PCR duplicated reads were removed using the fastq_nodup tool from the SEAStAR software (Iverson et al., 2012). Genomes were assembled in color-space using the CLC Assembly Cell version 4.1 (CLC Bio, Boston, MA, USA). Reads were mapped to both V. parahaemolyticus RIMD2210633 (Makino et al., 2003) and V. parahaemolyticus 10329 (González-Escalona et al., 2011) reference genomes, and the assembly recruiting the highest number of reads was retained. Unmapped reads as well as genomes mapping poorly to either reference were assembled de novo using a kmer length of 25. Draft genome assemblies were deposited in NCBI/GenBank (see Table S1 for accession numbers). NCBI’s Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP) was used to annotate the draft genomes.
Relatedness between the 30 V. parahaemolyticus genomes was established using a whole-genome alignment (WGA) approach (Sahl et al., 2011). Briefly, genomes were aligned using Mugsy (Angiuoli and Salzberg, 2011), and conserved alignment blocks (> 500 bp in length and shared by all 30 genomes) were concatenated using the Galaxy web-based platform (Giardine et al., 2005). The concatenated alignment was trimmed using the strictplus algorithm implemented in trimAL (Capella-Gutierrez et al., 2009). A phylogenetic tree was inferred with FastTree (Price et al., 2009) using the general time-reversible (GTR) model of nucleotide evolution. Local support values (500 resampled alignments) were calculated using SEQBOOT (from the PHYLIP package, version 3.6) (Felsenstein, 1989) and the FastTree global bootstrap parameter (the intree1 option). The WGA tree was rooted to a closely related species (Vibrio alginolyticus ATCC17749) (Liu et al., 2015) and annotated in FigTree.
Core genes evolving under positive selection
Evidence of recent positive selection among core genes (i.e., shared by all 30 genomes) was established using TimeZone (Chattopadhyay et al., 2013). Major workflows include a modified BLASTN comparison of all genomes against a reference, extraction of orthologous genes, multiple sequence alignment, estimates of homologous recombination and phylogenetic reconstruction. The closed and well-annotated assembly of the type strain V. parahaemolyticus RIMD2210633 was chosen as the reference genome (Makino et al., 2003). Genes were grouped as orthologous based on a minimum sequence identify (95%) and a minimum relative length (95%). Evidence of positive selection was established by the repeated occurrence of site-specific replacement mutations (i.e. hotspot mutations). Analysis of genes evolving under positive selection was limited to non-recombinant genes containing only recent (i.e., short-term) hotspot mutations. Recombinant genes were detected using MaxChi and PhylPro as implemented in TimeZone. Genes showing evidence of positive selection were analyzed using the DAVID bioinformatic resources (Huang et al., 2009) to identify instances of enrichment and infer biological function.
Lateral acquisition of adaptive genes
Clustering of orthologous proteins among all 30 V. parahaemolyticus genomes was carried out in OrthoMCL (Li et al., 2003) using a modified bidirectional all-against-all BLASTP (e–10 cutoff). Results were parsed using custom Python scripts to identify the subset of orthologs unique to a given ecotype (i.e., ST3 and ST36). Functional analysis of unique orthologs was aided by NCBI’s non-redundant (nr) database, the SEED Viewer (http://theSEED.org), the virulence factor database (VFDB) (Chen et al., 2012), and InterPro (Hunter et al., 2009).
To test whether genes located on chromosome II were more strongly influenced by positive selection (compared to chromosome I), differences in nucleotide diversity and rates of synonymous (dS) and nonsynonymous (dN) mutations between chromosomes were evaluated with a two-tailed t-test in Python using the SciPy library.
We compared a total of 30 V. parahaemolyticus genomes that reflect a global sampling of strains (15 ecotypes from ten countries) isolated from both clinical and environmental sources over a 27-year period. The focus of our analyses was the discovery of features that distinguish the ST3 and ST36 ecotypes from the larger V. parahaemolyticus population.
A phylogenetic tree (Figure 1A) was inferred using a whole genome alignment (WGA) approach (Sahl et al., 2011). The WGA tree was based on a gene-independent analysis of shared genomic data (∼3.96 Mbp) derived from 1,343 concatenated alignment blocks (each greater than 500 bp). The analysis grouped the 30 V. parahaemolyticus genomes into 15 genetically distinct clades in agreement with our previous phylogenetic analysis based on a 7-loci MLST scheme (Turner et al., 2013). The WGA tree provided higher local support values and resolved the subclade diversity of ST3 (n = 7 genomes) and ST36 (n = 8 genomes) ecotypes (Figures 1B and 1C, respectively). We observed no discernible phylogenetic patterns within ST3 despite significant temporal and geographic separation between the seven isolates. In contrast, diversity within ST36 showed evidence of a temporal shift between isolates collected prior to 2001 (EN2910, EN9701173, 97-10290, 10329 and EN9901310) and those collected since 2006 (3324, 12315 and 846).
Core genes evolving under positive selection
Whole-genome comparisons with the closed reference genome (V. parahaemolyticus RIMD2210633) identified a core set of 1,185 orthologous genes present in all 30 genomes. Core genes showing evidence of recombination (n = 81) (see Materials and methods) were removed from additional analysis. Polymorphic genes (n = 1,104) were defined as all non-recombinant core orthologs that produced non-identical alignments consisting of at least four alleles. Those polymorphic core genes located on chromosome II displayed significantly greater nucleotide diversity (π) and significantly higher rates of synonymous (dS) and nonsynonymous (dN) mutations than those located on chromosome I (p < 0.001) (Table 1). Furthermore, recombination was detected more frequently in polymorphic core genes located on chromosome II (63.3%) than on chromosome I (38.3%).
aGenes with more than four non-identical alleles.
bSignificance declared at P < 0.001.
To identify genes evolving under positive selection, we assessed the frequency of single-nucleotide polymorphisms (SNPs) resulting in repeated, site-specific amino acid replacements (i.e., hotspot mutations). Approximately one-third of polymorphic core genes (n = 344 of 1,104 genes) contained hotspot mutations (Table S2). Half encoded hypothetical proteins and half encoded putative proteins associated with a variety of functional categories (e.g., acetyltransferase activity, heterocycle biosynthesis, transcriptional regulation and nitrogen compound biosynthesis). Many of the 344 genes contained ecotype-specific hotspot mutations; 155 were found in the ST3 ecotype (n = 7 genomes), 194 were found in the ST36 ecotype (n = 8 genomes), and 105 were found in both ST3 and ST36 (n = 15 genomes) (Table S2). Functional analysis of the 105 shared hotspots clustered the proteins into 20 functional categories (e.g., transcription regulation, essential to the cell envelope, and integral to cell membrane). However, only those proteins involved in transcription regulation (n = 13 proteins) were significantly overrepresented (Table S2). These transcriptional regulators included GntR, LysR, TetR, MerR and LuxR family transcriptional regulators containing DNA-binding, helix-turn-helix domains. We also identified ten genes containing site-specific hotspot mutations shared exclusively between ST3 and ST36 ecotypes (Table 2). Of these, six resulted in identical amino acid changes. For example, analysis of TolC revealed that all ST3 and ST36 genomes (n = 15) shared a hotspot mutation resulting in an identical amino acid replacement (serine to alanine) at amino acid position 118 (Table 2).
aGenes are identified by NCBI/GenBank locus tag.
bThe amino acid position of the hotspot mutation.
cThe amino acid substitution resulting from the hotspot mutation is shown for both ecotypes.
Lateral acquisition of adaptive genes
To identify genes unique to ST3 and ST36 ecotypes but absent from all other ecotypes, we conducted an all-versus-all BLASTP comparison of the 30 predicted proteomes (Figure 2). Orthologs unique to the seven ST3 genomes (n = 42) and the eight ST36 genomes (n = 60) did not include putative toxins such as the TDH and TRH hemolysins (Honda et al., 1988; Nishibuchi et al., 1989) or the VopT and VopZ effector proteins (Kodama et al., 2007; Zhou et al., 2013). Instead, the majority of ecotype-specific orthologs (83%) were hypothetical proteins, with few defining features other than transmembrane and signal peptide domains (Tables S3 and S4, respectively). The remaining orthologs were associated with a variety of functions such as dormancy, membrane transport, capsular polysaccharide biosynthesis, and transcriptional regulation. The sole ortholog restricted to both ecotypes (locus tag VP1860 in RIMD2210633 and VP10329_14295 in 10329) was a hypothetical protein containing a domain of unknown function (DUF1706).
A large proportion of the ecotype-specific orthologs (n = 22/42 in ST3 and 17/60 in ST36) were localized to the superintegron (SI) on chromosome I. SI-localized orthologs unique to ST3 included a ReIE/ReIB toxin/antitoxin system, two acetyltransferases and numerous hypothetical proteins (Table S3). SI-localized orthologs unique to ST36 included an acetyltransferase and various hypothetical proteins (Table S4). In a different region on the same chromosome, nine orthologs unique to ST36 were localized within the capsular polysaccharide (cps) gene cluster (Table S4). Five of these ST36 cps genes encoded capsular polysaccharide proteins (glycosyl transferase, mannose-6-phosphate isomerase, nucleoside-diphosphate-sugar epimerase, and galactosyl transferase) with significant (< e–20) homology to putative virulence factors curated in the VFDB (Table 3). Comparison of ST3 and ST36 cps clusters revealed important differences in size, gene content, and synteny (Figure 3).
aGenes identified by NCBI/GenBank locus tag.
bNumber of amino acids.
cNumber of significant hits (e–20 cutoff) in the virulence factor database (VFDB).
dDescription of the most significant hit and the representative species.
This study describes evolutionary mechanisms correlated with two clinically important V. parahaemolyticus ecotypes (ST3 and ST36). These genetically distinct ecotypes are characterized by their widespread occurrence and their ability to cause large, non-localized disease outbreaks. Whole-genome comparisons across a diverse collection of 30 strains from clinical and environmental sources indicate that core genes shared between ST3 and ST36 ecotypes are converging under positive selection. In addition, the acquisition of accessory genes, novel to the ST3 and ST36 ecotypes, indicates divergent evolution. Together, these two evolutionary forces appear to be acting in concert to affect genes that play a role in sensing, responding, and adapting to extracellular stress.
Previous studies have inferred the relatedness of V. parahaemolyticus ecotypes through the phylogenetic analysis of seven housekeeping genes (González-Escalona et al., 2008). Recent studies seeking higher resolution phylogenies have turned to genome-scale analyses (Haendiges et al., 2015; Hazen et al., 2015; Loyola et al., 2015; Whistler et al., 2015). Here, we used a WGA approach to infer relatedness between major phylogenetic clades. This approach resolved the subclade diversity inherent to ST3 and ST36 ecotypes. The absence of a temporal pattern within ST3 demonstrates the strong clonality of this ecotype. Meanwhile, evidence of a temporal shift in ST36 may reflect important metabolic differences between closely related isolates. For instance, zebrafish challenged with different ST3 and ST36 strains (including strains from this study) have been shown to exhibit distinctly different survival curves (Paranjpye et al., 2013).
The diversification of bacterial ecotypes can arise from mutation and lateral gene transfer. The role played by mutation was assessed using a multistep analysis aimed at the identification of genes evolving under positive selection. A first step in this analysis was the measurement of evolutionary rates in 1,104 core genes. Significantly faster evolutionary rates (π, dS, and dN) among genes located on chromosome II (compared to chromosome I) indicate that genes located on the smaller secondary chromosome are subject to weaker purifying selection. In Bulkolderia and Vibrio species, rapidly evolving genes tend to be located on the smaller second chromosome (Cooper et al., 2010). Citing a chromosomal bias in the distribution of essential genes, it has been suggested that secondary chromosomes may act as accessory genomes for adaptation to environmental change or new ecological niches (Okada et al., 2005). In this study, the observed chromosomal bias in selection pressure supports the theory that secondary chromosomes are a reservoir for genes involved in adaptive evolution.
Genes containing site-specific replacement substitutions (i.e., hotspot mutations) accounted for nearly one-third of core genes. The occurrence of recent hotspot mutations is an indicator of genes evolving under positive selection (Chattopadhyay et al., 2009). Indeed, a single positively selected SNP leading to an amino acid replacement can promote adaptation to a new niche (Weissman et al., 2007). We detected an abundance of genes containing hotspot mutations and many were shared between the ST3 and ST36 ecotypes. Functional analysis of shared hotspot genes revealed a significant overrepresentation of helix-turn-helix DNA-binding transcriptional factors, which are known to regulate transcription in response to environmental stimuli like heavy metals, oxidative stress or antibiotics (Santos et al., 2009). The ability of an ecotype to respond appropriately to prevailing environmental conditions is essential to niche adaptation, as occupancy of a new niche would require the bacterium to overcome new growth conditions and extracellular stressors. For example, the ability of V. parahaemolyticus strains to tolerate elevated temperatures, starvation and acidity in the natural environment has been linked to cross-protection against similar stressors encountered in the human host (Koga and Takumi, 1995; Wong et al., 1998).
Since ST3 and ST36 ecotypes are both responsible for large, non-localized outbreaks, we hypothesized that these ecotypes would share adaptive SNPs in key genes associated with fitness. We identified ten site-specific hotspot mutations affecting only ST3 and ST36. The occurrence of shared hotspot mutations is a strong sign of convergent evolution (Chattopadhyay et al., 2012). Although the precise role of these genes remains unclear, four of these genes encode proteins (cytochrome c551 peroxidase, short-chain dehydrogenase, MerR family transcriptional regulator and glutathione S-transferase) predicted to mediate the detoxification of reactive oxygen species, which can accumulate as a by-product of aerobic respiration or host immune defense during phagocytosis (Cabiscol et al., 2000). Additionally, two of these exceptional hotspots occur within genes associated with membrane transport (outer membrane gene tolC and a phosphate ABC transporter). In V. cholerae, tolC is upregulated in response to environmental stress and is essential for V. cholerae bile resistance and infection (Bina and Mekalanos, 2001). Meanwhile, phosphate-binding ABC transporters are integral components of phosphate-specific transport systems connected to both homeostasis and pathogenesis (Lamarche et al., 2008).
The predominance of ST3 and ST36 ecotypes among clinical isolates has been correlated with the lateral acquisition of mobile genomic islands that carry one or more virulence-associated genes (Hurley et al., 2006; Boyd et al., 2008). We detected tens of novel genes uniquely associated with the ST3 and ST36 ecotypes. These genes were largely hypothetical and did not include putative toxins such as the TDH and TRH hemolysins. A large proportion of these genes were located within the large, hyper-variable SI. The V. parahaemolyticus SI was described previously (Boyd et al., 2008) and is theorized to play a role in adaptive evolution through the capture of exogenous DNA (Chen et al., 2011). In ST3, SI-bound orthologs included a toxin-antitoxin module (ReIE/ReIB). Originally described as playing a role in programmed cell death, chromosomal toxin-antitoxin modules are known to mediate persistence in response to extracellular stress (Gerdes et al., 2005; Maisonneuve et al., 2011). Persistence and the viable but non-culturable state are similar stress-induced states of dormancy that can increase resistance to extracelluar stressors (Wong and Wang, 2004; Ayrapetyan et al., 2015). Additional SI-bound orthologs included multiple proteins containing helix and transmembrane domains, indicating these proteins may mediate unknown interactions with the environment.
Several ecotype-specific orthologs were also detected within the cps gene cluster. In V. parahaemolyticus, genes within this cluster are involved in the synthesis of capsular polysaccharides (CPS) and play a role in the formation of biofilms (Chen et al., 2010). The cps cluster is known to be highly variable, and previous work has shown that recombination within this locus can be driven by chitin-induced transformation (Neiman et al., 2011). Five CPS proteins, unique to ST36, showed significant homology to putative virulence factors described in other Gram-negative bacteria. In clinical V. vulnificus strains, CPS is a primary toxin and has been shown to protect against phagocytosis during host invasion (Jones and Oliver, 2009). Taking advantage of ecotype-specific differences in cps gene content, a recent study announced the development of methods to specifically detect the ST36 ecotype (Whistler et al., 2015). Similarly, detection of an open reading frame (ORF8) associated with the f237 filamentous phage (Nasu et al., 2000) was previously proposed for the specific detection of the ST3 ecotype (Iida et al., 2001).
This study correlated the emergence of ST3 and ST36 ecotypes with the vertical and lateral evolution of genes associated with sensing, responding, and adapting to extracellular stimuli. The identification of adaptive mutations and adaptive genes, predicted to enhance fitness in both the natural environment and the human host, is an important step in dissecting mechanisms underlying the recent spread of these clinically important ecotypes. In particular, the evolution of genes associated with transcriptional regulation, defense against oxidative stress, membrane transport, and the formation of biofilms may play key roles in both environmental fitness and human pathogenesis. A better understanding of adaptive evolution, as a basis for emergence, could prioritize the public health response and lead to the development of new therapeutics or improved methods of detection.
Data accessibility statement
© 2016 Turner et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
This study would not have been possible without the prior isolation and typing of V. parahaemolyticus isolates. For that effort, we thank Dr. Rohinee Paranjpye, Dr. William Nilsson and Gladys Yanigeda. We thank Drs. Paul Sandip, Sujay Chattopadhyay and Evgeni Sokurenko for assistance with the interpretation of TimeZone analyses.
This study was funded by the National Oceanic and Atmospheric Administration’s (NOAA) Oceans and Human Health Initiative (OHHI) and National Marine Fisheries Service (NMFS). Postdoctoral support was provided by the National Research Council (NRC) and the Gordon and Betty Moore Foundation (EVA #537.01).
The authors declare no competing interests.
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.