Link to NIFA Home Page Link to USDA Home Page
Current Research Information System Link to CRIS Home Page

Item No. 1 of 1

ACCESSION NO: 1006049 SUBFILE: CRIS
PROJ NO: SC-2015-02576 AGENCY: NIFA SC.
PROJ TYPE: AFRI COMPETITIVE GRANT PROJ STATUS: TERMINATED
CONTRACT/GRANT/AGREEMENT NO: 2013-67012-23286 PROPOSAL NO: 2015-02576
START: 15 AUG 2014 TERM: 14 JUN 2017 FY: 2017
GRANT AMT: $80,714 GRANT YR: 2015
AWARD TOTAL: $80,713.79
INITIAL AWARD YEAR: 2013

INVESTIGATOR: Richards, V.

PERFORMING INSTITUTION:
CLEMSON UNIVERSITY
CLEMSON, SOUTH CAROLINA 29634

POPULATION GENOMICS OF STREPTOCOCCUS AGALACTIAE FROM INFECTED BOVINE AND FISH SOURCES

NON-TECHNICAL SUMMARY: Streptococccus agalactiae (group B Streptococcus - GBS) is a member of the commensal microbiota of the intestinal and genitourinary tracts of humans, but is also a leading cause of morbidity and mortality in newborn babies, pregnant women, and the elderly [1]. The other major reservoir for the bacterium is bovine, where it has long been recognized as a common cause of mastitis, a major production limiting disease in developed and developing countries around the world. S. agalactiae has also been identified as an aetiological agent of septicaemia and meningo-encephalitis in saltwater and freshwater fish species, and is now considered an important threat to the aquaculture industry [2-4]. Although there is considerable evidence for host- adaptation among strains of S. agalactiae, there is potential for foodborne and human-to-animal or animal-to-human transmission of the pathogen [5-7]. In addition to anthroponotic or zoonotic transmission, the possibility of emergence of human- pathogenic clones from an animal reservoir has been raised [8], as well as the suggestion that re-emergence of the pathogen in animal populations may be due to spill-over or adaptation of strains from humans [9]. There is therefore, a clear need for a better understanding of the evolution and transmission dynamics of S. agalactiae both within and across host species, including humans, agricultural, and aquaculture species.We take advantage of the rapidly decreasing cost of next-generation genome sequencing technology to examine for the first time the population genetic structure and dynamics of S. agalactiae on a global scale involving different hosts and disease states. This study will provide valuable insight on the population structure and diversity of S. agalactiae, uncover emergent clones, and identify associations between specific lineages and isolation source. Comparative genomics of multiple isolates derived from diseased bovine and fish sources, compared to human sourced isolates, should (i) identify genes that are key to host adaptation and possibly linked to the cause of disease, and (ii) allow estimation of the direction and rate of bacteria migration among populations and therefore provide insight into transmission dynamics among hosts.This project is directly related to the Foundational Program's priority area of "Animal Health and Production" and the challenge area of "Keeping American agriculture competitive" by identifying genes linked to bovine mastitis and fish septicaemia, caused by S. agalactiae. It will also provide data on transmission dynamics between these sources of infection and is therefore also linked to the challenge area of "Improving food safety." Collectively, the information arising from this project could ultimately lead to more effective preventative and treatment programs. Findings will be disseminated via scientific conferences and peer reviewed publications.

OBJECTIVES: 1. Describe the Streptococcus agalactiae population structure, genetic diversity, migration patterns, and demographic history using genome wide single nucleotide polymorphism (SNP) data in a global strain collection from humans, bovines, and fish, identifying associations between phylogenetic lineages and isolation source.2. Determination of the core genome, unique core genes, and dispensable genome components of distinct S. agalactiae lineages, as determined by genome wide SNP genotyping, characteristic of different host species and disease states.

APPROACH: Isolates and SNP discoveryThis study takes advantage of genome sequence data available on GenBank for multiple isolates of S. agalactiae and supplements this dataset with new genome sequences acquired as part of this project. The GBS sequences from human come from 12 different states within the USA, Sweden, Belgium, Norway, east and west Canada, Australia, South America, and Asia and contributed to both symptomatic infections (e.g., sepsis, meningitis, wound infections, urinary tract infections, mastitis)and asymptomatic colonization (e.g., colonized pregnant women, college students). The 37 sequences from bovine currently on GenBank come largely from Denmark and the NE US. The additional 100 bovine sequences cover 6 different states in the USA, Denmark, Italy, Belgium, Argentina, and eastern Canada (New Brunswick and Quebec). The 50 GBS genome sequences from fish arise from collections of several different host species from both fresh and saltwater environments.Genome sequencing will take place as follows: (i) Nextera™ DNA library prep and Illumina HiSeq® 2000 sequencing technologies will be combined to produce 100bp paired end reads for each isolate, (ii) de-novo assembly of reads will be performed using Velvet v0.7.55, and the script VelvetOptimizer v2.1.4. The MCL algorithm as implemented in the MCLBLASTLINE pipeline will be used to delineate homologous protein sequences among all genome sequences. Based on sequence similarity, the pipeline uses Markov clustering (MCL) to assign genes to homologous clusters. Similarity is obtained from a reciprocal BLASTp within and between all genome pairs using an E value cut-off of 1e-5. The MCL algorithm will be implemented using an inflation parameter of 1.8. Simulations have shown this value to be generally robust to false positives and negatives. Core homologous gene clusters (homologs shared among all genomes) will be delineated and clusters containing multiple homolog copies for any genome removed. Genes within each cluster will be aligned using Probalign v1.1. Gene alignments will be tested for evidence of intragenic recombination using multiple approaches: (i) GARD, a phylogenetic method that searches for gene segments with incongruent phylogenetic topologies, (ii) PHI and NSS, compatibility methods that examine pairs of sites for homoplasy without phylogenetic reconstruction, and (iii) Max c2, a substitution distribution method that searches for significant clustering of substitutions at putative recombination break points. To further assess recombination over wider genomic scales, the program Profile [distributed as part of PhiPack] will be used to detect recombination within a concatenated alignment of the SNPs assessed to be non-recombinant. Finally, a concatenation of SNP sites will be used to construct a contiguous genotype sequence for each strain.Population genetic analysis of SNP dataThe majority of previous studies of population genetics of bacteria have generally concentrated on relatively small numbers of isolates, from limited geographic locations. There are a few notable exceptions, but generally speaking, most studies lack sufficient numbers to test hypotheses of geographic partitioning. Combined with the relative lack of resolution typical of MLST data sets, these studies have provided us with only a partial picture of population genetic differentiation for the vast majority of bacteria species. A recent study employing genome wide SNP data for Staphylococcus aureus detected important geographic differences in an international set of MRSA [methicillin resistant S. aureus]. The proposed study will use non-recombinant genome wide SNP data to provide a much more detailed picture of population genetic structure of S. agalactiae on a global scale, from multiple hosts, and different disease states.The number of populations (K) will be estimated using two separate approaches as implemented in the programs STRUCTURE v2.3 and GENELAND v3.1.4. Both STRUCTURE and GENELAND use Markov Chain Monte Carlo (MCMC) algorithms within Bayesian frameworks to estimate K. For STRUCTURE, K will be estimated by first performing an evaluation of genetic partitioning and then calculating the ad hoc statistic DK. DK is a measure of the second order rate of change of the probability of the data L(K) for each value of K. Two models of ancestry will be utilized: (i) the no admixture model, which assumes ancestry is derived from only one population, and (ii) the linkage model, which accounts for mixed population ancestry due to recombination. Once K has been determined, assignment probabilities (membership coefficients) for each individual in each population will be calculated. SNPs within the same gene will be considered linked, with map distances assumed proportional to the number of sites between SNPs. The approach used by GENELAND (referred to as landscape genetics) implements a spatial model that incorporates spatial coordinates of sampled individuals and thereby includes geographic distance among sampled individuals in its estimation of K. The program also implements a haploid data model, which assumes a multinomial distribution of genotype frequencies and linkage equilibrium. Here, I will combine these models in the estimation of K. Levels of differentiation among the delineated populations will be measured using an analysis of molecular variation (AMOVA) as implemented in ARLEQUIN version 3.1. I will utilize the complete genome data for all S. agalactiae strains to examine the distribution and diversity of genes implicated in virulence, as well as levels of diversity in unique core genes, and core genes. Migration rates among populations will be estimated using the program MIGRATE v3.2. Within a Bayesian framework, parameters (Q and m/m) from preliminary runs with uniform prior distributions will be averaged and used to establish the boundaries for exponential prior distributions on a second run. For the second run, we will employ two long chains and an adaptive heating scheme combined over 3 replicate runs. Given impractical computational demands associated with analyzing all populations in the same run, we will minimize the number of parameters estimated by restricting each run to pairwise population comparisons.Historical population dynamicsIn addition to delineating population boundaries, the global diversity captured by the extensive sampling scheme and high resolution SNP genotyping will be further exploited to elucidate population demographic history. More specifically, accurate representation of the diversity within distinct populations, will facilitate recovery of sufficient phylogenetic signal to estimate changes in effective population size through time. While this approach has been successfully applied to eukaryote and viral populations, it has rarely been applied to bacterial populations. Its successful application here will provide valuable insight to the population demographic dynamics of S. agalactiae. For example, by combining epidemiological data with accurate estimates of population age, causal factors associated with emergence of specific populations/lineages may be elucidated. In addition, multiple comparisons of demographic histories provide the potential to detect distinct demographic signatures associated with host and/or disease state. Furthermore, population trajectories might be useful in identifying populations that pose future threats to livestock populations--for example virulent populations experiencing rapid population expansion. Here, by globally comparing multiple sampled populations representing different hosts and disease states, I will provide the most extensive comparison of bacterial demographic history to date.

PROGRESS: 2014/08 TO 2017/06
Target Audience: Nothing Reported Changes/Problems:The Project Director secured a faculty position midway through the two-year funding period. The remaining year of support was converted to standard award to Clemson University. The funds were finally transferred to Clemson in May 21st, 2015. A postdoc was hired on the project as of July 1st, 2015. A No-Cost Extension was approved, with a new end date of December 31st, 2016. A second postdoc was hired to complete the project, which required a second no cost extension, with a new end date of June 30th, 2017. What opportunities for training and professional development has the project provided?The PI obtained extensive bioinformatic training while at Cornell University. Experience was gained with Perl and bash programing, LINUX administration, and population genomic data analysis. This training was passed on to two postdocs subsequently hired to work on the project. Both the PI and postdocs obtained valuable bioinformatic experience through the course of the project that was applied to an important agricultural question. How have the results been disseminated to communities of interest?A manuscript describing our findings will be submitted to PNAS within the next 10 days What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

IMPACT: 2014/08 TO 2017/06
What was accomplished under these goals? Utilizing an expanded data set of over 900 genomes, Bayesian population structure analysis delineated lineages into twelve major populations that closely aligned with established clonal complexes, yet we showed the presence of a distinct bovine-adapted lineage, and a divide of clonal complex 1 into two distinct populations. Although this was the strongest pattern of differentiation in the data, geographic population structure was also detected. Admixture was more common in isolates from animal than human hosts. We expanded the analysis to include the distribution of antibiotic resistance genes. This distribution suggested transfer of resistance genes from human isolates to bovines, but not vice-versa. Bayesian migration analysis between bovine, human, and poikilothermic hosts revealed that the strongest signal of transmission was between humans and bovines, with a strong bias from humans into bovines. Pan genome analysis suggests that the pan genome is still open at 9,527 genes. Gene ontology enrichment analysis revealed that lineages specific to a single host are enriched/depleted in unique sets of genes associated with the host environment, such as enrichment for carbohydrate metabolism in the bovine lineage.

PUBLICATIONS (not previously reported): 2014/08 TO 2017/06
Type: Journal Articles Status: Submitted Year Published: 2017 Citation: Comparative genomics and pan-genome analyses of multi-host pathogen Streptococcus agalactiae from human and agricultural environments. Richards VP, Velsko IM, Alam TM, Zadoks RN, Manning SD, Delannoy CMJ, Pavinski Bitara PD, Town CD, Stanhope MJ. PNAS Manuscript will be submitted within the next 10 days

PROGRESS: 2014/08/15 TO 2015/08/14
Target Audience: Nothing Reported Changes/Problems:The Project Director secured a faculty position midway through the two-year funding period. The remaining year of support was converted to a standard award to Clemson University. The funds were finally transferred to Clemson on May 21st, 2015. A postdoc was hired on the project as of July 1st, 2015. A No-Cost Extension was approved, with a new end date of December 31st, 2016. What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals?During the next reporting period we will further explore transmission dynamics and demographic history. Antibiotic resistance genes will be a specific focus of this work.

IMPACT: 2014/08/15 TO 2015/08/14
What was accomplished under these goals? A manuscript describing patterns of adaption and transmission for Streptococcus agalactiae was prepared and submitted to a peer reviewed journal. This manuscript is now in revision. Major findings of the manuscript reveal a dynamic pattern of adaption for this species, with major lineages existing as either generalists able to infect a wide range of hosts or specialists that have adapted to specific hosts and/or niches. In particular, we show the existence of a distinct bovine adapted lineage and the possible ongoing adaptation of a second. Gain and loss of genes specific to environmental niches are typical signatures of specialization, with gene gain being the dominant mechanism for the bovine adapted lineage. This is in contrast to the previously described gene loss for a poikilothermic lineage. A Bayesian migration analysis showed that the strongest signal of transmission was between human and bovine hosts, with this transmission strongly biased from humans into bovines.

PUBLICATIONS: 2014/08/15 TO 2015/08/14
No publications reported this period.