|
Item No. 1 of 1
ACCESSION NO: 1012163 SUBFILE: CRIS
PROJ NO: ALAX-011-CBG2116 AGENCY: NIFA ALAX
PROJ TYPE: OTHER GRANTS PROJ STATUS: TERMINATED
CONTRACT/GRANT/AGREEMENT NO: 2017-38821-26421 PROPOSAL NO: 2016-06543
START: 01 MAY 2017 TERM: 30 APR 2020 FY: 2019
GRANT AMT: $297,478 GRANT YR: 2017 AWARD TOTAL: $297,478 INITIAL AWARD YEAR: 2017
INVESTIGATOR: SRIPATHI, V.
PERFORMING INSTITUTION:
ALABAMA A&M UNIVERSITY
4900 MERIDIAN STREET
NORMAL, ALABAMA 35762
UNDERSTANDING EPIGENOMIC VARIATION IN DIPLOID COTTON, GOSSYPIUM HERBACEUM IN RESPONSE TO ABIOTIC STRESS
NON-TECHNICAL SUMMARY: Project summary: Heat, drought and salinity stresses lead to metabolic toxicity, membrane disorganization, generation of reactiveoxygen species (ROS), inhibition of photosynthesis and altered nutrient acquisition. This effort will identify underlying genetic and epigenetic mechanisms linked to abiotic tolerance inherent in Gossypium herbaceum. The study responds to the accentuated needs of the farmers by climate change. This research proposal is based on four-tiered objectives: 1. Expand genomic sequence coverage of an underexplored diploid cotton, G. herbaceum and develop a reference A1 genome; 2. Understanding epigenomic variation in diploid cotton, G. herbaceum, in response to abiotic stress (heat-, drought-, and salinity-stress) by utilizing Whole Genome BiSulphite Sequencing (BS-Seq) and Chromatin Immunoprecipitation-Sequencing (ChIP-Seq);
3. Identification of regulatory mechanisms associated with gene expression by utilizing comparative genomics; and 4. Train two PhD level graduate students and four undergraduate students, while expanding AAMU's capabilities in emerging sphere of epigenetic investigations. The expected outcomes of this project are: a) providing a draft genome (>96%) of G. herbaceum; b) providing gold-standard (108 methylated sites) stressed and un-stressed methylome patterns of G. herbaceum to plant community; c) decipher components of tolerance mechanisms, their regulation and control of gene expression by signaling networks and transcriptional regulators; d) data generated from this effort will be shared to public and published in high-quality journals; and e) The broader impact is the information generated will help in developing abiotic stress resistant cultivars by applying plant breeding
strategies and in training next generation scientists in advanced STEM fields.
OBJECTIVES: Long-term Goals: The proposed project is a new application with long-term goals of the participating institution, Alabama A&M University (AAMU), with active research for the past five decades on agriculturally important crops. The Center for Molecular Biology (CMB) at Alabama A&M University has a well-established sequencing facility for supporting inter-institutional and inter-departmental genomic needs. CMB has a two-decade history in applied genomic research, particularly in deciphering molecular mechanisms associated with abiotic and biotic plant-stress responses. Such an impressive track record alone is not sufficient enough as new approaches and advances are driving the field of molecular biology. Therefore, additional capacity building is needed if one must train undergraduate and graduate students in cutting-edge technologies. The inherent
genetic potential of some species to tolerate stress can be tapped by screening genomes. Crop production is increasingly challenging as terrestrial land plants are forced to grow under modified soil conditions due to global climate change. Understanding genomic and epigenomic variation in land plants under abiotic stresses will aid in unraveling key regulatory mechanisms associated with abiotic tolerance and adaptation. Through this project, AAMU aims to advance its long-term goals by expanding research in cotton using a diploid, underexplored parental species, Gossypium herbaceum applying novel approaches available in post-genomics era, train two graduate and four undergraduate students in conducting comprehensive research in understanding genomic and epigenomic landscapes of G. herbaceum, and to tap its inherent genetic potential linked to traits associated with abiotic stress
tolerance. A holistic approach as proposed here that integrates genomic, transcriptomic and epigenomic resources in cotton will provide better understanding of these underlying molecular mechanisms. The advanced research as proposed here not only benefit the stake holders but also in training young minds who aspire to pursue their careers in STEM disciplines, which is integrated in the project objectives. The geographical location of AAMU also adds weight to this project as it is located in the most important cotton producing belt of the US, the Tennessee river delta. This project is in response to requests for development of effective and alternative strategies for genetic improvement and sustainable production of cotton to better serve its stakeholders. This is a single institution and applied research application belonging to program code "EQ", and directly or indirectly addresses CBG
priority areas j, g & e; discipline code N2, B2; NIFA challenge areas b & c within NIFA priorities 1 & 3.Major Goal: The goal of this research effort is to identify underlying genetic and epigenetic mechanisms linked to abiotic stress tolerance that is inherent to G. herbaceum by utilizing existing resources in an effective manner and to make the data generated in this effort publicly available, which facilitates the cotton breeders in developing abiotic tolerant genotypes based genomic and epigenomic information rather than just relying on genetic information alone.Objectives: 1. Expand genomic sequence coverage of an underexplored diploid cotton, G. herbaceum.2. Understanding epigenomic variation in diploid cotton, G. herbaceum, in response to salinity stress (abiotic) by utilizing BS-Seq and ChIP-Seq.3. Identification of regulatory mechanisms associated with gene
expression by utilizing comparative genomics.4. Train two PhD level graduate students and four undergraduate students, while expanding AAMU's capabilities in emerging sphere of epigenome research.
APPROACH: Genome analysis:Our preliminary effort in sequencing, analyzing, and annotating diploid cotton, G. herbaceum (PI 630014; A1-141), using 454 FLX+ and Illumina HiSeq2000 platforms was encouraging and we assembled up to 70% of the A1-genome with minimal (20X) coverage using hybrid assembly approach with minimal funds available. Recently, the genomes of G. raimondii, G. arboreum, G. hirsutum and G. barbadense have been sequenced by diverse research groups across the world and made available in the public domain (CottonGen/Phytozome). However, we are pioneers in sequencing, analyzing, and annotating diploid cotton, G. herbaceum, and a portion of the work has been presented at the Plant and Animal Genome meeting, 2015 (Sripathi, 2015). The GC content (35%), repeat regions (42%), and protein coding genes (~48,000) identified in the assembled genome (70%)
concurred with studies in other Gossypium species. Now with the motivation received from prior experience in genome assembly and annotation and the support requested here, we are aiming to advance this research with higher aspirations: to attain more coverage (>100X), to assemble >95% of the genome, to utilize similar but more robust approach.DNA and RNA Extraction: The genomic DNA (gDNA) and total RNA will be extracted (as per the manufacturer's guidelines) from 54 (2 x 3 x 3 x 3) samples as mentioned in the experimental design. These biological replicates (R1, R2 and R3) collected from each time point will be pooled to reduce the sequencing costs, thus generating 18 libraries (2 x 3 x 3) each for ChIP-Seq, BS-Seq and RNA-Seq applications. The quantity and quality of gDNA, methylated DNA, ChiP'ed DNA, total RNA, and their libraries will be determined using suitable kits
on Bioanalyzer 2100 (Agilent Technologies, CA, USA) and only 0.5 - 1.0 µg samples with A260/A280 ratios > 2.0 and RIN > 8.0 will be selected for sequencing.RNA-Seq analysis: Total RNA isolated from 54 samples will be used for sequencing mRNA, small RNAs and degraded RNA (degradome). For these samples, Illumina TrueSeq libraries (18) will be prepared and quality libraries will be sequenced at HudsonAlpha Institute of Biotechnology (HAIB, Huntsville, AL). Raw reads obtained will be evaluated by FastQC, trimmed for adapters, filtered (Phred score < 30) for low-quality reads by using FASTX toolkit and resultant high-quality reads of at least 50 bases will be retained. The quality reads thus collected will be mapped against the reference genome (G. arboreum) using TopHat with default parameters. The genome annotations available at CottonGen and Phytozome will be used to
extract features from transcriptome analysis. HTSeq will be used to generate raw read counts per gene from each sample using TopHat output and the known gene annotations. The resulting annotation information (bam files) will be used to determine differential gene expression using Cufflinks (v 2.0.2) suite of programs (Trapnell et al., 2012). Further, we will categorize the genes associated with biotic-stress responses based on KOG functional classes. The hypergeometric test with multiple adjustments will be used for GO analysis and categorized into their respective classes or pathway annotations based on the Kyoto Encyclopedia of Genes and Genomes (KEGG).Chromatin immunoprecipitation followed by sequencing: ChIP assay will be performed (Luo and Lam, 2014; and Ayyappan et al., 2015), and modified for cotton. High quality ChIP-Seq libraries will be prepared and sequenced on Illumina
HiSeq 2500. Raw reads will be collected and their quality will be assessed using FastQC to determine data statistics such as number of reads, individual nucleotide count, total number of nucleotides, and GC percentage. Raw reads will then be trimmed and filtered to remove low quality data, and mapped to the G. arboreum genome with no more than two mismatches by using Bowtie. The methylation (H3K9me2) and acetylation (H4K12ac) marked peaks will be identified using Spatial clustering for Identification of ChIP-Enriched Regions (SICER) and annotated by using HOMER from both stressed and un-stressed samples. A stern filtering method will be adopted while identifying differentially marked peaks. A gene will be regarded as being methylated (H3K9me2-modified) or acetylated (H4K12ac-modified) only if it overlaps (based on 'known' annotated genes) with peak coordinates at least by one
base.Methylated cytosine sequencing: Methyl-MaxiSeqTM EpiQuest libraries will be prepared (Crampton et al., 2016), fragment size will be checked using the Bioanalyzer-2100 and will be sequenced on the Illumina HiSeq 2500. Sequence reads from bisulfite-treated EpiQuest libraries will be identified using standard Illumina base-calling software and then analyzed using in-house analysis pipeline written in Python and using methylation peak calling algorithm, Bismark. The methylation level of each sampled cytosine will be estimated as the number of reads reporting a C, divided by the total number of reads reporting a C or T (Crampton et al., 2016). Promoter and gene body annotations will be added using other Gossypium species genome annotations available at CottonGen and Phytozome. Further, weighted methylation estimates will be taken from overall methylation ratio in a particular location
after considering the sequencing depth of each methylation site. This method has been successfully utilized in both mammals and plants (Schmitz et al., 2013). Weighted methylation will be calculated for selected genes including genic and promoter regions as reported earlier (Schultz et al., 2012).Methylated DNA Immunoprecipitation (MeDIP-Seq): To understand methylomes ofstressed and un-stressedcotton, assessing peak density and peak shape associated with DNA methylation is necessary. Peaks can be called by mapping reads to the G. arboreum genome to reveal the loci of selectively methylated DNA. A correlation between peak density and chromosomal length has been proposed in mammals (Hughes et al., 2010) and extended to plants. We will be estimating the methylation peak density (peak/Mb) based on chromosome length and total number of peaks identified per chromosome. MeDIP-Seq signatures
per chromosome, total peaks across the genome, peaks found only in promoter, genic and non-genic regions will be identified. Peaks obtained from MeDIP-Seq will be compared against the methylation sites from BS-Seq data to corroborate the peaks identified in likely methylated regions and to compare the two methylome sequencing work flows. Further, top 20 most significant (p < 0.05) MeDIP-Seq peaks identified will be compared to CG and CHG weighted methylation determined from BS-Seq.Reverse Transcriptase-PCR (RT-PCR) and quantitative RT-PCR (qRT-PCR or qPCR) validation: Genes associated with ROS signaling pathways, R-genes, and stress-responsive genes will be identified, primers (TaqMan) will be designed using an online tool and validated experimentally using qPCR. The RT-PCR and qPCR validations will be carried outto qualitatively detect the gene (mRNA) expression and to quantitatively
measure the amplification of cDNA by using Tetrad thermocycler (Bio-Rad Laboratories, Hercules, CA) and Roche Light cycler (Roche, Foster City, CA), respectively. To normalize the results, cotton ubiquitin extension protein UBQ7 (DQ116441) gene will be used as an internal control, and relative expression levels representing the relative fold changes as compared with the expression level of UBQ7 will be calculated with 2−ΔCt. The normalized CT values (ΔCT) from qPCR analysis will be collected and analyzed by using Minitab 17, the expression results will be presented as mean±SE. One-way ANOVA will be performed on qPCR experiments for multiple comparisons between the mean of samples.
PROGRESS: 2017/05 TO 2020/04 Target Audience:The research results generated here will provide a better understanding of epigenomic variation between salt-stressed and unstressed cotton and germplasm improvement ofGossypiumspecies. The resources (germplasm and data) generated from this project are publicly available and helpful for academia, industry, agricultural scientists, plant biologists, geneticists, and farmers. Also, data generated by this project help in developing resources that benefit limited resource farmers in the southern US to cater to the increasing population needs. The protocols, tools, and resources developed in this project can be applied to other crops. The project benefits the cotton researchers in specific and plant biologists in general. The project significantly improves this 1890 institution's capacity, as it develops expertise in the
emerging science of Genomics and Bioinformatics. Additionally, five full-time and part-time graduates, two research interns, two visiting scholars, and six undergraduate students received comprehensive training on the use and development of genomic and epigenomic tools while undertaking this project. Moreover, this project is encouraging underrepresented undergraduate and graduate students interested in STEM programs with an emphasis in plant biology and genetics. Students will gain expertise in multiple disciplines and pursue further studies or careers in these areas of research. Changes/Problems:Genome sequencing and analyses (Objective 1), epigenomic variation analyses in response to salt-stress using BS-Seq (Objective 2), and comparative gene expression analyses using RNA-Seq (Objective 3) in a diploid cotton species,G. herbaceum, and training next-generation scientists in advanced
areas of Genomics and Bioinformatics (Objective 4), progressed as expected. All four objectives progressed as expected, except for the Chip-Seq component of Objective 2, because one of the graduate students recruited to undertake this experiment left the project abruptly and because of the disruption by this Covid-19 pandemic to the wet lab aspects of our research. As a result, a portion of the allocated funds for this experiment was unutilized and returned. Also, in Objective 3, out of 32 RNA-Seq libraries prepared, the sequencing quality of one sample was poor and excluded from downstream analysis. What opportunities for training and professional development has the project provided?More than what we proposed, this project supported (full or partially) and trained three thesis students (G. Taylor, D. Head, and S. Etukuri), two non-thesis (M. Hassan and M. Jakka), five part-time
students (K. Keita, M. Miller, S. Ekke, M. Yang, and S. Reddy), six undergraduates (B. White, F. Thomas, I. Crawford, M. Smith, K. Gibbons, and M. Simbule), two research interns (V. Golston and A. Mummadi) and a bioinformatician (Z. Gossett) in the advanced STEM areas, i.e., Genomics and Bioinformatics. A student visitor (A. Brown) from Tuskegee University and a visiting scientist from Uzbekistan (M. Ayubov) were also trained in these areas. The students and scientists listed above received hands-on experience in next-generation sequencing technologies followed by bioinformatics analyses (RNA-Seq and BS-Seq) while analyzing the data generated in this project. The students and professionals trained in this project currently wish to pursue their careers in the STEM fields, thus contributing to the next generation workforce development. How have the results been disseminated to communities
of interest?Results from this work provide a substantially more comprehensive genome coverage of a diploid cotton,G. herbaceum, and will help in the understanding of epigenome modifications in cotton under salt stress. The epigenomic and transcriptomic resources developed from this project were made available to the user community in many ways: 1) We initially presented posters at the Plant and Animal Genome (PAG) conference and at the Beltwide Cotton Conference organized by the National Cotton Council of America, each attended by plant breeders from both public and private entities; 2) The outreach is not be limited to a mere presentation at the professional meetings. For example, during each summer, groups of high school students (5 -10) come to AAMU from the different states in the United States through the North Alabama Center for Education Excellence (NACEE). These students will
visit our laboratories and be provided information on the projects outlined above; 3) The results were shared with our close collaborators in the southeastern United States, both in public and private plant breeding programs. We currently have formal collaborations with USDA ARS scientists at College Station (TX), Stoneville, and Starkville (MS), and at our sister land grant universities in Alabama (members in Alabama Land Grant Alliance or ALGA include Alabama A&M University, Auburn, and Tuskegee Universities); 4) In addition, both the undergraduate and graduate students meet, greet, and introduce our research to potential students on STEM Day, an annual event at AAMU; 5) A website has been developed as an outcome of this project, which will soon be available and maintained through AAMU's Center for Molecular Biology. Short video content will also be produced, posted on our
website, and linked to other sites visited by potential students. 6) As students complete their research work, it will be expeditiously published in peer-reviewed journals; 7) The data and other information resulting from the specific objectives are shared through the NCBI website and published in regional and national journals. Our use of an integrated approach to better understand epigenomic variation in cotton under salt stress (abiotic), instead of separately studying epigenomics and transcriptomics, makes this project innovative and informative. The BS-Seq and RNA-Seq experiments were conducted from the same tissues to infer the correlation between gene regulation and gene expression. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported
IMPACT: 2017/05 TO 2020/04 What was accomplished under these goals? In the past three years (2017, 2018, and 2019), we completed almost all the objectives as listed in the timeline. Under Objective 1 (to better understand the genome of underexplored diploid cotton,G. herbaceum, a closest progenitor A-genome species), we isolated genomic DNA (gDNA) and sequenced at ~100X coverage on Illumina HiSeq, resulting in ~600 million paired-end and mate-pair reads, collectively. These reads were combined with our previous sequencing data forG. herbaceum. The metadata thus obtained (from both Roche/454 and Illumina platforms) was assembled using a hybrid assembly approach, which utilized multiple assemblers that accommodate short and long reads. Using AbySS, we generated a non-redundant, combined (Roche/454 + Illumina), hybrid assembly, consisting of 472,942 contigs in 1.21
GB with an N50 contig length of 6,197 bp. After two rounds of scaffolding with these contigs using SSPACE, the assembly contained 293,653 scaffolds in 1.46 GB with an N50 contig length of 61,351 bp. Gap filling of scaffolds with Sealer reduced the percentage of Ns from 16.65% to 15.87%. This scaffold assembly ofG. herbaceumwas used for the subsequent assembly assessment as well as gene prediction and annotations. BUSCO's genome assembly assessment detected 95.3% of BUSCO sets, which contained single-copy orthologs in at least 90% of the species and identified that 3.2% of BUSCOs were missing. Similarly, in the gene set (proteins) assessment, 92.7% of BUSCO sets were detected, while 2.7% of BUSCOs were missing. These BUSCO scores estimate the overall completeness of the genome assembly. Similarly, an alignment coverage was defined as the percentage length of chromosomes with at least
one aligned scaffold from the A1 genome against each A2, AD1, and D5 genomes. The best coverage was determined to be approximately 80% for each chromosome ofG. arboreum(A2), contributing an additional measure for assembly completeness. The average GC content was ~40%. In total, 41,387 genes were predicted across 11,738 scaffolds, including 14,993 genes supported by the expressed sequence tag data. These genes were annotated for protein sequence, tRNA, and rRNA. Furthermore, OrthoFiller predicted 459 additional genes, of which 209 (45%), 172 (37%), and 459 (100%) have hits against NCBI NR, Arabidopsis, andG. arboreumprotein databases, respectively. OrthoMCL was used to generate 25,416 multiple orthologous clusters. Further, the scaffolds/superscaffolds were assembled into pseudochromosomes using Chromosomer. The 13 assembled pseudochromosomes were analyzed for GC and repeat content,
abiotic stress-responsive, overall stress-responsive, and fiber-related gene expression, all of which are organized and graphed Circos plot. Also, gene network analysis with StringDB elucidated several gene interaction networks that are associated with fiber-related genes. Under Objectives 2 and 3 (to understand the epigenomic variation of salt-sensitive and salt-tolerant genotypes of cotton), we evaluated two cotton species,G. herbaceum, andG. hirsutum. First,G. herbaceumandG. hirsutumseeds were delinted, surface sterilized, germinatedin vitro, and then transplanted into the greenhouse. Plants approximately 30 days old were subjected to salt stress. In total, 32 (2 x 2 x 4 x 2) samples were collected, by taking salt-tolerant and salt-susceptible species, two treatment conditions (unstressed, 0 mM NaCl; stressed, 200 mM NaCl), four-time points (0-, 1-, 7-, and 14-days after treatment),
and two technical replicates (pooled from four biological replicates). To suffice the needs of downstream applications, double the required number of plants were maintained in the greenhouse. Both genomic DNA (gDNA) and total RNA were extracted (as per the manufacturer's guidelines) from the 32 samples (2 species x 2 treatments x 4-time points x 2 technical replicates). The four biological replicates collected were pooled, and two technical replicates were used for sequencing to reduce the costs, thereby generating 32 libraries (2 x 2 x 4 x 2) each for Bisulfite-sequencing (BS-Seq) and RNA-sequencing (RNA-Seq). The quantity and quality of gDNA, methylated DNA, total RNA, and their libraries were determined using suitable kits on TapeStation 2200 (Agilent Technologies, CA, USA). The pooled libraries were sequenced on an Illumina NovaSeq to generate over 1 Tb of data, analyzed under
Objectives 2 and 3. RNA-Seq generated a total of 1.7654 billion reads, with an average of 55.17 million per library, ranging from 44 to 86 million reads among the 32 samples. In our libraries, reads sequenced per sample varied due to the difference in the sample quantity utilized, ranging between 0.5 to 1.5 ug. Almost all sequenced reads were 150 bp in length and were quantified using FastQC, before and after trimming by Cutadapt and FastX toolkit, separately. The Phred Scores for all reads throughout all 32 samples ranged between 35 and 37, suggesting high-quality data. The average GC content of all reads varied between 48-55% per sample. Between 7-21 million reads were uniquely mapped to the reference genome (G. hirsutum) and evaluated using STAR aligner. Among the mapped reads, 12-27% of the sequences have assigned function. The read distributions of uniquely mapped reads were
assessed for genomic features, revealing that most reads (>50%) were mapped to exons (CDS, 5'UTR, and 3'UTR) and that the others were distributed between introns (0.2 - 0.5%), TSS (1 - 8%), TES (30 - 40%), and intergenic regions (7.5 - 15%). Further, overlapping genomic features and their distributions among the samples were evaluated to identify protein-coding sequences (5 - 20 million reads, M), rRNA (2.5 - 7.5 M), pseudogenes (0.5 - 2.0 M), lncRNA (0.05 - 0.20 M), transcribed pseudogenes (0.05 - 0.10 M), other proteins with assigned function (0.05 - 0.10 M), transcripts with unassigned function (10 - 25 M), and ambiguous transcripts (1 - 2 M). To begin our BS-Seq analysis, bisulfite-treated Methyl-MaxiSeq libraries were prepared from 50 ng gDNA, using Pico Methyl-Seq Library Prep Kit as per the manufacturer's protocol (Zymo Research) and Illumina Unique Dual Indices.
Library quality control was performed on the Agilent 2200 TapeStation, then sequenced on an Illumina NovaSeq. The raw FASTQ files were adapter and quality trimmed using TrimGalore. FastQC was used to assess the effect of trimming and overall quality distributions of the data. Alignment to theG. hirsutumreference genome was performed using Bismark. Methylated and unmethylated read totals for each CpG site were called using MethylDackel. The methylation level of each sampled cytosine was estimated as the number of reads reporting a C, divided by the total number of reads indicating a C or T. Differentially methylated regions (DMRs) were accessed in the three methylation contexts (CpG, CGH, and CHH) to identify 15377, 1667, and 5465 variable sites between the two diverse species. The annotated DMRs belonged to 21 protein classes, and the top three protein classes were metabolic
interconversion enzyme, nucleic acid-binding protein, and protein modification enzyme. In association with our study, three processes of interest identified were: cellular response to the stimulus, response to abiotic stimulus, and response to stress. Under Objective-4, we trained more students than we proposed in the timeline to achieve the Objectives 1, 2, and 3. During the project duration, we recruited five MS-level students, of which two have graduated, one is a semester from graduation, and two abruptly left for personal reasons. Also, six undergraduate students have worked towards achieving the project goals, among which three have graduated and three plan on graduating in two years.
PUBLICATIONS (not previously reported): 2017/05 TO 2020/04
1. Type: Journal Articles Status: Published Year Published: 2018 Citation: Sripathi, V.R., Choi, Y., Gossett, Z.B., Stelly, D.M., Moss, E.M., Town, C.D., Walker, L.T., Sharma, G.C. and Chan, A.P., 2018. Identification of microRNAs and their targets in four Gossypium species using RNA sequencing. Current Plant Biology, 14, pp.30-40.
2. Type: Conference Papers and Presentations Status: Published Year Published: 2018 Citation: Keita, K., Williams, A.J., Miller, M., Nyaku, S.T., Lawrence, K., Sharma, G.C., and Sripathi, V.R. Transcriptome Analyses of Reniform Nematode Infested and Uninfested Susceptible and Tolerant Genotypes of Upland Cotton (Gossypium hirsutum). The Plant and Animal Genome XXVI Conference (PAG), San Diego, CA., January 13 â 17, 2018.
3. Type: Conference Papers and Presentations Status: Published Year Published: 2019 Citation: Head, D.C., Gossett, Z.B., Erpelding, J., Sharma, G.C., and Sripathi, V.R. Comparative Transcriptome Analysis of Salt-Tolerant and Sensitive Genotypes of Cotton. ARD Research Symposium 2019, Jacksonville, FL., March 30th â April 2nd, 2019.
4. Type: Conference Papers and Presentations Status: Published Year Published: 2019 Citation: Thomas, F., Head, D.C., Etukuri, S.P., Sharma, G.C., and Sripathi, V.R. Effects of Drought Stress on Morphological and Physiological Characteristics, and Transcriptome of Upland Cotton. Thirteenth Annual STEM Day, Alabama A&M University, Huntsville, AL., April 12th, 2019.
5. Type: Conference Papers and Presentations Status: Published Year Published: 2020 Citation: Etukuri, S.P., Williams, A.J., Gossett, Z.B., Anche, V.C., Sharma, G.C., and Sripathi, V.R. Differentially Expressed Genes (DEGs) in Response to Reniform Nematode in Upland Cotton, Gossypium hirsutum. The Plant and Animal Genome XXVII Conference (PAG), San Diego, CA., January 11-15, 2020.
PROGRESS: 2017/05/01 TO 2018/04/30 Target Audience:This research results will provide a better understanding of epigenomic variation between salt-stressed and un-stressed plants for growth and development useful for germplasm improvement of Gossypium species. The resources (both germplasm and data) generated from this project will be made publicly available and are useful for academic, industry, agricultural scientists, plant biologists, geneticists, and farmers. Also, data generated in this project will develop resources that benefit limited resource farmers in the southern US to cater the needs of the increasing population. The protocols, tools and resources developed in this project can be applied to other crops. The project will benefit the cotton researchers in specific and plant biologists in general. The project will have a great impact on the capacity of
this 1890 institution as it develops expertise in the emerging science of genomics and bioinformatics. Additionally, two graduate and four undergraduate students will receive comprehensive training on use and development of genomic and epigenomic tools while undertaking this project. Moreover, this project is encouraging underrepresented undergraduate and graduate students interested in STEM programs with an emphasis in plant biology and genetics. Students will gain expertise in multiple disciplines and pursue further studies or professional careers in this area of research. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?So far, we have trained two graduate (MS-level) students and two undergraduate juniors in the advanced STEM areas, i.e. genomics and bioinformatics. In the second year, we will be recruiting two
more undergraduate juniors to continue the project goals and objectives. ? How have the results been disseminated to communities of interest?Results from this work will provide a substantially more comprehensive genome coverage of a diploid cotton, G. herbaceum, and will help in the understanding of genome-wide epigenome modifications in cotton under salt stress. The epigenomic and transcriptomic resources developed from this project will be made available to the user community by: 1) initially presenting posters in the Plant and Animal Genome (PAG) meeting and in the Beltwide Cotton Conference organized by the National Cotton Council of America, each attended by plant breeders from both public and private entities; 2) the outreach will not be limited to a mere presentation at the professional meetings. For example, during each summer, groups of high school students come to AAMU from the
cities of Huntsville and Birmingham. These students will visit our laboratories and be provided information on the projects outlined above; 3) the results will also be shared with our close collaborators in the southeastern United States, both in public and private plant breeding programs; we currently have formal collaborations with USDA ARS scientists at College Station (TX), at Stoneville and Starkville (MS), and at our sister land grant universities in Alabama (members in Alabama Land Grant Alliance or ALGA include Alabama A&M University, Auburn, and Tuskegee Universities); 4) in addition, both the undergraduate and graduate students will meet, greet, and introduce Science to potential students on STEM Day, an annual event at AAMU; 5) this project will also develop and publish web content, which will be hosted on the website for AAMU's Center for Molecular Biology. YouTube
short video content will also be produced, posted on our website, and linked to other sites visited by potential students. Additionally, as students complete their research work, it will be expeditiously published in peer reviewed journals; 6) the data and other information resulting from the specific objectives will be shared through the NCBI website and published in regional and national journals. Our use of an integrated approach to better understand epigenomic variation in cotton under salt stress (abiotic), instead of separately studying epigenomics and transcriptomics, makes this project innovative and informative. We will conduct BS-Seq and RNA-Seq experiments from the same tissues to infer the correlation between epigenomic- and sRNA-mediated regulation in gene expression. What do you plan to do during the next reporting period to accomplish the goals?Briefly, as proposed in the
time-line, in early Summer 2018 we will recruit two undergraduate juniors to continue the project goals and work toward achieving objective-4. Later in the second year (Fall and Spring, 2018), we will complete objective-2 and a portion of objective-3. Also, the results from the objective-1 will be published by the end of the second year.? IMPACT: 2017/05/01 TO 2018/04/30 What was accomplished under these goals? Briefly, as proposed in the time-line, in early Summer 2017 we recruited two graduate (MS-level) students and two undergraduate juniors to undertake the project goals and achieve objective-4. Later in the first year (Fall and Spring, 2017), we completed objective-1 and a portion of objective-2. Under Objective-1: To better understand the genome of an underexplored diploid cotton, G. herbaceum, a closest progenitor A-genome species, we isolated the gDNA
and sequenced at ~100X coverage on Illumina HiSeq, resulting in ~600 million paired-end and mate-pair reads collectively. These reads were combined with our previous sequencing data for G. herbaceum. The meta-data thus obtained (from both Roche/454 and Illumina platforms) was assembled using hybrid assembly approach, which utilized multiple assemblers that accommodate short and long reads. Using AbySS we generated a non-redundant combined (Roche/454 + Illumina) assembly, consisting of 472,942 contigs in 1.21 GB with N50 contig length of 6,197 bp. After two rounds of scaffolding with these contigs using SSPACE, the assembly contained 293,653 scaffolds in 1.46 GB with N50 contig length of 61,351 bp. Gap filling of scaffolds with sealer reduced the percentage of Ns from 16.65% to 15.87%. This scaffold assembly of G. herbaceum was used for the subsequent assembly assessment as well as gene
prediction and annotations. The genome assembly assessment by BUSCO detected 95.3% of BUSCO sets, which contained single-copy orthologs in at least 90% of the species, and identified that 3.2% of BUSCOs were missing. Similarly, in gene set (proteins) assessment, 92.7% of BUSCO sets were detected, while 2.7% of BUSCOs were missing. These BUSCO scores estimate the overall completeness of the genome assembly. Similarly, an alignment coverage was determined as the percentage length of chromosomes with at least one aligned scaffold from A1 genome against each A2, AD1, and D5 genomes. The best coverage was determined to be approximately 80% for each chromosome of G. arboreum, contributing an additional measure for assembly completeness. The average GC content was ~40%, and repeat elements were determined using RepeatMasker. In total, 41,387 genes were predicted across 11,738 scaffolds,
including 14,993 genes supported by the expressed sequence tag data. These genes were annotated for protein sequence, tRNA, and rRNA. Furthermore, OrthoFiller predicted 459 additional genes, of which 209 (45%), 172 (37%), and 459 (100%) have hits against NCBI NR, Arabidopsis, and G. arboreum protein databases, respectively. OrthoMCL was used to generate a total of 25,416 multiple orthologous clusters that contain multiple genes spanning multiple taxa. Under Objective-2: To understand the epigenomic variation of salt-sensitive and salt-tolerant genotypes of cotton, we evaluated two cotton species, G. herbaceum and G. hirsutum. First, G. herbaceum seeds were delinted, surface sterilized, geminated in vitro, and then transplanted into the greenhouse. Plants approximately 30 days old were subjected to salt stress. In total, 54 (2 x 3 x 3 x 3) samples were collected, by permuting two tissues
(roots and leaves), three treatment conditions (un-stressed, 0 mM; stressed, 100 mM and 200 mM), three time points (0, 1-day, and 1-week after treatment), and three biological replicates (R1, R2 and R3). To provide sufficient plant material for downstream sequencing applications, 108 (54 x 2) plants were maintained. G. hirstum is naturally susceptible to salt stress and was used as a control to physiologically compare the symptoms. Both genomic DNA (gDNA) and total RNA were extracted (as per the manufacturer's guidelines) from the 54 (2 x 3 x 3 x 3) samples mentioned above. These biological replicates (R1, R2, and R3), collected from each time point, were pooled in order to reduce the sequencing costs, thereby generating 18 libraries (2 x 3 x 3) each for BS-Seq and RNA-Seq. The quantity and quality of gDNA, methylated DNA, total RNA, and their libraries were determined using suitable
kits on TapeStation 2200 (Agilent Technologies, CA, USA). PUBLICATIONS: 2017/05/01 TO 2018/04/30 No publications reported this period.
|