Protein coding gene

Protein coding gene Belnacasan (VX-765) prediction and functional annotation After quality control and end trimming, the transcrip tome reads were mapped against the reference genome using TopHat software, v 2. 0. 6. A reference data set of 399 H. contortus protein encoding genes was manually curated from predictions of highly conserved genes using CEGMA and RNA seq mapping. Of those, 347 were used to train Augustus and 52 were used to independently evaluate the accuracy of the predictions. Final gene prediction was performed Inhibitors,Modulators,Libraries by Augustus using H. contortus specific parameters and RNA seq, EST and polyA mappings as evidence hints generated by TopHat2 and PASA, respectively. Gene prediction accuracy was computed at the level of nucleotides, exons and complete genes on 52 manual curated gene models as described previously and shown in Table S7 in Additional file 1.

Func tional annotation information was obtained from the interpro databases using interproscan v4. 5, GO terms were annotated via interpro2GO and from the curated C. elegans annotation Inhibitors,Modulators,Libraries in Wormbase by assigning all GO terms shared by all C. elegans Inhibitors,Modulators,Libraries genes in a gene family to the H. contortus members of that family. Further functional insight was obtained by BLAST searches for similar genes Inhibitors,Modulators,Libraries in the GenBank nr database, and putative signal peptides were identified by SignalP. To investigate H. contortus metabolism, a total of 828 ECs, covering 2,853 proteins, were assigned using KAAS, DETECT and EFICAz. Of these, 563 ECs covering 1,246 proteins were assigned to a metabolic pathway. the others are non metabolic enzymes.

Inhibitors,Modulators,Libraries Similar annotation efforts were carried out on Caenorhabditis species and P. pacificus. Gene expression and Gene Ontology analysis The numbers of RNA seq reads per gene model were counted using custom made scripts making use of BED tools and a gff file of the genome annotation, and based on the TopHat mapping described above. Analysis of gene expression was performed using the DESeq pack age for Bioconductor. Read coverage was normalized to estimate the effective library sizes for each library and negative binomial tests performed between pairs of sample triplicates, using dispersion estimates from the default approach, to obtain P values for differential expression of each gene adjusted for false discovery rate using the Benjamini Hochberg procedure for multi testing. Only genes with adjusted P values 1e 5 were retained. GO terms enriched in the cell assay set of differentially expressed genes in each comparison were identified using the weight01 algorithm of the TopGO package for Bioconductor. Only GO terms with P 0. 01 were considered for more detailed analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>