TY - JOUR
T1 - Annotation of genomics data using bidirectional hidden Markov models unveils variations in Pol II transcription cycle
AU - Zacher, Benedikt
AU - Lidschreiber, Michael
AU - Cramer, Patrick
AU - Gagneur, Julien
AU - Tresch, Achim
N1 - Publisher Copyright:
© 2014 The Authors. Published under the terms of the CC BY 4.0 license.
PY - 2014/12
Y1 - 2014/12
N2 - DNA replication, transcription and repair involve the recruitment of protein complexes that change their composition as they progress along the genome in a directed or strand-specific manner. Chromatin immunoprecipitation in conjunction with hidden Markov models (HMMs) has been instrumental in understanding these processes, as they segment the genome into discrete states that can be related to DNA-associated protein complexes. However, current HMM-based approaches are not able to assign forward or reverse direction to states or properly integrate strand-specific (e.g., RNA expression) with non-strand-specific (e.g., ChIP) data, which is indispensable to accurately characterize directed processes. To overcome these limitations, we introduce bidirectional HMMs which infer directed genomic states from occupancy profiles de novo. Application to RNA polymerase II-associated factors in yeast and chromatin modifications in human T cells recovers the majority of transcribed loci, reveals gene-specific variations in the yeast transcription cycle and indicates the existence of directed chromatin state patterns at transcribed, but not at repressed, regions in the human genome. In yeast, we identify 32 new transcribed loci, a regulated initiation-elongation transition, the absence of elongation factors Ctk1 and Paf1 from a class of genes, a distinct transcription mechanism for highly expressed genes and novel DNA sequence motifs associated with transcription termination. We anticipate bidirectional HMMs to significantly improve the analyses of genome-associated directed processes. Synopsis Bidirectional hidden Markov models improve the annotation of DNA-associated processes from genomics data, reveal variations in the yeast Pol II transcription cycle and identify directed chromatin state patterns at transcribed regions in the human genome. Genomic feature annotations derived from bidirectional hidden Markov models are up to twice as accurate compared to those from standard hidden Markov models. Variations in the yeast Pol II transcription cycle fall into clusters of co-regulated genes, whose functional categories range from housekeeping and cell cycle to stress response. New insights into transcriptional regulation are obtained, indicating a regulated initiation-elongation transition and a distinct transcription mechanism for highly expressed genes. An implementation of bidirectional hidden Markov models is freely available at the Bioconductor website: http://www.bioconductor.org/packages/devel/bioc/html/STAN.html. Bidirectional hidden Markov models improve the annotation of DNA-associated processes from genomics data, reveal variations in the yeast Pol II transcription cycle and identify directed chromatin state patterns at transcribed regions in the human genome.
AB - DNA replication, transcription and repair involve the recruitment of protein complexes that change their composition as they progress along the genome in a directed or strand-specific manner. Chromatin immunoprecipitation in conjunction with hidden Markov models (HMMs) has been instrumental in understanding these processes, as they segment the genome into discrete states that can be related to DNA-associated protein complexes. However, current HMM-based approaches are not able to assign forward or reverse direction to states or properly integrate strand-specific (e.g., RNA expression) with non-strand-specific (e.g., ChIP) data, which is indispensable to accurately characterize directed processes. To overcome these limitations, we introduce bidirectional HMMs which infer directed genomic states from occupancy profiles de novo. Application to RNA polymerase II-associated factors in yeast and chromatin modifications in human T cells recovers the majority of transcribed loci, reveals gene-specific variations in the yeast transcription cycle and indicates the existence of directed chromatin state patterns at transcribed, but not at repressed, regions in the human genome. In yeast, we identify 32 new transcribed loci, a regulated initiation-elongation transition, the absence of elongation factors Ctk1 and Paf1 from a class of genes, a distinct transcription mechanism for highly expressed genes and novel DNA sequence motifs associated with transcription termination. We anticipate bidirectional HMMs to significantly improve the analyses of genome-associated directed processes. Synopsis Bidirectional hidden Markov models improve the annotation of DNA-associated processes from genomics data, reveal variations in the yeast Pol II transcription cycle and identify directed chromatin state patterns at transcribed regions in the human genome. Genomic feature annotations derived from bidirectional hidden Markov models are up to twice as accurate compared to those from standard hidden Markov models. Variations in the yeast Pol II transcription cycle fall into clusters of co-regulated genes, whose functional categories range from housekeeping and cell cycle to stress response. New insights into transcriptional regulation are obtained, indicating a regulated initiation-elongation transition and a distinct transcription mechanism for highly expressed genes. An implementation of bidirectional hidden Markov models is freely available at the Bioconductor website: http://www.bioconductor.org/packages/devel/bioc/html/STAN.html. Bidirectional hidden Markov models improve the annotation of DNA-associated processes from genomics data, reveal variations in the yeast Pol II transcription cycle and identify directed chromatin state patterns at transcribed regions in the human genome.
KW - RNA transcription cycle
KW - bidirectional hidden Markov model
KW - chromatin marks
KW - genome annotation
UR - http://www.scopus.com/inward/record.url?scp=84921675383&partnerID=8YFLogxK
U2 - 10.15252/msb.20145654
DO - 10.15252/msb.20145654
M3 - Article
C2 - 25527639
AN - SCOPUS:84921675383
SN - 1744-4292
VL - 10
JO - Molecular Systems Biology
JF - Molecular Systems Biology
IS - 12
M1 - 768
ER -