TY - JOUR
T1 - GenoGAM
T2 - Genome-wide generalized additive models for ChIP-Seq analysis
AU - Stricker, Georg
AU - Engelhardt, Alexander
AU - Schulz, Daniel
AU - Schmid, Matthias
AU - Tresch, Achim
AU - Gagneur, Julien
N1 - Publisher Copyright:
© The Author 2017. Published by Oxford University Press. All rights reserved.
PY - 2017/8/1
Y1 - 2017/8/1
N2 - Motivation: Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) is a widely used approach to study protein-DNA interactions. Often, the quantities of interest are the differential occupancies relative to controls, between genetic backgrounds, treatments, or combinations thereof. Current methods for differential occupancy of ChIP-Seq data rely however on binning or sliding window techniques, for which the choice of the window and bin sizes are subjective. Results: Here, we present GenoGAM (Genome-wide Generalized Additive Model), which brings the well-established and flexible generalized additive models framework to genomic applications using a data parallelism strategy. We model ChIP-Seq read count frequencies as products of smooth functions along chromosomes. Smoothing parameters are objectively estimated from the data by crossvalidation, eliminating ad hoc binning and windowing needed by current approaches. GenoGAM provides base-level and region-level significance testing for full factorial designs. Application to a ChIP-Seq dataset in yeast showed increased sensitivity over existing differential occupancy methods while controlling for type I error rate. By analyzing a set of DNA methylation data and illustrating an extension to a peak caller, we further demonstrate the potential of GenoGAM as a generic statistical modeling tool for genome-wide assays.
AB - Motivation: Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) is a widely used approach to study protein-DNA interactions. Often, the quantities of interest are the differential occupancies relative to controls, between genetic backgrounds, treatments, or combinations thereof. Current methods for differential occupancy of ChIP-Seq data rely however on binning or sliding window techniques, for which the choice of the window and bin sizes are subjective. Results: Here, we present GenoGAM (Genome-wide Generalized Additive Model), which brings the well-established and flexible generalized additive models framework to genomic applications using a data parallelism strategy. We model ChIP-Seq read count frequencies as products of smooth functions along chromosomes. Smoothing parameters are objectively estimated from the data by crossvalidation, eliminating ad hoc binning and windowing needed by current approaches. GenoGAM provides base-level and region-level significance testing for full factorial designs. Application to a ChIP-Seq dataset in yeast showed increased sensitivity over existing differential occupancy methods while controlling for type I error rate. By analyzing a set of DNA methylation data and illustrating an extension to a peak caller, we further demonstrate the potential of GenoGAM as a generic statistical modeling tool for genome-wide assays.
UR - http://www.scopus.com/inward/record.url?scp=85026385523&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btx150
DO - 10.1093/bioinformatics/btx150
M3 - Article
C2 - 28369277
AN - SCOPUS:85026385523
SN - 1367-4803
VL - 33
SP - 2258
EP - 2265
JO - Bioinformatics
JF - Bioinformatics
IS - 15
ER -