TY - JOUR
T1 - Modeling positional effects of regulatory sequences with spline transformations increases prediction accuracy of deep neural networks
AU - Avsec, Ziga
AU - Barekatain, Mohammadamin
AU - Cheng, Jun
AU - Gagneur, Julien
N1 - Publisher Copyright:
© 2017 The Author. Published by Oxford University Press. All rights reserved.
PY - 2018/4/15
Y1 - 2018/4/15
N2 - Motivation Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Results Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox.
AB - Motivation Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Results Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox.
UR - http://www.scopus.com/inward/record.url?scp=85046801446&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btx727
DO - 10.1093/bioinformatics/btx727
M3 - Article
C2 - 29155928
AN - SCOPUS:85046801446
SN - 1367-4803
VL - 34
SP - 1261
EP - 1269
JO - Bioinformatics
JF - Bioinformatics
IS - 8
ER -