TY - JOUR
T1 - Predicting mean ribosome load for 5’UTR of any length using deep learning
AU - Karollus, Alexander
AU - Avsec, Žiga
AU - Gagneur, Julien
N1 - Publisher Copyright:
Copyright: © 2021 Karollus et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2021/5
Y1 - 2021/5
N2 - The 5’ untranslated region plays a key role in regulating mRNA translation and consequently protein abundance. Therefore, accurate modeling of 5’UTR regulatory sequences shall provide insights into translational control mechanisms and help interpret genetic variants. Recently, a model was trained on a massively parallel reporter assay to predict mean ribosome load (MRL)—a proxy for translation rate—directly from 5’UTR sequence with a high degree of accuracy. However, this model is restricted to sequence lengths investigated in the reporter assay and therefore cannot be applied to the majority of human sequences without a substantial loss of information. Here, we introduced frame pooling, a novel neural network operation that enabled the development of an MRL prediction model for 5’UTRs of any length. Our model shows state-of-the-art performance on fixed length randomized sequences, while offering better generalization performance on longer sequences and on a variety of translation-related genome-wide datasets. Variant interpretation is demonstrated on a 5’UTR variant of the gene HBB associated with beta-thalassemia. Frame pooling could find applications in other bioinformatics predictive tasks. Moreover, our model, released open source, could help pinpoint pathogenic genetic variants.
AB - The 5’ untranslated region plays a key role in regulating mRNA translation and consequently protein abundance. Therefore, accurate modeling of 5’UTR regulatory sequences shall provide insights into translational control mechanisms and help interpret genetic variants. Recently, a model was trained on a massively parallel reporter assay to predict mean ribosome load (MRL)—a proxy for translation rate—directly from 5’UTR sequence with a high degree of accuracy. However, this model is restricted to sequence lengths investigated in the reporter assay and therefore cannot be applied to the majority of human sequences without a substantial loss of information. Here, we introduced frame pooling, a novel neural network operation that enabled the development of an MRL prediction model for 5’UTRs of any length. Our model shows state-of-the-art performance on fixed length randomized sequences, while offering better generalization performance on longer sequences and on a variety of translation-related genome-wide datasets. Variant interpretation is demonstrated on a 5’UTR variant of the gene HBB associated with beta-thalassemia. Frame pooling could find applications in other bioinformatics predictive tasks. Moreover, our model, released open source, could help pinpoint pathogenic genetic variants.
UR - http://www.scopus.com/inward/record.url?scp=85106280685&partnerID=8YFLogxK
U2 - 10.1371/journal.pcbi.1008982
DO - 10.1371/journal.pcbi.1008982
M3 - Article
C2 - 33970899
AN - SCOPUS:85106280685
SN - 1553-734X
VL - 17
JO - PLoS Computational Biology
JF - PLoS Computational Biology
IS - 5
M1 - e1008982
ER -