Predicting mean ribosome load for 5’UTR of any length using deep learning

Alexander Karollus, Žiga Avsec, Julien Gagneur

Research output: Contribution to journalArticlepeer-review

15 Scopus citations

Abstract

The 5’ untranslated region plays a key role in regulating mRNA translation and consequently protein abundance. Therefore, accurate modeling of 5’UTR regulatory sequences shall provide insights into translational control mechanisms and help interpret genetic variants. Recently, a model was trained on a massively parallel reporter assay to predict mean ribosome load (MRL)—a proxy for translation rate—directly from 5’UTR sequence with a high degree of accuracy. However, this model is restricted to sequence lengths investigated in the reporter assay and therefore cannot be applied to the majority of human sequences without a substantial loss of information. Here, we introduced frame pooling, a novel neural network operation that enabled the development of an MRL prediction model for 5’UTRs of any length. Our model shows state-of-the-art performance on fixed length randomized sequences, while offering better generalization performance on longer sequences and on a variety of translation-related genome-wide datasets. Variant interpretation is demonstrated on a 5’UTR variant of the gene HBB associated with beta-thalassemia. Frame pooling could find applications in other bioinformatics predictive tasks. Moreover, our model, released open source, could help pinpoint pathogenic genetic variants.

Original languageEnglish
Article numbere1008982
JournalPLoS Computational Biology
Volume17
Issue number5
DOIs
StatePublished - May 2021

Fingerprint

Dive into the research topics of 'Predicting mean ribosome load for 5’UTR of any length using deep learning'. Together they form a unique fingerprint.

Cite this