TY - JOUR
T1 - Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction
AU - Weissenow, Konstantin
AU - Heinzinger, Michael
AU - Rost, Burkhard
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/8/4
Y1 - 2022/8/4
N2 - Advanced protein structure prediction requires evolutionary information from multiple sequence alignments (MSAs) from evolutionary couplings that are not always available. Artificial intelligence (AI)-based predictions inputting only single sequences are faster but so inaccurate as to render speed irrelevant. Here, we described a competitive prediction of inter-residue distances (2D structure) exclusively inputting embeddings from pre-trained protein language models (pLMs), namely ProtT5, from single sequences into a convolutional neural network (CNN) with relatively few layers. The major advance used the ProtT5 attention heads. Our new method, EMBER2, which never requires any MSAs, performed similarly to other methods that fully rely on co-evolution. Although clearly not reaching AlphaFold2, our leaner solution came somehow close at substantially lower costs. By generating protein-specific rather than family-averaged predictions, EMBER2 might better capture some features of particular protein structures. Results from using protein engineering and deep mutational scanning (DMS) experiments provided at least a proof of principle for such a speculation.
AB - Advanced protein structure prediction requires evolutionary information from multiple sequence alignments (MSAs) from evolutionary couplings that are not always available. Artificial intelligence (AI)-based predictions inputting only single sequences are faster but so inaccurate as to render speed irrelevant. Here, we described a competitive prediction of inter-residue distances (2D structure) exclusively inputting embeddings from pre-trained protein language models (pLMs), namely ProtT5, from single sequences into a convolutional neural network (CNN) with relatively few layers. The major advance used the ProtT5 attention heads. Our new method, EMBER2, which never requires any MSAs, performed similarly to other methods that fully rely on co-evolution. Although clearly not reaching AlphaFold2, our leaner solution came somehow close at substantially lower costs. By generating protein-specific rather than family-averaged predictions, EMBER2 might better capture some features of particular protein structures. Results from using protein engineering and deep mutational scanning (DMS) experiments provided at least a proof of principle for such a speculation.
KW - deep learning
KW - machine learning
KW - multiple sequence alignments
KW - protein language model
KW - protein structure prediction
UR - http://www.scopus.com/inward/record.url?scp=85135461799&partnerID=8YFLogxK
U2 - 10.1016/j.str.2022.05.001
DO - 10.1016/j.str.2022.05.001
M3 - Article
C2 - 35609601
AN - SCOPUS:85135461799
SN - 0969-2126
VL - 30
SP - 1169-1177.e4
JO - Structure
JF - Structure
IS - 8
ER -