Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets

Christian Dallago, Konstantin Schütze, Michael Heinzinger, Tobias Olenyi, Maria Littmann, Amy X. Lu, Kevin K. Yang, Seonwoo Min, Sungroh Yoon, James T. Morton, Burkhard Rost

Publikation: Beitrag in FachzeitschriftArtikelBegutachtung

55 Zitate (Scopus)

Abstract

Models from machine learning (ML) or artificial intelligence (AI) increasingly assist in guiding experimental design and decision making in molecular biology and medicine. Recently, Language Models (LMs) have been adapted from Natural Language Processing (NLP) to encode the implicit language written in protein sequences. Protein LMs show enormous potential in generating descriptive representations (embeddings) for proteins from just their sequences, in a fraction of the time with respect to previous approaches, yet with comparable or improved predictive ability. Researchers have trained a variety of protein LMs that are likely to illuminate different angles of the protein language. By leveraging the bio_embeddings pipeline and modules, simple and reproducible workflows can be laid out to generate protein embeddings and rich visualizations. Embeddings can then be leveraged as input features through machine learning libraries to develop methods predicting particular aspects of protein function and structure. Beyond the workflows included here, embeddings have been leveraged as proxies to traditional homology-based inference and even to align similar protein sequences. A wealth of possibilities remain for researchers to harness through the tools provided in the following protocols.

OriginalspracheEnglisch
Aufsatznummere113
FachzeitschriftCurrent Protocols
Jahrgang1
Ausgabenummer5
DOIs
PublikationsstatusVeröffentlicht - Mai 2021

Fingerprint

Untersuchen Sie die Forschungsthemen von „Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren