Efficiently predicting high resolution mass spectra with graph neural networks

Michael Murphy, Stefanie Jegelka, Ernest Fraenkel, Tobias Kind, David Healey, Thomas Butler

Research output: Contribution to journalConference articlepeer-review

6 Scopus citations

Abstract

Identifying a small molecule from its mass spectrum is the primary open problem in computational metabolomics. This is typically cast as information retrieval: an unknown spectrum is matched against spectra predicted computationally from a large database of chemical structures. However, current approaches to spectrum prediction model the output space in ways that force a tradeoff between capturing high resolution mass information and tractable learning. We resolve this tradeoff by casting spectrum prediction as a mapping from an input molecular graph to a probability distribution over chemical formulas. We further discover that a large corpus of mass spectra can be closely approximated using a fixed vocabulary constituting only 2% of all observed formulas. This enables efficient spectrum prediction using an architecture similar to graph classification - GRAFF-MS - achieving significantly lower prediction error and greater retrieval accuracy than previous approaches.

Original languageEnglish
Pages (from-to)25549-25562
Number of pages14
JournalProceedings of Machine Learning Research
Volume202
StatePublished - 2023
Externally publishedYes
Event40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States
Duration: 23 Jul 202329 Jul 2023

Fingerprint

Dive into the research topics of 'Efficiently predicting high resolution mass spectra with graph neural networks'. Together they form a unique fingerprint.

Cite this