TY - JOUR
T1 - Efficiently predicting high resolution mass spectra with graph neural networks
AU - Murphy, Michael
AU - Jegelka, Stefanie
AU - Fraenkel, Ernest
AU - Kind, Tobias
AU - Healey, David
AU - Butler, Thomas
N1 - Publisher Copyright:
© 2023 Proceedings of Machine Learning Research. All rights reserved.
PY - 2023
Y1 - 2023
N2 - Identifying a small molecule from its mass spectrum is the primary open problem in computational metabolomics. This is typically cast as information retrieval: an unknown spectrum is matched against spectra predicted computationally from a large database of chemical structures. However, current approaches to spectrum prediction model the output space in ways that force a tradeoff between capturing high resolution mass information and tractable learning. We resolve this tradeoff by casting spectrum prediction as a mapping from an input molecular graph to a probability distribution over chemical formulas. We further discover that a large corpus of mass spectra can be closely approximated using a fixed vocabulary constituting only 2% of all observed formulas. This enables efficient spectrum prediction using an architecture similar to graph classification - GRAFF-MS - achieving significantly lower prediction error and greater retrieval accuracy than previous approaches.
AB - Identifying a small molecule from its mass spectrum is the primary open problem in computational metabolomics. This is typically cast as information retrieval: an unknown spectrum is matched against spectra predicted computationally from a large database of chemical structures. However, current approaches to spectrum prediction model the output space in ways that force a tradeoff between capturing high resolution mass information and tractable learning. We resolve this tradeoff by casting spectrum prediction as a mapping from an input molecular graph to a probability distribution over chemical formulas. We further discover that a large corpus of mass spectra can be closely approximated using a fixed vocabulary constituting only 2% of all observed formulas. This enables efficient spectrum prediction using an architecture similar to graph classification - GRAFF-MS - achieving significantly lower prediction error and greater retrieval accuracy than previous approaches.
UR - http://www.scopus.com/inward/record.url?scp=85164474151&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85164474151
SN - 2640-3498
VL - 202
SP - 25549
EP - 25562
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 40th International Conference on Machine Learning, ICML 2023
Y2 - 23 July 2023 through 29 July 2023
ER -