TY - JOUR
T1 - Feature Selection Pipelines with Classification for Non-Targeted Metabolomics Combining the Neural Network and Genetic Algorithm
AU - Lisitsyna, Anna
AU - Moritz, Franco
AU - Liu, Youzhong
AU - Al Sadat, Loubna
AU - Hauner, Hans
AU - Claussnitzer, Melina
AU - Schmitt-Kopplin, Philippe
AU - Forcisi, Sara
N1 - Publisher Copyright:
© 2022 American Chemical Society. All rights reserved.
PY - 2022/4/12
Y1 - 2022/4/12
N2 - Non-Targeted metabolomics via high-resolution mass spectrometry methods, such as direct infusion Fourier transform-ion cyclotron resonance mass spectrometry (DI-FT-ICR MS), produces data sets with thousands of features. By contrast, the number of samples is in general substantially lower. This disparity presents challenges when analyzing non-Targeted metabolomics data sets and often requires custom methods to uncover information not always accessible via classical statistical techniques. In this work, we present a pipeline that combines a convolutional neural network with traditional statistical approaches and an adaptation of a genetic algorithm. The developed method was applied to a lifestyle intervention cohort data set, where subjects at risk of type 2 diabetes underwent an oral glucose tolerance test. Feature selection is the final result of the pipeline, achieved through classification of the data set via a neural network, with a precision-recall score of over 0.9 on the test set. The features most relevant for the described classification were then chosen via a genetic algorithm. The output of the developed pipeline encompasses approximately 200 features with high predictive scores, providing a fingerprint of the metabolic changes in the prediabetic class on the data set. Our framework presents a new approach which allows to apply complex modeling based on convolutional neural networks for the analysis of high-resolution mass spectrometric data.
AB - Non-Targeted metabolomics via high-resolution mass spectrometry methods, such as direct infusion Fourier transform-ion cyclotron resonance mass spectrometry (DI-FT-ICR MS), produces data sets with thousands of features. By contrast, the number of samples is in general substantially lower. This disparity presents challenges when analyzing non-Targeted metabolomics data sets and often requires custom methods to uncover information not always accessible via classical statistical techniques. In this work, we present a pipeline that combines a convolutional neural network with traditional statistical approaches and an adaptation of a genetic algorithm. The developed method was applied to a lifestyle intervention cohort data set, where subjects at risk of type 2 diabetes underwent an oral glucose tolerance test. Feature selection is the final result of the pipeline, achieved through classification of the data set via a neural network, with a precision-recall score of over 0.9 on the test set. The features most relevant for the described classification were then chosen via a genetic algorithm. The output of the developed pipeline encompasses approximately 200 features with high predictive scores, providing a fingerprint of the metabolic changes in the prediabetic class on the data set. Our framework presents a new approach which allows to apply complex modeling based on convolutional neural networks for the analysis of high-resolution mass spectrometric data.
UR - http://www.scopus.com/inward/record.url?scp=85127892157&partnerID=8YFLogxK
U2 - 10.1021/acs.analchem.1c03237
DO - 10.1021/acs.analchem.1c03237
M3 - Article
AN - SCOPUS:85127892157
SN - 0003-2700
VL - 94
SP - 5474
EP - 5482
JO - Analytical Chemistry
JF - Analytical Chemistry
IS - 14
ER -