Skip to main navigation Skip to search Skip to main content

Feature Selection Pipelines with Classification for Non-Targeted Metabolomics Combining the Neural Network and Genetic Algorithm

  • Anna Lisitsyna
  • , Franco Moritz
  • , Youzhong Liu
  • , Loubna Al Sadat
  • , Hans Hauner
  • , Melina Claussnitzer
  • , Philippe Schmitt-Kopplin
  • , Sara Forcisi
  • Helmholtz Zentrum München German Research Center for Environmental Health
  • German Centre for Diabetes Research (DZD)
  • Janssen Pharmaceutica, Headquarters
  • Technical University of Munich
  • The Broad Institute of MIT and Harvard
  • Harvard Medical School
  • Harvard Medical School

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Non-Targeted metabolomics via high-resolution mass spectrometry methods, such as direct infusion Fourier transform-ion cyclotron resonance mass spectrometry (DI-FT-ICR MS), produces data sets with thousands of features. By contrast, the number of samples is in general substantially lower. This disparity presents challenges when analyzing non-Targeted metabolomics data sets and often requires custom methods to uncover information not always accessible via classical statistical techniques. In this work, we present a pipeline that combines a convolutional neural network with traditional statistical approaches and an adaptation of a genetic algorithm. The developed method was applied to a lifestyle intervention cohort data set, where subjects at risk of type 2 diabetes underwent an oral glucose tolerance test. Feature selection is the final result of the pipeline, achieved through classification of the data set via a neural network, with a precision-recall score of over 0.9 on the test set. The features most relevant for the described classification were then chosen via a genetic algorithm. The output of the developed pipeline encompasses approximately 200 features with high predictive scores, providing a fingerprint of the metabolic changes in the prediabetic class on the data set. Our framework presents a new approach which allows to apply complex modeling based on convolutional neural networks for the analysis of high-resolution mass spectrometric data.

Original languageEnglish
Pages (from-to)5474-5482
Number of pages9
JournalAnalytical Chemistry
Volume94
Issue number14
DOIs
StatePublished - 12 Apr 2022

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Fingerprint

Dive into the research topics of 'Feature Selection Pipelines with Classification for Non-Targeted Metabolomics Combining the Neural Network and Genetic Algorithm'. Together they form a unique fingerprint.

Cite this