TY - JOUR
T1 - Machine Learning-Driven Data Valuation for Optimizing High-Throughput Screening Pipelines
AU - Hesse, Joshua
AU - Boldini, Davide
AU - Sieber, Stephan A.
N1 - Publisher Copyright:
© 2024 The Authors.
PY - 2024/11/11
Y1 - 2024/11/11
N2 - In the rapidly evolving field of drug discovery, high-throughput screening (HTS) is essential for identifying bioactive compounds. This study introduces a novel application of data valuation, a concept for evaluating the importance of data points based on their impact, to enhance drug discovery pipelines. Our approach improves active learning for compound library screening, robustly identifies true and false positives in HTS data, and identifies important inactive samples in an imbalanced HTS training, all while accounting for computational efficiency. We demonstrate that importance-based methods enable more effective batch screening, reducing the need for extensive HTS. Machine learning models accurately differentiate true biological activity from assay artifacts, streamlining the drug discovery process. Additionally, importance undersampling aids in HTS data set balancing, improving machine learning performance without omitting crucial inactive samples. These advancements could significantly enhance the efficiency and accuracy of drug development.
AB - In the rapidly evolving field of drug discovery, high-throughput screening (HTS) is essential for identifying bioactive compounds. This study introduces a novel application of data valuation, a concept for evaluating the importance of data points based on their impact, to enhance drug discovery pipelines. Our approach improves active learning for compound library screening, robustly identifies true and false positives in HTS data, and identifies important inactive samples in an imbalanced HTS training, all while accounting for computational efficiency. We demonstrate that importance-based methods enable more effective batch screening, reducing the need for extensive HTS. Machine learning models accurately differentiate true biological activity from assay artifacts, streamlining the drug discovery process. Additionally, importance undersampling aids in HTS data set balancing, improving machine learning performance without omitting crucial inactive samples. These advancements could significantly enhance the efficiency and accuracy of drug development.
UR - http://www.scopus.com/inward/record.url?scp=85207246059&partnerID=8YFLogxK
U2 - 10.1021/acs.jcim.4c01547
DO - 10.1021/acs.jcim.4c01547
M3 - Article
AN - SCOPUS:85207246059
SN - 1549-9596
VL - 64
SP - 8142
EP - 8152
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
IS - 21
ER -