TY - JOUR
T1 - Deep Learning Total Energies and Orbital Energies of Large Organic Molecules Using Hybridization of Molecular Fingerprints
AU - Rahaman, Obaidur
AU - Gagliardi, Alessio
N1 - Publisher Copyright:
© 2020 American Chemical Society.
PY - 2020/12/28
Y1 - 2020/12/28
N2 - The ability to predict material properties without the need for resource-consuming experimental efforts can immensely accelerate material and drug discovery. Although ab initio methods can be reliable and accurate in making such predictions, they are computationally too expensive on a large scale. The recent advancements in artificial intelligence and machine learning as well as the availability of large quantum mechanics derived datasets enable us to train models on these datasets as a benchmark and to make fast predictions on much larger datasets. The success of these machine learning models highly depends on the machine-readable fingerprints of the molecules that capture their chemical properties as well as topological information. In this work, we propose a common deep learning-based framework to combine different types of molecular fingerprints to enhance prediction accuracy. A graph neural network (GNN), many-body tensor representation (MBTR), and a set of simple molecular descriptors (MD) were used to predict the total energies, highest occupied molecular orbital (HOMO) energies, and lowest unoccupied molecular orbital (LUMO) energies of a dataset containing μ62k large organic molecules with complex aromatic rings and remarkably diverse functional groups. The results demonstrate that a combination of best performing molecular fingerprints can produce better results than the individual ones. The simple and flexible deep learning framework developed in this work can be easily adapted to incorporate other types of molecular fingerprints.
AB - The ability to predict material properties without the need for resource-consuming experimental efforts can immensely accelerate material and drug discovery. Although ab initio methods can be reliable and accurate in making such predictions, they are computationally too expensive on a large scale. The recent advancements in artificial intelligence and machine learning as well as the availability of large quantum mechanics derived datasets enable us to train models on these datasets as a benchmark and to make fast predictions on much larger datasets. The success of these machine learning models highly depends on the machine-readable fingerprints of the molecules that capture their chemical properties as well as topological information. In this work, we propose a common deep learning-based framework to combine different types of molecular fingerprints to enhance prediction accuracy. A graph neural network (GNN), many-body tensor representation (MBTR), and a set of simple molecular descriptors (MD) were used to predict the total energies, highest occupied molecular orbital (HOMO) energies, and lowest unoccupied molecular orbital (LUMO) energies of a dataset containing μ62k large organic molecules with complex aromatic rings and remarkably diverse functional groups. The results demonstrate that a combination of best performing molecular fingerprints can produce better results than the individual ones. The simple and flexible deep learning framework developed in this work can be easily adapted to incorporate other types of molecular fingerprints.
UR - http://www.scopus.com/inward/record.url?scp=85095814954&partnerID=8YFLogxK
U2 - 10.1021/acs.jcim.0c00687
DO - 10.1021/acs.jcim.0c00687
M3 - Article
C2 - 33118351
AN - SCOPUS:85095814954
SN - 1549-9596
VL - 60
SP - 5971
EP - 5983
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
IS - 12
ER -