Abstract
Artificial Intelligence (AI) applied for materials discovery has led to the design of novel materials, the development of complex structure-to-properties relationships and the discovery of new reaction pathways. However, to develop these AI-based models, it is necessary to digitalize the properties of molecules and materials, so that they can be mathematically analysed to develop predictive models and pattern recognition. The literature has shown a wealth of examples showcasing molecular descriptors used either as fingerprints, or as a means of correlating key molecular properties to a desired target. There are however a host of limitations associated with calculating and implementing molecular descriptors in research, including failed calculations, assumptions about 3D conformations, issues with spatial distribution and structure as well as the specialist programming understanding needed to make use of many of the packages used to calculate them. Consequently, we hypothesized that images can be used as molecular representations using deep learning architectures. Indeed, deep learning has shown vast success across a variety of image processing applications, but such methods are until now untested for modelling chemicals and materials.
In this work, the structures of various organic molecules were systematically sketched in pictures and used as inputs to develop a deep learning model capable of predicting their physicochemical features – namely their aqueous solubility. Beyond obtaining high predictive accuracy, we demonstrated that this novel type of molecular representation outperforms descriptor-based models. This model was further tested in additional databases to explore its different capabilities. Moreover, we found that this method enables users to test new structures without prior specialized knowledge on chemoinformatics, requiring only the ability to draw their own customized molecules. We argue that images used as molecular representations in AI-based materials discovery are not only excellent for applications in materials science; but also, they will open this thriving field to computer-inexperienced scientists.
In this work, the structures of various organic molecules were systematically sketched in pictures and used as inputs to develop a deep learning model capable of predicting their physicochemical features – namely their aqueous solubility. Beyond obtaining high predictive accuracy, we demonstrated that this novel type of molecular representation outperforms descriptor-based models. This model was further tested in additional databases to explore its different capabilities. Moreover, we found that this method enables users to test new structures without prior specialized knowledge on chemoinformatics, requiring only the ability to draw their own customized molecules. We argue that images used as molecular representations in AI-based materials discovery are not only excellent for applications in materials science; but also, they will open this thriving field to computer-inexperienced scientists.
Original language | English |
---|---|
Publication status | Published - 24 May 2022 |
Event | 2022 MRS Spring Meeting - Honolulu, USA United States Duration: 8 May 2022 → 25 May 2022 https://www.mrs.org/meetings-events/spring-meetings-exhibits/2022-mrs-spring-meeting/symposium-sessions |
Conference
Conference | 2022 MRS Spring Meeting |
---|---|
Country/Territory | USA United States |
City | Honolulu |
Period | 8/05/22 → 25/05/22 |
Internet address |
Keywords
- Deep Learning
- Machine Learning
- Artificial Intelligence
- Molecular Representation
- Cheminformatics
- Materials Design
- Pharmaceutcals
- Crystal engineering
- Solubility