Machine Learning Approaches to Reaction Design and Optimisation

  • Sam Espley

Student thesis: Doctoral ThesisPhD

Abstract

The modelling of chemical reactions has been used within the pharmaceutical industry for decades; applications include regioselectivity/stereoselectivity predictions, deciding on solvents, and understanding of catalytic pathways. Chemical modelling has provided pharmaceutical companies with alternatives to expensive and less sustainable reactions performed in the laboratory. Costs continue to rise and therefore companies strive for faster, high-quality results. Machine learning (ML) is seen as the future for computational modelling with approaches having been applied extensively across the pharmaceutical pipeline in an attempt to circumvent wasteful and expensive reactions and calculations.

In this thesis, the application of ML to predict reaction barriers for key pharmaceutical reactions is explored with two distinct ML methods. To begin with, ML applicability is tested for predicting density functional theory (DFT) accuracy reaction barriers for Diels-Alder reactions by correcting low level semi-empirical quantum mechanical (SQM) calculations; the results show errors substantially below the chemical accuracy threshold of 1 kcal mol-1. In an attempt to lower the data requirements (thousands of data points for ML predictions), and thus lower the computational cost, a transfer learning (TL) technique was investigated for cycloaddition reactions. Results show that TL can provide access to accurate barrier predictions within low data regimes (tens of new data points) therefore, avoiding the need for extensive amounts of high-level calculations. To develop further on this, the use of ML for reactivity analysis of multiple reaction types was investigated. Two cycloaddition and two Michael addition datasets were utilised to explore ML predictive power with regards to distortion and interaction energies; the results show errors close to or below the chemical accuracy threshold. The DFT-derived distortion and interaction energies were then successfully predicted for key literature cycloaddition reactions beyond the scope of the training data which supports the idea of using ML in further reactivity analysis problems. To conclude the scope of this thesis, ML was used to predict energies for electrophilic aromatic substitution (EAS) reactions. This reaction is of high importance within the pharmaceutical industry and has high usage in early-stage drug discovery. Prediction of DFT accurate Wheland intermediate barriers from extended tight binding (xTB) calculations was performed to show the power of ML within a pharmaceutical company’s cheminformatics workflow.
Date of Award22 Jan 2025
Original languageEnglish
Awarding Institution
  • University of Bath
SupervisorMatt Grayson (Supervisor) & Tina Düren (Supervisor)

Cite this

'