Abstract
With the proliferation of screening tools for chemical testing, it is now possible to create vast databases of chemicals easily.However, rigorous statistical methodology used to analyse these databases are in their infancy, and further development to facilitate chemical discovery is imperative.
In this thesis, conditional Gaussian process models are developed within a regression and classification setting to predict herbicidal efficacy from glasshouse experiments .
The Tanimoto metric is employed within the covariance of the Gaussian processes to account for distances and capture correlated effects within the chemical space.
Using molecular fingerprints, a representation of a compound within the chemical space, it is shown that by accounting for correlation amongst herbicidal compounds, predictive performance can be improved over the uncorrelated model, where the effects between compounds are assumed to be independent.
Moreover, several optimisation techniques on discrete spaces are presented for the facilitation of chemical discovery. These methods assist in searching interesting regions of the chemical space and support the identification of key molecular features attributing to high efficacy.
Furthermore, a simulation study is conducted to confirm the suitability of the both the Tanimoto metric and the method of scoring rules to evaluate model performance on the novel application.
We conclude that the spatially correlated model has the ability to improve predictions, and also has the potential to be applied to other drug discovery settings and beyond.
Date of Award | 22 Feb 2022 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Vangelis Evangelou (Supervisor), Jonathan Bartlett (Supervisor) & Agisilaos Chantzis (Supervisor) |
Keywords
- Statistical analysis
- Gaussian Processes
- Machine Learning
- chemical analysis