Dataset for "Machine learning reaction barriers in low data regimes: a horizontal and diagonal transfer learning approach"

  • Sam Espley (Creator)
  • Elliot Farrar (Creator)
  • Matt Grayson (Supervisor)
  • Simone Tomasi (Supervisor)
  • David Buttar (Supervisor)

Dataset

Description

Machine learning (ML) has previously been used to predict density functional theory (DFT) free energy reaction barriers on a variety of different reactions from semi-empirical quantum mechanical (SQM) inputs. These models can require expensive dataset curation and can struggle with generalisability outside of the datasets immediate chemical space. One such approach that can drastically lower the number of required training points is transfer learning (TL). We demonstrate that various TL techniques can be used to provide highly accurate results with a fraction of the training points required for standard ML, thus lowering the overall computational cost of barrier predictions. This dataset includes all the structural data in the form of Gaussian16 (Revision A.03 and C.01) output files for the Diels-Alder and [3+2] cycloaddition reactions used for this ML/TL analysis. This data archive also includes exemplar code for performing some standard ML from the manuscript.
Date made available31 May 2023
PublisherUniversity of Bath

Cite this