A Large Scale Molecular Hessian Database for Optimizing Reactive Machine Learning Interatomic Potentials
database[Title] 2025-12-11
Sci Data. 2025 Dec 4. doi: 10.1038/s41597-025-06350-5. Online ahead of print.
ABSTRACT
Transition-state (TS) characterization underpins reaction modeling but conventional DFT is costly. Machine-learning interatomic potentials (MLIPs) promise quantum-level accuracy at lower cost, yet, lacking large-scale Hessian data, most are pretrained only on energies and forces, limiting TS optimization. We present HORM, the largest quantum-chemistry Hessian dataset for reactive systems: 1.84 million matrices at the ωB97x/6-31G(d) level. To exploit second-order information efficiently, we propose Hessian-informed training with stochastic row sampling, which controls the computational overhead of incorporating Hessians. Across diverse MLIP architectures and force-learning schemes, HORM yields up to 63% lower Hessian mean absolute error and up to 200× improvement in TS-search efficiency versus counterparts trained without Hessians. HORM thus fills critical data and methodological gaps, enabling more accurate, robust reactive MLIPs and scalable exploration of reaction networks.
PMID:41345402 | DOI:10.1038/s41597-025-06350-5