Abstract (EN):
Many challenges persist in developing accurate computationalmodelsfor predicting solvation free energy (& UDelta;G (sol)). Despite recent developments in Machine Learning (ML)methodologies that outperformed traditional quantum mechanical models,several issues remain concerning explanatory insights for broad chemicalpredictions with an acceptable speed-accuracy trade-off. Toovercome this, we present a novel supervised ML model to predict the & UDelta;G (sol) for an array of solvent-solutepairs. Using two different ensemble regressor algorithms, we madefast and accurate property predictions using open-source chemicalfeatures, encoding complex electronic, structural, and surface areadescriptors for every solvent and solute. By integrating molecularproperties and chemical interaction features, we have analyzed individualdescriptor importance and optimized our model though explanatory informationform feature groups. On aqueous and organic solvent databases, MLmodels revealed the predictive relevance of solutes with increasingpolar surface area and decreasing polarizability, yielding betterresults than state-of-the-art benchmark Neural Network methods (withoutcomplex quantum mechanical or molecular dynamic simulations). Bothalgorithms successfully outperformed previous & UDelta;G (sol) predictions methods, with a maximum absolute errorof 0.22 & PLUSMN; 0.02 kcal mol(-1), further validatedin an external benchmark database and with solvent hold-out tests.With these explanatory and statistical insights, they allow a thoughtfulapplication of this method for predicting other thermodynamic properties,stressing the relevance of ML modeling for further complex computationalchemistry problems.
Language:
English
Type (Professor's evaluation):
Scientific
No. of pages:
13