Assembling a large data set for melting point prediction: Text-mining to the rescue! – NextMove Software
"As part of a project initiated by Tony Williams and the Royal Society of Chemistry, I have been working with Igor Tetko to text-mine melting and decomposition point data from the US patent literature so that he could then produce a melting point prediction model. This model showed an improvement over previous models, which is likely due to the overwhelming large size of the dataset compared to the smaller curated data sets used by these previous models. The results of this work have now been published in the Journal of Cheminformatics here: The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from Patents ..."