The use of databases containing thousands of molecular descriptors, including 3-D descriptors, for predicting physical properties is discussed. It is shown that the use of 3-D descriptors for property prediction via quantitative structure property relations (QSPR) limits considerably their applicability, as 3-D structure files must be obtained from the same reliable source for all predictive and target compounds. A modified targeted QSPR (TQSPR) algorithm is presented, which includes a new technique for selecting training sets belonging to the homologous series of the target compound (if such compounds are available in the database). The method is employed for predicting seven properties for five homologous series. It is shown that most properties can be predicted on experimental error level, using training sets of 10 compounds and a maximum of 2 (non 3-D) descriptors. The exclusion of the 3-D descriptors enhances considerably the applicability of the TQSPRs, and the use of a small number of descriptors reduces the probability of "chance correlations".
ASJC Scopus subject areas
- Chemistry (all)
- Chemical Engineering (all)
- Industrial and Manufacturing Engineering