Sar modeling of unbalanced data sets |
| |
Authors: | H. S. Rosenkranz A. R. Cunningham |
| |
Affiliation: | Department of Environmental and Occupational Health, Graduate School of Public Health , University of Pittsburgh , 111 Parran Hall, 130 DeSoto Street, Pittsburgh, P A, 15261, USA |
| |
Abstract: | Abstract The increased acceptance of SAR approaches to hazard identification has led us to investigate methods to improve the predictive performance of SAR models. In the present study we demonstrate that although on theoretical grounds the ratio of active to inactive chemicals in the learning set should be unity, SAR models can ?tolerate‘ an unbalanced range in ratios from 3 : 1 (i.e., 75% actives) to 1 : 2 (i.e., 33% actives) and still perform adequately. On the other hand SAR models derived from learning sets with ratios in excess of 4 : 1 (80% actives), even when corrected for the initial ratio do not perform satisfactorily. |
| |
Keywords: | Unbalanced data SAR CASE/MULTICASE Optimum models |
|
|