首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Classification ensembles for unbalanced class sizes in predictive toxicology
Authors:J J Chen  C A Tsai  J F Young  R L Kodell
Institution:1. Division of Biometry and Risk Assessment , National Center for Toxicological Research , Food and Drug Administration , Jefferson, Arkansas 72079, USA jchen@nctr.fda.gov;3. Institute of Statistical Science , Academia Sinica , Taipei, 11529, Taiwan;4. Division of Biometry and Risk Assessment , National Center for Toxicological Research , Food and Drug Administration , Jefferson, Arkansas 72079, USA
Abstract:This paper investigates the effects of the ratio of positive-to-negative samples on the sensitivity, specificity, and concordance. When the class sizes in the training samples are not equal, the classification rule derived will favor the majority class and result in a low sensitivity on the minority class prediction. We propose an ensemble classification approach to adjust for differential class sizes in a binary classifier system. An ensemble classifier consists of a set of base classifiers; its prediction rule is based on a summary measure of individual classifications by the base classifiers. Two re-sampling methods, augmentation and abatement, are proposed to generate different bootstrap samples of equal class size to build the base classifiers. The augmentation method balances the two class sizes by bootstrapping additional samples from the minority class, whereas the abatement method balances the two class sizes by sampling only a subset of samples from the majority class. The proposed procedure is applied to a data set to predict estrogen receptor binding activity and to a data set to predict animal liver carcinogenicity using SAR (structure-activity relationship) models as base classifiers. The abatement method appears to perform well in balancing sensitivity and specificity.
Keywords:Bagging  Cross validation  Ensemble classification  Imbalanced data  Sensitivity  Specificity
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号