Induction of decision trees using genetic programming for modelling ecotoxicity data: adaptive discretization of real-valued endpoints |
| |
Authors: | X Z Wang F V Buontempo A Young D Osborn |
| |
Institution: | 1. Institute of Particle Science and Engineering, School of Process, Environmental and Materials Engineering , LS2 9JT, UK x.z.wang@leeds.ac.uk;3. Institute of Particle Science and Engineering, School of Process, Environmental and Materials Engineering , LS2 9JT, UK;4. AstraZeneca UK Ltd , Brixham Environmental Lab. , Freshwater Quarry, Brixham, Devon, TQ5 8BA, UK;5. Centre of Ecology and Hydrology, Monks Wood , Huntingdon, PE28 2LS, UK |
| |
Abstract: | Recent literature has demonstrated the applicability of genetic programming to induction of decision trees for modelling toxicity endpoints. Compared with other decision tree induction techniques that are based upon recursive partitioning employing greedy searches to choose the best splitting attribute and value at each node that will necessarily miss regions of the search space, the genetic programming based approach can overcome the problem. However, the method still requires the discretization of the often continuous-valued toxicity endpoints prior to the tree induction. A novel extension of this method, YAdapt, is introduced in this work which models the original continuous endpoint by adaptively finding suitable ranges to describe the endpoints during the tree induction process, removing the need for discretization prior to tree induction and allowing the ordinal nature of the endpoint to be taken into account in the models built. |
| |
Keywords: | Toxicity QSAR Inductive learning Genetic and evolutionary programming Decision tree Discretization |
|
|