Abstract
| - Despite their growing popularity among neural network practitioners, ensemble methods have not beenwidely adopted in structure−activity and structure−property correlation. Neural networks are inherentlyunstable, in that small changes in the training set and/or training parameters can lead to large changes intheir generalization performance. Recent research has shown that by capitalizing on the diversity of theindividual models, ensemble techniques can minimize uncertainty and produce more stable and accuratepredictors. In this work, we present a critical assessment of the most common ensemble technique knownas bootstrap aggregation, or bagging, as applied to QSAR and QSPR. Although aggregation does offerdefinitive advantages, we demonstrate that bagging may not be the best possible choice and that simplertechniques such as retraining with the full sample can often produce superior results. These findings arerationalized using Krogh and Vedelsby's decomposition of the generalization error into a term that measuresthe average generalization performance of the individual networks and a term that measures the diversityamong them. For networks that are designed to resist over-fitting, the benefits of aggregation are clear butnot overwhelming.
|