本文主要回答一个问题, 为什么Ensemble methods are better than individual method.
摘录核心总结: bootstrap, boosting, bagging 几种方法的联系
文本分类中使用的投票方法(Voting,也叫组合分类器)就是一种典型的集成机器学习方法。它通过组合多个弱分类器来得到一个强分类器,包括Bagging和Boosting两种方式,二者的主要区别是取样方式不同。Bagging采用均匀取样,而Boosting根据错误率来取样,因此Boosting的分类精度要优于Bagging。投票分类方法虽然分类精度较高,但训练时间较长。Boosting思想的一种改进型AdaBoost方法在邮件过滤、文本分类方面都有很好的性能。
文献题目 | 去谷歌学术搜索 | ||||||||||
Ensemble Methods in Machine Learning | |||||||||||
文献作者 | Thomas G Dietterich | ||||||||||
文献发表年限 | 2000 | ||||||||||
文献关键字 | |||||||||||
Ensemble; 3000 citions; 集成学习; boosting; bootstrap; bagging (bootstap aggregating); gradient boosting; Adaboost 抽样方法 | |||||||||||
摘要描述 | |||||||||||
Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include error-correcting output coding, Bagging, and boosting. This paper reviews these methods and explains why ensembles can often perform better than an y single classifier. Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly. |