Bagging, also known as bootstrap aggregating, is an ensemble learning method used to reduce variance within a noisy dataset. It is a machine learning technique that helps to improve the performance and accuracy of machine learning algorithms by combining multiple individual models, often called base models, to produce an effective optimal prediction model. Bagging is used to deal with bias-variance trade-offs and reduces the variance of a prediction model. It avoids overfitting of data and is used for both regression and classification models, specifically for decision tree algorithms.
In bagging, a random sample of data in a training set is selected, and a subset of features is chosen randomly to create a model using sample observations. The feature offering the best split out of the lot is used to split the nodes. Multiple subsets are created from the original dataset with equal tuples, selecting observations with replacement. A base model is created on each of these subsets, and each model is learned in parallel with each training set and independent of each other. The final predictions are determined by combining the predictions from all the models.
Bagging and boosting are two main types of ensemble learning methods. The main difference between these learning methods is the way in which they are trained. In bagging, weak learners are trained in parallel, but in boosting, they learn sequentially. Bagging methods are typically used on weak learners that exhibit high variance and low bias, whereas boosting methods are leveraged when low variance and high bias are observed.
Bagging is an effective regularization technique used to reduce variance from the training data and improve the accuracy of the model by using multiple copies of it trained on different subsets of data from the original dataset. Bagging leads to improvements for unstable procedures, including artificial neural networks, classification and regression trees, and subset selection in linear regression.