DECISION TREE¶
Definition of Decision Trees¶
A decision tree is a supervised learning algorithm used for classification and regression. It builds a tree-like model by splitting the dataset into smaller subsets, enabling predictions to be made based on the data. Each internal node represents a test on a feature, and each leaf node represents a class label or a numerical value.
The process of building a decision tree typically involves the following steps:
- Feature Selection: Use a criterion (e.g., information gain, Gini index) to select the feature that best separates the data.
- Data Splitting: Split the dataset into subsets based on the selected feature.
- Recursive Subtree Construction: Repeat the above steps for each subset until a stopping condition is met (e.g., maximum depth or minimum number of samples per leaf).
- Tree Generation: Combine all subtrees into a complete decision tree.
- Pruning: Simplify the tree by removing some unnecessary branches to prevent overfitting.
- Prediction: Use the resulting decision tree to make predictions on new data.
Advantages and Disadvantages of Decision Trees¶
Advantages¶
- Easy to Understand and Interpret: The structure of a decision tree is intuitive and easy to visualize and explain.
- Handles Nonlinear Relationships: Capable of modeling nonlinear interactions between features.
- No Need for Feature Scaling: Standardization or normalization of features is not required.
- Suitable for Large Datasets: Can handle large-scale datasets and has relatively fast training speed.
Disadvantages¶
- Overfitting: Decision trees are prone to overfitting, especially when the tree is deep.
- Instability: Small changes in the data can lead to significant changes in the tree structure.
- Bias Toward Multivalued Features: Trees may favor features with more categories.
- Limited in Capturing Complex Patterns: May struggle to model more intricate patterns in data.
Study Links¶
Development History¶
Basic Tree Structures: ID3 C4.5 CART Advanced Tree Structures: Random Forest Adaboost GBDT More Advanced Tree Structures: XGBoost LightGBM