LightGbmBinaryClassifier Class
Gradient Boosted Decision Trees
- Inheritance
-
nimbusml.internal.core.ensemble._lightgbmbinaryclassifier.LightGbmBinaryClassifierLightGbmBinaryClassifiernimbusml.base_predictor.BasePredictorLightGbmBinaryClassifiersklearn.base.ClassifierMixinLightGbmBinaryClassifier
Constructor
LightGbmBinaryClassifier(number_of_iterations=100, learning_rate=None, number_of_leaves=None, minimum_example_count_per_leaf=None, booster=None, normalize='Auto', caching='Auto', unbalanced_sets=False, weight_of_positive_examples=1.0, sigmoid=0.5, evaluation_metric='Logloss', maximum_bin_count_per_feature=255, verbose=False, silent=True, number_of_threads=None, early_stopping_round=0, batch_size=1048576, use_categorical_split=None, handle_missing_value=True, minimum_example_count_per_group=100, maximum_categorical_split_point_count=32, categorical_smoothing=10.0, l2_categorical_regularization=10.0, random_state=None, parallel_trainer=None, feature=None, group_id=None, label=None, weight=None, **params)
Parameters
Name | Description |
---|---|
feature
|
see Columns. |
group_id
|
see Columns. |
label
|
see Columns. |
weight
|
see Columns. |
number_of_iterations
|
Number of iterations. |
learning_rate
|
Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution. |
number_of_leaves
|
The maximum number of leaves (terminal nodes) that can be created in any tree. Higher values potentially increase the size of the tree and get better precision, but risk overfitting and requiring longer training times. |
minimum_example_count_per_leaf
|
Minimum number of training instances required to form a leaf. That is, the minimal number of documents allowed in a leaf of regression tree, out of the sub-sampled data. A 'split' means that features in each level of the tree (node) are randomly divided. |
booster
|
|
normalize
|
If |
caching
|
Whether trainer should cache input training data. |
unbalanced_sets
|
Use for binary classification when training data is not balanced. |
weight_of_positive_examples
|
Control the balance of positive and negative weights, useful for unbalanced classes. A typical value to consider: sum(negative cases) / sum(positive cases). |
sigmoid
|
Parameter for the sigmoid function. |
evaluation_metric
|
Evaluation metrics. |
maximum_bin_count_per_feature
|
Maximum number of bucket bin for features. |
verbose
|
Verbose. |
silent
|
Printing running messages. |
number_of_threads
|
Number of parallel threads used to run LightGBM. |
early_stopping_round
|
Rounds of early stopping, 0 will disable it. |
batch_size
|
Number of entries in a batch when loading data. |
use_categorical_split
|
Enable categorical split or not. |
handle_missing_value
|
Enable special handling of missing value or not. |
minimum_example_count_per_group
|
Minimum number of instances per categorical group. |
maximum_categorical_split_point_count
|
Max number of categorical thresholds. |
categorical_smoothing
|
Lapalace smooth term in categorical feature spilt. Avoid the bias of small categories. |
l2_categorical_regularization
|
L2 Regularization for categorical split. |
random_state
|
Sets the random seed for LightGBM to use. |
parallel_trainer
|
Parallel LightGBM Learning Algorithm. |
params
|
Additional arguments sent to compute engine. |
Examples
###############################################################################
# LightGbmBinaryClassifier
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import LightGbmBinaryClassifier
from nimbusml.ensemble.booster import Goss
from nimbusml.feature_extraction.categorical import OneHotVectorizer
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path)
print(data.head())
# age case education induced parity ... row_num spontaneous ...
# 0 26 1 0-5yrs 1 6 ... 1 2 ...
# 1 42 1 0-5yrs 1 1 ... 2 0 ...
# 2 39 1 0-5yrs 2 6 ... 3 0 ...
# 3 34 1 0-5yrs 2 4 ... 4 0 ...
# 4 35 1 6-11yrs 1 3 ... 5 1 ...
# define the training pipeline
pipeline = Pipeline([
OneHotVectorizer(columns={'edu': 'education'}),
LightGbmBinaryClassifier(feature=['induced', 'edu'], label='case',
booster=Goss(top_rate=0.9))
])
# train, predict, and evaluate
metrics, predictions = pipeline.fit(
data, 'case').test(
data, output_scores=True)
# print predictions
print(predictions.head())
# PredictedLabel Probability Score
# 0 1 0.612220 0.913309
# 1 1 0.612220 0.913309
# 2 0 0.334486 -1.375929
# 3 0 0.334486 -1.375929
# 4 0 0.421264 -0.635176
# print evaluation metrics
print(metrics)
# AUC Accuracy Positive precision Positive recall ...
# 0 0.626433 0.677419 0.588235 0.120482 ...
Remarks
Light GBM is an open source implementation of boosted trees. It is available in nimbusml as a binary classification trainer, a multi-class trainer, a regression trainer and a ranking trainer.
Reference
Methods
decision_function |
Returns score values |
get_params |
Get the parameters for this operator. |
predict_proba |
Returns probabilities |
decision_function
Returns score values
decision_function(X, **params)
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
Name | Description |
---|---|
deep
|
Default value: False
|
predict_proba
Returns probabilities
predict_proba(X, **params)