OneVsRestClassifier Class
One-vs-All macro (OVA)
Note
This algorithm can be treated as a wrapper for all the binary
classifiers in nimbusml. A few binary
classifiers already have implementation for multi-class problems,
thus users can choose either one
depending on the context. The OVA version of a binary classifier,
such as wrapping a LightGbmBinaryClassifier ,
can be different from LightGbmClassifier ,
which develops a multi-class classifier directly.
- Inheritance
-
nimbusml.internal.core.multiclass._onevsrestclassifier.OneVsRestClassifierOneVsRestClassifiernimbusml.base_predictor.BasePredictorOneVsRestClassifiersklearn.base.ClassifierMixinOneVsRestClassifier
Constructor
OneVsRestClassifier(classifier, output_for_sub_graph=0, use_probabilities=True, normalize='Auto', caching='Auto', feature=None, label=None, weight=None, **params)
Parameters
Name | Description |
---|---|
feature
|
see Columns. |
label
|
see Columns. |
weight
|
see Columns. |
classifier
|
The subgraph for the binary trainer used to construct the OVA learner. This should be a TrainBinary node. |
output_for_sub_graph
|
The training subgraph output. |
use_probabilities
|
Use probabilities in OVA combiner. |
normalize
|
If |
caching
|
Whether trainer should cache input training data. |
params
|
Additional arguments sent to compute engine. |
Examples
###############################################################################
# OneVsRestClassifier
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import FastTreesBinaryClassifier
from nimbusml.feature_extraction.categorical import OneHotVectorizer
from nimbusml.multiclass import OneVsRestClassifier
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path)
print(data.head())
# age case education induced parity ... row_num spontaneous ...
# 0 26 1 0-5yrs 1 6 ... 1 2 ...
# 1 42 1 0-5yrs 1 1 ... 2 0 ...
# 2 39 1 0-5yrs 2 6 ... 3 0 ...
# 3 34 1 0-5yrs 2 4 ... 4 0 ...
# 4 35 1 6-11yrs 1 3 ... 5 1 ...
# define the training pipeline
pipeline = Pipeline([
OneHotVectorizer(columns={'edu': 'education'}),
OneVsRestClassifier(
# using a binary classifier + OVR for multiclass dataset
FastTreesBinaryClassifier(),
# True = class probabilities will sum to 1.0
# False = raw scores, unknown range
use_probabilities=True,
feature=['age', 'edu'], label='induced')
])
# train, predict, and evaluate
metrics, predictions = pipeline.fit(data).test(data, output_scores=True)
# print predictions
print(predictions.head())
# PredictedLabel Score.0 Score.1 Score.2
# 0 2 0.084504 0.302600 0.612897
# 1 0 0.620235 0.379226 0.000538
# 2 2 0.077734 0.061426 0.860840
# 3 0 0.657593 0.012318 0.330088
# 4 0 0.743427 0.090343 0.166231
# print evaluation metrics
print(metrics)
# Accuracy(micro-avg) Accuracy(macro-avg) Log-loss Log-loss reduction ...
# 0 0.641129 0.515541 0.736198 22.99994 ...
Remarks
OneVsRestClassifier https://en.wikipedia.org/wiki/Multiclass_classification converts any binary classifiers into mult-class. A multi-class classification problem (with K classes) can be decomposed into K binary classification problems per class, with label as 0 or 1 (if a sample belongs to the class). OneVsRestClassifier predicts the label with the highest score from the basic learners.
Methods
decision_function |
Returns score values |
get_params |
Get the parameters for this operator. |
predict_proba |
Returns probabilities |
decision_function
Returns score values
decision_function(X, **params)
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
Name | Description |
---|---|
deep
|
Default value: False
|
predict_proba
Returns probabilities
predict_proba(X, **params)