NaiveBayesClassifier Class
Machine Learning Naive Bayes Classifier
- Inheritance
-
nimbusml.internal.core.naive_bayes._naivebayesclassifier.NaiveBayesClassifierNaiveBayesClassifiernimbusml.base_predictor.BasePredictorNaiveBayesClassifiersklearn.base.ClassifierMixinNaiveBayesClassifier
Constructor
NaiveBayesClassifier(normalize='Auto', caching='Auto', feature=None, label=None, **params)
Parameters
Name | Description |
---|---|
feature
|
see Columns. |
label
|
see Columns. |
normalize
|
Specifies the type of automatic normalization used:
Normalization rescales disparate data ranges to a standard scale.
Feature
scaling insures the distances between data points are proportional
and
enables various optimization methods such as gradient descent to
converge
much faster. If normalization is performed, a |
caching
|
Whether trainer should cache input training data. |
params
|
Additional arguments sent to compute engine. |
Examples
###############################################################################
# NaiveBayesClassifier
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.feature_extraction.categorical import OneHotVectorizer
from nimbusml.naive_bayes import NaiveBayesClassifier
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path)
print(data.head())
# age case education induced parity ... row_num spontaneous ...
# 0 26 1 0-5yrs 1 6 ... 1 2 ...
# 1 42 1 0-5yrs 1 1 ... 2 0 ...
# 2 39 1 0-5yrs 2 6 ... 3 0 ...
# 3 34 1 0-5yrs 2 4 ... 4 0 ...
# 4 35 1 6-11yrs 1 3 ... 5 1 ...
# define the training pipeline
pipeline = Pipeline([
OneHotVectorizer(columns={'edu': 'education'}),
NaiveBayesClassifier(feature=['age', 'edu'], label='induced')
])
# train, predict, and evaluate
metrics, predictions = pipeline.fit(data).test(data, output_scores=True)
# print predictions
print(predictions.head())
# PredictedLabel Score.0 Score.1 Score.2
# 0 2 -5.297264 -5.873055 -4.847996
# 1 2 -5.297264 -5.873055 -4.847996
# 2 2 -5.297264 -5.873055 -4.847996
# 3 2 -5.297264 -5.873055 -4.847996
# 4 0 -1.785266 -3.172440 -3.691075
# print evaluation metrics
print(metrics)
# Accuracy(micro-avg) Accuracy(macro-avg) Log-loss Log-loss reduction ...
# 0 0.584677 0.378063 34.538776 -3512.460882 ...
Remarks
Naive Bayes is a probabilistic classifier that can be used for multiclass problems. Using Bayes' theorem, the conditional probability for a sample belonging to a class can be calculated based on the sample count for each feature combination groups. However, Naive Bayes Classifier is feasible only if the number of features and the values each feature can take is relatively small. It also assumes that the features are strictly independent.
Reference
Methods
decision_function |
Returns score values |
get_params |
Get the parameters for this operator. |
decision_function
Returns score values
decision_function(X, **params)
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
Name | Description |
---|---|
deep
|
Default value: False
|