NaiveBayesClassifier Class

Machine Learning Naive Bayes Classifier

Inheritance
nimbusml.internal.core.naive_bayes._naivebayesclassifier.NaiveBayesClassifier
NaiveBayesClassifier
nimbusml.base_predictor.BasePredictor
NaiveBayesClassifier
sklearn.base.ClassifierMixin
NaiveBayesClassifier

Constructor

NaiveBayesClassifier(normalize='Auto', caching='Auto', feature=None, label=None, **params)

Parameters

Name Description
feature

see Columns.

label

see Columns.

normalize

Specifies the type of automatic normalization used:

  • "Auto": if normalization is needed, it is performed automatically. This is the default choice.

  • "No": no normalization is performed.

  • "Yes": normalization is performed.

  • "Warn": if normalization is needed, a warning message is displayed, but normalization is not performed.

Normalization rescales disparate data ranges to a standard scale. Feature scaling insures the distances between data points are proportional and enables various optimization methods such as gradient descent to converge much faster. If normalization is performed, a MaxMin normalizer is used. It normalizes values in an interval [a, b] where -1 <= a <= 0 and 0 <= b <= 1 and b - a = 1. This normalizer preserves sparsity by mapping zero to zero.

caching

Whether trainer should cache input training data.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # NaiveBayesClassifier
   from nimbusml import Pipeline, FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.feature_extraction.categorical import OneHotVectorizer
   from nimbusml.naive_bayes import NaiveBayesClassifier

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()

   data = FileDataStream.read_csv(path)
   print(data.head())
   #    age  case education  induced  parity ... row_num  spontaneous  ...
   # 0   26     1    0-5yrs        1       6 ...       1            2  ...
   # 1   42     1    0-5yrs        1       1 ...       2            0  ...
   # 2   39     1    0-5yrs        2       6 ...       3            0  ...
   # 3   34     1    0-5yrs        2       4 ...       4            0  ...
   # 4   35     1   6-11yrs        1       3 ...       5            1  ...


   # define the training pipeline
   pipeline = Pipeline([
       OneHotVectorizer(columns={'edu': 'education'}),
       NaiveBayesClassifier(feature=['age', 'edu'], label='induced')
   ])

   # train, predict, and evaluate
   metrics, predictions = pipeline.fit(data).test(data, output_scores=True)

   # print predictions
   print(predictions.head())
   #   PredictedLabel   Score.0   Score.1   Score.2
   # 0               2 -5.297264 -5.873055 -4.847996
   # 1               2 -5.297264 -5.873055 -4.847996
   # 2               2 -5.297264 -5.873055 -4.847996
   # 3               2 -5.297264 -5.873055 -4.847996
   # 4               0 -1.785266 -3.172440 -3.691075

   # print evaluation metrics
   print(metrics)
   #   Accuracy(micro-avg)  Accuracy(macro-avg)   Log-loss  Log-loss reduction ...
   # 0             0.584677             0.378063  34.538776       -3512.460882 ...

Remarks

Naive Bayes is a probabilistic classifier that can be used for multiclass problems. Using Bayes' theorem, the conditional probability for a sample belonging to a class can be calculated based on the sample count for each feature combination groups. However, Naive Bayes Classifier is feasible only if the number of features and the values each feature can take is relatively small. It also assumes that the features are strictly independent.

Reference

Naive Bayes

Methods

decision_function

Returns score values

get_params

Get the parameters for this operator.

decision_function

Returns score values

decision_function(X, **params)

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

Name Description
deep
Default value: False