LogisticRegressionClassifier Class

Reference

Machine Learning Logistic Regression

Inheritance: nimbusml.internal.core.linear_model._logisticregressionclassifier.LogisticRegressionClassifier

LogisticRegressionClassifier

nimbusml.base_predictor.BasePredictor

LogisticRegressionClassifier

sklearn.base.ClassifierMixin

LogisticRegressionClassifier

Constructor

LogisticRegressionClassifier(normalize='Auto', caching='Auto', show_training_statistics=False, l2_regularization=1.0, l1_regularization=1.0, optimization_tolerance=1e-07, history_size=20, enforce_non_negativity=False, initial_weights_diameter=0.0, maximum_number_of_iterations=2147483647, stochastic_gradient_descent_initilaization_tolerance=0.0, quiet=False, use_threads=True, number_of_threads=None, dense_optimizer=False, feature=None, label=None, weight=None, **params)

Parameters

Name	Description
feature	see Columns.
label	see Columns.
weight	see Columns.
normalize	If `Auto`, the choice to normalize depends on the preference declared by the algorithm. This is the default choice. If `No`, no normalization is performed. If `Yes`, normalization always performed. If `Warn`, if normalization is needed by the algorithm, a warning message is displayed but normalization is not performed. If normalization is performed, a `MaxMin` normalizer is used. This normalizer preserves sparsity by mapping zero to zero.
caching	Whether trainer should cache input training data.
show_training_statistics	Show statistics of training examples.
l2_regularization	L2 regularization weight.
l1_regularization	L1 regularization weight.
optimization_tolerance	Tolerance parameter for optimization convergence. Low = slower, more accurate.
history_size	Memory size for L-BFGS. Lower=faster, less accurate. The technique used for optimization here is L-BFGS, which uses only a limited amount of memory to compute the next step direction. This parameter indicates the number of past positions and gradients to store for the computation of the next step. Must be greater than or equal to `1`.
enforce_non_negativity	Enforce non-negative weights. This flag, however, does not put any constraint on the bias term; that is, the bias term can be still a negtaive number.
initial_weights_diameter	Sets the initial weights diameter that specifies the range from which values are drawn for the initial weights. These weights are initialized randomly from within this range. For example, if the diameter is specified to be `d`, then the weights are uniformly distributed between `-d/2` and `d/2`. The default value is `0`, which specifies that all the weights are set to zero.
maximum_number_of_iterations	Maximum iterations.
stochastic_gradient_descent_initilaization_tolerance	Run SGD to initialize LR weights, converging to this tolerance.
quiet	If set to true, produce no output during training.
use_threads	Whether or not to use threads. Default is true.
number_of_threads	Number of threads.
dense_optimizer	If `True`, forces densification of the internal optimization vectors. If `False`, enables the logistic regression optimizer use sparse or dense internal states as it finds appropriate. Setting `denseOptimizer` to `True` requires the internal optimizer to use a dense internal state, which may help alleviate load on the garbage collector for some varieties of larger problems.
params	Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # LogisticRegressionClassifier
   from nimbusml import Pipeline, FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.feature_extraction.categorical import OneHotVectorizer
   from nimbusml.linear_model import LogisticRegressionClassifier

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()

   data = FileDataStream.read_csv(path)
   print(data.head())
   #    age  case education  induced  parity ... row_num  spontaneous  ...
   # 0   26     1    0-5yrs        1       6 ...       1            2  ...
   # 1   42     1    0-5yrs        1       1 ...       2            0  ...
   # 2   39     1    0-5yrs        2       6 ...       3            0  ...
   # 3   34     1    0-5yrs        2       4 ...       4            0  ...
   # 4   35     1   6-11yrs        1       3 ...       5            1  ...

   # define the training pipeline
   pipeline = Pipeline([
       OneHotVectorizer(columns={'edu': 'education'}),
       LogisticRegressionClassifier(feature=['parity', 'edu'], label='induced')
   ])

   # train, predict, and evaluate
   metrics, predictions = pipeline.fit(data).test(data, output_scores=True)

   # print predictions
   print(predictions.head())
   #   PredictedLabel   Score.0   Score.1   Score.2
   # 0               2  0.171122  0.250151  0.578727
   # 1               0  0.678313  0.220665  0.101022
   # 2               2  0.171122  0.250151  0.578727
   # 3               0  0.360849  0.289190  0.349961
   # 4               0  0.556921  0.260420  0.182658
   # print evaluation metrics
   print(metrics)
   #   Accuracy(micro-avg)  Accuracy(macro-avg)  Log-loss  Log-loss reduction  ...
   # 0             0.592742             0.389403  0.857392          10.324157  ...

Remarks

Logistic Regression is a classification method used to predict the value of a categorical dependent variable from its relationship to one or more independent variables assumed to have a logistic distribution. If the dependent variable has only two possible values (success/failure), then the logistic regression is binary. If the dependent variable has more than two possible values (blood type given diagnostic test results), then the logistic regression is multinomial.

The optimization technique used for LogisticRegressionClassifier is the limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS). Both the L-BFGS and regular BFGS algorithms use quasi-Newtonian methods to estimate the computationally intensive Hessian matrix in the equation used by Newton's method to calculate steps. But the L-BFGS approximation uses only a limited amount of memory to compute the next step direction, so that it is especially suited for problems with a large number of variables. The memory_size parameter specifies the number of past positions and gradients to store for use in the computation of the next step.

This learner can use elastic net regularization: a linear combination of L1 (lasso) and L2 (ridge) regularizations. Regularization is a method that can render an ill-posed problem more tractable by imposing constraints that provide information to supplement the data and that prevents overfitting by penalizing models with extreme coefficient values. This can improve the generalization of the model learned by selecting the optimal complexity in the bias-variance tradeoff. Regularization works by adding the penalty that is associated with coefficient values to the error of the hypothesis. An accurate model with extreme coefficient values would be penalized more, but a less accurate model with more conservative values would be penalized less. L1 and L2 regularization have different effects and uses that are complementary in certain respects.

l1_weight: can be applied to sparse models, when working with high-dimensional data. It pulls small weights associated

features that are relatively unimportant towards 0.

l2_weight: is preferable for data that is not sparse. It pulls large weights towards zero.

Adding the ridge penalty to the regularization overcomes some of lasso's limitations. It can improve its predictive accuracy, for example, when the number of predictors is greater than the sample size. If x = l1_weight and y = l2_weight, ax + by = c defines the linear span of the regularization terms. The default values of x and y are both 1. An agressive regularization can harm predictive capacity by excluding important variables out of the model. So choosing the optimal values for the regularization parameter is important for the performance of the logistic regression model.

Reference

Wikipedia: L-BFGS

Wikipedia: Logistic regression

Scalable Training of L1-Regularized Log-Linear Models

Test Run - L1 and L2 Regularization for Machine Learning