FactorizationMachineBinaryClassifier Class

Train a field-aware factorization machine for binary classification.

Inheritance
nimbusml.internal.core.decomposition._factorizationmachinebinaryclassifier.FactorizationMachineBinaryClassifier
FactorizationMachineBinaryClassifier
nimbusml.base_predictor.BasePredictor
FactorizationMachineBinaryClassifier
sklearn.base.ClassifierMixin
FactorizationMachineBinaryClassifier

Constructor

FactorizationMachineBinaryClassifier(learning_rate=0.1, number_of_iterations=5, latent_dimension=20, lambda_linear=0.0001, lambda_latent=0.0001, normalize=True, caching='Auto', extra_feature_columns=None, shuffle=True, verbose=True, radius=0.5, feature=None, label=None, weight=None, **params)

Parameters

Name Description
feature

see Columns.

label

see Columns.

weight

see Columns.

learning_rate

Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution.

number_of_iterations

Number of training iterations.

latent_dimension

Latent space dimension.

lambda_linear

Regularization coefficient of linear weights.

lambda_latent

Regularization coefficient of latent weights.

normalize

Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length.

caching

Whether trainer should cache input training data.

extra_feature_columns

Extra columns to use for feature vectors. The i-th specified string denotes the column containing features form the (i+1)-th field. Note that the first field is specified by "feat" instead of "exfeat".

shuffle

Whether to shuffle for each training iteration.

verbose

Report traning progress or not.

radius

Radius of initial latent factors.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # FactorizationMachineBinaryClassifier
   import numpy
   from nimbusml import Pipeline, FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.decomposition import FactorizationMachineBinaryClassifier
   from nimbusml.feature_extraction.categorical import OneHotVectorizer

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()
   data = FileDataStream.read_csv(path, sep=',',
                                  numeric_dtype=numpy.float32,
                                  names={0: 'row_num', 5: 'case'})
   print(data.head())
   #    age  case education  induced  parity  pooled.stratum  row_num  ...
   # 0  26.0   1.0    0-5yrs      1.0     6.0             3.0      1.0  ...
   # 1  42.0   1.0    0-5yrs      1.0     1.0             1.0      2.0  ...
   # 2  39.0   1.0    0-5yrs      2.0     6.0             4.0      3.0  ...
   # 3  34.0   1.0    0-5yrs      2.0     4.0             2.0      4.0  ...
   # 4  35.0   1.0   6-11yrs      1.0     3.0            32.0      5.0  ...
   # define the training pipeline
   pipeline = Pipeline([
       OneHotVectorizer(columns={'edu': 'education'}),
       FactorizationMachineBinaryClassifier(feature=['induced', 'edu', 'parity'],
                                            label='case')
   ])

   # train, predict, and evaluate
   metrics, predictions = pipeline.fit(data).test(data, output_scores=True)

   # print predictions
   print(predictions.head())
   #   PredictedLabel  Probability     Score
   # 0             0.0     0.370519 -0.529990
   # 1             0.0     0.420984 -0.318737
   # 2             0.0     0.364432 -0.556180
   # 3             0.0     0.380421 -0.487761
   # 4             0.0     0.365351 -0.552214
   # print evaluation metrics
   print(metrics)
   #        AUC  Accuracy  Positive precision  Positive recall  ...
   # 0  0.609639  0.665323                   0                0  ...

Remarks

Field Aware Factorization Machines use, in addition to the input variables, factorized parameters to model the interaction between pairs of variables. The algorithm is particularly useful for high dimensional datasets which can be very sparse (e.g. click-prediction for advertising systems). An advantage of FFM over SVMs is that the training data does not need to be stored in memory, and the coefficients can be optimized directly.

Reference

Field Aware Factorization Machines, Field-aware Factorization Machines for CTR Prediction, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

Methods

decision_function

Returns score values

get_params

Get the parameters for this operator.

predict_proba

Returns probabilities

decision_function

Returns score values

decision_function(X, **params)

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

Name Description
deep
Default value: False

predict_proba

Returns probabilities

predict_proba(X, **params)