FactorizationMachineBinaryClassifier Class
Train a field-aware factorization machine for binary classification.
- Inheritance
-
nimbusml.internal.core.decomposition._factorizationmachinebinaryclassifier.FactorizationMachineBinaryClassifierFactorizationMachineBinaryClassifiernimbusml.base_predictor.BasePredictorFactorizationMachineBinaryClassifiersklearn.base.ClassifierMixinFactorizationMachineBinaryClassifier
Constructor
FactorizationMachineBinaryClassifier(learning_rate=0.1, number_of_iterations=5, latent_dimension=20, lambda_linear=0.0001, lambda_latent=0.0001, normalize=True, caching='Auto', extra_feature_columns=None, shuffle=True, verbose=True, radius=0.5, feature=None, label=None, weight=None, **params)
Parameters
Name | Description |
---|---|
feature
|
see Columns. |
label
|
see Columns. |
weight
|
see Columns. |
learning_rate
|
Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution. |
number_of_iterations
|
Number of training iterations. |
latent_dimension
|
Latent space dimension. |
lambda_linear
|
Regularization coefficient of linear weights. |
lambda_latent
|
Regularization coefficient of latent weights. |
normalize
|
Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length. |
caching
|
Whether trainer should cache input training data. |
extra_feature_columns
|
Extra columns to use for feature vectors. The i-th specified string denotes the column containing features form the (i+1)-th field. Note that the first field is specified by "feat" instead of "exfeat". |
shuffle
|
Whether to shuffle for each training iteration. |
verbose
|
Report traning progress or not. |
radius
|
Radius of initial latent factors. |
params
|
Additional arguments sent to compute engine. |
Examples
###############################################################################
# FactorizationMachineBinaryClassifier
import numpy
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.decomposition import FactorizationMachineBinaryClassifier
from nimbusml.feature_extraction.categorical import OneHotVectorizer
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path, sep=',',
numeric_dtype=numpy.float32,
names={0: 'row_num', 5: 'case'})
print(data.head())
# age case education induced parity pooled.stratum row_num ...
# 0 26.0 1.0 0-5yrs 1.0 6.0 3.0 1.0 ...
# 1 42.0 1.0 0-5yrs 1.0 1.0 1.0 2.0 ...
# 2 39.0 1.0 0-5yrs 2.0 6.0 4.0 3.0 ...
# 3 34.0 1.0 0-5yrs 2.0 4.0 2.0 4.0 ...
# 4 35.0 1.0 6-11yrs 1.0 3.0 32.0 5.0 ...
# define the training pipeline
pipeline = Pipeline([
OneHotVectorizer(columns={'edu': 'education'}),
FactorizationMachineBinaryClassifier(feature=['induced', 'edu', 'parity'],
label='case')
])
# train, predict, and evaluate
metrics, predictions = pipeline.fit(data).test(data, output_scores=True)
# print predictions
print(predictions.head())
# PredictedLabel Probability Score
# 0 0.0 0.370519 -0.529990
# 1 0.0 0.420984 -0.318737
# 2 0.0 0.364432 -0.556180
# 3 0.0 0.380421 -0.487761
# 4 0.0 0.365351 -0.552214
# print evaluation metrics
print(metrics)
# AUC Accuracy Positive precision Positive recall ...
# 0 0.609639 0.665323 0 0 ...
Remarks
Field Aware Factorization Machines use, in addition to the input variables, factorized parameters to model the interaction between pairs of variables. The algorithm is particularly useful for high dimensional datasets which can be very sparse (e.g. click-prediction for advertising systems). An advantage of FFM over SVMs is that the training data does not need to be stored in memory, and the coefficients can be optimized directly.
Reference
Field Aware Factorization Machines, Field-aware Factorization Machines for CTR Prediction, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
Methods
decision_function |
Returns score values |
get_params |
Get the parameters for this operator. |
predict_proba |
Returns probabilities |
decision_function
Returns score values
decision_function(X, **params)
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
Name | Description |
---|---|
deep
|
Default value: False
|
predict_proba
Returns probabilities
predict_proba(X, **params)