PcaTransformer Class

Pca Transformer

Inheritance
nimbusml.internal.core.decomposition._pcatransformer.PcaTransformer
PcaTransformer
nimbusml.base_transform.BaseTransform
PcaTransformer
sklearn.base.TransformerMixin
PcaTransformer

Constructor

PcaTransformer(rank=20, oversampling=20, center=True, random_state=0, weight=None, columns=None, **params)

Parameters

Name Description
weight

The PCA transform can take into account a weight for each row. To use weights, the input must contain a weight column, whose name is specified using this parameter. See Columns for syntax.

columns

see Columns. If users specify mutiple non-Vector Type columns as input, PCA will select n features (in total) from the selected columns. The output columns will be named after the first selected column followed by the slot number. Users can also apply PCA to a set of Vector Type columns. In this case, PCA will applies to each of the columns, and this transform will generate n principle components for each of the column.

rank

The number of components in the PCA. The default value is 20.

oversampling

Oversampling parameter for randomized PCA training.

center

If enabled, data is centered to be zero mean.

random_state

The seed for random number generation.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # PcaTransformer
   import numpy
   from nimbusml import FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.decomposition import PcaTransformer

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()

   data = FileDataStream.read_csv(path, sep=',', numeric_dtype=numpy.float32)

   # transform data
   feature_columns = ['age', 'parity', 'induced', 'spontaneous']

   pipe = PcaTransformer(rank=3, columns={'features': feature_columns})

   print(pipe.fit_transform(data).head())
   #     age  case education  features.0  features.1  features.2  induced  ...
   # 0  26.0   1.0    0-5yrs   -5.675901   -3.964389   -1.031570      1.0  ...
   # 1  42.0   1.0    0-5yrs   10.364552    0.875251    0.773911      1.0  ...
   # 2  39.0   1.0    0-5yrs    7.336117   -4.073389    1.128798      2.0  ...
   # 3  34.0   1.0    0-5yrs    2.340584   -2.130528    1.248973      2.0  ...
   # 4  35.0   1.0   6-11yrs    3.343876   -1.088401   -0.100063      1.0  ...

Remarks

Principle Component Analysis (PCA) is a dimensionality-reduction transform which computes the projection of the feature vector to onto a low-rank subspace. Its training is done using the technique described in the paper Combining Structured and Unstructured Randomness in Large Scale PCA by Nikos Karampatziakis and Paul Mineiro, and the paper Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions by N. Halko et al.

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

Name Description
deep
Default value: False