PcaTransformer Class
Pca Transformer
- Inheritance
-
nimbusml.internal.core.decomposition._pcatransformer.PcaTransformerPcaTransformernimbusml.base_transform.BaseTransformPcaTransformersklearn.base.TransformerMixinPcaTransformer
Constructor
PcaTransformer(rank=20, oversampling=20, center=True, random_state=0, weight=None, columns=None, **params)
Parameters
Name | Description |
---|---|
weight
|
The PCA transform can take into account a weight for each row. To use weights, the input must contain a weight column, whose name is specified using this parameter. See Columns for syntax. |
columns
|
see Columns. If users specify mutiple non-Vector Type columns as input, PCA will select n features (in total) from the selected columns. The output columns will be named after the first selected column followed by the slot number. Users can also apply PCA to a set of Vector Type columns. In this case, PCA will applies to each of the columns, and this transform will generate n principle components for each of the column. |
rank
|
The number of components in the PCA. The default value is 20. |
oversampling
|
Oversampling parameter for randomized PCA training. |
center
|
If enabled, data is centered to be zero mean. |
random_state
|
The seed for random number generation. |
params
|
Additional arguments sent to compute engine. |
Examples
###############################################################################
# PcaTransformer
import numpy
from nimbusml import FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.decomposition import PcaTransformer
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path, sep=',', numeric_dtype=numpy.float32)
# transform data
feature_columns = ['age', 'parity', 'induced', 'spontaneous']
pipe = PcaTransformer(rank=3, columns={'features': feature_columns})
print(pipe.fit_transform(data).head())
# age case education features.0 features.1 features.2 induced ...
# 0 26.0 1.0 0-5yrs -5.675901 -3.964389 -1.031570 1.0 ...
# 1 42.0 1.0 0-5yrs 10.364552 0.875251 0.773911 1.0 ...
# 2 39.0 1.0 0-5yrs 7.336117 -4.073389 1.128798 2.0 ...
# 3 34.0 1.0 0-5yrs 2.340584 -2.130528 1.248973 2.0 ...
# 4 35.0 1.0 6-11yrs 3.343876 -1.088401 -0.100063 1.0 ...
Remarks
Principle Component Analysis (PCA) is a dimensionality-reduction transform which computes the projection of the feature vector to onto a low-rank subspace. Its training is done using the technique described in the paper Combining Structured and Unstructured Randomness in Large Scale PCA by Nikos Karampatziakis and Paul Mineiro, and the paper Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions by N. Halko et al.
Methods
get_params |
Get the parameters for this operator. |
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
Name | Description |
---|---|
deep
|
Default value: False
|