ColumnConcatenator Class

Combines several columns into a single vector-valued column.

Inheritance
nimbusml.internal.core.preprocessing.schema._columnconcatenator.ColumnConcatenator
ColumnConcatenator
nimbusml.base_transform.BaseTransform
ColumnConcatenator
sklearn.base.TransformerMixin
ColumnConcatenator

Constructor

ColumnConcatenator(columns=None, **params)

Parameters

Name Description
columns

a dictionary of key-value pairs, where key is the output column name and value is a list of input column names.

  • Only one key-value pair is allowed.

  • Input column type: numeric or string.

  • Output column type:

Vector Type.

The << operator can be used to set this value (see Column Operator)

For example

  • ColumnConcatenator(columns={'features': ['age', 'parity',

'induced']})

  • ColumnConcatenator() << {'features': ['age', 'parity',

'induced']})

For more details see Columns.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # ColumnConcatenator
   import numpy
   from nimbusml import FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.preprocessing.schema import ColumnConcatenator

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()

   data = FileDataStream.read_csv(path, sep=',', numeric_dtype=numpy.float32)
   print(data.head())
   #    age  case education  induced  parity  pooled.stratum  row_num  ...
   # 0  26.0   1.0    0-5yrs      1.0     6.0             3.0      1.0  ...
   # 1  42.0   1.0    0-5yrs      1.0     1.0             1.0      2.0  ...
   # 2  39.0   1.0    0-5yrs      2.0     6.0             4.0      3.0  ...
   # 3  34.0   1.0    0-5yrs      2.0     4.0             2.0      4.0  ...
   # 4  35.0   1.0   6-11yrs      1.0     3.0            32.0      5.0  ...

   # transform usage
   xf = ColumnConcatenator(columns={'features': ['age', 'parity', 'induced']})

   # fit and transform
   features = xf.fit_transform(data)

   # print features
   print(features.head())
   # Feature is a vectory type column, when a dataset with vectortype column is
   # the final output, the vector column will convert into multiple columns for
   # each slot.
   #    age  case education  features.age  features.induced  features.parity  ...
   # 0  26.0   1.0    0-5yrs          26.0               1.0              6.0  ...
   # 1  42.0   1.0    0-5yrs          42.0               1.0              1.0  ...
   # 2  39.0   1.0    0-5yrs          39.0               2.0              6.0  ...
   # 3  34.0   1.0    0-5yrs          34.0               2.0              4.0  ...
   # 4  35.0   1.0   6-11yrs          35.0               1.0              3.0  ...

Remarks

ColumnConcatenator creates a single vector-valued column from multiple columns. It can be performed on data before training a model. The concatenation can significantly speed up the processing of data when the number of columns is as large as hundreds to thousands.

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

Name Description
deep
Default value: False