CountSelector Class

Selects the features for which the count of non-default values is greater than or equal to a threshold.

Inheritance
nimbusml.internal.core.feature_selection._countselector.CountSelector
CountSelector
nimbusml.base_transform.BaseTransform
CountSelector
sklearn.base.TransformerMixin
CountSelector

Constructor

CountSelector(count=1, columns=None, **params)

Parameters

Name Description
columns

see Columns.

count

The threshold for count based feature selection. A feature is selected if and only if at least count examples have non-default value in the feature. The default value is 1.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # OneHotHashVectorizer
   from nimbusml import FileDataStream, Pipeline
   from nimbusml.datasets import get_dataset
   from nimbusml.feature_extraction.categorical import OneHotHashVectorizer
   from nimbusml.feature_selection import CountSelector

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()
   data = FileDataStream.read_csv(path, sep=',')
   print(data.head())
   #   age  case education  induced  parity  ... row_num  spontaneous  ...
   # 0   26     1    0-5yrs        1       6 ...       1            2  ...
   # 1   42     1    0-5yrs        1       1 ...       2            0  ...
   # 2   39     1    0-5yrs        2       6 ...       3            0  ...
   # 3   34     1    0-5yrs        2       4 ...       4            0  ...
   # 4   35     1   6-11yrs        1       3 ...       5            1  ...


   pip = Pipeline([
       OneHotHashVectorizer(columns={'edu': 'education'}, number_of_bits=2),
       CountSelector(count=5, columns=['edu'])
   ])
   features_selection = pip.fit_transform(data)
   print(features_selection.head())
   #   age  case  edu.0  edu.1 education  induced  parity  pooled.stratum  ...
   # 0   26     1    0.0    1.0    0-5yrs        1       6               3  ...
   # 1   42     1    0.0    1.0    0-5yrs        1       1               1  ...
   # 2   39     1    0.0    1.0    0-5yrs        2       6               4  ...
   # 3   34     1    0.0    1.0    0-5yrs        2       4               2  ...
   # 4   35     1    1.0    0.0   6-11yrs        1       3              32  ...

Remarks

When using the count mode in feature selection transform, a feature is selected if the number of examples have at least the specified count examples of non-default values in the feature. The count mode feature selection transform is very useful when applied together with a categorical hash transform (see also, OneHotHashVectorizer ). The count feature selection can remove those features generated by hash transform that have no data in the examples.

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

Name Description
deep
Default value: False