CountSelector Class
Selects the features for which the count of non-default values is greater than or equal to a threshold.
- Inheritance
-
nimbusml.internal.core.feature_selection._countselector.CountSelectorCountSelectornimbusml.base_transform.BaseTransformCountSelectorsklearn.base.TransformerMixinCountSelector
Constructor
CountSelector(count=1, columns=None, **params)
Parameters
Name | Description |
---|---|
columns
|
see Columns. |
count
|
The threshold for count based feature selection. A feature
is
selected if and only if at least |
params
|
Additional arguments sent to compute engine. |
Examples
###############################################################################
# OneHotHashVectorizer
from nimbusml import FileDataStream, Pipeline
from nimbusml.datasets import get_dataset
from nimbusml.feature_extraction.categorical import OneHotHashVectorizer
from nimbusml.feature_selection import CountSelector
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path, sep=',')
print(data.head())
# age case education induced parity ... row_num spontaneous ...
# 0 26 1 0-5yrs 1 6 ... 1 2 ...
# 1 42 1 0-5yrs 1 1 ... 2 0 ...
# 2 39 1 0-5yrs 2 6 ... 3 0 ...
# 3 34 1 0-5yrs 2 4 ... 4 0 ...
# 4 35 1 6-11yrs 1 3 ... 5 1 ...
pip = Pipeline([
OneHotHashVectorizer(columns={'edu': 'education'}, number_of_bits=2),
CountSelector(count=5, columns=['edu'])
])
features_selection = pip.fit_transform(data)
print(features_selection.head())
# age case edu.0 edu.1 education induced parity pooled.stratum ...
# 0 26 1 0.0 1.0 0-5yrs 1 6 3 ...
# 1 42 1 0.0 1.0 0-5yrs 1 1 1 ...
# 2 39 1 0.0 1.0 0-5yrs 2 6 4 ...
# 3 34 1 0.0 1.0 0-5yrs 2 4 2 ...
# 4 35 1 1.0 0.0 6-11yrs 1 3 32 ...
Remarks
When using the count mode in feature selection transform, a feature is selected if the number of examples have at least the specified count examples of non-default values in the feature. The count mode feature selection transform is very useful when applied together with a categorical hash transform (see also, OneHotHashVectorizer ). The count feature selection can remove those features generated by hash transform that have no data in the examples.
Methods
get_params |
Get the parameters for this operator. |
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
Name | Description |
---|---|
deep
|
Default value: False
|