NGramExtractor Class

Description Produces a bag of counts of n-grams (sequences of consecutive values of length 1-n) in a given vector of keys. It does so by building a dictionary of n-grams and using the id in the dictionary as the index in the bag.

Inheritance
nimbusml.internal.core.feature_extraction.text._ngramextractor.NGramExtractor
NGramExtractor
nimbusml.base_transform.BaseTransform
NGramExtractor
sklearn.base.TransformerMixin
NGramExtractor

Constructor

NGramExtractor(ngram_length=2, all_lengths=True, skip_length=0, max_num_terms=[10000000], weighting='Tf', columns=None, **params)

Parameters

Name Description
columns

see Columns.

ngram_length

Maximum n-gram length.

all_lengths

Whether to store all n-gram lengths up to ngramLength, or only ngramLength.

skip_length

Maximum number of tokens to skip when constructing an n-gram.

max_num_terms

Maximum number of n-grams to store in the dictionary.

weighting

The weighting criteria.

params

Additional arguments sent to compute engine.

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

Name Description
deep
Default value: False