text Package

Reference

Packages

extractor
stopwords

Classes

LightLda	The LDA transform implements LightLDA, a state-of-the-art implementation of Latent Dirichlet Allocation.
NGramExtractor	Description Produces a bag of counts of n-grams (sequences of consecutive values of length 1-n) in a given vector of keys. It does so by building a dictionary of n-grams and using the id in the dictionary as the index in the bag.
NGramFeaturizer	Text transforms that can be performed on data before training a model.
Sentiment	Scores natural language text and assesses the probability the sentiments are positive.
WordEmbedding	Word Embeddings transform is a text featurizer which converts vectors of text tokens into sentence vectors using a pre-trained model. Note As WordEmbedding requires a column with text vector, e.g. <'This', 'is', 'good'>, users need to create an input column by: concatenating columns with TX type, or using the output_tokens_column_name for NGramFeaturizer() to c:\hostedtoolcache\windows\python\3.6.8\x64\lib\site-packages\nimbusml\feature_extraction\text_init_.py:docstring of nimbusml.feature_extraction.text.WordEmbedding:51: (WARNING/2) Bullet list ends without a blank line; unexpected unindent. convert a column with sentences like "This is good" into <'This', 'is', 'good'>. In the following example, after the NGramFeaturizer, features named ngram.__ are generated. A new column named ngram_TransformedText is also created with the text vector, similar as running .split(' '). However, due to the variable length of this column it cannot be properly converted to pandas dataframe, thus any pipelines/transforms output this text vector column will throw errors. However, we use ngram_TransformedText as the input to WordEmbedding, the ngram_TransformedText column will be overwritten by the output from WordEmbedding. The output from WordEmbedding is named ngram_TransformedText.__