text Package
Packages
extractor | |
stopwords |
Classes
LightLda |
The LDA transform implements LightLDA, a state-of-the-art implementation of Latent Dirichlet Allocation. |
NGramExtractor |
Description Produces a bag of counts of n-grams (sequences of consecutive values of length 1-n) in a given vector of keys. It does so by building a dictionary of n-grams and using the id in the dictionary as the index in the bag. |
NGramFeaturizer |
Text transforms that can be performed on data before training a model. |
Sentiment |
Scores natural language text and assesses the probability the sentiments are positive. |
WordEmbedding |
Word Embeddings transform is a text featurizer which converts vectors of text tokens into sentence vectors using a pre-trained model. Note As WordEmbedding requires a column with text vector, e.g. <'This', 'is', 'good'>, users need to create an input column by: concatenating columns with TX type, or using the output_tokens_column_name for NGramFeaturizer() to c:\hostedtoolcache\windows\python\3.6.8\x64\lib\site-packages\nimbusml\feature_extraction\text_init_.py:docstring of nimbusml.feature_extraction.text.WordEmbedding:51: (WARNING/2) Bullet list ends without a blank line; unexpected unindent. convert a column with sentences like "This is good" into <'This', 'is', 'good'>. In the following example, after the NGramFeaturizer, features named ngram.__ are generated. A new column named ngram_TransformedText is also created with the text vector, similar as running .split(' '). However, due to the variable length of this column it cannot be properly converted to pandas dataframe, thus any pipelines/transforms output this text vector column will throw errors. However, we use ngram_TransformedText as the input to WordEmbedding, the ngram_TransformedText column will be overwritten by the output from WordEmbedding. The output from WordEmbedding is named ngram_TransformedText.__ |