WordEmbedding Class

Reference

Word Embeddings transform is a text featurizer which converts vectors of text tokens into sentence vectors using a pre-trained model.

Note

As WordEmbedding requires a column with text vector, e.g.

<'This', 'is', 'good'>, users need to create an input column by:

concatenating columns with TX type,

or using the output_tokens_column_name for NGramFeaturizer() to

c:\hostedtoolcache\windows\python\3.6.8\x64\lib\site-packages\nimbusml\feature_extraction\text_init_.py:docstring of nimbusml.feature_extraction.text.WordEmbedding:51: (WARNING/2) Bullet list ends without a blank line; unexpected unindent.

convert a column with sentences like "This is good" into <'This',

'is', 'good'>.

In the following example, after the NGramFeaturizer, features

named ngram.__ are generated.

A new column named ngram_TransformedText is also created with the

text vector, similar as running .split(' ').

However, due to the variable length of this column it cannot be

properly converted to pandas dataframe,

thus any pipelines/transforms output this text vector column will

throw errors. However, we use ngram_TransformedText as the input to

WordEmbedding, the

ngram_TransformedText column will be overwritten by the output from

WordEmbedding. The output from WordEmbedding is named

ngram_TransformedText.__

Inheritance: nimbusml.internal.core.feature_extraction.text._wordembedding.WordEmbedding

WordEmbedding

nimbusml.base_transform.BaseTransform

WordEmbedding

sklearn.base.TransformerMixin

WordEmbedding

Constructor

WordEmbedding(model_kind='SentimentSpecificWordEmbedding', custom_lookup_table=None, columns=None, **params)

Parameters

Name	Description
columns	a dictionary of key-value pairs, where key is the output column name and value is the input column name. Only one key-value pair is allowed. Input column type: Vector Type. Output column type: Vector Type. If the output column name is same as the input column name, then simply specify `columns` as a string. The << operator can be used to set this value (see Column Operator) For example WordEmbedding(columns={'out1':'input1',) WordEmbedding() << {'ou1': 'input1'} For more details see Columns.
model_kind	Pre-trained model used to create the vocabulary. Available options are: 'GloVe50D', 'GloVe100D', 'GloVe200D', 'GloVe300D', 'GloVeTwitter25D', 'GloVeTwitter50D', 'GloVeTwitter100D', 'GloVeTwitter200D', 'FastTextWikipedia300D', 'SentimentSpecificWordEmbedding'.
custom_lookup_table	Filename for custom word embedding model.
params	Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # WordEmbedding: pre-trained DNN model
   # for text.
   from nimbusml import FileDataStream, Pipeline
   from nimbusml.datasets import get_dataset
   from nimbusml.feature_extraction.text import NGramFeaturizer, WordEmbedding
   from nimbusml.feature_extraction.text.extractor import Ngram

   # data input (as a FileDataStream)
   path = get_dataset('wiki_detox_train').as_filepath()
   data = FileDataStream.read_csv(path, sep='\t')
   print(data.head())
   #   Sentiment                                      SentimentText
   # 0          1  ==RUDE== Dude, you are rude upload that carl p...
   # 1          1  == OK! ==  IM GOING TO VANDALIZE WILD ONES WIK...
   # 2          1  Stop trolling, zapatancas, calling me a liar m...
   # 3          1  ==You're cool==  You seem like a really cool g...
   # 4          1  ::::: Why are you threatening me? I'm not bein...

   # transform usage
   pipeline = Pipeline([
       NGramFeaturizer(word_feature_extractor=Ngram(), output_tokens_column_name='ngram_TransformedText',
                       columns={'ngram': ['SentimentText']}),

       WordEmbedding(columns='ngram_TransformedText')
   ])

   # fit and transform
   features = pipeline.fit_transform(data)

   # print features
   print(features.head())
   #   Sentiment  ...       ngram.douchiest  ngram.award.
   # 0          1 ...                   0.0           0.0
   # 1          1 ...                   0.0           0.0
   # 2          1 ...                   0.0           0.0
   # 3          1 ...                   0.0           0.0
   # 4          1 ...                   0.0           0.0

Remarks

WordEmbeddings wrap different embedding models, such as Sentiment Specific Word Embedding(SSWE). Users can specify which embedding to use. The available options are various versions of GloVe Models, FastText, and Sswe.

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

Name	Description
deep	Default value: False

通过

WordEmbedding Class

Constructor

Parameters

Examples

Remarks

Methods

get_params

Parameters