WordEmbedding Class

Word Embeddings transform is a text featurizer which converts vectors of text tokens into sentence vectors using a pre-trained model.

Note

As WordEmbedding requires a column with text vector, e.g.

<'This', 'is', 'good'>, users need to create an input column by:

concatenating columns with TX type,

or using the output_tokens_column_name for NGramFeaturizer() to

c:\hostedtoolcache\windows\python\3.6.8\x64\lib\site-packages\nimbusml\feature_extraction\text_init_.py:docstring of nimbusml.feature_extraction.text.WordEmbedding:51: (WARNING/2) Bullet list ends without a blank line; unexpected unindent.

convert a column with sentences like "This is good" into <'This',

'is', 'good'>.

In the following example, after the NGramFeaturizer, features

named ngram.__ are generated.

A new column named ngram_TransformedText is also created with the

text vector, similar as running .split(' ').

However, due to the variable length of this column it cannot be

properly converted to pandas dataframe,

thus any pipelines/transforms output this text vector column will

throw errors. However, we use ngram_TransformedText as the input to

WordEmbedding, the

ngram_TransformedText column will be overwritten by the output from

WordEmbedding. The output from WordEmbedding is named

ngram_TransformedText.__

Inheritance
nimbusml.internal.core.feature_extraction.text._wordembedding.WordEmbedding
WordEmbedding
nimbusml.base_transform.BaseTransform
WordEmbedding
sklearn.base.TransformerMixin
WordEmbedding

Constructor

WordEmbedding(model_kind='SentimentSpecificWordEmbedding', custom_lookup_table=None, columns=None, **params)

Parameters

Name Description
columns

a dictionary of key-value pairs, where key is the output column name and value is the input column name.

  • Only one key-value pair is allowed.

  • Input column type:

    Vector Type.

  • Output column type:

    Vector Type.

  • If the output column name is same as the input column name, then

simply specify columns as a string.

The << operator can be used to set this value (see Column Operator)

For example

  • WordEmbedding(columns={'out1':'input1',)

  • WordEmbedding() << {'ou1': 'input1'}

For more details see Columns.

model_kind

Pre-trained model used to create the vocabulary. Available options are: 'GloVe50D', 'GloVe100D', 'GloVe200D', 'GloVe300D', 'GloVeTwitter25D', 'GloVeTwitter50D', 'GloVeTwitter100D', 'GloVeTwitter200D', 'FastTextWikipedia300D', 'SentimentSpecificWordEmbedding'.

custom_lookup_table

Filename for custom word embedding model.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # WordEmbedding: pre-trained DNN model
   # for text.
   from nimbusml import FileDataStream, Pipeline
   from nimbusml.datasets import get_dataset
   from nimbusml.feature_extraction.text import NGramFeaturizer, WordEmbedding
   from nimbusml.feature_extraction.text.extractor import Ngram

   # data input (as a FileDataStream)
   path = get_dataset('wiki_detox_train').as_filepath()
   data = FileDataStream.read_csv(path, sep='\t')
   print(data.head())
   #   Sentiment                                      SentimentText
   # 0          1  ==RUDE== Dude, you are rude upload that carl p...
   # 1          1  == OK! ==  IM GOING TO VANDALIZE WILD ONES WIK...
   # 2          1  Stop trolling, zapatancas, calling me a liar m...
   # 3          1  ==You're cool==  You seem like a really cool g...
   # 4          1  ::::: Why are you threatening me? I'm not bein...

   # transform usage
   pipeline = Pipeline([
       NGramFeaturizer(word_feature_extractor=Ngram(), output_tokens_column_name='ngram_TransformedText',
                       columns={'ngram': ['SentimentText']}),

       WordEmbedding(columns='ngram_TransformedText')
   ])

   # fit and transform
   features = pipeline.fit_transform(data)

   # print features
   print(features.head())
   #   Sentiment  ...       ngram.douchiest  ngram.award.
   # 0          1 ...                   0.0           0.0
   # 1          1 ...                   0.0           0.0
   # 2          1 ...                   0.0           0.0
   # 3          1 ...                   0.0           0.0
   # 4          1 ...                   0.0           0.0

Remarks

WordEmbeddings wrap different embedding models, such as Sentiment Specific Word Embedding(SSWE). Users can specify which embedding to use. The available options are various versions of GloVe Models, FastText, and Sswe.

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

Name Description
deep
Default value: False