GRUCell#
Versioned name: GRUCell-3
Category: Sequence processing
Short description: GRUCell represents a single GRU Cell that computes the output using the formula described in the paper.
Detailed description: GRUCell computes the output Ht for the current time step based on the followint formula:
Formula:
  *  - matrix multiplication
 (.) - Hadamard product(element-wise)
 [,] - concatenation
  f, g - are activation functions.
   zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)
   rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)
   ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0
   ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0
   Ht = (1 - zt) (.) ht + zt (.) Ht-1
Attributes
- hidden_size - Description: hidden_size specifies hidden state size. 
- Range of values: a positive integer 
- Type: - int
- Required: yes 
 
- activations - Description: activation functions for gates 
- Range of values: any combination of relu, sigmoid, tanh 
- Type: a list of strings 
- Default value: sigmoid for f, tanh for g 
- Required: no 
 
- activations_alpha, activations_beta - Description: activations_alpha, activations_beta functions attributes 
- Range of values: a list of floating-point numbers 
- Type: - float[]
- Default value: None 
- Required: no 
 
- clip - Description: clip specifies value for tensor clipping to be in [-C, C] before activations 
- Range of values: a positive floating-point number 
- Type: - float
- Default value: infinity that means that the clipping is not applied 
- Required: no 
 
- linear_before_reset - Description: linear_before_reset flag denotes if the layer behaves according to the modification of GRUCell described in the formula in the ONNX documentation. 
- Range of values: true or false 
- Type: - boolean
- Default value: false 
- Required: no 
 
Inputs
- 1: - X- 2D tensor of type T- [batch_size, input_size], input data. Required.
- 2: - initial_hidden_state- 2D tensor of type T- [batch_size, hidden_size]. Required.
- 3: - W- 2D tensor of type T- [3 * hidden_size, input_size], the weights for matrix multiplication, gate order: zrh. Required.
- 4: - R- 2D tensor of type T- [3 * hidden_size, hidden_size], the recurrence weights for matrix multiplication, gate order: zrh. Required.
- 5: - B- 1D tensor of type T. If linear_before_reset is set to 1, then the shape is- [4 * hidden_size]- the sum of biases for z and r gates (weights and recurrence weights), the biases for h gate are placed separately. Otherwise the shape is- [3 * hidden_size], the sum of biases (weights and recurrence weights). Optional.
Outputs
- 1: - Ho- 2D tensor of type T- [batch_size, hidden_size], the last output value of hidden state.
Types
- T: any supported floating-point type. 
Example
<layer ... type="GRUCell" ...>
    <data hidden_size="128" linear_before_reset="1"/>
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>16</dim>
        </port>
        <port id="1">
            <dim>1</dim>
            <dim>128</dim>
        </port>
         <port id="2">
            <dim>384</dim>
            <dim>16</dim>
        </port>
         <port id="3">
            <dim>384</dim>
            <dim>128</dim>
        </port>
         <port id="4">
            <dim>768</dim>
        </port>
    </input>
    <output>
        <port id="5">
            <dim>1</dim>
            <dim>128</dim>
        </port>
    </output>
</layer>