AttentionモデルのDecoder Inputについて

前提・実現したいこと

現在、Tensorflowを利用して深層学習(系列変換モデル)について学んでおります。
https://www.tensorflow.org/beta/tutorials/text/image_captioning
Tensorflowチュートリアル（上記URLページ）にある、画像のキャプション生成を行なっているのですが、
Decoderのモデルのcall関数について、疑問をもったので質問させてください。

発生している問題・エラーメッセージ

上記ページ内ではshow, attend and tellという論文に基づいて、Attention機構を導入したCNNEncoder+RNNDecoderでキャプションを生成するというモデルになっています。
私の理解では、時刻tにおけるDecoderRNNへの入力となるのはx(t)と前時刻のRNNの出力(隠れ層の出力)であるh(t-1)であり、これにAttentionを導入した場合はh(t-1)はAttentionとの重み付き平均で表されるものだと思っていたのですが、以下のような実装でも同じ出力となるのでしょうか。

以下、チュートリアルページに記載のあったDecoderクラスです。

python
1class RNN_Decoder(tf.keras.Model):
2  def __init__(self, embedding_dim, units, vocab_size):
3    super(RNN_Decoder, self).__init__()
4    self.units = units
5
6    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
7    self.gru = tf.keras.layers.GRU(self.units,
8                                   return_sequences=True,
9                                   return_state=True,
10                                   recurrent_initializer='glorot_uniform')
11    self.fc1 = tf.keras.layers.Dense(self.units)
12    self.fc2 = tf.keras.layers.Dense(vocab_size)
13
14    self.attention = BahdanauAttention(self.units)
15
16  def call(self, x, features, hidden):
17    # defining attention as a separate model
18    context_vector, attention_weights = self.attention(features, hidden)
19
20    # x shape after passing through embedding == (batch_size, 1, embedding_dim)
21    x = self.embedding(x)
22
23    # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
24    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
25
26    # passing the concatenated vector to the GRU
27    output, state = self.gru(x)
28
29    # shape == (batch_size, max_length, hidden_size)
30    x = self.fc1(output)
31
32    # x shape == (batch_size * max_length, hidden_size)
33    x = tf.reshape(x, (-1, x.shape[2]))
34
35    # output shape == (batch_size * max_length, vocab)
36    x = self.fc2(x)
37
38    return x, state, attention_weights
39
40  def reset_state(self, batch_size):
41    return tf.zeros((batch_size, self.units))

上記のcall関数内で、attentionと埋め込みを得るところ(x=self.embedding(x)のところ)まではわかるのですが、その後attentionとxをconcatしてself.gruへ入力するというところに疑問をもっています。
context_vectorはencoder_output（CNNが出力した特徴量)と(t-1)のdecoder_hiddenから得たattentionを表しています。また、self.gru(x)のとき、つまり、initial_stateがNoneの時、これは零ベクトルで計算されるようになっているようです。
私自身の理解ではself.gru(x, initial_state=context_vector)となるのではないかなと思っていたのですが、上記のような記述でも同じ出力が得られるのでしょうか？
Encoder-Decoderモデルについて詳しい方がいらっしゃったら、どうかご教授のほどお願いいたします。