tensorflow model.predict() エラー

前提・実現したいこと

実現したいことは、入力として画像を与えると、学習させた文調のテキストを出力することです。
既に訓練が完了したモデルは2入力,1出力のモデルです。
入力の詳細:input_1:(4096,),input_2:(199,)
の画像特徴量と単語ベクトルです。
自分で調べた結果データの形に問題はなさそうです。
二つの入力の渡し方に問題がある可能性が高いです。
この問題を解決する方法を、どなたか教えてください。

発生している問題・エラーメッセージ

ソースコードのmodel.predictの箇所でエラーが発生します。

Error when checking input: expected input_1 to have shape (4096,) but got array with shape (1,)

該当のソースコード

def Inference(self, test_image):
        image_feature = self.GetImageFeature(test_image, self.feature_extractor)
        text = "startseq"
        for i in range(self.max_length):
            seq = self.tokenizer.texts_to_sequences([text])[0]
            seq = pad_sequences([seq], maxlen=self.max_length)
            yhat = self.model.predict([image_feature, seq[0]])#ここでエラー
            yhat = argmax(yhat)
            word = self.IDToWord(yhat)#単語token→日本語
            if word is None:
                break
            text += " " + word
            if word == "endseq":
                break
        return text

試したこと

print(image_feature.shape, seq[0].shape)の結果は(4096,),(199,)
model.summary()と一致している為、データの形はあっている。
2.
model.predict({'input_1':image_feature,'input_2':seq[0]})
としても、まったく同じエラーをはきました。