kerasで同じコードでIndexErrorとAttributeErrorがでます

文章を学習するdeep autoencoderを書こうと試みています。
しかし、

C:\Users\yudai\Desktop\keras_AE.py:62: UserWarning: Update your `Model` call to the Keras 2 API: `Model(inputs=Tensor("in..., outputs=Tensor("de...)`
  autoencoder = Model(input=input_word, output=decoded)
Traceback (most recent call last):
  File "C:\Users\yudai\Desktop\keras_AE.py", line 70, in <module>
    shuffle=False)
  File "C:\Users\yudai\Anaconda3\envs\pyMLgpu\lib\site-packages\keras\engine\training.py", line 1039, in fit
    validation_steps=validation_steps)
  File "C:\Users\yudai\Anaconda3\envs\pyMLgpu\lib\site-packages\keras\engine\training_arrays.py", line 139, in fit_loop
    if issparse(ins[i]) and not K.is_sparse(feed[i]):
IndexError: list index out of range

と出力されます。
もし原因がわかる方がいらっしゃるならば、
何卒、ご教授宜しくお願い致します。
スタックオーバーフローでも質問しています。
マルチポストです。すみません。

追記:
https://github.com/keras-team/keras/issues/7602
より

autoencoder = Model(input=input_word, output=decoded)

を

autoencoder = Model(inputs=input_word, output=decoded)

に直しました。
しかし、同じエラーが出ます。

違うWindows 10のPCでは、
python 3.6.5
tensorflow 1.8.0
keras 2.1.5

C:\Users\hoge\Desktop\keras_AE.py:62: UserWarning: Update your Model call to the Keras 2 API: Model(inputs=Tensor("in..., outputs=Tensor("de...)
autoencoder = Model(input=input_word, output=decoded)
Traceback (most recent call last):
File "C:\Users\hoge\Desktop\keras_AE.py", line 70, in 
shuffle=False)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\keras\engine\training.py", line 1630, in fit
batch_size=batch_size)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\keras\engine\training.py", line 1487, in _standardize_user_data
in zip(y, sample_weights, class_weights, self._feed_sample_weight_modes)]
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\keras\engine\training.py", line 1486, in 
for (ref, sw, cw, mode)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\keras\engine\training.py", line 540, in _standardize_weights
return np.ones((y.shape[0],), dtype=K.floatx())
AttributeError: 'NoneType' object has no attribute 'shape'

が同じコードで違うエラーがでます。

python
1# -*- coding: utf-8 -*-
2from keras.layers import Input, Dense
3from keras.layers.core import Activation
4from keras.models import Model
5from keras.utils.data_utils import get_file
6import numpy as np
7import codecs
8
9#データの読み込み
10with codecs.open(r'C:\Users\yudai\Desktop\poem.txt', 'r', 'utf-8') as f:
11    for text in f:
12        text = text.strip()
13#コーパスの長さ
14print('corpus length:', len(text))
15#文字数を数えるため、textをソート
16chars = sorted(list(set(text)))
17#全文字数の表示
18print('total chars:', len(chars))
19#文字をID変換
20char_indices = dict((c, i) for i, c in enumerate(chars))
21#IDから文字へ変換
22indices_char = dict((i, c) for i, c in enumerate(chars))
23#テキストを17文字ずつ読み込む
24maxlen = 17
25#サンプルバッチ数
26step = 3
27sentences = []
28next_chars = []
29for i in range(0, len(text) - maxlen, step):
30    sentences.append(text[i: i + maxlen])
31    next_chars.append(text[i + maxlen])
32#学習する文字数を表示
33print('Sequences:', len)
34
35#ベクトル化する
36print('Vectorization...')
37x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
38y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
39for i, sentence in enumerate(sentences):
40    for t, char in enumerate(sentence):
41        x[i, t, char_indices[char]] = 1
42    y[i, char_indices[next_chars[i]]] = 1
43
44#モデルを構築する工程に入る
45print('Build model...')
46#encoderの次元
47encoding_dim = 128
48#入力用の変数
49input_word = Input(shape=(maxlen, len(chars)))
50#入力された語がencodeされたものを格納する
51encoded = Dense(128, activation='relu')(input_word)
52encoded = Dense(64, activation='relu')(encoded)
53encoded = Dense(32, activation='relu')(encoded)
54#潜在変数（実質的な主成分分析）
55latent = Dense(8, activation='relu')(encoded)
56#encodeされたデータを再構成
57decoded = Dense(32, activation='relu')(latent)
58decoded = Dense(64, activation='relu')(decoded)
59decoded = Dense(128, activation='relu')(encoded)
60
61output = Dense(100, activation='relu')
62
63autoencoder = Model(input=input_word, output=decoded)
64#Adamで最適化、loss関数をcategorical_crossentropy
65autoencoder.compile(optimizer='Adam', loss='categorical_crossentropy')
66
67#autoencoderの実行
68autoencoder.fit(x,
69                epochs=1000,
70                batch_size=256,
71                shuffle=False)
72#学習の進み具合を観察
73def on_epoch_end(epochs):
74    print()
75    print('Epoch: %d' % epochs)
76
77#モデルの構造を保存
78model_json = autoencoder.to_json()
79with open('keras_AE.json', 'w') as json_file:
80    json_file.write(model_json)
81#学習済みモデルの重みを保存
82autoencoder.save_weights('AE.h5')
83
84decoded_word = autoencoder.predict(word_test)
85
86X_embedded = model.predict(X_train)
87autoencoder.fit(X_embedded,X_embedded,epochs=10,
88            batch_size=256, validation_split=.1)

C:\Users\yudai\Desktop\poem.txtは、webから俳句を29000件集め、MeCabで形態素解析しております。
例：
朝霧の中に九段のともし哉
あたたかな雨が降るなり枯葎
菜の花やはつと明るき町はづれ
秋風や伊予へ流るる汐の音
長閑さや障子の穴に海見えて
若鮎の二手になりて上りけり
行く秋をすつくと鹿の立ちにけり
我声の風になりけり茸狩
毎年よ彼岸の入りに寒いのは

#環境
Windows 10

python 3.7.0
tensorflow-gpu 1.9.0
keras 2.2.4

tiitoi

2018/10/23 08:37

モデルに流す入力データの問題なので、poem.txt の中身 (沢山ある場合は数行だけでも) がないと回答するのは難しいと思います。

yep

2018/10/23 09:12

すみません。付け加えておきました。

yep

2018/10/23 10:28 編集

入力データの問題かどうかを調べるため、上に示しているpoem.txtのように漢字空間あり、漢字空間なし、ひらがな空間あり、ひらがな空間なし、また、それぞれ行替えでありなしを試してみましたが、すべて同じエラーがでてしまいます。

行動規範の内容に同意します

回答1件

ベストアンサー

まず AutoEncoder は入力と出力を同じデータにして学習するものですよね。
なので、入力が正解データでもあるわけで、fit(x) と入力しか渡していないため、'NoneType' object has no attribute 'shape' とエラーになっています。

以下のようにしましょう。

autoencoder.fit(x, x,
                epochs=1000,
                batch_size=256,
                shuffle=False)

次に入力に対して、入力と同じデータを出力させたいわけなので、出力層の形状を

decoded = Dense(128, activation='relu')(encoded)

ではなく、入力と同じ以下のようにする必要があります。

decoded = Dense(12, activation='relu')(encoded)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_7 (InputLayer)         (None, 17, 12)            0         
_________________________________________________________________
dense_48 (Dense)             (None, 17, 128)           1664      
_________________________________________________________________
dense_49 (Dense)             (None, 17, 64)            8256      
_________________________________________________________________
dense_50 (Dense)             (None, 17, 32)            2080      
_________________________________________________________________
dense_54 (Dense)             (None, 17, 12)            396       
=================================================================
Total params: 12,396
Trainable params: 12,396
Non-trainable params: 0
_________________________________________________________________

これでエラーはなくなり、一応動くようになります。
が、学習自体はうまくいっていないようです。自分は自然言語処理は門外漢なため、学習ができない原因やそもそもやろうとしているアプローチが正しいのかについては、すみませんが、アドバイスできません。

追記

test.txt
1朝霧 の 中 に 九段 の ともし 哉
2あたたか な 雨 が 降る なり 枯葎
3菜の花 や は つと 明るき 町 は づれ
4秋風 や 伊予 へ 流る る 汐 の 音
5長閑 さ や 障子 の 穴 に 海 見え て
6若鮎 の 二 手 に なりて 上り けり
7行く 秋 を す つく と 鹿 の 立ち に けり
8我 声 の 風 に なり けり 茸狩
9毎年 よ 彼岸の入り に 寒い の は

python
1import numpy as np
2import codecs
3from keras.layers import Activation, Dense, Input
4from keras.models import Model
5
6#データの読み込み
7with open(r'test.txt', encoding='utf-8') as f:
8    poems = f.read().splitlines()
9text = poems[0]  # 1個目のデータ
10print(text)
11
12# コーパスの長さ
13print('corpus length:', len(text))
14
15# 文字数を数えるため、textをソート
16chars = sorted(list(set(text)))
17
18# 全文字数の表示
19print('total chars:', len(chars))
20
21# 文字をID変換
22char_indices = dict((c, i) for i, c in enumerate(chars))
23
24# IDから文字へ変換
25indices_char = dict((i, c) for i, c in enumerate(chars))
26
27#テキストを17文字ずつ読み込む
28maxlen = 17
29#サンプルバッチ数
30step = 3
31sentences = []
32next_chars = []
33for i in range(0, len(text) - maxlen, step):
34    sentences.append(text[i: i + maxlen])
35    next_chars.append(text[i + maxlen])
36#学習する文字数を表示
37print('Sequences:', sentences)
38print('next_chars:', next_chars)
39
40#ベクトル化する
41print('Vectorization...')
42x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
43y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
44for i, sentence in enumerate(sentences):
45    for t, char in enumerate(sentence):
46        x[i, t, char_indices[char]] = 1
47    y[i, char_indices[next_chars[i]]] = 1
48
49#モデルを構築する工程に入る
50print('Build model...')
51#encoderの次元
52encoding_dim = 128
53#入力用の変数
54input_word = Input(shape=(maxlen, len(chars)))
55#入力された語がencodeされたものを格納する
56encoded = Dense(128, activation='relu')(input_word)
57encoded = Dense(64, activation='relu')(encoded)
58encoded = Dense(32, activation='relu')(encoded)
59#潜在変数（実質的な主成分分析）
60latent = Dense(8, activation='relu')(encoded)
61#encodeされたデータを再構成
62decoded = Dense(32, activation='relu')(latent)
63decoded = Dense(64, activation='relu')(decoded)
64decoded = Dense(12, activation='relu')(encoded)
65autoencoder = Model(input=input_word, output=decoded)
66# #Adamで最適化、loss関数をcategorical_crossentropy
67autoencoder.compile(optimizer='Adam', loss='categorical_crossentropy')
68autoencoder.summary()
69
70print(x.shape)
71# #autoencoderの実行
72autoencoder.fit(x, x,
73                epochs=1000,
74                batch_size=256,
75                shuffle=False)
76
77#モデルの構造を保存
78model_json = autoencoder.to_json()
79with open('keras_AE.json', 'w') as json_file:
80    json_file.write(model_json)
81#学習済みモデルの重みを保存
82autoencoder.save_weights('AE.h5')

投稿2018/10/23 11:27

編集2018/10/23 12:33

tiitoi

総合スコア21960

yep

2018/10/23 12:00

すみません。 UserWarning: Update your `Model` call to the Keras 2 API: `Model(inputs=Tensor("in..., outputs=Tensor("de...)` autoencoder = Model(inputs=input_word, output=decoded) Traceback (most recent call last): File "C:\Users\yudai\Desktop\keras_AE.py", line 71, in <module> shuffle=False) File "C:\Users\yudai\Anaconda3\envs\pyMLgpu\lib\site-packages\keras\engine\training.py", line 952, in fit batch_size=batch_size) File "C:\Users\yudai\Anaconda3\envs\pyMLgpu\lib\site-packages\keras\engine\training.py", line 789, in _standardize_user_data exception_prefix='target') File "C:\Users\yudai\Anaconda3\envs\pyMLgpu\lib\site-packages\keras\engine\training_utils.py", line 138, in standardize_input_data str(data_shape)) ValueError: Error when checking target: expected dense_7 to have shape (17, 12) but got array with shape (17, 10) と出力されます。

yep

2018/10/23 12:01

tiitoiさんが使用された環境やデータをもう少し詳しく教えてもらってもよろしいでしょうか？

tiitoi

2018/10/23 12:05

とりあえず動かしたコードを貼りました。

yep

2018/10/23 12:28

何度もエラーが起こってしまい申し訳ないです。 tiitoiさんは、mac osでしょうか？ macはこれから試します。 ubuntu 18.04,Windows 10では、 Using TensorFlow backend. 朝霧の中に九段のともし哉 corpus length: 19 total chars: 12 Sequences: ['朝霧の中に九段のともし'] next_chars: [' '] Vectorization... Traceback (most recent call last): File "/home/yudai/Desktop/keras_AE.py", line 46, in <module> x[i, t, char_indices[char]] = 1 NameError: name 'char_indices' is not defined

tiitoi

2018/10/23 12:34

Ubuntu 16.04 です。途中で変数名変えたりして、それが残っていてコードが動かない状態になってました。失礼しました。もとに戻しましたので、これでどうでしょうか。

yep

2018/10/23 12:43

うわぁ！すごい！感動しました！

yep

2018/10/23 13:42

度重なるエラーの中、本当に丁寧にありがとうございました。エラー処理の方法についても教えていただき勉強になりました。

行動規範の内容に同意します