Kerasを用いた文章生成プログラムが思った通りの動作をしない

前提・実現したいこと

Kerasのサンプルコード(リンク内容)を参考にしながら漢詩を単語単位で自動生成するプログラムを作っているのですが、それを実行してもサンプルコードを実行したときと同じような動作をしません。

どのようにすればサンプルコードと同様の動作をするようになるでしょうか？詳しく教えていただきたいです。

ちなみに、ソースコード上で読み込んでいる、“quantangshi_data.txt”には以下のように合計１０万字程度の漢詩のデータが入っています

暧暧去尘昏灞岸飞飞轻盖指河梁云峰衣结千重叶雪岫花开几树妆深悲黄鹤孤舟远独叹青山别路长聊将分袂沾巾泪还用持添离席觞神皋福地...

発生している問題・エラーメッセージ

実行すると以下のように表示されます。（あまりにも長いので途中で省略しています）

Using TensorFlow backend.
データ単語数: 15000
データセット総数: 14997
ベクトルに変換...
モデルの作成...
Epoch 1/3
2019-01-06 15:51:36.297132: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-01-06 15:51:37.137953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.176
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 1.64GiB
2019-01-06 15:51:37.143904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-06 15:51:38.316119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-06 15:51:38.319944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2019-01-06 15:51:38.321841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2019-01-06 15:51:38.323895: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1394 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)
  128/14997 [..............................] - ETA: 8:27 - loss: 9.6157
----- 生成時までに完了したEpoch数: 0
----- diversity: 0.2
----- 最初の句または単語:"照雪下玉关虏箭"
照雪下玉关虏箭白帝浪淘暖风徐籍中谁人会电跃天上桥北莫谩烟波侧空房薄暮
----- diversity: 0.5
----- 最初の句または単語:"照雪下玉关虏箭"
照雪下玉关虏箭白纻霜气今如此景福两不见和烟娇波发薄昔为绣被掩袖童稚
  256/14997 [..............................] - ETA: 4:36 - loss: 9.6155
----- 生成時までに完了したEpoch数: 1
----- diversity: 0.2
----- 最初の句または単語:"不少钱能骑骏马"
不少钱能骑骏马衔石遗星辰腰间子卿回朱轮衔红巾迁客襄王潮水白虬行人将开
----- diversity: 0.5
----- 最初の句または単語:"不少钱能骑骏马"
不少钱能骑骏马一身忆昔作天地此时箫竽衫袖一自角弓古人怨胡天燕雁协奏
C:\Users\ユーザー名\AppData\Local\conda\conda\envs\tensorflow_gpu\lib\site-packages\keras\callbacks.py:122: UserWarning: Method on_batch_end() is slow compared to the batch update (0.272442). Check your callbacks.
  % delta_t_median)
  384/14997 [..............................] - ETA: 3:15 - loss: 9.6158
----- 生成時までに完了したEpoch数: 2
----- diversity: 0.2
----- 最初の句または単語:"青蛾尚未衰莫道"
青蛾尚未衰莫道忆蛾眉十二年坐北堂学画眉不意少知音无所忧多下泪女儿六月寒似中万涂侵
----- diversity: 0.5
----- 最初の句または単語:"青蛾尚未衰莫道"
青蛾尚未衰莫道王母制北胡挝鼓须臾留腾兮在上为人子饮一杯生红纬柳叶月明刘白
C:\Users\ユーザー名\AppData\Local\conda\conda\envs\tensorflow_gpu\lib\site-packages\keras\callbacks.py:122: UserWarning: Method on_batch_end() is slow compared to the batch update (0.223871). Check your callbacks.
  % delta_t_median)
  512/14997 [>.............................] - ETA: 2:35 - loss: 9.6152
----- 生成時までに完了したEpoch数: 3
----- diversity: 0.2
----- 最初の句または単語:"钟鼓馔玉不足贵"
钟鼓馔玉不足贵游人漾春色白日服药走红鸣珂心里是时银汉低圣德远幼妹梅香
----- diversity: 0.5
----- 最初の句または単語:"钟鼓馔玉不足贵"
钟鼓馔玉不足贵春光少在人间颜色秋壁花骢独能久共笑滩急决汉红英长日冲风
C:\Users\ユーザー名\AppData\Local\conda\conda\envs\tensorflow_gpu\lib\site-packages\keras\callbacks.py:122: UserWarning: Method on_batch_end() is slow compared to the batch update (0.224871). Check your callbacks.
  % delta_t_median)

///（同様のものが何度も繰り返される）///

----- 生成時までに完了したEpoch数: 117
----- diversity: 0.2
----- 最初の句または単語:"牧马胡雏小日暮"
牧马胡雏小日暮千里千里当时杨柳当时年年将军年年少年年年年年长安
----- diversity: 0.5
----- 最初の句または単語:"牧马胡雏小日暮"
牧马胡雏小日暮千里千里芙蓉柳带杨柳生前千里凤凰少妇盛银罂谁数天子
C:\Users\ユーザー名\AppData\Local\conda\conda\envs\tensorflow_gpu\lib\site-packages\keras\callbacks.py:122: UserWarning: Method on_batch_end() is slow compared to the batch update (0.236926). Check your callbacks.
  % delta_t_median)
14997/14997 [==============================] - 41s 3ms/step - loss: 8.7363

該当のソースコード（Anaconda上で実行）

Epoch数についてですが、動作の確認のために3に設定しています。

Python3
1# coding: utf-8
2from keras.callbacks import LambdaCallback
3from keras.models import Sequential
4from keras.layers import Dense, LSTM
5from keras.optimizers import RMSprop
6import numpy as np
7import random
8import sys
9import io
10import gc
11import os
12 
13
14x_exists = os.path.exists('./x')
15y_exists = os.path.exists('./y')
16if x_exists == True and y_exists == True:
17    os.remove('./x')
18    os.remove('./y')
19
20
21# データの読み込み
22Path = './quantangshi_data.txt'
23with io.open(Path, 'r', encoding='utf-8') as f:
24    text = f.read()
25    chars = 35000
26    rawtext = text[0:chars]
27
28
29# データを2文字、２文字、３文字に分割
30text = []
31start = 0
32end = 0
33count = 1
34while end <= len(rawtext):
35    if count % 3 != 0:
36        end += 2
37        text.append(rawtext[start:end])
38        start += 2
39        count += 1
40    else:
41        end += 3
42        text.append(rawtext[start:end])
43        start += 3
44        count += 1
45text.pop()
46print('データ単語数:', len(text))
47
48
49del rawtext, Path
50gc.collect()
51
52
53words = text
54word_indices = dict((w, i) for i, w in enumerate(words))    # 単語辞書を作成
55indices_word = dict((i, w) for i, w in enumerate(words))    # 逆引き辞書を作成
56
57
58# データセットの作成
59maxlen = 3
60step = 1
61sentences = []
62next_words = []
63for i in range(0, len(text) - maxlen, step):
64    sentences.append(text[i: i + maxlen])
65    next_words.append(text[i + maxlen])
66print('データセット総数:', len(sentences))
67
68
69# ワンホット表現に変換
70print('ベクトルに変換...')
71mode = 'w+'
72x = np.memmap('x', dtype=np.bool, mode=mode, shape=(len(sentences), maxlen, len(words)))
73y = np.memmap('y', dtype=np.bool, mode=mode, shape=(len(sentences), len(words)))
74for i, sentence in enumerate(sentences):
75    for t, word in enumerate(sentence):
76        x[i, t, word_indices[word]] = 1
77    y[i, word_indices[next_words[i]]] = 1
78
79
80# モデルの作成
81print('モデルの作成...')
82model = Sequential()
83model.add(LSTM(128, input_shape=(maxlen, len(words))))
84model.add(Dense(len(words), activation='softmax'))
85
86optimizier = RMSprop(lr=0.01)
87model.compile(loss='categorical_crossentropy', optimizer=optimizier)
88
89
90# 各単語の出現確率の配列から、出力する単語を選んでインデックスを返す関数
91def sample(preds, temperature=1.0):
92    preds = np.asarray(preds).astype('float64')
93    preds = np.log(preds) / temperature
94    exp_preds = np.exp(preds)
95    preds = exp_preds / np.sum(exp_preds)
96    probas = np.random.multinomial(1, preds, 1)
97    return np.argmax(probas)
98
99
100# 各epochの終了時に生成された文章を表示する関数
101def on_epoch_end(epoch, _):
102    print()
103    print('----- 生成時までに完了したEpoch数: %d' % epoch)
104
105    start_index = random.randint(0, len(text) - maxlen - 1)
106    for diversity in [0.2, 0.5]:
107        print('----- diversity:', diversity)
108
109        generated = ''
110        sentence = text[start_index: start_index + maxlen]
111        generated += ''.join(sentence)
112        print('----- 最初の句または単語:"' + ''.join(sentence) + '"')
113        sys.stdout.write(generated)
114
115        for i in range(12):
116            x_pred = np.zeros((1, maxlen, len(words)))
117            for t, word in enumerate(sentence):
118                x_pred[0, t, word_indices[word]] = 1.
119
120            preds = model.predict(x_pred, verbose=0)[0]
121            next_index = sample(preds, diversity)
122            next_word = indices_word[next_index]
123
124            generated += next_word
125            sentence = sentence[1:]
126            sentence.append(next_word)
127
128            sys.stdout.write(next_word)
129            sys.stdout.flush()
130        print()
131
132
133print_callback = LambdaCallback(on_batch_end=on_epoch_end)
134
135
136# モデルのフィッティング
137model.fit(x, y,
138          batch_size=128,
139          epochs=3,
140          callbacks=[print_callback])

行動規範の内容に同意します

回答1件

自己解決

ソースコードの136行目を以下のように変更することで想定通りの動作をしました。
どうやら打ち間違えていたようです。

python3
1
2print_callback = LambdaCallback(on_epoch_end=on_epoch_end)
3

投稿2019/01/13 13:38

taka104n0

総合スコア6

あなたの回答

tips

プレビュー

行動規範の内容に同意します

質問の解決につながる回答をしましょう。サンプルコードなど、より具体的な説明があると質問者の理解の助けになります。また、読む側のことを考えた、分かりやすい文章を心がけましょう。

15分調べてもわからないことは
teratailで質問しよう！

ただいまの回答率
85.48%

質問をまとめることで
思考を整理して素早く解決

テンプレート機能で
簡単に質問をまとめる

質問する

質問をすることでしか得られない、回答やアドバイスがある。

15分調べてもわからないことは、質問しよう！