回答編集履歴

2018/10/23 12:33

投稿

スコア21960

answer CHANGED Viewed

@@ -80,7 +80,7 @@
 print('total chars:', len(chars))
 # 文字をID変換
-char_to_id = dict((c, i) for i, c in enumerate(chars))
+char_indices = dict((c, i) for i, c in enumerate(chars))
 # IDから文字へ変換
 indices_char = dict((i, c) for i, c in enumerate(chars))

2018/10/23 12:32

投稿

tiitoi

スコア21960

answer CHANGED Viewed

@@ -42,4 +42,103 @@
 ```
 これでエラーはなくなり、一応動くようになります。
-が、学習自体はうまくいっていないようです。自分は自然言語処理は門外漢なため、学習ができない原因やそもそもやろうとしているアプローチが正しいのかについては、すみませんが、アドバイスできません。
+が、学習自体はうまくいっていないようです。自分は自然言語処理は門外漢なため、学習ができない原因やそもそもやろうとしているアプローチが正しいのかについては、すみませんが、アドバイスできません。
+## 追記
+```test.txt
+朝霧 の 中 に 九段 の ともし 哉
+あたたか な 雨 が 降る なり 枯葎
+菜の花 や は つと 明るき 町 は づれ
+秋風 や 伊予 へ 流る る 汐 の 音
+長閑 さ や 障子 の 穴 に 海 見え て
+若鮎 の 二 手 に なりて 上り けり
+行く 秋 を す つく と 鹿 の 立ち に けり
+我 声 の 風 に なり けり 茸狩
+毎年 よ 彼岸の入り に 寒い の は
+```
+```python
+import numpy as np
+import codecs
+from keras.layers import Activation, Dense, Input
+from keras.models import Model
+#データの読み込み
+with open(r'test.txt', encoding='utf-8') as f:
+    poems = f.read().splitlines()
+text = poems[0]  # 1個目のデータ
+print(text)
+# コーパスの長さ
+print('corpus length:', len(text))
+# 文字数を数えるため、textをソート
+chars = sorted(list(set(text)))
+# 全文字数の表示
+print('total chars:', len(chars))
+# 文字をID変換
+char_to_id = dict((c, i) for i, c in enumerate(chars))
+# IDから文字へ変換
+indices_char = dict((i, c) for i, c in enumerate(chars))
+#テキストを17文字ずつ読み込む
+maxlen = 17
+#サンプルバッチ数
+step = 3
+sentences = []
+next_chars = []
+for i in range(0, len(text) - maxlen, step):
+    sentences.append(text[i: i + maxlen])
+    next_chars.append(text[i + maxlen])
+#学習する文字数を表示
+print('Sequences:', sentences)
+print('next_chars:', next_chars)
+#ベクトル化する
+print('Vectorization...')
+x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
+y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
+for i, sentence in enumerate(sentences):
+    for t, char in enumerate(sentence):
+        x[i, t, char_indices[char]] = 1
+    y[i, char_indices[next_chars[i]]] = 1
+#モデルを構築する工程に入る
+print('Build model...')
+#encoderの次元
+encoding_dim = 128
+#入力用の変数
+input_word = Input(shape=(maxlen, len(chars)))
+#入力された語がencodeされたものを格納する
+encoded = Dense(128, activation='relu')(input_word)
+encoded = Dense(64, activation='relu')(encoded)
+encoded = Dense(32, activation='relu')(encoded)
+#潜在変数（実質的な主成分分析）
+latent = Dense(8, activation='relu')(encoded)
+#encodeされたデータを再構成
+decoded = Dense(32, activation='relu')(latent)
+decoded = Dense(64, activation='relu')(decoded)
+decoded = Dense(12, activation='relu')(encoded)
+autoencoder = Model(input=input_word, output=decoded)
+# #Adamで最適化、loss関数をcategorical_crossentropy
+autoencoder.compile(optimizer='Adam', loss='categorical_crossentropy')
+autoencoder.summary()
+print(x.shape)
+# #autoencoderの実行
+autoencoder.fit(x, x,
+                epochs=1000,
+                batch_size=256,
+                shuffle=False)
+#モデルの構造を保存
+model_json = autoencoder.to_json()
+with open('keras_AE.json', 'w') as json_file:
+    json_file.write(model_json)
+#学習済みモデルの重みを保存
+autoencoder.save_weights('AE.h5')
+```