tensorflow0.12.1で学習プログラムを実行したら『TypeError: cannot use a bytes pattern on a string-like object』と表示される

前提・実現したいこと

Python3.5.4とtensorflow0.12.1、janome0.3.10で以下のリンク
https://oimeg.blogspot.jp/2016/11/tensorflow_80.html
https://oimeg.blogspot.com/2016/11/tensorflow_15.html
のような学習をしようと実行したら以下のエラーが発生しました。
初心者であまりさわってこなかったため、解決方がわかりません。
特定の期間内にこの学習を実行したく、かなり切羽詰まっているため、どうかお願い致します。

発生している問題・エラーメッセージ

C:\Users\○○>cd Documents

C:\Users\○○\Documents>Python chatbot.py
Preparing LINE talk data in line_talk_data
Tokenizing data in line_talk_data\line_talk_train.out
Traceback (most recent call last):
  File "chatbot.py", line 311, in <module>
    tf.app.run()
  File "C:\Users\○○\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\platform\app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "chatbot.py", line 308, in main
    train()
  File "chatbot.py", line 154, in train
    FLAGS.data_dir, FLAGS.en_vocab_size, FLAGS.fr_vocab_size)
  File "C:\Users\○○\Documents\data_utils.py", line 228, in prepare_line_talk_data
    data_to_token_ids(train_path + ".out", out_train_ids_path, out_vocab_path, tokenizer)
  File "C:\Users\○○\Documents\data_utils.py", line 193, in data_to_token_ids
    normalize_digits)
  File "C:\Users\○○\Documents\data_utils.py", line 159, in sentence_to_token_ids
    words = basic_tokenizer(sentence)
  File "C:\Users\○○\Documents\data_utils.py", line 61, in basic_tokenizer
    words.extend(_WORD_SPLIT.split(space_separated_fragment))
TypeError: cannot use a bytes pattern on a string-like object

使用するソースコード

　参照
https://oimeg.blogspot.com/2016/11/tensorflow_37.html
https://github.com/tensorflow/tensorflow/blob/0.12.1/tensorflow/models/rnn/translate/seq2seq_model.py

補足情報（FW/ツールのバージョンなど）

言語：Python3.5.4
その他のツール：

janome (0.3.10)
numpy (1.14.0)
pip (9.0.1)
protobuf (3.14.0)
setuptools (28.8.0)
six (1.15.0)
tensorflow (0.12.1)
wheel (0.36.2)

jbpb0

2021/01/24 06:12

https://oimeg.blogspot.com/2016/11/tensorflow_37.html を書いた方はMacを使ってるので、lineのトークから抽出した学習用データファイルの文字コードは、おそらくUTF-8になっていたのだと思いますまた、動かそうとしているコードでは学習用データファイルをバイナリモードで開いて読んでいるようなので、WindowsでもファイルをUTF-8で作らないといけないのかもしれません作成した四つの学習用データファイル(*.in, *.out)の文字コードを調べて、もしUTF-8ではないなら、UTF-8に変換してみてください

sho12

2021/01/24 06:24 編集

とりあえず、TeraPadで4ファイル読み込んで文字/改行コード指定保存を行ったら、文字コードはUTF-8NになっていたのでUTF-8にして保存しました。因みに改行コードはCR+LFでした。

jbpb0

2021/01/24 06:30

linuxや最近のMacの改行コードはLFだから、どうせならそこも合わせた方が https://qiita.com/Dace_K/items/76a1873ed4ab327254b5

sho12

2021/01/24 06:34

LFで保存いたしました。

jbpb0

2021/01/24 06:39

> 文字コードはUTF-8NになっていたのでUTF-8にして保存しました。を、うっかり見落としてました PythonでのUTF-8は、TeraPadでのUTF-8Nのことです UTF-8Nに戻してくださいただし、改行コードはMacに合わせてLFででも、それだと、エラーの原因は文字コードじゃなさそうですね

sho12

2021/01/24 06:45

同じエラーでした(UTF-8N)

sho12

2021/01/24 07:02 編集

そういえば、分かち書きを行う際、https://oimeg.blogspot.com/2016/11/tensorflow_80.htmlを元にやろうとしたのですが、自分のやり方が間違っていたのかエラーになったため、https://gist.github.com/erigithub/1b960ebbedd0a63ea2700d083739fe4fを元に、 from janome.tokenizer import Tokenizer book = open(r"C:/Users/○○/Documents/入力部分.txt", "rt", encoding="utf-8") text = book.read() tok = Tokenizer() with open(r"C:/Users/○○/Documents/line_talk.in.txt", "w") as fp: for token in tok.tokenize(text, wakati=True): fp.write(str(token)) fp.write("\n") という風に実行して、前に指摘されたように縦から横1行に手動で修正したのですが、関係ありますか？

sho12

2021/01/24 06:59 編集

因みに失敗時はこんな感じです(入力部分のテキストと同じ内容をコピーしたinファイルを使用して再現) >>> from janome.tokenizer import Tokenizer >>> with open('C:/Users/○○/Documents/take.in', mode = 'w') as fw: ...(スペース) t = Tokenizer() ...(スペース)for line in inputs: ...(スペース)(スペース)tokens = t.tokenize(line) ...(スペース)(スペース)line = ' '.join([token.surface for token in tokens]).encode('utf-8') + '\n' ...(スペース)(スペース)fw.write(line) ... Traceback (most recent call last): File "<stdin>", line 3, in <module> NameError: name 'inputs' is not defined

jbpb0

2021/01/24 07:03 編集

> 縦から横1行に修正した横1列と言っても、それはlineのトークの一つの文が、であって、lineの別の文は別の行に分かれてるのですよね？それはそうと、 > with open(r"C:/Users/1718081/Documents/line_talk.in.txt", "w") as fp: やっぱり自分で「.txt」付けてたではないか！！ https://teratail.com/questions/317894 の原因

sho12

2021/01/24 07:19 編集

すいません。当時は普通に.txtつけてテキストファイルから分かち書きして別のテキストファイルに出力してました。＞横1列と言っても、それはlineのトークの一つの文が、であって、lineの別の文は別の行に分かれてるのですよね？はい。一文の単語を間隔を空けて一行に入れ、他の文は他の行にしてます。例：今日は晴だよ ○○ ○○○ ○○

sho12

2021/01/24 07:09 編集

因みに上記の分かち書きコード失敗時再現のやつはちゃんとinファイルにしてから実行してます。

sho12

2021/01/24 08:01 編集

https://oimeg.blogspot.com/2016/11/tensorflow_80.htmlの分かち書きのやり方、inputsできていないからって言われてたんですけどやり方がよくわからなくて、結局.txt消す方法分かった今でもこのやり方で分かち書きできないんですよね。

jbpb0

2021/01/24 08:02

原因を切り分けるために、学習用の四つのファイルを日本語が無いものに取り替えてから実行して、同じエラーが出るか確認してくださいファイルの内容は、半角のアルファベットか数字だけにしてください文としての意味は無くてもいいですファイル内の形式だけ合ってれば分かち書き後のように、行の中に適当に半角スペースを入れてください行数は、あまり無くてもいいです改行コードはLFで

jbpb0

2021/01/24 08:10 編集

> NameError: name 'inputs' is not defined 分かち書き前の入力ファイルもopen()して、それをinputsにするのだと思う下記の「fw」と同様にして > with open('C:/Users/○○/Documents/take.in', mode = 'w') as fw: なお、入力ファイルのopen()はmode='r' ただし、正しい形式のファイルがちゃんとできてるのなら、別の方法でも構いませんが

sho12

2021/01/24 08:22 編集

関係あるかわかりませんが、devの２つファイルだけLFで保存しても読み込んで確認するとまたCR+LFになってました。ちなみに、UTF=8でBOMのありなしとかあった場合どちらの方がいいですか？

jbpb0

2021/01/24 08:24

BOM無しです

jbpb0

2021/01/24 08:42

エラーを解析するためのコードを回答に書きました伏せ字「XXX」だけ直して、あとはそのままで実行して、表示されたものを貼り付けてくださいたぶんエラーが出ますが、エラー以外も含めて表示されたものは全部貼ってくださいなお、学習データファイルは、日本語が入ってても入ってなくても、どちらでも構いません

sho12

2021/01/24 08:54 編集

仮の適当な半角英数字(trainに27文、devに3文、それぞれinとout内の文字の内容は一緒)な4ファイル(UTF-8に変換したが、4ファイルともSJISになってた)を使って回答のプログラム(タイトル:tt.py)を実行しました↓回答への返信

jbpb0

2021/01/24 09:04

貼っていただいた結果を見ると、バイナリモードでファイルを読んでないです gfile.GFile(data_path, mode="rb") のmodeの"rb"の"b"がなぜか無効になってるようですそのため、バイナリデータを処理するコードと不整合が起きて、 > cannot use a bytes pattern on a string-like object となってます

jbpb0

2021/01/24 09:10

テストコードの先頭に import tensorflow as tf を追加して、 with gfile.GFile(data_path, mode="rb") as data_file: を with tf.gfile.GFile(data_path, mode="rb") as data_file: に変えて実行して、結果を貼ってください上記でもエラーが出たら、その行を with open(data_path, mode="rb") as data_file: に変えてみてください

sho12

2021/01/24 09:23 編集

前者がエラーだったので両方実行しました↓ 後者の結果がコメントに収まりきらなかったので3分割して投稿しました。

jbpb0

2021/01/24 09:25

open(... だとバイナリモードで読んで、エラーが出ずに進みますね

jbpb0

2021/01/24 09:32

「data_utils.py」の5箇所にある gfile.GFile( を全て open( に書き換えてくださいそうすれば、バイナリモードの指定が有効なままになり、質問のエラーが出なくなると思います

sho12

2021/01/24 09:48 編集

英数表記のやつのままで実行。途中までは参考元のように動きましたが途中でエラーが起きました。↓回答への返信あとchatbot.pyにも「gfile.GFile(」が二箇所ありましたがそれは大丈夫なのでしょうか。

jbpb0

2021/01/24 10:05

「chatbot.py」ではモードがバイナリではないので、そのままでもいいかなと思ったのですが、気になるのでしたらopen()に変えても大丈夫だと思います変える場合は、 tf.gfile.GFile( を open( に変えてください

jbpb0

2021/01/24 10:17

質問のエラー > TypeError: cannot use a bytes pattern on a string-like object が出なくなったので、以降は別の質問にしてください

sho12

2021/01/24 10:24

新しく質問を投稿しました。 https://teratail.com/questions/318178?modal=q-comp どうぞよろしくお願いいたします。次の質問の話題も兼ねて少し聞きたいのですが。chatbot.py内を変更&trainとdevを元に戻して実行しました。↓さっきと少し違うエラーが起きましたがchatbot.pyの方は元に戻した方がいいですか？ C:\Users\○○\Documents>Python chatbot.py Preparing LINE talk data in line_talk_data Creating vocabulary line_talk_data\vocab40000.out from data line_talk_data\line_talk_train.out Creating vocabulary line_talk_data\vocab40000.in from data line_talk_data\line_talk_train.in Tokenizing data in line_talk_data\line_talk_train.out Tokenizing data in line_talk_data\line_talk_train.in Tokenizing data in line_talk_data\line_talk_dev.out Tokenizing data in line_talk_data\line_talk_dev.in Creating 3 layers of 256 units. WARNING:tensorflow:From C:\Users\○○\Documents\seq2seq_model.py:186 in __init__.: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. Created model with fresh parameters. WARNING:tensorflow:From chatbot.py:146 in create_model.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Use `tf.global_variables_initializer` instead. Reading development and training data (limit: 0). Traceback (most recent call last): File "chatbot.py", line 311, in <module> tf.app.run() File "C:\Users\○○\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\platform\app.py", line 43, in run sys.exit(main(sys.argv[:1] + flags_passthrough)) File "chatbot.py", line 308, in main train() File "chatbot.py", line 164, in train dev_set = read_data(in_dev, out_dev) File "chatbot.py", line 105, in read_data with tf.open(source_path, mode="r") as source_file: AttributeError: module 'tensorflow' has no attribute 'open'

jbpb0

2021/01/24 10:28 編集

> with tf.open(source_path, mode="r") as source_file: tf.open( じゃないですって tf.gfile.GFile( を open( に変えてくださいって書きましたよねちゃんと読んでください

行動規範の内容に同意します

回答1件

ベストアンサー

cannot use a bytes pattern on a string-like object

python
1gfile.GFile(data_path, mode="rb")

のmodeの"rb"の"b"がなぜか効かず、バイナリモードでファイルが読まれず、バイナリデータを処理するコードと不整合が起きているようです

「data_utils.py」内の

python
1gfile.GFile()

を全部

python
1open()

に置き換えてみてください

投稿2021/01/24 08:39

編集2021/01/24 10:15

jbpb0

総合スコア7653

sho12

2021/01/24 08:53

C:\○○>cd Documents C:\Users\○○\Documents>Python tt.py tokenizing line 1 sentence: SAiaebie isdbawe idbkseur sentence.strip().split(): ['SAiaebie', 'isdbawe', 'idbkseur'] space_separated_fragment: SAiaebie Traceback (most recent call last): File "tt.py", line 15, in <module> words.extend(_WORD_SPLIT.split(space_separated_fragment)) TypeError: cannot use a bytes pattern on a string-like object

sho12

2021/01/24 09:20

①with tf.gfile.GFile(data_path, mode="rb") as data_file: の場合 C:\Users\○○>cd Documents C:\Users\○○\Documents>Python tt.py tokenizing line 1 sentence: SAiaebie isdbawe idbkseur sentence.strip().split(): ['SAiaebie', 'isdbawe', 'idbkseur'] space_separated_fragment: SAiaebie Traceback (most recent call last): File "tt.py", line 16, in <module> words.extend(_WORD_SPLIT.split(space_separated_fragment)) TypeError: cannot use a bytes pattern on a string-like object

sho12

2021/01/24 09:21

②with open(data_path, mode="rb") as data_file:　の場合 C:\Users\○○>cd Documents C:\Users\○○\Documents>Python tt.py tokenizing line 1 sentence: b'SAiaebie isdbawe idbkseur\n' sentence.strip().split(): [b'SAiaebie', b'isdbawe', b'idbkseur'] space_separated_fragment: b'SAiaebie' WORD_SPLIT.split: [b'SAiaebie'] words: [b'SAiaebie'] space_separated_fragment: b'isdbawe' WORD_SPLIT.split: [b'isdbawe'] words: [b'SAiaebie', b'isdbawe'] space_separated_fragment: b'idbkseur' WORD_SPLIT.split: [b'idbkseur'] words: [b'SAiaebie', b'isdbawe', b'idbkseur'] tokenizing line 2 sentence: b'\n' sentence.strip().split(): [] tokenizing line 3 sentence: b'\n' sentence.strip().split(): [] tokenizing line 4 sentence: b'aebuewgfuavueaw\n' sentence.strip().split(): [b'aebuewgfuavueaw'] space_separated_fragment: b'aebuewgfuavueaw' WORD_SPLIT.split: [b'aebuewgfuavueaw'] words: [b'aebuewgfuavueaw'] tokenizing line 5 sentence: b'\n' sentence.strip().split(): [] tokenizing line 6 sentence: b'\n' sentence.strip().split(): [] tokenizing line 7 sentence: b'bueawaeieif\n' sentence.strip().split(): [b'bueawaeieif'] space_separated_fragment: b'bueawaeieif' WORD_SPLIT.split: [b'bueawaeieif'] words: [b'bueawaeieif'] tokenizing line 8 sentence: b'\n' sentence.strip().split(): [] tokenizing line 9 sentence: b'\n' sentence.strip().split(): [] tokenizing line 10 sentence: b'buefvaiwefvue\n' sentence.strip().split(): [b'buefvaiwefvue'] space_separated_fragment: b'buefvaiwefvue' WORD_SPLIT.split: [b'buefvaiwefvue'] words: [b'buefvaiwefvue'] tokenizing line 11 sentence: b'\n' sentence.strip().split(): [] tokenizing line 12 sentence: b'\n' sentence.strip().split(): [] tokenizing line 13 sentence: b'avefuav ueabfiaew beauffbewfviewf\n' sentence.strip().split(): [b'avefuav', b'ueabfiaew', b'beauffbewfviewf'] space_separated_fragment: b'avefuav' WORD_SPLIT.split: [b'avefuav']

sho12

2021/01/24 09:22

words: [b'avefuav'] space_separated_fragment: b'ueabfiaew' WORD_SPLIT.split: [b'ueabfiaew'] words: [b'avefuav', b'ueabfiaew'] space_separated_fragment: b'beauffbewfviewf' WORD_SPLIT.split: [b'beauffbewfviewf'] words: [b'avefuav', b'ueabfiaew', b'beauffbewfviewf'] tokenizing line 14 sentence: b'\n' sentence.strip().split(): [] tokenizing line 15 sentence: b'\n' sentence.strip().split(): [] tokenizing line 16 sentence: b'wqb bfeajvfaew bfejv\n' sentence.strip().split(): [b'wqb', b'bfeajvfaew', b'bfejv'] space_separated_fragment: b'wqb' WORD_SPLIT.split: [b'wqb'] words: [b'wqb'] space_separated_fragment: b'bfeajvfaew' WORD_SPLIT.split: [b'bfeajvfaew'] words: [b'wqb', b'bfeajvfaew'] space_separated_fragment: b'bfejv' WORD_SPLIT.split: [b'bfejv'] words: [b'wqb', b'bfeajvfaew', b'bfejv'] tokenizing line 17 sentence: b'\n' sentence.strip().split(): [] tokenizing line 18 sentence: b'\n' sentence.strip().split(): [] tokenizing line 19 sentence: b'fbeueafue ifeb537rqgfu asbae\n' sentence.strip().split(): [b'fbeueafue', b'ifeb537rqgfu', b'asbae'] space_separated_fragment: b'fbeueafue' WORD_SPLIT.split: [b'fbeueafue'] words: [b'fbeueafue'] space_separated_fragment: b'ifeb537rqgfu' WORD_SPLIT.split: [b'ifeb537rqgfu'] words: [b'fbeueafue', b'ifeb537rqgfu'] space_separated_fragment: b'asbae' WORD_SPLIT.split: [b'asbae'] words: [b'fbeueafue', b'ifeb537rqgfu', b'asbae'] tokenizing line 20 sentence: b'\n' sentence.strip().split(): [] tokenizing line 21 sentence: b'\n' sentence.strip().split(): [] tokenizing line 22 sentence: b'dbcbddj bafie wubfeajfew baufwa\n' sentence.strip().split(): [b'dbcbddj', b'bafie', b'wubfeajfew', b'baufwa'] space_separated_fragment: b'dbcbddj' WORD_SPLIT.split: [b'dbcbddj'] words: [b'dbcbddj'] space_separated_fragment: b'bafie' WORD_SPLIT.split: [b'bafie'] words: [b'dbcbddj', b'bafie'] space_separated_fragment: b'wubfeajfew' WORD_SPLIT.split: [b'wubfeajfew'] words: [b'dbcbddj', b'bafie', b'wubfeajfew'] space_separated_fragment: b'baufwa' WORD_SPLIT.split: [b'baufwa'] words: [b'dbcbddj', b'bafie', b'wubfeajfew', b'baufwa'] tokenizing line 23 sentence: b'\n' sentence.strip().split(): [] tokenizing line 24 sentence: b'\n' sentence.strip().split(): [] tokenizing line 25 sentence: b'qeufaewuf3245 wajda wde ie\n' sentence.strip().split(): [b'qeufaewuf3245', b'wajda', b'wde', b'ie'] space_separated_fragment: b'qeufaewuf3245' WORD_SPLIT.split: [b'qeufaewuf3245'] words: [b'qeufaewuf3245'] space_separated_fragment: b'wajda' WORD_SPLIT.split: [b'wajda'] words: [b'qeufaewuf3245', b'wajda'] space_separated_fragment: b'wde' WORD_SPLIT.split: [b'wde'] words: [b'qeufaewuf3245', b'wajda', b'wde'] space_separated_fragment: b'ie' WORD_SPLIT.split: [b'ie'] words: [b'qeufaewuf3245', b'wajda', b'wde', b'ie'] tokenizing line 26 sentence: b'\n' sentence.strip().split(): [] tokenizing line 27 sentence: b'\n' sentence.strip().split(): [] tokenizing line 28 sentence: b'wduwvfuw afwuefaeaef eawfewif\n' sentence.strip().split(): [b'wduwvfuw', b'afwuefaeaef', b'eawfewif'] space_separated_fragment: b'wduwvfuw' WORD_SPLIT.split: [b'wduwvfuw'] words: [b'wduwvfuw'] space_separated_fragment: b'afwuefaeaef' WORD_SPLIT.split: [b'afwuefaeaef'] words: [b'wduwvfuw', b'afwuefaeaef'] space_separated_fragment: b'eawfewif' WORD_SPLIT.split: [b'eawfewif'] words: [b'wduwvfuw', b'afwuefaeaef', b'eawfewif'] tokenizing line 29 sentence: b'\n' sentence.strip().split(): [] tokenizing line 30 sentence: b'\n' sentence.strip().split(): [] tokenizing line 31 sentence: b'awefe weaf\n' sentence.strip().split(): [b'awefe', b'weaf'] space_separated_fragment: b'awefe' WORD_SPLIT.split: [b'awefe'] words: [b'awefe'] space_separated_fragment: b'weaf' WORD_SPLIT.split: [b'weaf'] words: [b'awefe', b'weaf'] tokenizing line 32 sentence: b'\n' sentence.strip().split(): [] tokenizing line 33 sentence: b'\n' sentence.strip().split(): [] tokenizing line 34 sentence: b'awfeawe aewgr\n' sentence.strip().split(): [b'awfeawe', b'aewgr'] space_separated_fragment: b'awfeawe' WORD_SPLIT.split: [b'awfeawe'] words: [b'awfeawe'] space_separated_fragment: b'aewgr' WORD_SPLIT.split: [b'aewgr'] words: [b'awfeawe', b'aewgr'] tokenizing line 35 sentence: b'\n' sentence.strip().split(): [] tokenizing line 36 sentence: b'\n' sentence.strip().split(): [] tokenizing line 37 sentence: b'awfagrwr aewef\n' sentence.strip().split(): [b'awfagrwr', b'aewef'] space_separated_fragment: b'awfagrwr' WORD_SPLIT.split: [b'awfagrwr'] words: [b'awfagrwr'] space_separated_fragment: b'aewef' WORD_SPLIT.split: [b'aewef'] words: [b'awfagrwr', b'aewef']

sho12

2021/01/24 09:22

tokenizing line 38 sentence: b'\n' sentence.strip().split(): [] tokenizing line 39 sentence: b'\n' sentence.strip().split(): [] tokenizing line 40 sentence: b'aewfefu garg\n' sentence.strip().split(): [b'aewfefu', b'garg'] space_separated_fragment: b'aewfefu' WORD_SPLIT.split: [b'aewfefu'] words: [b'aewfefu'] space_separated_fragment: b'garg' WORD_SPLIT.split: [b'garg'] words: [b'aewfefu', b'garg'] tokenizing line 41 sentence: b'\n' sentence.strip().split(): [] tokenizing line 42 sentence: b'\n' sentence.strip().split(): [] tokenizing line 43 sentence: b'aewfea agwgr\n' sentence.strip().split(): [b'aewfea', b'agwgr'] space_separated_fragment: b'aewfea' WORD_SPLIT.split: [b'aewfea'] words: [b'aewfea'] space_separated_fragment: b'agwgr' WORD_SPLIT.split: [b'agwgr'] words: [b'aewfea', b'agwgr'] tokenizing line 44 sentence: b'\n' sentence.strip().split(): [] tokenizing line 45 sentence: b'\n' sentence.strip().split(): [] tokenizing line 46 sentence: b'aeafgwhgr ewgawg\n' sentence.strip().split(): [b'aeafgwhgr', b'ewgawg'] space_separated_fragment: b'aeafgwhgr' WORD_SPLIT.split: [b'aeafgwhgr'] words: [b'aeafgwhgr'] space_separated_fragment: b'ewgawg' WORD_SPLIT.split: [b'ewgawg'] words: [b'aeafgwhgr', b'ewgawg'] tokenizing line 47 sentence: b'\n' sentence.strip().split(): [] tokenizing line 48 sentence: b'\n' sentence.strip().split(): [] tokenizing line 49 sentence: b'aguewgiaewg wfw wufew\n' sentence.strip().split(): [b'aguewgiaewg', b'wfw', b'wufew'] space_separated_fragment: b'aguewgiaewg' WORD_SPLIT.split: [b'aguewgiaewg'] words: [b'aguewgiaewg'] space_separated_fragment: b'wfw' WORD_SPLIT.split: [b'wfw'] words: [b'aguewgiaewg', b'wfw'] space_separated_fragment: b'wufew' WORD_SPLIT.split: [b'wufew'] words: [b'aguewgiaewg', b'wfw', b'wufew'] tokenizing line 50 sentence: b'\n' sentence.strip().split(): [] tokenizing line 51 sentence: b'\n' sentence.strip().split(): [] tokenizing line 52 sentence: b'qaewj wqfe vwquwf\n' sentence.strip().split(): [b'qaewj', b'wqfe', b'vwquwf'] space_separated_fragment: b'qaewj' WORD_SPLIT.split: [b'qaewj'] words: [b'qaewj'] space_separated_fragment: b'wqfe' WORD_SPLIT.split: [b'wqfe'] words: [b'qaewj', b'wqfe'] space_separated_fragment: b'vwquwf' WORD_SPLIT.split: [b'vwquwf'] words: [b'qaewj', b'wqfe', b'vwquwf'] tokenizing line 53 sentence: b'\n' sentence.strip().split(): [] tokenizing line 54 sentence: b'\n' sentence.strip().split(): [] tokenizing line 55 sentence: b'qwduqwfg fw uewfv\n' sentence.strip().split(): [b'qwduqwfg', b'fw', b'uewfv'] space_separated_fragment: b'qwduqwfg' WORD_SPLIT.split: [b'qwduqwfg'] words: [b'qwduqwfg'] space_separated_fragment: b'fw' WORD_SPLIT.split: [b'fw'] words: [b'qwduqwfg', b'fw'] space_separated_fragment: b'uewfv' WORD_SPLIT.split: [b'uewfv'] words: [b'qwduqwfg', b'fw', b'uewfv'] tokenizing line 56 sentence: b'\n' sentence.strip().split(): [] tokenizing line 57 sentence: b'\n' sentence.strip().split(): [] tokenizing line 58 sentence: b'fawefeif efgeu eiev \n' sentence.strip().split(): [b'fawefeif', b'efgeu', b'eiev'] space_separated_fragment: b'fawefeif' WORD_SPLIT.split: [b'fawefeif'] words: [b'fawefeif'] space_separated_fragment: b'efgeu' WORD_SPLIT.split: [b'efgeu'] words: [b'fawefeif', b'efgeu'] space_separated_fragment: b'eiev' WORD_SPLIT.split: [b'eiev'] words: [b'fawefeif', b'efgeu', b'eiev'] tokenizing line 59 sentence: b'\n' sentence.strip().split(): [] tokenizing line 60 sentence: b'\n' sentence.strip().split(): [] tokenizing line 61 sentence: b'aewu awefe aewfe\n' sentence.strip().split(): [b'aewu', b'awefe', b'aewfe'] space_separated_fragment: b'aewu' WORD_SPLIT.split: [b'aewu'] words: [b'aewu'] space_separated_fragment: b'awefe' WORD_SPLIT.split: [b'awefe'] words: [b'aewu', b'awefe'] space_separated_fragment: b'aewfe' WORD_SPLIT.split: [b'aewfe'] words: [b'aewu', b'awefe', b'aewfe'] tokenizing line 62 sentence: b'\n' sentence.strip().split(): [] tokenizing line 63 sentence: b'\n' sentence.strip().split(): [] tokenizing line 64 sentence: b'awbefuai ieagu ieiwf\n' sentence.strip().split(): [b'awbefuai', b'ieagu', b'ieiwf'] space_separated_fragment: b'awbefuai' WORD_SPLIT.split: [b'awbefuai'] words: [b'awbefuai'] space_separated_fragment: b'ieagu' WORD_SPLIT.split: [b'ieagu'] words: [b'awbefuai', b'ieagu'] space_separated_fragment: b'ieiwf' WORD_SPLIT.split: [b'ieiwf'] words: [b'awbefuai', b'ieagu', b'ieiwf'] tokenizing line 65 sentence: b'\n' sentence.strip().split(): [] tokenizing line 66 sentence: b'\n' sentence.strip().split(): [] tokenizing line 67 sentence: b'ewfuwefa bfefgyw fei\n' sentence.strip().split(): [b'ewfuwefa', b'bfefgyw', b'fei'] space_separated_fragment: b'ewfuwefa' WORD_SPLIT.split: [b'ewfuwefa'] words: [b'ewfuwefa'] space_separated_fragment: b'bfefgyw' WORD_SPLIT.split: [b'bfefgyw'] words: [b'ewfuwefa', b'bfefgyw'] space_separated_fragment: b'fei' WORD_SPLIT.split: [b'fei'] words: [b'ewfuwefa', b'bfefgyw', b'fei'] tokenizing line 68 sentence: b'\n' sentence.strip().split(): [] tokenizing line 69 sentence: b'\n' sentence.strip().split(): [] tokenizing line 70 sentence: b'faua bufbf ebfuewf\n' sentence.strip().split(): [b'faua', b'bufbf', b'ebfuewf'] space_separated_fragment: b'faua' WORD_SPLIT.split: [b'faua'] words: [b'faua'] space_separated_fragment: b'bufbf' WORD_SPLIT.split: [b'bufbf'] words: [b'faua', b'bufbf'] space_separated_fragment: b'ebfuewf' WORD_SPLIT.split: [b'ebfuewf'] words: [b'faua', b'bufbf', b'ebfuewf'] tokenizing line 71 sentence: b'\n' sentence.strip().split(): [] tokenizing line 72 sentence: b'\n' sentence.strip().split(): [] tokenizing line 73 sentence: b'fuef ubaf biafbie\n' sentence.strip().split(): [b'fuef', b'ubaf', b'biafbie'] space_separated_fragment: b'fuef' WORD_SPLIT.split: [b'fuef'] words: [b'fuef'] space_separated_fragment: b'ubaf' WORD_SPLIT.split: [b'ubaf'] words: [b'fuef', b'ubaf'] space_separated_fragment: b'biafbie' WORD_SPLIT.split: [b'biafbie'] words: [b'fuef', b'ubaf', b'biafbie'] tokenizing line 74 sentence: b'\n' sentence.strip().split(): [] tokenizing line 75 sentence: b'\n' sentence.strip().split(): [] tokenizing line 76 sentence: b'fewa 37f s' sentence.strip().split(): [b'fewa', b'37f', b's'] space_separated_fragment: b'fewa' WORD_SPLIT.split: [b'fewa'] words: [b'fewa'] space_separated_fragment: b'37f' WORD_SPLIT.split: [b'37f'] words: [b'fewa', b'37f'] space_separated_fragment: b's' WORD_SPLIT.split: [b's'] words: [b'fewa', b'37f', b's']

sho12

2021/01/24 09:47

C:\Users\○○\Documents>Python chatbot.py Preparing LINE talk data in line_talk_data Tokenizing data in line_talk_data\line_talk_train.out Tokenizing data in line_talk_data\line_talk_train.in Tokenizing data in line_talk_data\line_talk_dev.out Tokenizing data in line_talk_data\line_talk_dev.in Creating 3 layers of 256 units. WARNING:tensorflow:From C:\Users\○○\Documents\seq2seq_model.py:186 in __init__.: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. Created model with fresh parameters. WARNING:tensorflow:From chatbot.py:146 in create_model.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Use `tf.global_variables_initializer` instead. Reading development and training data (limit: 0). Traceback (most recent call last): File "chatbot.py", line 311, in <module> tf.app.run() File "C:\Users\○○\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\platform\app.py", line 43, in run sys.exit(main(sys.argv[:1] + flags_passthrough)) File "chatbot.py", line 308, in main train() File "chatbot.py", line 166, in train train_bucket_sizes = [len(train_set[b]) for b in xrange(len(_buckets))] NameError: name 'xrange' is not defined