テキストを一行ずつ読み込みたい

下記のようなデータを一行ずつ読み込みたいと考えています。
そこで、コードの９行目、

python
1poems = f.read().splitlines()

を

python
1poems = f.readlines().splitlines()

に変えると、

Traceback (most recent call last):

File "/home/yudai/Desktop/keras_AE.py", line 9, in <module>
poems = f.readlines().splitlines()
AttributeError: 'list' object has no attribute 'splitlines'

と出ます。

朝霧の中に九段のともし哉
あたたかな雨が降るなり枯葎
菜の花やはつと明るき町はづれ
秋風や伊予へ流るる汐の音
長閑さや障子の穴に海見えて

python
1# coding:utf-8
2import numpy as np
3import codecs
4from keras.layers import Activation, Dense, Input
5from keras.models import Model
6
7#データの読み込み
8with open(r'/home/yudai/Desktop/poem.txt', encoding='utf-8') as f:
9    poems = f.read().splitlines()
10text = poems[0]  # 1個目のデータ
11print(text)
12# コーパスの長さ
13print('corpus length:', len(text))
14# 文字数を数えるため、textをソート
15chars = sorted(list(set(text)))
16# 全文字数の表示
17print('total chars:', len(chars))
18# 文字をID変換
19char_indices = dict((c, i) for i, c in enumerate(chars))
20# IDから文字へ変換
21indices_char = dict((i, c) for i, c in enumerate(chars))
22#テキストを17文字ずつ読み込む
23maxlen = 1
24#サンプルバッチ数
25step = 1
26sentences = []
27next_chars = []
28for i in range(0, len(text) - maxlen, step):
29    sentences.append(text[i: i + maxlen])
30    next_chars.append(text[i + maxlen])
31#学習する文字数を表示
32print('Sequences:', sentences)
33print('next_chars:', next_chars)
34#ベクトル化する
35print('Vectorization...')
36x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
37y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
38for i, sentence in enumerate(sentences):
39    for t, char in enumerate(sentence):
40        x[i, t, char_indices[char]] = 1
41    y[i, char_indices[next_chars[i]]] = 1
42#モデルを構築する工程に入る
43print('Build model...')
44#encoderの次元
45encoding_dim = 128
46#入力用の変数
47input_word = Input(shape=(maxlen, len(chars)))
48#入力された語がencodeされたものを格納する
49encoded = Dense(128, activation='relu')(input_word)
50encoded = Dense(64, activation='relu')(encoded)
51encoded = Dense(32, activation='relu')(encoded)
52#潜在変数（実質的な主成分分析）
53latent = Dense(8, activation='relu')(encoded)
54#encodeされたデータを再構成
55decoded = Dense(32, activation='relu')(latent)
56decoded = Dense(64, activation='relu')(decoded)
57decoded = Dense(12, activation='relu')(encoded)
58ae = Model(input=input_word, output=decoded)
59# #Adamで最適化、loss関数をcategorical_crossentropy
60ae.compile(optimizer='Adam', loss='categorical_crossentropy')
61ae.summary()
62
63print(x.shape)
64# #autoencoderの実行
65ae.fit(x, x,
66       epochs=500,
67       batch_size=256,
68       shuffle=False)
69
70#モデルの構造を保存
71model_json = ae.to_json()
72with open('keras_AE.json', 'w') as json_file:
73    json_file.write(model_json)
74#学習済みモデルの重みを保存
75ae.save_weights('AE.h5')

y_waiwai

2018/10/24 12:50

で、しつもんはなんでしょうか。

yep

2018/10/24 13:45

コードを一行ずつ読み込むためには、どうすればよいでしょうか？

行動規範の内容に同意します

回答3件

ベストアンサー

１行づつ読むなら、readline (readlinesではない）を使えばどうでしょうか

といいながらコピペ可能バージョン

python
1with open(r'/home/yudai/poem.txt', encoding='utf-8') as f:
2  poems = f.readline()
3  while poems:
4    print (poems)
5    poems = f.readline()

投稿2018/10/24 13:48

編集2018/10/24 14:58

y_waiwai

総合スコア87784

yep

2018/10/24 14:05

一文字しか読めませんでした。

y_waiwai

2018/10/24 14:07

それは使い方が間違ってます「python readline」でぐぐってみましょう。使い方が出てきます

yep

2018/10/24 14:12

with open(r'/home/yudai/poem.txt', encoding='utf-8') as f: poems = f.readline() while poems: print (poems) 一つの行が無限に出力されました。

y_waiwai

2018/10/24 14:16

readline 関数はファイルから１行だけ、を読み出す関数です次の行を読むなら、繰り返しreadlneを実行する必要があります poems = f.readline() while poems: 　print (poems) 　poems = f.readline()

yep

2018/10/24 14:51

File "/home/yudai/Desktop/keras_AE.py", line 11 　print(poems) ^ SyntaxError: invalid character in identifier と出力されます。

y_waiwai

2018/10/24 14:55

コピペで済まさないで、手で打ってください全角スペースが入ってるのでコピペではエラーでます提示したコードを理解して利用しましょう

yep

2018/10/25 08:29 編集

確かに一行ずつ読めました。手で打ちました。ありがとうございました。

行動規範の内容に同意します

求める処理に合っているかちゃんと確認できていませんが
以下にて行毎に空白で区切られた単語のリストを取得できます。

Python
1with open( 'poem.txt', encoding='utf-8') as f:
2    poems = f.readlines()
3    print( poems) # ['朝霧 の 中 に 九段 の ともし 哉\n', 'あたたか な 雨 が 降る なり 枯葎\n', '菜の花 や は つと 明るき 町 は づれ\n', '秋風 や 伊予 へ 流る る 汐 の 音\n', '長閑 さ や 障子 の 穴 に 海 見え て\n']
4    for p in poems:
5        s = p.rstrip() # 改行を除去
6        s = s.split(' ')
7        print( s) # ['朝霧', 'の', '中', 'に', '九段', 'の', 'ともし', '哉']