python テキストファイルを読み込み、指定した文字列で始まる行からn行取得

実現したいこと

pythonでテキストファイルを読み込み、「指定した文字列で始まる行からn行を取得するプログラム」
を作成しようとしています。
理想は指定した文字列から始まるn行を取得し、1行1行が配列に格納されるようにしたいです。

イメージ

下記内容のsample.txtがあるとします。

#catalog of fruits
apple is red
banana is sweet
orange is sour
strawberry
grape
lemon
#
#
#catalog of sports
soccer
baseball
basketball
volleyball
tennis
golf
#
#

pythonでリスト型変数fruitsに「#catalog of fruits」から始まる6行を格納したいです。

with open(sample.txt) as f:
    lines = f.readlines()
    
    何らかの処理

    print(type(fruits))
    print(fruits)

【実行結果】
<class 'list'>
["apple is red", "banana is sweet", "orange is sour", "strawberry", "grape", "lemon"]

現状

with open(sample.txt) as f:
    lines = f.readlines()

    lines_strip = [line.strip() for line in lines]

    str1 = [line for line in lines_strip if line.startswith("apple")]
    str2 = [line for line in lines_strip if line.startswith("banana")]
    str3 = [line for line in lines_strip if line.startswith("orange")]
    str4 = [line for line in lines_strip if line.startswith("strawberry")]
    str5 = [line for line in lines_strip if line.startswith("grape")]
    str6 = [line for line in lines_strip if line.startswith("lemon")]
    
    fruits = []
    fruits.append(str1)
    fruits.append(str2)
    fruits.append(str3)
    fruits.append(str4)
    fruits.append(str5)
    fruits.append(str6)

    print(type(fruits))
    print(fruits)

これでも良いのですが、行数が増えると大変になってしまうので、上記イメージのように
「指定した文字列で始まる行からn行」を配列におさめる方法を知りたいです。
もし、ご存じの方がいらっしゃれば、ご教授お願いいたします。

環境

python 3.8.5
windows10 64bit
テキストエディタ：Visual Studio Code

行動規範の内容に同意します

回答4件

示されているコードの内容とは違いますが、『「#catalog of fruits」から始まる6行』を取りたいのであれば、forやwhileなどの繰り返しを使います。

ポイントは、「先頭行をみつける」処理と、「6行取得する」処理を分けることです。

pythan
1fruits = []
2
3with open('sample.txt') as f:
4    for l in f:
5        l = l.replace('\n', '')
6        if l == '#catalog of fruits':
7            for num in range(6):
8                fluits_line = f.readline().replace('\n','')
9                fruits.append(fluits_line)
10
11print(fruits)

この処理では、「#catalog...」の行があったら、そこから6回ループを回して取り出しながらリストに追加する処理をしています。

この先他の部分も取り出したいとか、取得する行を動的に変更したいなどあれば、構造を検討するうがあるかもしれません。

投稿2020/09/27 08:11

TakaiY

総合スコア13790

d415uke

2020/09/27 11:04

個人的に、回答の中で一番直感的に読みやすいコードでした。説明もとても分かりやすかったです！回答ありがとうございました！

行動規範の内容に同意します

質問者様の要望に近いと思われるコードです。

リストから検索するにはリスト内包表現[i for iという記述を使う。結果はリストになる。
リストから部分リストを抜き出すにはリスト名[開始インデックス:終了インデックスの次]という書き方をする。

という点が、Pythonでよく使うテクニックですので、参考にしてください。

Python
1with open('sample.txt') as f:  # 元のコードは引用符を忘れています
2	lines = f.read().splitlines()  # 元のコードだと改行が入ってしまいます
3
4	n = 6
5	match_str = '#catalog of fruits'
6	print(lines)
7	match_indexes = [i for i, x in enumerate(lines) if x == match_str]
8	if len(match_indexes) > 0:
9		match_index = match_indexes[0]  # 複数一致は考慮していません
10		fruits = lines[match_index+1:match_index+n+1]  # リスト末端超過のエラー処理はしていません
11
12		print(type(fruits))
13		print(fruits)

投稿2020/09/27 08:07

編集2020/09/27 08:30

toast-uz

総合スコア3266

d415uke

2020/09/27 11:02

なるほど、内包表現を利用する方法もあるんですね！回答ありがとうございました！

行動規範の内容に同意します

ベストアンサー

こんな処理にするのはいかが？
空行を読み飛ばす処理を追加しました。

python
1def pickup(genre):
2    target = f'#catalog of {genre}'
3    with open("sample.txt") as f:
4        while f.readline().strip() != target:
5            pass
6        while True:
7            data = f.readline().strip()
8            if data == '':
9                continue
10            if data[0] == '#':
11                break
12            yield data
13
14
15def main():
16    fruits = list(pickup('fruits'))
17    print(type(fruits))
18    print(fruits)
19
20if __name__ == '__main__':
21    main()

投稿2020/09/27 07:51

編集2020/09/27 14:00

shiracamus

総合スコア5406

d415uke

2020/09/27 09:53

回答ありがとうございます！今回の質問以上にその他勉強になるコードでした、、、、 1つ質問よろしいでしょうか。 sample.txtが下記の場合(2行ごとに1つ改行があった場合)はどのようになりますでしょうか。 #catalog of fruits apple is red banana is sweet orange is sour strawberry grape lemon # # #catalog of sports soccer baseball basketball volleyball tennis golf # #