python テキストファイルを読み込み、指定した文字列で始まる行からn行取得

Question

## 実現したいこと
pythonでテキストファイルを読み込み、「指定した文字列で始まる行からn行を取得するプログラム」
を作成しようとしています。
理想は指定した文字列から始まるn行を取得し、1行1行が配列に格納されるようにしたいです。

## イメージ
下記内容のsample.txtがあるとします。
```
#catalog of fruits
apple is red
banana is sweet
orange is sour
strawberry
grape
lemon
#
#
#catalog of sports
soccer
baseball
basketball
volleyball
tennis
golf
#
#
```
pythonでリスト型変数fruitsに「#catalog of fruits」から始まる6行を格納したいです。
```
with open(sample.txt) as f:
    lines = f.readlines()
    
    何らかの処理

    print(type(fruits))
    print(fruits)

【実行結果】
<class 'list'>
["apple is red", "banana is sweet", "orange is sour", "strawberry", "grape", "lemon"]
```

## 現状
```
with open(sample.txt) as f:
    lines = f.readlines()

    lines_strip = [line.strip() for line in lines]

    str1 = [line for line in lines_strip if line.startswith("apple")]
    str2 = [line for line in lines_strip if line.startswith("banana")]
    str3 = [line for line in lines_strip if line.startswith("orange")]
    str4 = [line for line in lines_strip if line.startswith("strawberry")]
    str5 = [line for line in lines_strip if line.startswith("grape")]
    str6 = [line for line in lines_strip if line.startswith("lemon")]
    
    fruits = []
    fruits.append(str1)
    fruits.append(str2)
    fruits.append(str3)
    fruits.append(str4)
    fruits.append(str5)
    fruits.append(str6)

    print(type(fruits))
    print(fruits)
    
```
これでも良いのですが、行数が増えると大変になってしまうので、上記イメージのように
「指定した文字列で始まる行からn行」を配列におさめる方法を知りたいです。
もし、ご存じの方がいらっしゃれば、ご教授お願いいたします。

## 環境
python 3.8.5
windows10 64bit
テキストエディタ：Visual Studio Code

Accepted Answer

こんな処理にするのはいかが？
空行を読み飛ばす処理を追加しました。

```python
def pickup(genre):
    target = f'#catalog of {genre}'
    with open("sample.txt") as f:
        while f.readline().strip() != target:
            pass
        while True:
            data = f.readline().strip()
            if data == '':
                continue
            if data[0] == '#':
                break
            yield data


def main():
    fruits = list(pickup('fruits'))
    print(type(fruits))
    print(fruits)

if __name__ == '__main__':
    main()
```

Answer

示されているコードの内容とは違いますが、『「#catalog of fruits」から始まる6行』を取りたいのであれば、forやwhileなどの繰り返しを使います。

ポイントは、「先頭行をみつける」処理と、「6行取得する」処理を分けることです。

```pythan
fruits = []

with open('sample.txt') as f:
    for l in f:
        l = l.replace('
', '')
        if l == '#catalog of fruits':
            for num in range(6):
                fluits_line = f.readline().replace('
','')
                fruits.append(fluits_line)

print(fruits)
```
この処理では、「#catalog...」の行があったら、そこから6回ループを回して取り出しながらリストに追加する処理をしています。

この先他の部分も取り出したいとか、取得する行を動的に変更したいなどあれば、構造を検討するうがあるかもしれません。

Answer

質問者様の要望に近いと思われるコードです。

- リストから検索するにはリスト内包表現`[i for iという記述`を使う。結果はリストになる。
- リストから部分リストを抜き出すには`リスト名[開始インデックス:終了インデックスの次]`という書き方をする。

という点が、Pythonでよく使うテクニックですので、参考にしてください。

```Python
with open('sample.txt') as f:  # 元のコードは引用符を忘れています
	lines = f.read().splitlines()  # 元のコードだと改行が入ってしまいます

	n = 6
	match_str = '#catalog of fruits'
	print(lines)
	match_indexes = [i for i, x in enumerate(lines) if x == match_str]
	if len(match_indexes) > 0:
		match_index = match_indexes[0]  # 複数一致は考慮していません
		fruits = lines[match_index+1:match_index+n+1]  # リスト末端超過のエラー処理はしていません

		print(type(fruits))
		print(fruits)
```

Answer

> これでも良いのですが、行数が増えると大変になってしまうので、

どういうふうに大変になるんでしょうか。
ファイルサイズが増えてメモリに読み込めない、というのであるなら、
１行づつ読み込んで、その行が条件に合致すれば、以降ｎ行取得、とすればいいはなしですが。

実現したいこと

イメージ

現状

環境

関連した質問