回答編集履歴

訂正

2020/12/02 10:25

投稿

nto

スコア1438

answer CHANGED Viewed

@@ -34,8 +34,8 @@
 今回のソースの場合[機械種別]が記載されたtrにrowspanが設定されていた為
 rowspanを抽出し、rowspanが1を超える場合には
-rowspanの数だけ`.find_next_siblings()`を取得し
+`.find_next_siblings()`で兄弟要素を取得し
-先頭列に[機械種別]の記載がないデータの数(列×2行)文のデータをスライスして抽出しています。
+先頭列に[機械種別]の記載がないデータの数(列(rowspan-1)×2行)分のデータをスライスして抽出しています。
 抽出したデータの先頭に保持していたgenre変数(機械種別名)をおいてデータを整形し
 都度resultリストに追加していき、最後にdfに変換するという流れです。

追記

2020/12/02 10:25

投稿

nto

スコア1438

answer CHANGED Viewed

@@ -23,4 +23,49 @@
 df = pd.DataFrame(result)
 df.to_csv('result.csv', encoding='cp932', header=header, index=False)
+```
+### 追記
+とりあえずは以下で目的の形での取得が可能かと思います。
+ちょっと複雑で説明が難しいです。
+for文ではenumerarteで何番目かを判定しています。
+0番目(初回)のループである場合にはヘッダー行は無視し
+それ以外の場合には要素をtd要素を抽出します。
+今回のソースの場合[機械種別]が記載されたtrにrowspanが設定されていた為
+rowspanを抽出し、rowspanが1を超える場合には
+rowspanの数だけ`.find_next_siblings()`を取得し
+先頭列に[機械種別]の記載がないデータの数(列×2行)文のデータをスライスして抽出しています。
+抽出したデータの先頭に保持していたgenre変数(機械種別名)をおいてデータを整形し
+都度resultリストに追加していき、最後にdfに変換するという流れです。
+```python
+from bs4 import BeautifulSoup
+import requests
+import pandas as pd
+url = 'https://ja.nc-net.or.jp/search/equipment/?cl[]=1'
+res = requests.get(url)
+soup = BeautifulSoup(res.content, 'html.parser')
+tables = soup.find('table', id='equip1_list')
+#print(tables.prettify())
+header = [th.text for th in tables.find_all('th')]
+result = []
+for e, tr in enumerate(tables.find('tbody').find_all('tr')):
+	if e != 0:
+		genre, maker, num = [td.text for td in tr.find_all('td')]
+		span = int(tr.find_all('td')[0].get('rowspan'))
+		result.append([genre, maker, num.replace('台', '')])
+		if span > 1:
+			after = [td.text for td in tr.find_next_siblings()[0: (span-1)*2]]
+			for i in range(0, len(after), 2):
+				maker, num = after[i: i+2]
+				result.append([genre, maker, num.replace('台', '')])
+df = pd.DataFrame(result)
+df.to_csv('result.csv', encoding='cp932', header=header, index=False)
 ```