'NoneType' object has no attribute 'find

前提・実現したい事

お忙しい中、見て頂き有難うございます。
競馬のホームページから情報をスクレイピングする際に下記のようなエラーメッセージが出てしまいます。

ソースコード

py
1def scrape_race_info(race_id_list):
2  race_infos = {}
3  for race_id in tqdm(race_id_list):
4    url = 'https://db.netkeiba.com/race/' + race_id
5
6    html = requests.get(url)
7    html.encoding = 'EUC-JP'
8    soup = BeautifulSoup(html.text, 'html.parser')
9
10    texts = soup.find('div', attrs={'class': 'data_intro'}).find_all('p')[0].text + soup.find('div', attrs={'class': 'data_intro'}).find_all('p')[1].text
11
12    info = re.findall('\w+', texts)
13
14    info_dict = {}
15    for text in info:
16      try:
17        if text in ['芝', 'ダート']:
18          info_dict['race_type'] = text
19        if '障' in text:
20          info_dict['race_type'] = '障害'
21        if 'm' in text:
22          info_dict['course_len'] = re.findall(r'\d+', text)[0]
23        if text in ['良', '稍重', '重', '不良']:
24          info_dict['ground_state'] = text
25        if text in ['曇', '晴', '雨', '小雨', '小雪', '雪']:
26          info_dict['weather'] = text
27        if '年' in text:
28          info_dict['date'] = text
29
30        race_infos[race_id] = info_dict
31        time.sleep(1)
32      except IndexError:
33        continue
34      except:
35        break
36  df_race_infos = pd.DataFrame(race_infos).T
37  df_race_infos['course_len'].astype(int)
38  df_race_infos.to_pickle('/content/drive/MyDrive/Colab Notebooks/keiba/df_race_infos.pickle')    
39  return race_infos

エラーコード

試したこと

関数を外してコードが動作するか確認しました。find_all('p')のところも動作してスクレイピング出来ていました。
関数にするとエラーが出ます。

どうぞ宜しくお願いいたします。

追加　対象ホームページを検証で見た画面

退会済みユーザー

2022/07/04 21:16

beautifulsoupを使ったことがないのでカンで書きます。※違うかもしれないのでコメント欄で… "soup.find('div', attrs={'class': 'data_intro'})."の部分の答えがNone（該当する結果がない）のために、None.find_all("p")のような操作をしようとして「'NoneType' object has no attribute 'find_all'」となっていると思います。具体的には、 url = 'https://db.netkeiba.com/race/' + race_idのURLの内容に「div」や「data_intro」がない（WEBページ自体がほかのページとフォーマットが違う、例えば存在しないページ）ため、探った結果がNoneになってしまっていて、soupではないNoneに対してsoupの操作はできないよ、というわけです。対策：エラーが起きる部分からfor文の繰り返す部分を、下のような構成にしてください。 try: ___本来の処理 # ここにtexts = soup.find('div'以降、breakまでを格納 except Exception as err: ___print("[%s]は読み込み不良のため処理をスキップしました。"%(url)) ___print("エラー内容:",err) 参考：Pythonのエラートラップのチュートリアル https://docs.python.org/ja/3/tutorial/errors.html#handling-exceptions

cocosan

2022/07/04 22:18

お忙しい中アドバイス有難うございます。早速試してみたところ全て「[https://db.netkeiba.com/race/r]は読み込み不良のため処理をスキップしました。」のエラーメッセージになってしまいました。

行動規範の内容に同意します

回答2件

自己解決

py
1scrape_race_info(race_id_list)

で関数を実行すべきところを

py
1scrape_race_info('race_id_list')

のように余計なところに''を入れていたのが、間違いの原因でした。

情報を提供していない所に原因があり申し訳ございませんでした。
RiaFeedさん、fourteenlengthさんお忙しい中、アドバイス有難うございました。

投稿2022/07/04 22:27

cocosan

総合スコア27

soup.findで何か失敗してNoneが帰ってきてるのが原因と思われるので、
soup.findとfind_allを分けてみては

投稿2022/07/04 21:30

RiaFeed

総合スコア2703

cocosan

2022/07/04 21:54

早速のアドバイス有難うございます。下記のように関数化しないでやるとスクレイピングできるのですが、関数化すると同じエラーメッセージが出てしまいます。 race_id ='202101010101' url = 'https://db.netkeiba.com/race/' + race_id html = requests.get(url) html.encoding = 'EUC-JP' soup = BeautifulSoup(html.text, 'html.parser') text_div = soup.find('div', attrs={'class': 'data_intro'}) text_p1 =text_div.find_all('p')[0].text text_p2 = text_div.find_all('p')[1].text text_p12 = text_p1 + text_p2 print(text_p12)