このサイトのスクレイピングを教えてください。beautifusoup python

http://race.netkeiba.com/?pid=payback_list&id=p2018050301
このサイトの勝馬の全レースの単勝人気をスクレイピングで取得したいのですがうまくいきません。

１レース目のタグにはクラスがあって出来たのですが、２レース目のタグにはクラスがなく、上手くいきません。
1レース目の勝馬の単勝人気を取得したコードを下記に記します。
２レース目には同様のクラスがなく、取得できません。

取得方法をコードとともに教えてください。

python
1import urllib.request
2from bs4 import BeautifulSoup
3#指定するURL
4url = "http://race.netkeiba.com/?pid=payback_list&id=p2018050301"
5
6html = urllib.request.urlopen(url)
7soup = BeautifulSoup(html, "html.parser")
8print(soup)
9
10market = soup.find('td', class_='cellcolor_1')
11market
12print(market.text)
13

行動規範の内容に同意します

回答2件

ベストアンサー

market = soup.find('td', class_='cellcolor_1')

クラス指定が列:人気を取得するコードになっていますが・・・

◇スクレイピングの基本。
取得したい要素の上位のタグから下に手繰っていく。
HTMLのセレクタ指定と発想は同じです。

Python
1# -*- coding: utf-8 -*-
2from urllib.request import urlopen
3from bs4 import BeautifulSoup
4
5
6def main() -> None:
7    # 指定するURL
8    URL = "http://race.netkeiba.com/?pid=payback_list&id=p2018050301"
9    html = urlopen(URL)
10    soup = BeautifulSoup(html, "html.parser")
11    print(soup)
12
13    for table in soup.find_all('table', class_='race_table_01'):
14        for val in table.find_all('td', class_='txt_r'):
15            print(val.text)
16            break
17
18
19if __name__ == '__main__':
20    main()
21

◇参考情報
[netkeiba] の検索結果

218/06/10追記

Python
1# -*- coding: utf-8 -*-
2from itertools import filterfalse
3from urllib.request import urlopen
4from bs4 import BeautifulSoup
5
6
7def main() -> None:
8    # 指定するURL
9    URL = "http://race.netkeiba.com/?pid=payback_list&id=p2018050301"
10    html = urlopen(URL)
11    soup = BeautifulSoup(html, "html.parser")
12    print(soup)
13    print('#' * 60)
14
15    for tr in soup.select("table.race_table_01 > tr.bml1"):
16        for rank in filterfalse(lambda x: x.renderContents().decode() != "1", tr.select("td:nth-of-type(1)")):
17            for popular in tr.select("td:nth-of-type(12)"):
18                print(popular.renderContents().decode())
19                break
20        # filterfalseは以下の行と同じです
21        #for rank in tr.select("td:nth-of-type(1)"):
22        #    if rank.renderContents().decode() != "1":
23        #        continue
24        #    for popular in tr.select("td:nth-of-type(12)"):
25        #       print(popular.renderContents().decode())
26        #        break
27
28
29if __name__ == '__main__':
30    main()
31

◇参考情報
0. itertools.filterfalse
0. Pythonスクレイピング：同じ名前のクラス内での識別
0. :nth-of-type()

投稿2018/06/10 12:29

編集2018/06/10 14:32

umyu

総合スコア5846

MitMc

2018/06/10 12:49

早速のご回答ありがとうございます。私の質問が間違っていました。申し訳ございません。勝馬のオッズ情報ではなく、その隣の何番人気（着順テーブルの右端）が来たかの情報を取得したいのですが、上記コードの'txt_r'のようにクラスがないので、そのような場合、どのようなコードになるでしょうか？

umyu

2018/06/10 12:56 編集

＞MitMcさんへ質問が間違ってたというのであれば、他の回答者さんが質問を見たときに誤解が発生する可能性があるので、質問文は修正できるので修正してくださいな。

MitMc

2018/06/10 13:31

質問修正できました。ありがとうございます。

MitMc

2018/06/11 15:23

回答誠にありがとうございます。実際に求めた値が抽出できました。ちなみに、 # if rank.renderContents().decode() != "1": # continue # for popular in tr.select("td:nth-of-type(12)"): のところを、 if rank.renderContents().decode() = "1": # for popular in tr.select("td:nth-of-type(12)"): に書き換えると、エラーになるのはなぜでしょうか？どちらも同じ意味になる気がするのですが。

umyu

2018/06/11 17:37 編集

＞MitMcさんへいえ、=ひとつなので、代入文になってます。==です。 if文を使うとインデントが深くなってコードが読みづらくなるので、できるだけ条件を反転させたほうがよいですよ。あとエラーになったら、できるだけエラー文言を記載してくださいな。

MitMc

2018/06/12 03:34

凡ミスすみません。エラー文も次回から記載します。ほんとに助かります。ありがとうございます。

行動規範の内容に同意します