seleniumでのスクレイピング

前提・実現したいこと

pythonによるseleniumを用いたスクレイピングで画像の部分のURLを取得したいのですが
やたらと長い出力が帰ってきてしまいます。
どこが間違っているのか教えていただきたいです。

発生している問題・エラーメッセージ

Traceback (most recent call last):
  File "c:/Users/81801/Desktop/python/スクレイピング/競馬情報取得2R1.py", line 15, in <module>
    search_bar = driver.find_element_by_xpath('//dd/ul/li/a')[1]
TypeError: 'WebElement' object is not subscriptable

該当のソースコード

python3
1#selenium 
2
3#ライブラリ取得
4from selenium import webdriver
5from time import sleep
6from selenium.webdriver.chrome.options import Options
7 
8#URL取得
9options = Options()
10options.add_argument('--headless')  
11driver = webdriver.Chrome(r"C:\Users\81801\Desktop\python\chromedriver_win32\chromedriver",options=options)
12driver.get('https://race.netkeiba.com/top/race_list.html?kaisai_date=20200802&kaisai_id=2020040204&current_group=1020200801#racelist_top_a')
13
14#検索
15search_bar = driver.find_element_by_xpath('//dd/ul/li/a')[1]
16print(search_bar.get_attribute('href'))
17
18

行動規範の内容に同意します

回答2件

ベストアンサー

URLが欲しいということはその後それぞれに遷移してレース結果を収集するということでしょうか。
だとしたらサイトをもう少しよく研究してみてください。
決まり切ったURLをわざわざ取得するよりはスマートなやり方が見つかるはずです。

例えば

https://race.netkeiba.com/race/result.html?race_id=202004030207は
新潟3回2日目7R

https://race.netkeiba.com/race/result.html?race_id=202010020204は
小倉2回2日目4R

もういくつか見ればはっきりしますが、年、場所、回、日目、Rになっています。

python
1driver.get('https://race.netkeiba.com/top/race_list.html?kaisai_date=20200802&kaisai_id=2020040204&current_group=1020200801#racelist_top_a')
2place_pair = {'新潟': '04', '小倉': '10', '札幌': '01'}
3for data_title in driver.find_elements_by_class_name('RaceList_DataTitle'):
4    basyo = place_pair.get(data_title.text.split(' ')[1]) # "04"
5    kaisai = data_title.text.split('回')[0] # "2"
6    day = data_title.text.split(' ')[2].split('日')[0] # "4"
7    for r in ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']:
8        driver.get('https://race.netkeiba.com/race/result.html?race_id=2020' + basyo + '0' + kaisai + '0' + day + r)
9        # ここで各レースページ処理