Pythonで動的サイトからURLを抽出

以下のサイトから定期的にCSVファイルを取り込もうとしていますが、URLを指定すると、一定のタイミングで変更するため、beautiful soupを使用し、ある年月のCSVのURLを取得ができません。
以下のような、動的サイト？からXPathを取得する方法をご教授ください。

https://www.cmegroup.com/ja/trading/interest-rates/countdown-to-fomc.html

"
import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
import urllib.request, urllib.parse, urllib.error

header = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
}

url = "https://www.cmegroup.com/trading/interest-rates/countdown-to-fomc.html"
req = requests.get(url, headers=header)
soup = BeautifulSoup(req.content, "lxml")

print('a')

for i_a in soup.findAll('a'):
title = i_a.get("title")
href = i_a.get("href")
print(title,href)

if "Download Federal Reserve meeting history for 14 Dec 2022" in href:
print(title)

quickquip

2022/08/09 00:41

コードは **最低限として** 普通に読めるようにしましょう。 https://teratail.com/help#about-markdown https://teratail.com/help/question-tips#questionTips35 あたりを参考に質問を編集してください。

行動規範の内容に同意します

回答1件

ベストアンサー

JavaScriptでHTMLが変更されるページという意味であれば、JavaScriptを実行するためにブラウザなどを使う必要があります。
このサイトで質問が多いのは、Seleniumライブラリでブラウザをコントロールするというやり方です。
https://www.selenium.dev/ja/documentation/webdriver/getting_started/

投稿2022/08/08 23:35