beautifulsoupのselectでコロンでエラーが出てしまう

googleトレンドでpythonの学習をしているのですが、
エラー　:approx_traffic' pseudo-class is not implemented at this time
が出てしまいうまくいきません。

調べてみるとコロン：がクラスと判断しているようなのですが、回避方法はありませんでしょうか？

import requests
from bs4 import BeautifulSoup
import time
rss = 'https://trends.google.co.jp/trends/trendingsearches/daily/rss?geo=JP'
result = requests.get(rss)
soup = BeautifulSoup(result.text, 'xml')
search_scores = soup.select('ht:approx_traffic')

以下googleトレンドのサンプル一部コード

<item>
<title>日向坂46</title>
<ht:approx_traffic>20,000+</ht:approx_traffic>
<description/>
<link>https://trends.google.co.jp/trends/trendingsearches/daily?geo=JP#%E6%97%A5%E5%90%91%E5%9D%8246</link>
<pubDate>Mon, 07 Mar 2022 16:00:00 +0900</pubDate>
<ht:picture>https://t3.gstatic.com/images?q=tbn:ANd9GcQvt_DHXpxYd_2AmgVGlbbx7x1KZ6FxUjheob4XvSzG7bCjsgiTNOcDPcryPOfekXNSWCc7ZSFK</ht:picture>
<ht:picture_source>Yahoo!ニュース</ht:picture_source>
<ht:news_item>
<ht:news_item_title>日向坂46、新メンバーオーディション開催「人生は1回しかないです ...</ht:news_item_title>
<ht:news_item_snippet>アイドルグループ「日向坂46」が新たなメンバーを募集する、「日向坂46 新メンバーオーディション」の開催を決定したと発表した。新メンバーオーディションを開催する&nbsp;...</ht:news_item_snippet>
<ht:news_item_url>https://news.yahoo.co.jp/articles/c17d8146644ad3191fa09caa5b90fb0679e1e384</ht:news_item_url>
<ht:news_item_source>Yahoo!ニュース</ht:news_item_source>
</ht:news_item>
<ht:news_item>
<ht:news_item_title>日向坂46、新メンバーオーディション開催 佐々木久美「待ってい ...</ht:news_item_title>
<ht:news_item_snippet>アイドルグループ・日向坂46が、約4年ぶりに新たなメンバーを募集する「日向坂46 新メンバーオーディション」を開催することが決定し、7日から応募受付を開始した。</ht:news_item_snippet>
<ht:news_item_url>https://news.mynavi.jp/article/20220307-2287009/</ht:news_item_url>
<ht:news_item_source>マイナビニュース</ht:news_item_source>
</ht:news_item>
</item>

行動規範の内容に同意します

回答2件

パーサーが"xml"だと、名前空間部分が削除されるようなので、

Python
1soup.select('approx_traffic')

でどうでしょうか？

投稿2022/03/07 12:45

otn

総合スコア84499

usausagi

2022/03/07 12:47

なるほど！！ありがとうございます。たしかにそういう発想もありますね。パーサーを色々試して、適宜自分の取りやすいものに変更したほうがいいと言うことを教えていただきありがとうございます。勉強がはかどりますありがとうございます

otn

2022/03/07 12:50

パーサーが lxml だとnamespaceが残るので、selectじゃなくて soup.find_all("ht:approx_traffic") とか。

行動規範の内容に同意します

ベストアンサー

python
1#search_scores = soup.select('ht:approx_traffic')
2search_scores = soup.select('approx_traffic', namespaces={'ht': ''})
3
4for score in search_scores:
5  print(score.text)
6
7#
820,000+
920,000+
1020,000+
1120,000+
1220,000+
1310,000+
1410,000+
1510,000+
1610,000+
17200,000+
18100,000+
1950,000+
2020,000+
2120,000+
2220,000+
2320,000+
2420,000+
2520,000+
2620,000+
2720,000+