### 実現したいことスクレイピングで同じクラスが複数あるものを一つずつ指定するにはどのようにすればいいですか ### 前提こちらのサイトhttps://github.com/orangain/scraping-hands-on/blob/master/exercises.md でpythonのスクレイピングの練習しております。1つ目の問題で同じクラスが複数あり一番最初の一つだけしか取得できません。どのようにすれば一つずつ取得できますか？ ### 該当のソースコード ```python import requests from bs4 import BeautifulSoup url = "http://qiita.com/advent-calendar/2016/crawler" html = requests.get(url) soup = BeautifulSoup(html.content,'html.parser') #木曜日1日 for link in soup.find(class_='style-1dctyxx'): print(link.get('href')) for element in soup.find(class_='style-3ki7ar'): print(element.text) #金曜日2日 for link in soup.find(class_='style-1dctyxx'): print(link.get('href')) for element in soup.find(class_='style-3ki7ar'): print(element.text) ``` ### 出力結果 ``` http://amacbee.hatenablog.com/entry/2016/12/01/210436 scrapy-splashを使ってJavaScript利用ページを簡単スクレイピング http://amacbee.hatenablog.com/entry/2016/12/01/210436 scrapy-splashを使ってJavaScript利用ページを簡単スクレイピング ``` pythonのバージョンは3.11.2です。

スクレイピングで同じクラスが複数あるものを一つずつ指定するにはどのようにすればいいですか

実現したいこと

スクレイピングで同じクラスが複数あるものを一つずつ指定するにはどのようにすればいいですか

前提

こちらのサイトhttps://github.com/orangain/scraping-hands-on/blob/master/exercises.md でpythonのスクレイピングの練習しております。1つ目の問題で同じクラスが複数あり一番最初の一つだけしか取得できません。どのようにすれば一つずつ取得できますか？

該当のソースコード

python
1import requests
2from bs4 import BeautifulSoup
3
4url = "http://qiita.com/advent-calendar/2016/crawler"
5html = requests.get(url)
6soup = BeautifulSoup(html.content,'html.parser')
7
8#木曜日1日
9for link in soup.find(class_='style-1dctyxx'):
10    print(link.get('href'))
11for element in soup.find(class_='style-3ki7ar'):
12    print(element.text)
13#金曜日2日
14for link in soup.find(class_='style-1dctyxx'):
15    print(link.get('href'))
16for element in soup.find(class_='style-3ki7ar'):
17    print(element.text)
18

出力結果

http://amacbee.hatenablog.com/entry/2016/12/01/210436
scrapy-splashを使ってJavaScript利用ページを簡単スクレイピング
http://amacbee.hatenablog.com/entry/2016/12/01/210436
scrapy-splashを使ってJavaScript利用ページを簡単スクレイピング

pythonのバージョンは3.11.2です。

1T2R3M4

2023/03/26 10:32

https://qiita.com/hiha1323/questions/c2c64ecea549a9653f4d 以下ご対応ください。 https://teratail.com/help#posted-otherservice

hiha

2023/03/26 14:08

申し訳ございません。以後気をつけます。

行動規範の内容に同意します

回答1件

ベストアンサー

findは該当する要素の内、最初の1件だけを返すメソッドです。
該当する要素を全てリスト(list)で返して欲しいのであれば、find_allです。

投稿2023/03/26 08:37

otn

総合スコア86590

hiha

2023/03/27 06:47 編集

9行目のfindの後をfind_allにしたらurlがすべてNoneになり文章は同じ文書が4回出力されました。＃金曜日2日から下は消しました。 import requests from bs4 import BeautifulSoup url = "http://qiita.com/advent-calendar/2016/crawler" html = requests.get(url) soup = BeautifulSoup(html.content,'html.parser') for link in soup.find_all(class_='style-1dctyxx'): print(link.get('href')) for element in soup.find_all(class_='style-3ki7ar'): print(element.text)

otn

2023/03/27 11:02

該当ページを見てみましたが、クラスが'style-1dctyxx'な要素はdivです。divにはhrefが無いのでlink.get('href')はNoneですね。クラスが'style-3ki7ar'な要素はaのようなので、そのhrefが欲しいのならelement.get('href')で取得できるでしょう。クラス名のタイプミスかコピペミスをしているのか、そもそもHTMLとCSSをよくわかっていないのか、どちらかです。

hiha

2023/03/27 11:44

ちゃんとurl取得できました！ htmlとcssの勉強を飛ばしてpythonをやっているので、そこから勉強したほうがよさそうです。ありがとうございました！

行動規範の内容に同意します