selenium+BeautifulSoupによるスクレイピングにおけるリスト処理

前提・実現したいこと

Googlemapから検索結果の名前、住所などの情報を取得したい。

発生している問題・エラーメッセージ

取得結果の2番目以降の情報が取得できない。

DevTools listening on ws://127.0.0.1:60887/devtools/browser/870dbb21-fe99-4c91-8bd2-eeccef6b0aac
-------------------------------
東京駅
〒100-0005 東京都千代田区丸の内１丁目
None
-------------------------------
Traceback (most recent call last):
  File "scraping.py", line 35, in <module>
    link.click()
  File "C:\Python38\lib\site-packages\selenium\webdriver\remote\webelement.py", line 80, in click
    self._execute(Command.CLICK_ELEMENT)
  File "C:\Python38\lib\site-packages\selenium\webdriver\remote\webelement.py", line 633, in _execute
    return self._parent.execute(command, params)
  File "C:\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Python38\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: chrome=84.0.4147.105)

該当のソースコード

Python
1from selenium import webdriver
2import time
3from bs4 import BeautifulSoup
4import chromedriver_binary
5import re
6
7# 検索蘭にキーワードを記入
8keys = "新宿　猫カフェ"
9# Google Chromeのドライバを用意
10driver = webdriver.Chrome()
11
12# Google mapsを開く
13url = 'https://www.google.co.jp/maps/'
14driver.get(url)
15
16time.sleep(5)
17
18# データ入力
19id = driver.find_element_by_id("searchboxinput")
20id.send_keys(keys)
21
22time.sleep(1)
23
24# クリック
25search_button = driver.find_element_by_xpath(
26    "//*[@id='searchbox-searchbutton']")
27search_button.click()
28
29time.sleep(10)
30
31links = driver.find_elements_by_class_name("section-result-title")
32
33# 検索結果の情報を順に取得
34for link in links:
35    link.click()
36
37    time.sleep(10)
38
39    page_source = driver.page_source
40    soup = BeautifulSoup(page_source, 'html.parser')
41
42    title = soup.find(
43        class_="section-hero-header-title-title GLOBAL__gm2-headline-5")
44    address = soup.find(text=re.compile("〒."))
45    phone_number = soup.find(text=re.compile("^0\d{2,3}-\d{1,4}-\d{4}$"))
46
47    print("-------------------------------")
48    print(title.text.strip())
49    print(address)
50    print(phone_number)
51    print("-------------------------------")
52
53    driver.back()
54    time.sleep(10)

どのようにすれば解決するでしょうか。

補足情報

Windows 10
Python 3.8.2
selenium 3.141.0
bs4 0.0.1
chromedriver_binary 84.0.4147.30.0

行動規範の内容に同意します

回答1件

ベストアンサー

（ひとまず、StaleElementReferenceExceptionの回避についてのみ言及します）
コード中の

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

というエラーについて調べました。
参照：
StackOverFlow："stale-element-reference-element-is-not-attached-to-the-page-document"

この中で

Whenever you face this issue, just define the web element once again above the line in which you are getting an Error.(略)

Since the DOM has changed e.g. through the update action, you are receiving a StaleElementReference Error.

と回答されているように、DOMが変わるためエラーが発生する、エラーを回避するにはもう一度web要素を指定せよ、ということらしいです。

元コードでは、driver.back()で猫カフェ店舗の個別情報を表示したページから一覧に戻った時点で、最初にlinksリストに格納していたリンク要素が更新され無効になってしまっており、無効になった要素をクリックしようとするためStaleElementReferenceExceptionエラーが発生しています。

したがって、一覧に戻る都度リンクの要素を取得しなおし、クリックする要素だけを変更するようにします。

コード例

(略)
links = driver.find_elements_by_class_name("section-result-title")

for i in range(len(links)):
    lnk = driver.find_elements_by_class_name("section-result-title") # リンクの要素を取得しなおす
    lnk[i].click()  # i番目の要素をクリックする。
    time.sleep(10)
(略)

投稿2020/07/28 13:59

編集2020/07/28 14:01