seleniumを用いてPubmedから上から順に論文名とURLと要約を取得するコードを作製したいです。
Python
1 2 3cur_url = driver.current_url 4 5item = 1 6for elem_name in driver.find_elements_by_xpath('//a[@class="docsum-title"]'): 7 csvlist = [] 8 csvlist.append(str(item)) 9 csvlist.append(elem_name.text) 10 elem_url = elem_name.get_attribute('href') 11 csvlist.append(elem_url) 12 driver.get(elem_url) 13 elem_abst = driver.find_elements_by_xpath('//*[@id="enc-abstract"]/p') 14 elem_abst = elem_abst[0].text 15 csvlist.append(elem_abst) 16 writer.writerow(csvlist) 17 driver.get(cur_url) 18 item = item+1 19
ここまで書いてみたのですが、for文中の初めの論文にアクセスした後に一つ前のページに戻り、二つ目の論文にアクセスするところで止まってしまいます。
どのようにすれば、二つ目以上の論文にアクセスできるようになりますか?
実行したらどういう結果(出力とか、エラーメッセージとか)になるのでしょうか?
visual studio codeで実行しているのですが、
DevTools listening on ws://127.0.0.1:54982/devtools/browser/1e4d13f5-1dc2-46f0-895b-5ae23a7eefb7
d:\python\pubmed 検索.py:19: DeprecationWarning: find_element_by_name is deprecated. Please use find_element(by=By.NAME, value=name)
instead
search_box = driver.find_element_by_name('term')
d:\python\pubmed 検索.py:36: DeprecationWarning: find_elements_by_xpath is deprecated. Please use find_elements(by=By.XPATH, value=xpath) instead
for elem_name in driver.find_elements_by_xpath('//a[@class="docsum-title"]'):
d:\python\pubmed 検索.py:43: DeprecationWarning: find_elements_by_xpath is deprecated. Please use find_elements(by=By.XPATH, value=xpath) instead
elem_abst = driver.find_elements_by_xpath('//*[@id="enc-abstract"]/p')
Traceback (most recent call last):
File "d:\python\pubmed 検索.py", line 39, in <module>
csvlist.append(elem_name.text)
File "C:\Users\t_kiriya_1127\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webelement.py", line 77, in text
return self._execute(Command.GET_ELEMENT_TEXT)['value']
File "C:\Users\t_kiriya_1127\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webelement.py", line 710, in _execute
return self._parent.execute(command, params)
File "C:\Users\t_kiriya_1127\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webdriver.py", line 425, in execute
self.error_handler.check_response(response)
File "C:\Users\t_kiriya_1127\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=98.0.4758.102)
Stacktrace:
Backtrace:
Ordinal0 [0x00E069A3+2582947]
Ordinal0 [0x00D9A6D1+2139857]
Ordinal0 [0x00C93A98+1063576]
Ordinal0 [0x00C962B7+1073847]
Ordinal0 [0x00C9617E+1073534]
Ordinal0 [0x00C963F0+1074160]
Ordinal0 [0x00CBB8E0+1226976]
Ordinal0 [0x00CD854C+1344844]
Ordinal0 [0x00CB6524+1205540]
Ordinal0 [0x00CD86A4+1345188]
Ordinal0 [0x00CE834A+1409866]
Ordinal0 [0x00CD8366+1344358]
Ordinal0 [0x00CB5176+1200502]
Ordinal0 [0x00CB6066+1204326]
GetHandleVerifier [0x00FABE02+1675858]
GetHandleVerifier [0x0106036C+2414524]
GetHandleVerifier [0x00E9BB01+560977]
GetHandleVerifier [0x00E9A8D3+556323]
Ordinal0 [0x00DA020E+2163214]
Ordinal0 [0x00DA5078+2183288]
Ordinal0 [0x00DA51C0+2183616]
Ordinal0 [0x00DAEE1C+2223644]
BaseThreadInitThunk [0x7753FA29+25]
RtlGetAppContainerNamedObjectPath [0x77767A9E+286]
RtlGetAppContainerNamedObjectPath [0x77767A6E+238]
このように出ています。
まだ勉強を始めて1週間程度しかたってなく、正しく返答できていなければ、申し訳ありません。

回答1件
あなたの回答
tips
プレビュー