python3でseleniumとbs4を使用 使用機材はmac
seleniumとbs4を用いてスクレイピングをしています。社会科のキーワード集作成のためにキーワードとそのリンク先の説明を取得したいです。
ここの青文字の部分がリンクになっており、その先に各用語の説明が書いています。
発生している問題・エラーメッセージ
Traceback (most recent call last): File "Selelelen.py", line 1, in <module> import requests, bs4, sys File "/Users/****/opt/anaconda3/lib/python3.7/site-packages/requests/__init__.py", line 43, in <module> import urllib3 File "/Users/****/opt/anaconda3/lib/python3.7/site-packages/urllib3/__init__.py", line 7, in <module> from .connectionpool import HTTPConnectionPool, HTTPSConnectionPool, connection_from_url File "/Users/****/opt/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py", line 11, in <module> from .exceptions import ( File "/Users/****/opt/anaconda3/lib/python3.7/site-packages/urllib3/exceptions.py", line 2, in <module> from .packages.six.moves.http_client import IncompleteRead as httplib_IncompleteRead File "<frozen importlib._bootstrap>", line 983, in _find_and_load File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 668, in _load_unlocked File "<frozen importlib._bootstrap>", line 638, in _load_backward_compatible File "/Users/****/opt/anaconda3/lib/python3.7/site-packages/urllib3/packages/six.py", line 199, in load_module mod = mod._resolve() File "/Users/****/opt/anaconda3/lib/python3.7/site-packages/urllib3/packages/six.py", line 113, in _resolve return _import_module(self.mod) File "/Users/****/opt/anaconda3/lib/python3.7/site-packages/urllib3/packages/six.py", line 82, in _import_module __import__(name) File "/Users/****/opt/anaconda3/lib/python3.7/http/client.py", line 72, in <module> import email.message File "/Users/****/opt/anaconda3/lib/python3.7/email/message.py", line 10, in <module> import uu File "/Users/****/Desktop/Program/uu.py", line 1, in <module> import requests, sys, webbrowser, bs4 File "/Users/****/opt/anaconda3/lib/python3.7/site-packages/bs4/__init__.py", line 31, in <module> from .builder import builder_registry, ParserRejectedMarkup File "/Users/****/opt/anaconda3/lib/python3.7/site-packages/bs4/builder/__init__.py", line 475, in <module> from . import _html5lib File "/Users/****/opt/anaconda3/lib/python3.7/site-packages/bs4/builder/_html5lib.py", line 20, in <module> import html5lib File "/Users/****/opt/anaconda3/lib/python3.7/site-packages/html5lib/__init__.py", line 28, in <module> from .serializer import serialize File "/Users/****/opt/anaconda3/lib/python3.7/site-packages/html5lib/serializer.py", line 11, in <module> from xml.sax.saxutils import escape File "/Users/****/opt/anaconda3/lib/python3.7/xml/sax/saxutils.py", line 6, in <module> import os, urllib.parse, urllib.request File "/Users/****/opt/anaconda3/lib/python3.7/urllib/request.py", line 1351, in <module> if hasattr(http.client, 'HTTPSConnection'): AttributeError: module 'http' has no attribute 'client'
該当のソースコード
python
1import requests, bs4, sys 2from selenium import webdriver 3from time import sleep 4import urllib 5urllib.request.http.client 6 7 8print('Next...') 9driver = webdriver.Firefox(executable_path='/Users/****/opt/anaconda3/bin/geckodriver') 10url = 'http://ssd.cswiki.jp/index.php?%E6%AD%B4%E5%8F%B2%EF%BC%A1%E3%83%A9%E3%83%B3%E3%82%AF' 11driver.get(url) 12sleep(10) 13 14explain_urls = [] 15elems_explain_url = driver.find_elements_by_css_selector('#body p a') 16for elem_explain_url in elems_explain_url: 17 explain_url = elem_explain_url.get_attribute('href') 18 explain_urls.append(explain_url) 19 for explain_url_list in explain_urls: 20 driver.get(explain_url_list) 21 time.sleep(30) 22 driver.back() 23 24
試したこと
初めは
NameError: name 'explain_urls' is not defined
というエラーメッセージが出たので
explain_url = elem_explain_url.get_attribute('href')
から
explain_urls = elem_explain_url.get_attribute('href')
に変更しました。
回答1件
あなたの回答
tips
プレビュー
バッドをするには、ログインかつ
こちらの条件を満たす必要があります。
2020/04/22 14:03
2020/04/22 14:04
2020/04/22 14:10
2020/04/22 14:41
2020/04/22 14:46
2020/04/22 14:50
2020/04/22 15:08
2020/04/22 15:32
2020/04/22 15:33
2020/04/22 16:10
2020/04/22 23:46
2020/04/23 11:47
2020/04/23 11:58
2020/04/23 13:12 編集
2020/04/24 09:56