前提・実現したいこと
urlsまでは取得出来ていますが、
for文の途中で何か起きているようです。
発生している問題・エラーメッセージ
InvalidSchema Traceback (most recent call last) <ipython-input-35-c6e5e3cfb049> in <module> 10 links = [] 11 for url in urls: ---> 12 r = requests.get(url) 13 bs= BeautifulSoup(r.text,'html.parser') 14 link = bs.find('a') /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/api.py in get(url, params, **kwargs) 73 74 kwargs.setdefault('allow_redirects', True) ---> 75 return request('get', url, params=params, **kwargs) 76 77 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/api.py in request(method, url, **kwargs) 58 # cases, and look like a memory leak in others. 59 with sessions.Session() as session: ---> 60 return session.request(method=method, url=url, **kwargs) 61 62 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json) 531 } 532 send_kwargs.update(settings) --> 533 resp = self.send(prep, **send_kwargs) 534 535 return resp /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/sessions.py in send(self, request, **kwargs) 638 639 # Get the appropriate adapter to use --> 640 adapter = self.get_adapter(url=request.url) 641 642 # Start time (approximately) of the request /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/sessions.py in get_adapter(self, url) 729 730 # Nothing matches :-/ --> 731 raise InvalidSchema("No connection adapters were found for '%s'" % url) 732 733 def close(self): InvalidSchema: No connection adapters were found for '<a aria-current="page" class="logo" href="https://www.oreilly.com" title="home page"><img alt="O'Reilly home" onerror="this.src='https://cdn.oreillystatic.com/images/sitewide-headers/oreilly_logo_mark_red_@2x.png'; this.onerror=null;" src="https://cdn.oreillystatic.com/images/sitewide-headers/oreilly_logo_mark_red.svg"/></a>'
該当のソースコード
import requests from bs4 import BeautifulSoup import re url = 'https://www.oreilly.com/' r = requests.get(url) bs = BeautifulSoup(r.text,'html.parser') urls = bs.find_all('a',{'href':re.compile('^(http).+')}) links = [] for url in urls: r = requests.get(url) bs= BeautifulSoup(r.text,'html.parser') link = bs.find('a') links.append(link) print(links)
回答1件
あなたの回答
tips
プレビュー
バッドをするには、ログインかつ
こちらの条件を満たす必要があります。
2019/08/06 12:24
2019/08/06 12:26