例えばhttp://sample.com/a/b/c/d/e/1?ima=0000&cd=member
にブログの1ページ目があるとします。
2ページ目は
http://sample.com/a/b/c/d/e/2?ima=0000&cd=memberにあるとします。
このブログをBeautiful Soup 4を使って1ページ目から読み込みたいのですが
ときどき記事が削除されて404がでているときに
そのエラーを受け取ったら次のページにいきたいのですがうまく行きません。
lang
1#!/usr/bin/env python 2# -*- coding: utf-8 -*- 3from urllib.request import urlopen 4from urllib.error import URLError, HTTPError 5from bs4 import BeautifulSoup 6import os 7url = 'http://sample.com/a/b/c/d/e/{}?ima=0000&cd=member' 8try: 9 for i in range(1, 1000): 10 html = urlopen(url.format(str(i))) 11 soup = BeautifulSoup(html) 12 print(soup.find('div',{'class':'headArea'}).p.text) 13 14except HTTPError as e: 15 for i in range(i+1, 1000): 16 html = urlopen(url.format(str(i))) 17 soup = BeautifulSoup(html) 18 print('Error code: ', e.code) 19except URLError as e: 20 for i in range(i+1, 1000): 21 html = urlopen(url.format(str(i))) 22 soup = BeautifulSoup(html) 23 print('Reason: ', e.reason)
エラーメッセージ
lang
1Traceback (most recent call last): 2 File "sample.py", line 21, in <module> 3 html = urlopen(url.format(str(i))) 4 File "/Users/Mypc/.pyenv/versions/3.6.1/lib/python3.6/urllib/request.py", line 223, in urlopen 5 return opener.open(url, data, timeout) 6 File "/Users/Mypc/.pyenv/versions/3.6.1/lib/python3.6/urllib/request.py", line 532, in open 7 response = meth(req, response) 8 File "/Users/Mypc/.pyenv/versions/3.6.1/lib/python3.6/urllib/request.py", line 642, in http_response 9 'http', request, response, code, msg, hdrs) 10 File "/Users/Mypc/.pyenv/versions/3.6.1/lib/python3.6/urllib/request.py", line 570, in error 11 return self._call_chain(*args) 12 File "/Users/Mypc/.pyenv/versions/3.6.1/lib/python3.6/urllib/request.py", line 504, in _call_chain 13 result = func(*args) 14 File "/Users/Mypc/.pyenv/versions/3.6.1/lib/python3.6/urllib/request.py", line 650, in http_error_default 15 raise HTTPError(req.full_url, code, msg, hdrs, fp) 16urllib.error.HTTPError: HTTP Error 404: Not Found 17 18During handling of the above exception, another exception occurred: 19 20Traceback (most recent call last): 21 File "sample.py", line 27, in <module> 22 html = urlopen(url.format(str(i))) 23 File "/Users/Mypc/.pyenv/versions/3.6.1/lib/python3.6/urllib/request.py", line 223, in urlopen 24 return opener.open(url, data, timeout) 25 File "/Users/Mypc/.pyenv/versions/3.6.1/lib/python3.6/urllib/request.py", line 532, in open 26 response = meth(req, response) 27 File "/Users/Mypc/.pyenv/versions/3.6.1/lib/python3.6/urllib/request.py", line 642, in http_response 28 'http', request, response, code, msg, hdrs) 29 File "/Users/Mypc/.pyenv/versions/3.6.1/lib/python3.6/urllib/request.py", line 570, in error 30 return self._call_chain(*args) 31 File "/Users/Mypc/.pyenv/versions/3.6.1/lib/python3.6/urllib/request.py", line 504, in _call_chain 32 result = func(*args) 33 File "/Users/Mypc/.pyenv/versions/3.6.1/lib/python3.6/urllib/request.py", line 650, in http_error_default 34 raise HTTPError(req.full_url, code, msg, hdrs, fp) 35urllib.error.HTTPError: HTTP Error 404: Not Found
回答1件
あなたの回答
tips
プレビュー
バッドをするには、ログインかつ
こちらの条件を満たす必要があります。
2017/05/03 12:58
2017/05/03 13:04
2017/05/04 07:04