Pythonのrequests.get()関数でアクセスできないサイトがあります．

Question

### 前提・実現したいこと Pythonのrequests.get()関数でWebページをダウンロードしたいのですが，エラーになるわけでもなくアクセスできないサイトがあります．具体的にはこちらの通販サイトから， https://www.asos.com/fila/fila-mini-dress-with-drawstring-waist-and-logo-front/prd/13843263 以下にある商品価格を取得したいと思っております． £36.00 どのようにすればよいでしょうか？また，requests以外で使えるモジュールはありますか？ ### 発生している問題・エラーメッセージ timeoutを設定した時のみ以下のエラーが出ますが，それ以外の時はずっと実行中になってしまいます． ```Error Message --------------------------------------------------------------------------- timeout Traceback (most recent call last) C:\Anaconda\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 383 # otherwise it looks like a programming error was the cause. --> 384 six.raise_from(e, None) 385 except (SocketTimeout, BaseSSLError, SocketError) as e: C:\Anaconda\lib\site-packages\urllib3\packages\six.py in raise_from(value, from_value) C:\Anaconda\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 379 try: --> 380 httplib_response = conn.getresponse() 381 except Exception as e: C:\Anaconda\lib\http\client.py in getresponse(self) 1320 try: -> 1321 response.begin() 1322 except ConnectionError: C:\Anaconda\lib\http\client.py in begin(self) 295 while True: --> 296 version, status, reason = self._read_status() 297 if status != CONTINUE: C:\Anaconda\lib\http\client.py in _read_status(self) 256 def _read_status(self): --> 257 line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") 258 if len(line) > _MAXLINE: C:\Anaconda\lib\socket.py in readinto(self, b) 588 try: --> 589 return self._sock.recv_into(b) 590 except timeout: C:\Anaconda\lib\site-packages\urllib3\contrib\pyopenssl.py in recv_into(self, *args, **kwargs) 308 else: --> 309 return self.recv_into(*args, **kwargs) 310 C:\Anaconda\lib\site-packages\urllib3\contrib\pyopenssl.py in recv_into(self, *args, **kwargs) 306 if not util.wait_for_read(self.socket, self.socket.gettimeout()): --> 307 raise timeout('The read operation timed out') 308 else: timeout: The read operation timed out During handling of the above exception, another exception occurred: ReadTimeoutError Traceback (most recent call last) C:\Anaconda\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 448 retries=self.max_retries, --> 449 timeout=timeout 450 ) C:\Anaconda\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 637 retries = retries.increment(method, url, error=e, _pool=self, --> 638 _stacktrace=sys.exc_info()[2]) 639 retries.sleep() C:\Anaconda\lib\site-packages\urllib3\util\retry.py in increment(self, method, url, response, error, _pool, _stacktrace) 366 if read is False or not self._is_method_retryable(method): --> 367 raise six.reraise(type(error), error, _stacktrace) 368 elif read is not None: C:\Anaconda\lib\site-packages\urllib3\packages\six.py in reraise(tp, value, tb) 685 raise value.with_traceback(tb) --> 686 raise value 687 C:\Anaconda\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 599 body=body, headers=headers, --> 600 chunked=chunked) 601 C:\Anaconda\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 385 except (SocketTimeout, BaseSSLError, SocketError) as e: --> 386 self._raise_timeout(err=e, url=url, timeout_value=read_timeout) 387 raise C:\Anaconda\lib\site-packages\urllib3\connectionpool.py in _raise_timeout(self, err, url, timeout_value) 305 if isinstance(err, SocketTimeout): --> 306 raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value) 307 ReadTimeoutError: HTTPSConnectionPool(host='www.asos.com', port=443): Read timed out. (read timeout=10) During handling of the above exception, another exception occurred: ReadTimeout Traceback (most recent call last) in 1 url = 'https://www.asos.com/fila/fila-mini-dress-with-drawstring-waist-and-logo-front/prd/13843263' ----> 2 res = requests.get(url, timeout = 10) C:\Anaconda\lib\site-packages\requests\api.py in get(url, params, **kwargs) 73 74 kwargs.setdefault('allow_redirects', True) ---> 75 return request('get', url, params=params, **kwargs) 76 77 C:\Anaconda\lib\site-packages\requests\api.py in request(method, url, **kwargs) 58 # cases, and look like a memory leak in others. 59 with sessions.Session() as session: ---> 60 return session.request(method=method, url=url, **kwargs) 61 62 C:\Anaconda\lib\site-packages\requests\sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json) 531 } 532 send_kwargs.update(settings) --> 533 resp = self.send(prep, **send_kwargs) 534 535 return resp C:\Anaconda\lib\site-packages\requests\sessions.py in send(self, request, **kwargs) 644 645 # Send the request --> 646 r = adapter.send(request, **kwargs) 647 648 # Total elapsed time of the request (approximately) C:\Anaconda\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 527 raise SSLError(e, request=request) 528 elif isinstance(e, ReadTimeoutError): --> 529 raise ReadTimeout(e, request=request) 530 else: 531 raise ReadTimeout: HTTPSConnectionPool(host='www.asos.com', port=443): Read timed out. (read timeout=10) ``` ### 該当のソースコード ```Python 3.x url = 'https://www.asos.com/fila/fila-mini-dress-with-drawstring-waist-and-logo-front/prd/13843263' res = requests.get(url, timeout = 10) ``` ### 試したこと申し訳ないのですが，初心者で試行錯誤の余地がないため，これといって試すことのできたものはありません． urllib.requests()も同様に使えませんでした． ### 補足情報（FW/ツールのバージョンなど） Python 3, chromeでJupyterLabを用いています．同じウィンドウ上の別のタブでは当該URLを開くことは可能でした．

Accepted Answer

とりあえずはこのサイトはスクレイピング対策されているみたいで、requestでそのまま取得しようとすると応答しない仕様になっているみたいですので、ユーザーエージェントの偽装が必要でした。

ユーザーエージェントを偽装することでアクセスはできたのですが、中のHTMLも結構複雑で、私も初心者のため、価格をうまく取得することはできませんでしたので、サンプルコードは提示てきません、申し訳ないです。

Answer

ユーザーエージェントの変更によってアクセス自体は可能．

```Code
ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 safari/537.36'
headers = {'User-Agent': ua}
url = 'https://www.asos.com/fila/fila-mini-dress-with-drawstring-waist-and-logo-front/prd/13843263'
res = requests.get(url, headers=headers, timeout = 10)
res.raise_for_status()
```

しかし，ダウンロードしたHTMLテキストと，ホームページ上で「右クリック→検証」で表示されるテキストがなぜか異なるため，欲しい情報をbs4で取得することはできなかった．

前提・実現したいこと

発生している問題・エラーメッセージ

該当のソースコード

試したこと

補足情報（FW/ツールのバージョンなど）

関連した質問