質問編集履歴

自己解決した部分の削除

2023/01/08 19:18

投稿

スコア44

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -2,8 +2,8 @@
 for loop 中の id(resp) が変っています。この場合、resp が新しく作成されたタイミングで自動的に timeout するのでしょうか？
-shadow variable を許す言語なので resp が破棄されるのが関数を出るタイミングだったと記憶しています。そのため、以下の Keep-Alive のセクションを読む限り大量に timeout していない resp が残ったままになるのではないかと考えています。
+~~shadow variable を許す言語なので resp が破棄されるのが関数を出るタイミングだったと記憶しています。そのため、以下の Keep-Alive のセクションを読む限り大量に timeout していない resp が残ったままになるのではないかと考えています。~~
-https://requests.readthedocs.io/en/latest/user/advanced/#session-objects
+https://requests.readthedocs.io/en/latest/user/advanced/#session-objects
 for loop の中で get() の後に resp.close() した方が良いのでしょうか？しかし、そのようにするとストリーミング処理した意味がなくなってしまうのではないかと思うのですが...

typo と timeout を追加

2023/01/08 19:12

投稿

スコア44

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -11,15 +11,15 @@
 ```
           ses = requests.Session()
-          resp = ses.get(self.URLS['short_ratio'], stream=True)
+          resp = ses.get(self.URLS['short_ratio'], stream=True, timeout=3)
           resp.encoding = resp.apparent_encoding
           url_chars = '[0-9a-zA-Z\/\-\_]*'
           urls_xls = []
           urls_xls += re.findall('{0}\.xls?'.format(url_chars), resp.text)
-          for p in re.findall('{0}\d\d-archives-\d\d\.html?'.format(url_chars)
+          for p in re.findall('{0}\d\d-archives-\d\d\.html?'.format(url_chars):
-              resp = ses.get('{0}{1}'.format(self.BASE_URL, p), stream=True)
+              resp = ses.get('{0}{1}'.format(self.BASE_URL, p), stream=True, timeout=3)
               print(id(ses), id(resp))
               print(resp.cookies)
               resp.encoding = resp.apparent_encoding