req.raw.decode_content = Trueの解説をお願いします。。

https://qiita.com/revvve44/items/49f474fc1f05098bd670
上記記事より

python
1def download(url, file_name):
2    req = requests.get(url, stream=True)
3    if req.status_code == 200:
4        with open(file_name + ".png", 'wb') as f:   # pngをbinでfileに書き出し
5            req.raw.decode_content = True
6            shutil.copyfileobj(req.raw, f)   # fileにpng画像データをコピー

のreq.raw.decode_content = Trueの文の意味が色々調べても分かりません。

reqを .rawをつけることでなにをやっているのか，また.decode_content=Trueにすることでどうなるのか。
教えていただければ幸いです。

meg_

2022/06/05 10:54 編集

コードは「コードの挿入」で記入してください。 Qiitaの記事作成者には質問されましたか？ > 色々調べても腹落ちしません。何が「腹落ちし」ないのでしょうか？

sinsotu_S

2022/06/05 12:50

コメントありがとうございます。初投稿のため，書式に不手際があり失礼しました。先程Quitaの記事作成者の方に質問を致しました。 >何が「腹落ちし」ないのでしょうか？画像検索から画像を保存するという同様の他の記事でも，決まったようにreq.raw.decode_content = Trueの一文がありました。ただ，この一文を取り除いても正常に動いたのでなぜこの一文が必要になるのか，という点と.rawと.decode_contentで何をしているのか自分では調べても解決しきらなかったので解説を教えていただきたく質問しました。

quickquip

2022/06/05 23:38 編集

情報は質問を編集して追記してください。あと、正直、調べた結果としてここを読んで、ここはわかったけど、ここの意味がわからなかった、みたいな情報がないと答えづらい質問ではあります。どこを説明すればいいのか判断できないので。。。

行動規範の内容に同意します

回答1件

ベストアンサー

まずrequests.Responseのドキュメントにあたりますよね。
https://requests.readthedocs.io/en/latest/api/#requests.Response.raw

File-like object representation of response (for advanced usage). Use of raw requires that stream=True be set on the request. This requirement does not apply for use internally to Requests.

と出てくるのでstream=Trueの時にセットされるファイルライクのオブジェクトだと分かりますね。

stream=Trueをセットして型を見ますね。

python
1>>> import requests
2>>> req = requests.get('https://teratail.com/', stream=True)
3>>> print(type(res.raw))
4<class 'urllib3.response.HTTPResponse'>

urllib3のドキュメントにあたる必要があるとわかるわけです。PyPIからドキュメントに行って"decode_content"で検索しますよね。

https://urllib3.readthedocs.io/en/stable/reference/urllib3.response.html?highlight=decode_content

decode_content – If True, will attempt to decode the body based on the ‘content-encoding’ header.

っ書いてあるんで ‘content-encoding’ ヘッダが関係することが分かりますね。decode_contentがTrueだと、content-encoding ヘッダに従って本体ほデコードしようと試みるよ、と。

content-encoding ヘッダの振る舞いが分かっていればここで理解できて調査終了ですね。
そうでないならMDNとかを読みにいきますね。

この時点で私は挙動が推測できています。
content-encoding ヘッダが例えばgzipに設定されているレスポンスをこの処理で取得する時、decode_content=Trueを指定しないとネットワークを流れてきたgzip圧縮されたデータが取得できて、decode_content=Trueを指定するとヘッダを見てくれて自動でgzipでデコードしてくれたデータが取得されるんだろうな、と思っています。
そこで確認するコードを書きます。

python
1import shutil
2
3req = requests.get('https://teratail.com/', stream=True)
4if req.status_code == 200:
5    req.raw.decode_content = True
6    with open('decode_content_true_test', 'wb') as f:
7        shutil.copyfileobj(req.raw, f)
8
9req = requests.get('https://teratail.com/', stream=True)
10if req.status_code == 200:
11    with open('decode_content_false_test', 'wb') as f:
12        shutil.copyfileobj(req.raw, f)

として、macOSなのでターミナルに戻って

shell
1% file decode_content_false_test
2decode_content_false_test: gzip compressed data, from Unix, original size modulo 2^32 220957
3% file decode_content_true_test
4decode_content_true_test: HTML document text, Unicode text, UTF-8 text, with very long lines (20700), with CRLF line terminators