【python】cloud vision api から返ってきたjsonデータをソートしてから使いたい

前提・実現したいこと

cloud vision apiを用いてテキスト検出をしています。
その際、各テキストボックスのx座標最大値から降順でjsonデータとして格納されているのですが、これを
x座標の最大値から降順、y座標は0からにして
画像の右上から左下に向かってテキスト検出をさせたい

該当のソースコード

↓↓↓返って来るjsonデータ↓↓↓



{
                                                        "property": {
                                                            "detectedLanguages": [
                                                                {
                                                                    "languageCode": "ja"
                                                                }
                                                            ]
                                                        },
                                                        "boundingBox": {
                                                            "vertices": [
                                                                {
                                                                    "x": 35,
                                                                    "y": 201
                                                                },
                                                                {
                                                                    "x": 47,
                                                                    "y": 201
                                                                },
                                                                {
                                                                    "x": 47,
                                                                    "y": 206
                                                                },
                                                                {
                                                                    "x": 35,
                                                                    "y": 206
                                                                }
                                                            ]
                                                        },
                                                        "text": "\u308b",
                                                        "confidence": 0.99
                                                    },

python
1def request_cloud_vison_api(image_base64):
2    api_url = GOOGLE_CLOUD_VISION_API_URL + API_KEY
3    req_body = json.dumps({
4        'requests': [{
5            'image': {
6                'content': image_base64.decode('utf-8') # jsonに変換するためにstring型に変換する
7            },
8            'features': [{
9                'type': 'DOCUMENT_TEXT_DETECTION', # ここを変更することで分析内容を変更できる
10                'maxResults': 10,
11            }]
12        }]
13    })
14    
15    res = requests.post(api_url, data=req_body)
16    return res.json()
17
18# 画像読み込み
19def img_to_base64(filepath):
20    with open(filepath, 'rb') as img:
21        img_byte = img.read()
22    return base64.b64encode(img_byte)
23
24# 文字認識させたい画像を./img.pngとする
25img_base64 = img_to_base64('images/konan.jpg')
26result = request_cloud_vison_api(img_base64)
27
28i = 0;
29for rect in result["responses"][0]["textAnnotations"]:
30  
31  x1 = rect["boundingPoly"]["vertices"][0]["x"]
32  y1 = rect["boundingPoly"]["vertices"][0]["y"]
33  x2 = rect["boundingPoly"]["vertices"][1]["x"]
34  y2 = rect["boundingPoly"]["vertices"][2]["y"]
35  
36
37
38
39  print(result["responses"][0]["textAnnotations"][i]["boundingPoly"])
40  print(result["responses"][0]["textAnnotations"][i]["description"])

Terminal
1
2//検出したテキストboxの四つ角の座標(0番目のx座標、降順になっている。)
3//検出した文字
4
5
6{'vertices': [{'x': 604, 'y': 363}, {'x': 617, 'y': 363}, {'x': 617, 'y': 411}, {'x': 604, 'y': 411}]}
7引っ越しまん
8{'vertices': [{'x': 603, 'y': 50}, {'x': 616, 'y': 50}, {'x': 616, 'y': 65}, {'x': 603, 'y': 65}]}
9から
10{'vertices': [{'x': 594, 'y': 35}, {'x': 604, 'y': 35}, {'x': 606, 'y': 86}, {'x': 596, 'y': 86}]}
11トラックの上に
12{'vertices': [{'x': 583, 'y': 36}, {'x': 593, 'y': 36}, {'x': 592, 'y': 71}, {'x': 582, 'y': 71}]}
13売ってたん
14{'vertices': [{'x': 592, 'y': 361}, {'x': 600, 'y': 360}, {'x': 600, 'y': 361}, {'x': 592, 'y': 362}]}
15「
16{'vertices': [{'x': 592, 'y': 371}, {'x': 601, 'y': 370}, {'x': 603, 'y': 398}, {'x': 594, 'y': 399}]}
17ってたよ
18{'vertices': [{'x': 595, 'y': 402}, {'x': 603, 'y': 401}, {'x': 603, 'y': 402}, {'x': 595, 'y': 403}]}
19!
20{'vertices': [{'x': 578, 'y': 249}, {'x': 589, 'y': 249}, {'x': 589, 'y': 278}, {'x': 578, 'y': 278}]}
21そのに

補足情報（FW/ツールのバージョンなど）

python3.6.5

行動規範の内容に同意します

回答1件

ベストアンサー

画像の右上から左下に向かってテキスト検出をさせたい

矩形の右上の座標を対象として、yの昇順→xの降順でデータを並び替えればよいかと思います。

Python
1from PIL import Image, ImageDraw
2import random
3
4# テストデータ。ランダムな順番
5dat = [ {'text':'a', 'rect':[{'x': 50,'y': 50},{'x':150,'y': 50},{'x':150,'y':150},{'x': 50,'y':150}]},
6        {'text':'b', 'rect':[{'x':200,'y': 50},{'x':400,'y': 50},{'x':400,'y':150},{'x':200,'y':150}]},
7        {'text':'c', 'rect':[{'x':450,'y': 50},{'x':600,'y': 50},{'x':600,'y':150},{'x':450,'y':150}]},
8        {'text':'d', 'rect':[{'x': 50,'y':200},{'x':300,'y':200},{'x':300,'y':300},{'x': 50,'y':300}]},
9        {'text':'e', 'rect':[{'x':350,'y':200},{'x':600,'y':200},{'x':600,'y':300},{'x':350,'y':300}]}]
10random.shuffle(dat)
11
12# テストデータの描画
13im = Image.new('RGB', (640, 480), (255, 255, 255))
14draw = ImageDraw.Draw(im)
15for d in dat:
16    x1, y1, x2, y2 = d['rect'][0]['x'], d['rect'][0]['y'], d['rect'][2]['x'], d['rect'][2]['y']
17    t = d['text']
18    print(t)
19    draw.rectangle((x1, y1, x2, y2), outline=(0,0,0))
20    draw.text((x1+10,y1+10), t, fill=(0,0,0))
21im.save('ret.png')
22
23# 矩形の右上の座標を対象として、yの昇順→xの降順で並び替え
24print('-----')
25dat.sort(key=lambda v:(v['rect'][1]['y'], -v['rect'][1]['x']))
26for d in dat:
27    print(d['text'])
28"""
29c
30b
31a
32e
33d
34"""