したいこと
ディレクトリ内のjsonファイルを走査してcsvファイルに変換して変換結果を追記するとき、見出しと一致した時そこに書き込むことをしたいです。
それか見出し(下の例のCSVファイルでいう1行目)を指定してその見出しと一致したところのみを追記することをしたいです。
言い回しが複雑ですいません。
CSV
1"fruit_binomial name","fruit_major_producers_0","fruit_major_producers_1","fruit_major_producers_2","fruit_name","fruit_nutrition_carbohydrates","fruit_nutrition_fat","fruit_nutrition_protein" 2"Malus domestica","China","United States","Turkey","Apple","13.81g","0.17g","0.26g" 3"Citrus x sinensis","Brazil","United States","India","Orange","11.75g","0.12g","0.94g" 4"Mangifera indica","India","China","Thailand","Mango","15g","0.38g","0.82g"
作業対象のjsonファイルは1つ24MB、約50万行、全部で9000個と人間の目で確認しながら作業するには量が膨大すぎます。
##作業環境
- Python 3.6
- Ubuntu 16.04
ソースコード
以下のソースコードはvinay20045さんのjson-to-csvを使用させていただきました(https://github.com/ravikuc/json-to-csv-1)。
python
1import sys 2import json 3import csv 4import io 5 6## 7# Convert to string keeping encoding in mind... 8## 9def to_string(s): 10 try: 11 return str(s) 12 except: 13 #Change the encoding type if needed 14 return s.encode('utf-8') 15 16 17## 18# This function converts an item like 19# { 20# "item_1":"value_11", 21# "item_2":"value_12", 22# "item_3":"value_13", 23# "item_4":["sub_value_14", "sub_value_15"], 24# "item_5":{ 25# "sub_item_1":"sub_item_value_11", 26# "sub_item_2":["sub_item_value_12", "sub_item_value_13"] 27# } 28# } 29# To 30# { 31# "node_item_1":"value_11", 32# "node_item_2":"value_12", 33# "node_item_3":"value_13", 34# "node_item_4_0":"sub_value_14", 35# "node_item_4_1":"sub_value_15", 36# "node_item_5_sub_item_1":"sub_item_value_11", 37# "node_item_5_sub_item_2_0":"sub_item_value_12", 38# "node_item_5_sub_item_2_0":"sub_item_value_13" 39# } 40## 41def reduce_item(key, value): 42 global reduced_item 43 44 #Reduction Condition 1 45 if type(value) is list: 46 i=0 47 for sub_item in value: 48 reduce_item(key+'_'+to_string(i), sub_item) 49 i=i+1 50 51 #Reduction Condition 2 52 elif type(value) is dict: 53 sub_keys = value.keys() 54 for sub_key in sub_keys: 55 reduce_item(key+'_'+to_string(sub_key), value[sub_key]) 56 57 #Base Condition 58 else: 59 reduced_item[to_string(key)] = to_string(value) 60 61 62if __name__ == "__main__": 63 if len(sys.argv) != 4: 64 print ("\nUsage: python json_to_csv.py <node> <json_in_file_path> <csv_out_file_path>\n") 65 else: 66 #Reading arguments 67 node = sys.argv[1] 68 json_file_path = sys.argv[2] 69 csv_file_path = sys.argv[3] 70 71 with io.open(json_file_path, 'r', encoding='utf-8-sig') as fp: 72 json_value = fp.read() 73 raw_data = json.loads(json_value) 74 75 try: 76 data_to_be_processed = raw_data[node] 77 except: 78 data_to_be_processed = raw_data 79 80 processed_data = [] 81 header = [] 82 for item in data_to_be_processed: 83 reduced_item = {} 84 reduce_item(node, item) 85 86 header += reduced_item.keys() 87 88 processed_data.append(reduced_item) 89 90 header = list(set(header)) 91 header.sort() 92 93 with open(csv_file_path, 'w+') as f: 94 writer = csv.DictWriter(f, header, quoting=csv.QUOTE_ALL) 95 writer.writeheader() 96 for row in processed_data: 97 writer.writerow(row) 98 99 print ("Just completed writing csv file with %d columns" % len(header))
###入力ファイル構成
json
1{ 2 "devices": [ 3 { 4 "name": "IntGW-01", 5 "modules": { 6 "openconfig-interfaces": { 7 "interfaces": { 8 "interface": [ 9 { 10 "name": "GigabitEthernet1", 11 "config": { 12 "name": "GigabitEthernet1", 13 "type": "ianaift:ethernetCsmacd", 14 "description": "ManagementIF", 15 "enabled": true 16 }, 17 "state": { 18 "name": "GigabitEthernet1", 19 "type": "ianaift:ethernetCsmacd", 20 "enabled": true, 21 "ifindex": 1, 22 "admin-status": "UP", 23 "oper-status": "UP", 24 "last-change": 1580105229000389000, 25 "counters": { 26 "in-octets": 17493436676, 27 "in-unicast-pkts": 232900221, 28 "in-broadcast-pkts": 0, 29 "in-multicast-pkts": 0, 30 "in-discards": 0, 31 "in-errors": 0, 32 "in-unknown-protos": 0, 33 "in-fcs-errors": 0, 34 "out-octets": 4239283962, 35 "out-unicast-pkts": 413443767, 36 "out-broadcast-pkts": 0, 37 "out-multicast-pkts": 0, 38 "out-discards": 0, 39 "out-errors": 0, 40 "last-clear": 1580105112000091000 41 } 42 }, 43 "subinterfaces": {... 44 } 45 }, 46 { 47 "name": "GigabitEthernet2", 48 "config": {... 49 }, 50 "status":{... 51 }, 52 ... 53 } 54 //以下略 55 ] 56 } 57 } 58 }, 59 { 60 "name": "IntGW-02", 61 modules:{... 62 } 63 }, 64 { 65 "name": "RR-01", 66 modules:{... 67 } 68 }, 69 { 70 "name": "TR-01", 71 modules:{... 72 } 73 }, 74 { 75 "name": "TR-02", 76 modules:{... 77 } 78 } 79 } 80 ], 81 "@log_name": "network-device-bgpnw2" 82}
コード実行画面
Just completed writing csv file with 70053 columns Just completed writing csv file with 70053 columns Just completed writing csv file with 70053 columns Just completed writing csv file with 70053 columns Just completed writing csv file with 70052 columns
###出力されたファイルの一部
csv
1devices_modules_openconfig-interfaces_interfaces_interface_1_state_counters_in-octets 219404774900314 318232333157805 4321813212 56973081695820 66858087362957 7devices_modules_openconfig-interfaces_interfaces_interface_1_state_counters_in-octets 819404939122952 918232504179497 10321814077 116973251217583 126858254359951 13devices_modules_openconfig-interfaces_interfaces_interface_1_state_counters_in-octets 1419405103421764 1518232675201303 16321815002 176973419614954 186858418776197 19devices_modules_openconfig-interfaces_interfaces_interface_1_state_counters_in-octets 2019405267515552 2118232846259947 22321815813 236973590696219 246858586735109 25devices_modules_openconfig-interfaces_interfaces_interface_1_state_counters_in-unicast-pkts 2616281056362 2715297738104 281990660 295861296692
###理想のファイル
csv
1IntGW-01_devices_modules_openconfig-interfaces_interfaces_interface_1_state_counters_in-octets,IntGW-02_devices_modules_openconfig-interfaces_interfaces_interface_1_state_counters_in-octets,RR-01-02_devices_modules_openconfig-interfaces_interfaces_interface_1_state_counters_in-octets,TR-01_devices_modules_openconfig-interfaces_interfaces_interface_1_state_counters_in-octets,TR-02_devices_modules_openconfig-interfaces_interfaces_interface_1_state_counters_in-octets 219404774900314,18232333157805,321813212,6973081695820,6858087362957 319404939122952,18232504179497,321814077,6973251217583,6858254359951 419405103421764,18232675201303,321815002,6973419614954,6858418776197 5//以下略
パスなどは省略しています。
ディレクトリ内のファイルをシェルを用いて5つ変換した結果です。返還後に列数が表示される仕様で、何故か列が少ないものが存在してしまうため、同じ要素が縦に並ばない状態です。
また、入力ファイルの配列内に5つの要素(IntGW-01〜TR-02)があり、devicesをキー指定として展開しているのですが、csvファイルに変換した時に要素が4つしかなかったりと、素人の私からするとよく分からないことが多いです。
####最後に
初めての質問となるため、至らぬ点が多いと思いますが、解決のためにご協力頂ければなと思います。
あなたの回答
tips
プレビュー