Excelから読みだしたdataframeをもとに、jsonのパラメータファイルを生成したい。

前提

Excelから読みだしたdataframeに対し、とある自作のアプリケーションに対して使用するパラメータファイルを
自動生成する仕組みとを作りたいと考えています。

実現したいこと

下記の図のように、Excelから読みだしたdataframeを変換して、label_listの項目に、labelの一覧を、
settingの項目に、それぞれのラベルごとに、各列のデータを項目ごとに区切ってlist化したものを
格納しようと考えています。

発生している問題・エラーメッセージ

上記を実現するために下記のようなスクリプトを書いたのですが、
jsonの入れ子構造が再現できず困っています。
ループ内の文字列が、図のようなフォーマットとならず、json出力すると、下記エラーが出てしまいます。

エラー内容

TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17528/3905920474.py in <module>
1 df = pd.read_excel("test.xlsx")
----> 2 write_json_param(df)

~\AppData\Local\Temp/ipykernel_17528/2183125372.py in write_json_param(tmp_df)
28 # with open(write_filename, "w") as f:
29 with open(write_filename, "w", encoding="CP932") as f:
---> 30 json.dump(write_str, f, ensure_ascii=False)

~\AppData\Local\Programs\Python\Python39\lib\json_init_.py in dump(obj, fp, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
177 # could accelerate with writelines in some versions of Python, at
178 # a debuggability cost
--> 179 for chunk in iterable:
180 fp.write(chunk)
181

~\AppData\Local\Programs\Python\Python39\lib\json\encoder.py in _iterencode(o, _current_indent_level)
429 yield from _iterencode_list(o, _current_indent_level)
430 elif isinstance(o, dict):
--> 431 yield from _iterencode_dict(o, _current_indent_level)
432 else:
433 if markers is not None:

~\AppData\Local\Programs\Python\Python39\lib\json\encoder.py in _iterencode_dict(dct, _current_indent_level)
403 else:
404 chunks = _iterencode(value, _current_indent_level)
--> 405 yield from chunks
406 if newline_indent is not None:
407 _current_indent_level -= 1

~\AppData\Local\Programs\Python\Python39\lib\json\encoder.py in _iterencode(o, _current_indent_level)
436 raise ValueError("Circular reference detected")
437 markers[markerid] = o
--> 438 o = _default(o)
439 yield from _iterencode(o, _current_indent_level)
440 if markers is not None:

~\AppData\Local\Programs\Python\Python39\lib\json\encoder.py in default(self, o)
177
178 """
--> 179 raise TypeError(f'Object of type {o.class.name} '
180 f'is not JSON serializable')
181
TypeError: Object of type set is not JSON serializable

ソースコード

python3
1def write_json_param(tmp_df):
2    # jsonのパラメータ書込み部
3    write_filename = (
4        # Path(const.MyConst.SETTING_PATH) / f"{sel_schema}_{sel_proc}.json"
5        Path("db_settings_.json")
6	)
7
8    str_loop = ""
9    for row in df.itertuples():
10        print(row[1], row[2], row[3], row[4], row[5])
11        tmp_str = (f"{row[1]}" + ":{" + 
12                    "x:" + f"[{row[2]}]" + "," +
13                    "x1:" + f"[{row[3]}]" +  "," +
14                    "y1:" + f"[{row[4]}]" +  "," +
15                    "y2:" + f"[{row[5]}]" + 
16
17                    "}")
18        str_loop = str_loop + tmp_str + ','
19
20    write_loop_str = f'"proc_setting":{str_loop}'
21    write_str = {
22        "label_list": tmp_df["label"].tolist(),
23        "setting":{
24            write_loop_str
25        }
26	    # write_loop_str
27    }
28    # with open(write_filename, "w") as f:
29    with open(write_filename, "w", encoding="CP932") as f:
30        json.dump(write_str, f, ensure_ascii=False)
31
32# main処理
33df = pd.read_excel("test.xlsx")
34write_json_param(df)
35

補足情報（FW/ツールのバージョンなど）

Python 3.9.6

meg_

2022/03/30 16:32

> ソースコードは後日載せたいと考えておりますが、先行して質問だけ出させてください。そのようにされる理由は何でしょうか？回答が付きにくくなるかと思うのですが。コードを追記される際には「うまく実現できません。」についても詳細に書いていただくと回答しやすくなるかと思います。

H.K2

2022/04/02 06:08

ご指摘ありがとうございます。コードを追記させていただきます。状況の記載も見直しいたします。

meg_

2022/04/03 04:26

> jsonの入れ子構造が再現できず困っています。どんな出力になってしまうのでしょうか？掲載されたコードの「for row in df.itertuples():」以下の部分は何をしているのでしょうか？

H.K2

2022/04/03 04:53

すみません。jupyterでいろいろ試行錯誤しているのですが、下記のようなフォーマットになってしまい、入れ子自体には何とかなったのですが、全体に、「'」が入ってしまいます。 'ほげほげ:{x:[a_col],x1:[b_col],y1:[c_col],y2:[d_col]},もげもげ:{x:[a, b],x1:[b, c],y1:[c, d],y2:[d, e]},あああ:{x:[a,b,c],x1:[d,e,f],y1:[e,f,g],y2:[g,h,i]},' jsonに出力しようとしたら、「Object of type set is not JSON serializable」のようなエラーが出てしまいます。

H.K2

2022/04/03 04:56

＞掲載されたコードの「for row in df.itertuples():」以下の部分は何をしているのでしょうか？すみません。古いコードになっていたようです。。。関数の中で、各行のデータを作るために、itertuples（）を使っているのですが、そちらをなかにいれわすれていました。現状のコードに修正します

meg_

2022/04/03 06:08 編集

質問者さんのコードを修正するのが難しかったので私なりのコードを回答いたしました。下記については元々文字列として生成しているからでは？ > 下記のようなフォーマットになってしまい、入れ子自体には何とかなったのですが、全体に、「'」が入ってしまいます。また、コードに「"proc_setting"」がありますがこちらは質問の図にはありませんが何か必要なものでしょうか？

H.K2

2022/04/03 10:59

＞下記については元々文字列として生成しているからでは？なるほど。エラーについては、ちょっと落ち着いたら、コードを見直してみます。ありがとうございます。やりたかったことが、こういう方向でできることに気づいてなかったので、勉強になりました。＞こちらは質問の図にはありませんが何か必要なものでしょうか？すみません。こちらは今回の質問とは関係ない設定なので問題ありません。（実問題では必要ですが、今回の質問の本質ではないので、大丈夫です）ありがとうござい案した。

行動規範の内容に同意します

回答1件

ベストアンサー

Pandas.DataFrameから質問者さんの希望する辞書を生成するコードとなります。その他の部分は省略しています。

Python
1import pandas as pd
2
3df = pd.DataFrame({'label':['ほげほげ','もげもげ','あああ'], 'x':['a_col','a,b','a,b,c'],'x1':['b_col','b,c','d,e,f'],'y1':['c_col','c,d','e,f,g'],'y2':['d_col','d,e','g,h,i']})
4# 	label	x	x1	y1	y2
5# 0	ほげほげ	a_col	b_col	c_col	d_col
6# 1	もげもげ	a,b	b,c	c,d	d,e
7# 2	あああ	a,b,c	d,e,f	e,f,g	g,h,i
8
9out = dict()
10for i in range(len(df)):
11    temp_dict = dict()
12    for j in df.columns[1:]:
13        temp_dict[j] = df.loc[i,j].split(',')
14    out[df.loc[i,'label']] = temp_dict
15output = {'label_list':df['label'].tolist(), 'setting':out}
16# {'label_list': ['ほげほげ', 'もげもげ', 'あああ'],
17# 'setting': {'ほげほげ': {'x': ['a_col'],
18#    'x1': ['b_col'],
19#    'y1': ['c_col'],
20#    'y2': ['d_col']},
21#   'もげもげ': {'x': ['a', 'b'],
22#    'x1': ['b', 'c'],
23#    'y1': ['c', 'd'],
24#    'y2': ['d', 'e']},
25#   'あああ': {'x': ['a', 'b', 'c'],
26#    'x1': ['d', 'e', 'f'],
27#    'y1': ['e', 'f', 'g'],
28#    'y2': ['g', 'h', 'i']}}}