回答編集履歴

修正

2020/03/16 05:01

投稿

kirara0048

スコア1399

test CHANGED Viewed

@@ -212,6 +212,32 @@
 ```python
+# NumPy配列に変換
+arr = df.to_numpy()
+# もし、'id1', 'id2'の組み合わせが網羅的でなく、ソートされていない場合（例えば、
+print(df.sample(frac=0.9, random_state=0))
+#          item1  item2  item3
+# id1 id2
+# 2   2        4      5      6
+#     1        1      2      3
+#     3        7      8      9
+# 1   3        7      8      9
+# 3   1        1      2      3
+#     3        7      8      9
+# 1   2        4      5      6　のようなとき）
 # 'id1', 'id2'の組み合わせを網羅させ、NumPy配列に変換
 arr = df.reindex(pd.MultiIndex.from_product(df.index.levels)).to_numpy()
@@ -220,7 +246,7 @@
 # `ndarray.reshape()`を用いて組み換え、データフレームに変換
-col = pd.MultiIndex.from_product((df.columns, df.index.levels[1]))
+col = pd.MultiIndex.from_product((df.index.levels[1], df.columns))
 new_df = pd.DataFrame(arr.reshape(df.index.levshape[0], -1),
@@ -238,7 +264,7 @@
-|   id1 |   item1-1 |   item1-2 |   item1-3 |   item2-1 |   item2-2 |   item2-3 |   item3-1 |   item3-2 |   item3-3 |
+|   id1 |   item1-1 |   item2-1 |   item3-1 |   item1-2 |   item2-2 |   item3-2 |   item1-3 |   item2-3 |   item3-3 |
 |------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|

手法を追加

2020/03/16 05:01

投稿

kirara0048

スコア1399

test CHANGED Viewed

@@ -4,6 +4,16 @@
+また、`id1`・`id2`列がインデックスに設定されている場合は、`df.unstack()`を用いることができます。
+[pandas.DataFrame.unstack — pandas 1.0.2 documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.unstack.html#pandas-dataframe-unstack)
+## ケース1
 ```python
 data = {'id1': [1, 1, 1, 2, 2, 2],
@@ -66,7 +76,7 @@
-また、
+質問にある例と全く同一の形式に変換するには、
@@ -93,3 +103,145 @@
 |     1 |         1 |         2 |         3 |         4 |         5 |         6 |         7 |         8 |         9 |
 |     2 |         1 |         2 |         3 |         4 |         5 |         6 |         7 |         8 |         9 |
+## ケース2（`id`列がindexのとき）
+```python
+data = {'id1': [1, 1, 1, 2, 2, 2],
+        'id2': [1, 2, 3, 1, 2, 3],
+        'item1': [1, 4, 7, 1, 4, 7],
+        'item2': [2, 5, 8, 2, 5, 8],
+        'item3': [3, 6, 9, 3, 6, 9]}
+df = pd.DataFrame(data).set_index(['id1', 'id2'])
+print(df)
+```
+|        |   item1 |   item2 |   item3 |
+|:-------|--------:|--------:|--------:|
+| (1, 1) |       1 |       2 |       3 |
+| (1, 2) |       4 |       5 |       6 |
+| (1, 3) |       7 |       8 |       9 |
+| (2, 1) |       1 |       2 |       3 |
+| (2, 2) |       4 |       5 |       6 |
+| (2, 3) |       7 |       8 |       9 |
+このとき、
+```python
+new_df = df.unstack()
+new_df.sort_index(axis=1, level=1, inplace=True)
+new_df.set_axis(['-'.join([c1, str(c2)]) for c1, c2 in new_df.columns],
+                axis=1, inplace=True)
+print(new_df)
+```
+|   id1 |   item1-1 |   item2-1 |   item3-1 |   item1-2 |   item2-2 |   item3-2 |   item1-3 |   item2-3 |   item3-3 |
+|------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|
+|     1 |         1 |         2 |         3 |         4 |         5 |         6 |         7 |         8 |         9 |
+|     2 |         1 |         2 |         3 |         4 |         5 |         6 |         7 |         8 |         9 |
+## ケース3（NumPyを使う方法）
+`id1`・`id2`列がインデックスに設定されていて、`id1`と`id2`の組み合わせが全て網羅されている場合、3*3ブロックを1*9に組み替えるだけでいいので、以下の方法が使えます。
+```python
+data = {'id1': [1, 1, 1, 2, 2, 2],
+        'id2': [1, 2, 3, 1, 2, 3],
+        'item1': [1, 4, 7, 1, 4, 7],
+        'item2': [2, 5, 8, 2, 5, 8],
+        'item3': [3, 6, 9, 3, 6, 9]}
+df = pd.DataFrame(data).set_index(['id1', 'id2'])
+# ケース2と同じ
+```
+このとき、
+```python
+# 'id1', 'id2'の組み合わせを網羅させ、NumPy配列に変換
+arr = df.reindex(pd.MultiIndex.from_product(df.index.levels)).to_numpy()
+# `ndarray.reshape()`を用いて組み換え、データフレームに変換
+col = pd.MultiIndex.from_product((df.columns, df.index.levels[1]))
+new_df = pd.DataFrame(arr.reshape(df.index.levshape[0], -1),
+                      index=df.index.levels[0], columns=col)
+new_df.set_axis(['-'.join([c1, str(c2)]) for c1, c2 in new_df.columns],
+                axis=1, inplace=True)
+print(new_df)
+```
+|   id1 |   item1-1 |   item1-2 |   item1-3 |   item2-1 |   item2-2 |   item2-3 |   item3-1 |   item3-2 |   item3-3 |
+|------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|
+|     1 |         1 |         2 |         3 |         4 |         5 |         6 |         7 |         8 |         9 |
+|     2 |         1 |         2 |         3 |         4 |         5 |         6 |         7 |         8 |         9 |