pandasのpivotでエラーが出る

お世話になります。とある数表をmeltでtidy dataに変換してpivotで変換することを試みました。
実際のデータは300万行以上あるのですが、データとしては以下のような形です。（やや簡単化して最初の5行を書きます。）

データ種別	DCF施設コード	施設名称	製品	年月	値
SOM CR	01103811	北海道循環器病院	市場ポテンシャル区分(9月より累計)	推定年月1209	J
SOM CR	01103811	北海道循環器病院	市場ポテンシャル区分(3Week)	推定年月1209	K
SOM CR	01103811	北海道循環器病院	製品Aシェアレンジ(9月より累計)	推定年月1209	0
SOM CR	01103811	北海道循環器病院	製品Aシェアレンジ(3Week)	推定感謝1209	0
SOM CR	01103811	北海道循環器病院	主要製品切替フラグ	推定感謝1209	0

ここでpivotをしようとしたのですが、valueの値が数値ではないため、pivot_tableではなく、pivotを使いました。このdataframeをdf2_longとしました。

df2_p=pd.pivot(df2_long, index=['データ種別',''DCF施設コード','施設名称'],
columns=['製品','年月'],
values="値")

上記のコマンドを打つと、

発生している問題・エラーメッセージ

ValueError                                Traceback (most recent call last)
<ipython-input-49-9b47980d8760> in <module>()
      1 df2_p=pd.pivot(df2_long, index=['データ種別','DCF施設コード','施設名称'],
      2                columns=['製品','年月'],
----> 3                values="値")

C:\Anaconda3\lib\site-packages\pandas\core\reshape\pivot.py in pivot(data, index, columns, values)
    447             )
    448         else:
--> 449             indexed = data._constructor_sliced(data[values].values, index=index)
    450     return indexed.unstack(columns)
    451 

C:\Anaconda3\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    290                     if len(index) != len(data):
    291                         raise ValueError(
--> 292                             f"Length of passed values is {len(data)}, "
    293                             f"index implies {len(index)}."
ValueError: Length of passed values is 3964680, index implies 3.

というエラーがでます。
valueが数値でないため、pivot_tableが使えないと思ったのですが、何が間違っているか、分からず、お知恵を拝借できればと思いました。
何卒、よろしくお願い申し上げます。

melian

2023/01/25 07:45

上手くいくかどうか不明ですが、最初に set_index() でインデックスを分けてしまうのはどうでしょうか。 df2_p = df2_long.set_index(['データ種別','DCF施設コード','施設名称']).pivot(columns=['製品','年月'],values="値")

onosan

2023/01/25 09:26

ありがとうございます。下記のようなエラーになりました。pivotが複数のインデックスに対応していないのかもしれないです。 NotImplementedError Traceback (most recent call last) <ipython-input-9-561bbd2959dc> in <module>() 1 df2_p=df2_long.set_index(['データ種別','DCF施設コード','施設名称']).pivot( 2 columns=['製品','年月'], ----> 3 values="値") 4 5 C:\Anaconda3\lib\site-packages\pandas\core\frame.py in pivot(self, index, columns, values) 5921 from pandas.core.reshape.pivot import pivot 5922 -> 5923 return pivot(self, index=index, columns=columns, values=values) 5924 5925 _shared_docs[ C:\Anaconda3\lib\site-packages\pandas\core\reshape\pivot.py in pivot(data, index, columns, values) 439 else: 440 index = data[index] --> 441 index = MultiIndex.from_arrays([index, data[columns]]) 442 443 if is_list_like(values) and not isinstance(values, tuple): C:\Anaconda3\lib\site-packages\pandas\core\indexes\multi.py in from_arrays(cls, arrays, sortorder, names) 425 raise ValueError("all arrays must be same length") 426 --> 427 codes, levels = factorize_from_iterables(arrays) 428 if names is lib.no_default: 429 names = [getattr(arr, "name", None) for arr in arrays] C:\Anaconda3\lib\site-packages\pandas\core\arrays\categorical.py in factorize_from_iterables(iterables) 2706 # For consistency, it should return a list of 2 lists. 2707 return [[], []] -> 2708 return map(list, zip(*(factorize_from_iterable(it) for it in iterables))) C:\Anaconda3\lib\site-packages\pandas\core\arrays\categorical.py in <genexpr>(.0) 2706 # For consistency, it should return a list of 2 lists. 2707 return [[], []] -> 2708 return map(list, zip(*(factorize_from_iterable(it) for it in iterables))) C:\Anaconda3\lib\site-packages\pandas\core\arrays\categorical.py in factorize_from_iterable(values) 2678 # but only the resulting categories, the order of which is independent 2679 # from ordered. Set ordered to False as default. See GH #15457 -> 2680 cat = Categorical(values, ordered=False) 2681 categories = cat.categories 2682 codes = cat.codes C:\Anaconda3\lib\site-packages\pandas\core\arrays\categorical.py in __init__(self, values, categories, ordered, dtype, fastpath) 372 373 # we're inferring from values --> 374 dtype = CategoricalDtype(categories, dtype.ordered) 375 376 elif is_categorical_dtype(values): C:\Anaconda3\lib\site-packages\pandas\core\dtypes\dtypes.py in __init__(self, categories, ordered) 220 221 def __init__(self, categories=None, ordered: Ordered = False): --> 222 self._finalize(categories, ordered, fastpath=False) 223 224 @classmethod C:\Anaconda3\lib\site-packages\pandas\core\dtypes\dtypes.py in _finalize(self, categories, ordered, fastpath) 367 368 if categories is not None: --> 369 categories = self.validate_categories(categories, fastpath=fastpath) 370 371 self._categories = categories C:\Anaconda3\lib\site-packages\pandas\core\dtypes\dtypes.py in validate_categories(categories, fastpath) 540 if not fastpath: 541 --> 542 if categories.hasnans: 543 raise ValueError("Categorial categories cannot be null") 544 pandas\_libs\properties.pyx in pandas._libs.properties.CachedProperty.__get__() C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in hasnans(self) 1779 """ 1780 if self._can_hold_na: -> 1781 return bool(self._isnan.any()) 1782 else: 1783 return False pandas\_libs\properties.pyx in pandas._libs.properties.CachedProperty.__get__() C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in _isnan(self) 1759 """ 1760 if self._can_hold_na: -> 1761 return isna(self) 1762 else: 1763 # shouldn't reach to this condition by checking hasnans beforehand C:\Anaconda3\lib\site-packages\pandas\core\dtypes\missing.py in isna(obj) 124 Name: 1, dtype: bool 125 """ --> 126 return _isna(obj) 127 128 C:\Anaconda3\lib\site-packages\pandas\core\dtypes\missing.py in _isna_new(obj) 136 # hack (for now) because MI registers as ndarray 137 elif isinstance(obj, ABCMultiIndex): --> 138 raise NotImplementedError("isna is not defined for MultiIndex") 139 elif isinstance(obj, type): 140 return False NotImplementedError: isna is not defined for MultiIndex