Python3 pandas.cutで得られるデータの型がわからない

前提・実現したいこと

AnacondaのJupiterLabを用いて開発をしています．
目的はヒストグラムの描画です．
list形式の元データをpandas.cutで分離し，得られたものをmatplotlib.histを用いてヒストグラムを描画しようとしています．

発生している問題・エラーメッセージ

matplotlib.histの引数として利用できる型がlistかseriesしかなく，pandas.cutで得られる戻り値の型がこの二つではない

エラーメッセージ
TypeError                                 Traceback (most recent call last)
<ipython-input-16-5feae93c127a> in <module>
     18 #plt.hist(col_values2,bins=40,range=(0,4.0))
     19 burnt = pd.cut(col_values2,[1.7,4.0])
---> 20 plt.hist(burnt,bins=40,range=(0,4.0))
     21 #df = pd.DataFrame.from_dict(dict)
     22 #data = df['burnt'].tolist()

~\anaconda3\lib\site-packages\matplotlib\pyplot.py in hist(x, bins, range, density, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, data, **kwargs)
   2683         orientation='vertical', rwidth=None, log=False, color=None,
   2684         label=None, stacked=False, *, data=None, **kwargs):
-> 2685     return gca().hist(
   2686         x, bins=bins, range=range, density=density, weights=weights,
   2687         cumulative=cumulative, bottom=bottom, histtype=histtype,

~\anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, data, *args, **kwargs)
   1436     def inner(ax, *args, data=None, **kwargs):
   1437         if data is None:
-> 1438             return func(ax, *map(sanitize_sequence, args), **kwargs)
   1439 
   1440         bound = new_sig.bind(ax, *args, **kwargs)

~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py in hist(self, x, bins, range, density, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
   6654             # this will automatically overwrite bins,
   6655             # so that each histogram uses the same bins
-> 6656             m, bins = np.histogram(x[i], bins, weights=w[i], **hist_kwargs)
   6657             tops.append(m)
   6658         tops = np.array(tops, float)  # causes problems later if it's an int

<__array_function__ internals> in histogram(*args, **kwargs)

~\anaconda3\lib\site-packages\numpy\lib\histograms.py in histogram(a, bins, range, normed, weights, density)
    834 
    835             # Only include values in the right range
--> 836             keep = (tmp_a >= first_edge)
    837             keep &= (tmp_a <= last_edge)
    838             if not np.logical_and.reduce(keep):

TypeError: '>=' not supported between instances of 'pandas._libs.interval.Interval' and 'int'

該当のソースコード

Python3
1ソースコード
2burnt = pd.cut(col_values2,[1.7,4.0])
3plt.hist(burnt,bins=40,range=(0,4.0))

試したこと

pandas.cutの戻り値の型をtypeを用いて調べました．
<class 'pandas.core.arrays.categorical.Categorical'>
付随する疑問として上記型はpandasのDataFrameの一種なのかnumpyのarrayの一種なのかがわかりません．

df = pd.DataFrame.from_dict(dict)
data = df['burnt'].tolist()
というコードでDataFrameからlistに変換を試みましたができませんでした．
このことからおそらく上記型(pandas.core.arrays~)はDataFrameではないと考えていますが正解がわかりません．
こちらにもお答えいただけると幸いです．

補足情報（FW/ツールのバージョンなど）

pandasのバージョンは1.1.3です．

ppaul

2021/04/04 12:07

pandas.cutの戻り値は、DataFrameの一種なのかnumpyのarrayの一種でもありません。すでに元の値の情報を失っているので、それをもとにヒストグラムを書くことはできませんので、別の方法を回答しておきました。こういうものをしらべたいときは、 >>> print(burnt.__doc__) をやると、英語のドキュメントが表示されるので、それを見るのも一つの方法です。 >>> help(burnt) でもほぼ同じ情報が出ますが、終了すると画面から消えます。

行動規範の内容に同意します

回答2件

pd.cut()の戻りであるCategoricalは「カテゴリデータ型配列」です。

pd.Seriesやpd.DataFrameは、テーブルの中に「データの配列」が格納されています。これは.values属性から取り出すことができます。
この「データの配列」は往々にしてnp.ndarrayですが、pandasにも独自の配列の型があり、カテゴリデータ型配列はその一つです。
カテゴリデータ型配列がなんなのかについては、整数データ型配列を拡張したものだと思ってください。

なお、pd.cut()の引数にlabels=Falseを指定すると、カテゴリデータ型配列の代わりに、Numpyの整数データ型配列が返却されます。
pandas.cut — pandas documentation

投稿2021/04/05 03:36

kirara0048

総合スコア1399

ベストアンサー

pandas.cutを使わずにヒストグラムを書くと以下のようなコードです。

python
1import matplotlib.pyplot as plt
2data = [1.5, 3.8, 2.1, 4.7, 0.8, 2.9, 3.3, 1.8, 3.5, 3.2, 2.6, 4.1, 3.6]
3plt.hist(data, bins=range(6))
4plt.xticks(range(6),range(6))
5plt.yticks(range(6),range(6))
6plt.show()

表示されるのは以下です。

何がやりたいのか理解できていないような気もしますが、第二案を作ってみました。

python
1import matplotlib.pyplot as plt
2import pandas as pd
3import random
4N = 1000
5df = pd.DataFrame({'value2':[random.gauss(2.5, 2.5) for _ in range(N)]})
6burnt = df[(1.7 < df['value2']) & (df['value2'] <= 4.0)]
7plt.hist(burnt['value2'],bins=40,range=(0,4.0))
8plt.show()

結果は以下です。

投稿2021/04/03 07:55

編集2021/04/03 10:03

ppaul

総合スコア24666

Seniorious

2021/04/03 08:11

ご回答ありがとうございます。前提・実現したいことの部分が言葉足らずで申し訳ありません。今回のプログラムで行いたかったことはlist形式の元データ(正確にはexcelファイル内の数値データをpythonで読み込んだもの)をpython上で境界条件を設定し分離、得られるデータ群の内の一つをヒストグラムとして表示することです。ご回答いただいた内容も大変参考になるのですが上記内容を実現するためのご返信もいただけると幸いです。よろしくお願いいたします。

Seniorious

2021/04/04 10:41

ご回答ありがとうございます。伝わり辛い文章で申し訳ありません。ご提示いただいた手法でやりたいことができそうです。ありがとうございました。

行動規範の内容に同意します

あなたの回答