python、KMeansを用いて、クラスタリングを行おうとすると、Key Error となり、苦慮しております。

python、KMeansを用いて、クラスタリングを行おうとすると、Key Error となり、苦慮しております。

こちらが、今回 Key Errorとなりますソースコードでございます。

# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np
import matplotlib.lines as mlines
import codecs
from sklearn.cluster import KMeans

f1 = codecs.open('claster_panda_1_1.csv', 'w', 'utf-8')

# データセットを読み込み
#読み込ませるデータ
cust_df = pd.read_csv("odds_test_1_1.csv" , sep=",")

# Pandas のデータフレームから Numpy の行列 (Array) に変換
cust_array = np.array([cust_df['temp(0)'].tolist(),
                       cust_df['temp(1)'].tolist(),
                       cust_df['temp(2)'].tolist(),
                       cust_df['temp(3)'].tolist(),
                       cust_df['temp(4)'].tolist(),
                       cust_df['temp(5)'].tolist(),
                       cust_df['temp(6)'].tolist(),
                       cust_df['temp(7)'].tolist(),
                       cust_df['temp(8)'].tolist(),
                       cust_df['temp(9)'].tolist(),
                       cust_df['temp(10)'].tolist(),
                       cust_df['temp(11)'].tolist(),
                       cust_df['temp(12)'].tolist(),
                       
                       ], np.int32)

# 行列を転置
cust_array = cust_array.T

# クラスタ分析を実行 (クラスタ数=4)
pred = KMeans(n_clusters=4).fit_predict(cust_array)
print pred

読み込ませるデータ、odds_test_1_1.csv は、こちらでございます。

temp(0)	temp(1)	temp(2)	temp(3)	temp(4)	temp(5)	temp(6)	temp(7)	temp(8)	temp(9)	temp(10)	temp(11)	temp(12)	arrived
0	0	4	8.1	13.1	12.3	9.1	9.2	6.4	6.6	6.3	6.5	6.9	1
0	6.8	3.7	9.9	16	7	5.3	4.9	5.1	5.1	5.2	4.7	4.7	1
0	3.4	30	61.4	27.8	11.5	11.7	11.9	12.8	13.4	14	14.5	15.6	0
0	3.4	25	9.1	48	38	20.4	17.7	18.3	15	14.6	14.2	14.9	0
0	6.8	9.9	19.7	12.8	13.6	14.3	14	14.2	15	14.3	14.5	15.3	1
0	3.4	25	34.5	156.1	107.4	84.6	59.3	63.2	67.7	65.7	67.2	69.2	0
0	0	37.6	92.2	198.7	137.7	125.3	99.2	90.1	93.8	92.4	93.8	90.9	0
0	0	5.5	12	27.9	28.4	26.9	25.9	27.5	27.4	27.8	27.4	28.7	0
0	0	4.3	1.4	1.1	1.3	1.5	1.7	1.8	1.8	1.8	1.8	1.8	0
0	0	50.1	61.4	118.1	57	58.7	32.5	35.1	38.3	39.5	40.3	42.9	0
0	0	25.6	88.4	88.4	47.1	76.8	70.6	70.6	76.7	81.4	78.9	80.2	0

エラーメッセージは、 KeyError: 'temp(0)' でございます。

ラベル名を変更してもエラーのままでございました。

先輩方の御教示、よろしくお願いいたします。

行動規範の内容に同意します

回答3件

入力ファイルodds_test_1_1.csvは、厳密に云えばCSV(comma-separated values)ファイルではありません。
前回の質問と同じくセパレータ指定をカンマ,ではなく任意個のスペース\s*としてください。
参考：データを抽出する時に、"ValueError：labels ['arrival'] not contained in axis というエラーが発生して苦慮しております！

Python
1# 略
2cust_df = pd.read_csv("odds_test_1_1.csv" , sep=r"\s*")
3# 略
4print pred # [0 0 0 0 0 1 3 0 0 1 2]

投稿2017/07/29 03:28

8524ba23

総合スコア38352

akakage13

2017/07/29 03:53

can110様、御教示ありがとうございました。今後とも、よろしくお願いいたします。

行動規範の内容に同意します

ベストアンサー

python
1cust_array = np.array([cust_df['temp(0)'].tolist(),
2                       cust_df['temp(1)'].tolist(),
3                       cust_df['temp(2)'].tolist(),

の部分は

python
1cust_array = cust_df.as_matrix().astype(np.int)

でよいでしょう。

KeyError になるのは cust_df['temp(0)', axis=1] と指定しなければならないからだと思います。

投稿2017/07/29 02:22

MasashiKimura

総合スコア1150

akakage13

2017/07/29 03:51

MasashiKimura様、いつもありがとうございます。何行にもおよぶソースコードが、たった1行で問題解決いたしました。感激です、今後とも、よろしくお願いいたします。

行動規範の内容に同意します

odds_test_1_1.csvの内容がCSV形式になっていませんので、CSV形式に直しましょう。

今の形式：

temp(0)    temp(1)    temp(2)    temp(3)    temp(4)    temp(5)    temp(6)    temp(7)    temp(8)    temp(9)    temp(10)    temp(11)    temp(12)    arrived
0    0    4    8.1    13.1    12.3    9.1    9.2    6.4    6.6    6.3    6.5    6.9    1
0    6.8    3.7    9.9    16    7    5.3    4.9    5.1    5.1    5.2    4.7    4.7    1

CSV形式:

temp(0),temp(1),temp(2),temp(3),temp(4),temp(5),temp(6),temp(7),temp(8),temp(9),temp(10),temp(11),temp(12),arrived
0,0,4,8.1,13.1,12.3,9.1,9.2,6.4,6.6,6.3,6.5,6.9,1
0,6.8,3.7,9.9,16,7,5.3,4.9,5.1,5.1,5.2,4.7,4.7,1

追記（2017/07/29 13:00）

BA決まってしまったあとですが、一応コメントに書きました私の環境での検証結果を貼っておきます。

(pandas) yukke@yukke-main:~/tmp$ cat data.csv
temp(0),temp(1),temp(2),temp(3),temp(4),temp(5),temp(6),temp(7),temp(8),temp(9),temp(10),temp(11),temp(12),arrived
0,0,4,8.1,13.1,12.3,9.1,9.2,6.4,6.6,6.3,6.5,6.9,1
0,6.8,3.7,9.9,16,7,5.3,4.9,5.1,5.1,5.2,4.7,4.7,1
0,3.4,30,61.4,27.8,11.5,11.7,11.9,12.8,13.4,14,14.5,15.6,0
0,3.4,25,9.1,48,38,20.4,17.7,18.3,15,14.6,14.2,14.9,0
0,6.8,9.9,19.7,12.8,13.6,14.3,14,14.2,15,14.3,14.5,15.3,1
0,3.4,25,34.5,156.1,107.4,84.6,59.3,63.2,67.7,65.7,67.2,69.2,0
0,0,37.6,92.2,198.7,137.7,125.3,99.2,90.1,93.8,92.4,93.8,90.9,0
0,0,5.5,12,27.9,28.4,26.9,25.9,27.5,27.4,27.8,27.4,28.7,0
0,0,4.3,1.4,1.1,1.3,1.5,1.7,1.8,1.8,1.8,1.8,1.8,0
0,0,50.1,61.4,118.1,57,58.7,32.5,35.1,38.3,39.5,40.3,42.9,0
0,0,25.6,88.4,88.4,47.1,76.8,70.6,70.6,76.7,81.4,78.9,80.2,0
(pandas) yukke@yukke-main:~/tmp$ python
Python 3.5.4rc1 (default, Jul 25 2017, 08:53:34) 
[GCC 6.4.0 20170704] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> df = pd.read_csv("data.csv")
>>> df["temp(0)"].tolist()
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> 
(pandas) yukke@yukke-main:~/tmp$

投稿2017/07/29 02:05

編集2017/07/29 04:00

yukkeorg

総合スコア985

akakage13

2017/07/29 02:24

yukkeorg様、早々の御教示ありがとうございます。小生の表示の仕方が悪いためと思われますが、odds_test_1_1.csvは、csv形式になっていると思います。他のcsvファイルでは上記のソースコードは動くことも確認してございます。他のcsvファイルのラベル名は、例えば、weather,weight 等でございます。ですので、csv形式になっているという前提で、再度、御教示いただけますと幸いです。よろしくお願いいたします。