クラスタリングで'list index out of range'エラー

前提・実現したいこと

Kaggleの'New York City Airbnb Open Data'で
クラスタリングを試みたのですが、以下のエラーが起きて
描画がぐちゃぐちゃになってしまいました。

発生している問題・エラーメッセージ

IndexError                                Traceback (most recent call last)
<ipython-input-23-b90848bd8445> in <module>
      5 colors = ['blue', 'red', 'green']
      6 for i, data in data_sub.groupby('cluster'):
----> 7     ax = data.plot.scatter(x='feature1', y='feature2', color=colors[i],
      8                           label=f'cluster{i}', ax=ax)

IndexError: list index out of range

該当のソースコード

Python
1#データの絞り込み
2data_sub = data[['latitude', 'longitude', 'price', 'minimum_nights', 'number_of_reviews',
3                 'reviews_per_month', 'calculated_host_listings_count', 'availability_365']]
4
5#Initialise KMeans class
6kmeans = KMeans(init='random', n_clusters=3)
7
8#calculate the centroid of clusters
9kmeans.fit(data_sub)
10
11#predict clustering number
12y_pred = kmeans.predict(data_sub)
13
14data_sub.columns = ['feature1', 'feature2', 'feature3', 'feature4', 'feature5', 'feature6',
15                    'feature7', 'cluster']
16
17ax = None
18colors = ['blue', 'red', 'green']
19for i, data in data_sub.groupby('cluster'):
20    ax = data.plot.scatter(x='feature1', y='feature2', color=colors[i],
21                          label=f'cluster{i}', ax=ax)

要素の数とアクセスしたい数があっていないのが原因だと思うのですが、
その場合はxとyをいじればいいのか、
よく解決方法がわかりません。
何卒ご教示のほど、宜しくお願い致します。

LouiS0616

2019/09/30 12:50

とりあえず色数を増やしてみては？

meg_

2019/09/30 13:16

'cluster'はいくつ（何種類）あるのですか？

Pablito

2019/10/02 12:41

>LouiS0616さん質問ありがとうございます。色数を増やしても同様のエラーが起きてしまいました。 >meg_さん質問ありがとうございます。 clusterは６個あります。

行動規範の内容に同意します

回答1件

python
1data_sub.columns = ['feature1', 'feature2', 'feature3', 'feature4', 'feature5', 'feature6',
2                    'feature7', 'cluster']

列名を変えてもあまり意味はありません。まず必要な操作は、クラスタリングで得たy_predをデータフレームに追加することになります。

python
1data_sub_result = data_sub.copy()
2data_sub_result.columns = ['feature1', 'feature2', 'feature3', 'feature4', 'feature5', 'feature6', 'feature7', 'feature8']
3data_sub_result['cluster'] = y_pred

あとは基本的には同様にできるかと。

投稿2019/09/30 16:03

hayataka2049

総合スコア30939

Pablito

2019/10/02 12:38

ご回答ありがとうございます。やってみたのですが、カラムにない'cluster'を使ったことで KeyError: 'cluster'出てしまいました。 'feature8'と書き換えても同様のエラーが起きてしまいます。この場合はどのように対処すればよいでしょうか？

hayataka2049

2019/10/02 13:28

私の回答の方法ではdata_sub_resultにclusterというカラムができるはずです。

Pablito

2019/10/08 13:53

失礼致しました。実行したところ --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-22-93c9572cfadb> in <module> 6 colors = ['blue', 'red', 'green', 'purple', 'white', 'orange'] 7 for i, data in data_sub.groupby('cluster'): ----> 8 ax = data.plot.scatter(x='feature1', y='feature2', color=colors[i], 9 label=f'cluster{i}', ax=ax) IndexError: list index out of range というエラーが起きてしまいました、、、

hayataka2049

2019/10/08 15:16

iの値を確認してみるといいのでは。あと、clusterってなんですか？

行動規範の内容に同意します