質問編集履歴

質問内容の明確化

2023/03/16 13:21

投稿

minami_min

スコア3

title CHANGED Viewed

	@@ -1,1 +1,1 @@
1	- ~~時系列~~クラスタリング ~~K-shape法~~のやり方
1	+ K-shapeでのクラスタリング後の可視化のやりかた

body CHANGED Viewed

@@ -1,44 +1,66 @@
 ### 実現したいこと
-顧客別の時系列売上データを波形shapeでクラスタリングをし、
+波形クラスタリングをしたあと、以下の画像のようにクラスタ別に折れ線グラフを作成したいです。
-売上推移の形でグループを作成したい。
+![イメージ説明](https://ddjkaamml8q8x.cloudfront.net/questions/2023-03-16/7c05befb-dd92-426b-9c5e-95930e01802e.png)
+参考にしているもの↓
+[リンク内容](url)https://tslearn.readthedocs.io/en/stable/auto_examples/clustering/plot_kshape.html#sphx-glr-auto-examples-clustering-plot-kshape-py
-### 詳細
+### 実施したこと
 <データフレーム>
-取引日時	A	B	C	D	E	F	G	H	I	J
+顧客番号	1	2	3	4	5	6	7	8	9	10	11	12
-2022年1月	2,008	4,020	4,500	8,000	7,800	2,800	2,202	6,382	2,204	5,972
+A	100	200	300	400	500	600	700	800	900	1000	1100	1200
-2022年2月	2,247	1,329	5,000	7,500	6,500	4,000	4,611	2,618	5,792	1,026
-2022年3月	6,153	6,812	5,600	6,000	3,000	6,000	5,732	1,717	7,243	6,418
-2022年4月	4,446	6,505	4,999	4,000	2,000	8,000	5,188	2,095	1,197	2,455
-2022年5月	5,660	6,093	7,000	5,000	1,000	9,000	5,059	7,638	7,688	6,728
-2022年6月	8,000	1,926	9,000	3,900	500	5,000	2,309	2,985	4,277	7,439
-2022年7月	9,000	5,488	12,000	2,700	900	2,000	4,130	4,242	4,482	3,483
-2022年8月	17,000	6,247	13,000	3,000	1,200	3,000	2,144	1,710	4,605	6,410
+B	8000	7000	5000	4000	3500	3000	2000	1000	700	400	200	100
-2022年9月	8,000	2,629	14,000	1,500	3,000	1,000	4,705	6,188	3,368	3,970
+C	500	900	1200	1500	2000	2300	2600	3000	3400	4700	5600	8000
-2022年10月	9,900	4,545	15,000	500	5,000	6,000	1,328	1,802	7,008	7,543
+D	13000	12000	10000	8000	6000	5000	4800	4200	3000	2800	2000	1700
-2022年11月	14,000	1,403	17,000	1,000	6,000	8,000	3,605	1,145	6,544	1,945
+E	8000	10000	12000	13000	13700	20000	21000	24000	29000	31000	34000	70000
-2022年12月	26,000	6,031	20,000	800	9,000	10,000	4,191	6,628	3,989	7,572
+F	58000	48000	40000	20000	18000	17000	13000	10000	4800	3000	2400	1200
-※方法をご教授頂きたく思っていますので、金額は変えて頂いてもかまいません。
-最終的には、長期的に上昇しているクラスタを可視化したいです。
-### 試したこと
 ```
+#クラスター作成
 import pandas as pd
+import numpy as np
+df = pd.read_csv('K-shape.csv',encoding='shift-jis')
+df_1 = df.drop('顧客番号',axis=1)
 from tslearn.clustering import KShape
-import numpy as np
+from tslearn.preprocessing import TimeSeriesScalerMeanVariance
+#標準化
+df_2 = TimeSeriesScalerMeanVariance().fit_transform(df_1)
+ks = KShape(n_clusters=2, verbose=False, random_state=0)
-import matplotlib.pyplot as plt
+y_pred = ks.fit_predict(df_2)
-%matplotlib inline
+df['Cluster'] = y_pred
 ```
+解決したいグラフ化の部分
 ```
+import matplotlib.pyplot as plt
+plt.figure()
-df = pd.read_excel（・・・・）
+for i in range(2):
+    plt.subplot(3, 1, 1 + i)
-ks = KShape(n_clusters=4)
+    for z in df['Cluster' == i]:
+        plt.plot(xx.ravel(), "k-", alpha=.2)
+    plt.plot(ks.cluster_centers_[i].ravel(), "r-")
+    plt.xlim(0, df.shape[1])
+    plt.ylim(-10, 10)
+    plt.title("Cluster %d" % (i + 1))
-rs = ks.fit_predict(df)
+plt.tight_layout()
+plt.show()
 ```
 ### エラー
 ```
+KeyError: False
-TypeError: float() argument must be a string or a number, not 'Timestamp'
+The above exception was the direct cause of the following exception:
 ```
-### 補足情報（FW/ツールのバージョンなど）
+### 補足情報
-可能でしたら、TimeSeriesKMeansを活用したパターンもご教授頂けますと幸いです。
+  for z in df['Cluster' == i]:　がエラー原因なのかと思い、いろいろ調べたのですが、Cluster数は２つ（0,1)でrangeも(2)にしているので、あっているような気がしており、手詰まりになってしまいました。
-また、違いがあまりピンとこないので、何が異なるのか補足頂けますとなお助かります。

Jupyter Python