python K-meansクラスタリングのエラー　※ご教示お願い致します。

概要

シミュレーションデータの元データからある条件以上を抽出し←「こちらは完了している」

そのデータをクラスタリングして訪問先の抽出を行おうとしているのですがここでエラーが起きてしまいました

実現したいこと

正しいデータを導き出したい

発生している問題・エラーメッセージ

python
1エラーメッセージ
2
3
4C:\datasyori>python hoge.py
5    latitude   longitude
60  35.693590  139.712202
71  35.693497  139.712096
82  35.693217  139.712261
93  35.693549  139.712430
104  35.693621  139.712501
11Traceback (most recent call last):
12  File "C:\Users\mable\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 769, in _validate_tuple_indexer
13    self._validate_key(k, i)
14  File "C:\Users\mable\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 1378, in _validate_key
15    raise ValueError(f"Can only index by location with a [{self._valid_types}]")
16ValueError: Can only index by location with a [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array]
17
18The above exception was the direct cause of the following exception:
19
20Traceback (most recent call last):
21  File "C:\datasyori\hoge.py", line 95, in <module>
22    Cn = C.iloc[Tn,0]
23  File "C:\Users\mable\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 961, in __getitem__
24    return self._getitem_tuple(key)
25  File "C:\Users\mable\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 1458, in _getitem_tuple
26    tup = self._validate_tuple_indexer(tup)
27  File "C:\Users\mable\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 771, in _validate_tuple_indexer
28    raise ValueError(
29ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types
30
31

該当のソースコード

python
1ソースコード
2
3#1回目の訪問先の抽出
4
5from matplotlib import pyplot as plt
6from sklearn import datasets, preprocessing
7from sklearn.cluster import KMeans
8import numpy as np
9import pandas as pd
10import cartopy.crs as ccrs
11import cartopy.io.shapereader as shpreader
12
13pd.set_option('display.max_rows',600)
14#前処理後のcsvを読み込み
15yomi=pd.read_csv("simulationkai.csv")
16df=pd.read_csv("simulationkai.csv",usecols=["longitude","latitude"])
17
18# DataFrameに変換
19print(df.head())
20# データの整形
21X = df
22
23 
24# クラスタリング
25cls = KMeans(n_clusters=4)
26
27result = cls.fit(X)
28X['cluster'] = result.labels_
29PC= pd.DataFrame(X['cluster'])
30PC
31df.head()
32#yomiのデータフレームにcluster(クラスタ番号)を追加する
33yomi['cluster_id']=PC
34yomi
35
36#yomi(元データにクラスタ番号を追加したもの)をallclsdata.csvに保存する
37yomi.to_csv("allclsdata.csv")
38
39D = X.sort_values(by="cluster")
40D = D.drop_duplicates(subset='cluster')
41D
42# 各クラスタ内のデータ数をカウント
43V = X['cluster'].value_counts()
44V
45# 各クラスタの番号とデータ数をclsvalue.csvに保存
46V.to_csv("clsvalue.csv")
47
48
49# クラスタの重心確認
50C = pd.DataFrame(result.cluster_centers_)
51C
52
53C.iloc[0, :]
54
55
56lat= X['latitude'].tolist()
57lon= X['longitude'].tolist()
58
59clat=C[0].tolist()
60clon=C[1].tolist()
61
62
63#から1800までのクラスタのデータからそれぞれ同じ被験者の重複を省いてデータ人数を取得して順番にCSVにまとめる
64from csv import writer
65#pp = pd.DataFrame
66#ppi =  pd.DataFrame
67#yomiからWhile文でN番目のクラスタのデータのみ抽出
68i = 0
69while i <= 3:
70  yomic = yomi[yomi['cluster_id']== i]
71#N番目のクラスタdfから被験者idの重複を消す
72  yomics = yomic.drop_duplicates(subset=["id_questionnaire"])
73#N番目の加工後データの行数をCSVに付け足す
74  #file = [i,len(yomics)]
75  #ppi = pp.append([file], ignore_index=True)
76  #ppi.to_csv("pp.csv")
77  list_data=[i,len(yomics)]
78  with open('pp.csv', 'a', newline='') as f_object:  
79   writer_object = writer(f_object)
80   writer_object.writerow(list_data)  
81   f_object.close()
82  i = i + 1
83#else:
84  #ppi.to_csv("pp.csv") 
85
86#pp.csv内の人数を降順にしたものをpps.csvに保存
87PP = pd.read_csv("pp.csv",names=["cls","people"])
88T = PP.sort_values(by=["people"],ascending=False)
89T.to_csv("pps.csv")
90PP.to_csv("pp.csv")
91
92#pps.csvの上から順番にクラスタ番号を引き出してその番号の座標をCから引き出す
93num = 0
94while num <= 3:
95  Tn = T.iloc[num,0]
96  #Tno = Tn + 1
97  Cn = C.iloc[Tn,0]
98  Cn2 = C.iloc[Tn,1]
99  list_data2=[Tn,Cn,Cn2]
100  with open('point.csv', 'a', newline='') as f_object:  
101   writer_object = writer(f_object)
102   writer_object.writerow(list_data2)  
103   f_object.close()
104  num = num + 1  
105
106dfh = pd.read_csv("point.csv",names=["cluster_id","latitude","longitude"])
107B = pd.read_csv("pps.csv",usecols=["people"])
108#dfh2= pd.DataFrame(B['people'])
109dfh['people']= B
110dfh.to_csv("point.csv")

試したこと

勉強不足でエラーの内容が全く分かっていません

補足情報（FW/ツールのバージョンなど）

Python 3.10.4 (tags/v3.10.4:9d38120, Mar 23 2022, 23:13:41) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

ここにより詳細な情報を記載してください。

8524ba23

2022/10/18 04:49

pps.csvにはたしかに「#people」という（先頭に「#」がついた）列は存在しますか？

mable

2022/10/18 04:58

あ、すみませんありませんでした。そこをpeopleに直した場合 >>> #pp.csv内の人数を降順にしたものをpps.csvに保存 >>> PP = pd.read_csv("pp.csv",names=["cls","people"]) >>> T = PP.sort_values(by=["people"],ascending=False) >>> T.to_csv("pps.csv") >>> PP.to_csv("pp.csv") >>> >>> #pps.csvの上から順番にクラスタ番号を引き出してその番号の座標をCから引き出す >>> num = 0 >>> while num <= 3: ... Tn = T.iloc[num,0] ... #Tno = Tn + 1 ... Cn = C.iloc[Tn,0] ... Cn2 = C.iloc[Tn,1] ... list_data2=[Tn,Cn,Cn2] ... with open('point.csv', 'a', newline='') as f_object: ... writer_object = writer(f_object) ... writer_object.writerow(list_data2) ... f_object.close() ... num = num + 1 ... Traceback (most recent call last): File "C:\Users\mable\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 769, in _validate_tuple_indexer self._validate_key(k, i) File "C:\Users\mable\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 1378, in _validate_key raise ValueError(f"Can only index by location with a [{self._valid_types}]") ValueError: Can only index by location with a [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 4, in <module> File "C:\Users\mable\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 961, in __getitem__ return self._getitem_tuple(key) File "C:\Users\mable\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 1458, in _getitem_tuple tup = self._validate_tuple_indexer(tup) File "C:\Users\mable\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 771, in _validate_tuple_indexer raise ValueError( ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types >>> dfh = pd.read_csv("point.csv",names=["cluster_id","latitude","longitude"]) >>> B = pd.read_csv("pps.csv",usecols=["people"]) >>> #dfh2= pd.DataFrame(B['people']) >>> dfh['people']= B >>> dfh.to_csv("point.csv") B is not defineはきえました。

8524ba23

2022/10/18 05:08

あらたなエラー文は質問本文のほうに記載ください。またエラー発生行は「dfh['people']= B」でしょうか？

mable

2022/10/18 05:17

はい更新いたしました

8524ba23

2022/10/18 05:53 編集

エラーメッセージからコードのどの行でエラーが発生しているか把握できないのですがエラーが発生している行は「dfh['people']= B」でしょうか？あるいは「Tn = T.iloc[num,0]」でしょうか？また、コードはどのような手段で実行しているでしょうか？ - 提示コードをhoge.pyなど.pyファイルに保存して「python hoge.py」などと実行している - （>>>が表示される）対話環境で、逐次実行している - Jupyter Notebook環境でセル単位で実行している

mable

2022/10/18 05:53

すみません自分の勉強不足でこれがどこのエラー行なのか把握できていません ValueError: Can only index by location with a [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] と raise ValueError( ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types がエラーであることはわかるのですが、、何が起こっているのかを理解できていませんコードの実行はまずExcelに元データがありそれをCドライブに保存してあります。そしてコマンドプロンプトからpythonを起動しメモ帳にコードを書いてコピーしコマンドプロンプトにペーストしております。

8524ba23

2022/10/18 06:02

> pythonを起動しメモ帳にコードを書いてコピーしコマンドプロンプトにペーストしております。なるほど。対話環境で実行していますね。それだとおそらく複数エラーが発生してしまうのでメモ帳で書いたコードを「hoge（名前はなんでもよい）.py」というファイル名で保存してコマンドプロンプト上で「python hoge.py」として実行してください。そうすると最初のエラー場所で処理が止まります。

mable

2022/10/18 06:08

わかりましたありがとうございます！もう一度更新いたします！

8524ba23

2022/10/18 06:29

TやCなどのデータ読込部分？のコードが抜けているようです。

行動規範の内容に同意します

回答1件

ベストアンサー

Cn = C.iloc[Tn,0]においてTnはint(やboolなど)でなければならないのにfloatやstr型などそれ以外だと提示エラーが発生します。
print(Tn, type(Tn))などを実行してみて意図通りのint型の値が取得できているか確認してください。
なお、Tnの値がCの（文字型などの）インデクス値である場合は.ilocの代わりに.locを利用することができます。

Python
1import pandas as pd
2
3C = pd.DataFrame({'idx':list('abc'), 'val':[11,22,33]})
4C = C.set_index('idx', drop=True)
5print(C)
6#     val
7#idx
8#a     11
9#b     22
10#c     33
11
12for Tn in [1]:
13    print(Tn, type(Tn))
14    Cn = C.iloc[Tn,0]
15    print(Cn)
16
17for Tn in ['b']:
18    print(Tn, type(Tn))
19    Cn = C.loc[Tn,'val']
20    print(Cn)