前提・実現したいこと
MacでVSCodeを利用し、numpyのcsvデータの連結("test2.csv"と"test_data"の"PassengerId"列)を行いたいのですが、上手くいかずお分かりの方がいれば、ご教示いただけますと幸いです。
※pandas3でcsvデータの前処理をし、svmで評価を行った際にnumpyに変換されていたりとデータ形式が原因になっているかもと、下記「発生している問題・エラーメッセージ」部分に、各データの一部を記載しております。
データ形式等、状況把握に必要なものがありましたら、ご連絡いただければと思います。
発生している問題・エラーメッセージ
/Library/Frameworks/Python.framework/Versions/3.8/bin/python3 /Users/name/python/実績フォルダ/taitanic_bunseki.py Pclass Age SibSp Parch Fare Sex_female Embarked_C Embarked_Q PassengerId 627 2 57.0 0 0 12.3500 0 0 1 542 3 9.0 4 2 31.2750 1 0 0 809 2 39.0 0 0 13.0000 0 0 0 604 3 44.0 0 0 8.0500 0 0 0 266 2 36.0 0 0 10.5000 0 0 0 ... ... ... ... ... ... ... ... ... 38 3 21.0 0 0 8.0500 0 0 0 660 1 58.0 0 2 113.2750 0 1 0 535 3 30.0 0 0 8.6625 1 0 0 862 2 21.0 1 0 11.5000 0 0 0 586 1 18.0 0 2 79.6500 1 0 0 [418 rows x 8 columns] Survived PassengerId 627 0 542 0 809 0 604 0 266 0 ... ... 38 0 660 0 535 0 862 0 586 1 [418 rows x 1 columns] Pclass Age SibSp Parch Fare Sex_female Embarked_C Embarked_Q PassengerId 892 3 34.5 0 0 7.8292 0 0 1 893 3 47.0 1 0 7.0000 1 0 0 894 2 62.0 0 0 9.6875 0 0 1 895 3 27.0 0 0 8.6625 0 0 0 896 3 22.0 1 1 12.2875 1 0 0 ... ... ... ... ... ... ... ... ... 1305 3 21.0 0 0 8.0500 0 0 0 1306 1 39.0 0 0 108.9000 1 1 0 1307 3 38.5 0 0 7.2500 0 0 0 1308 3 21.0 0 0 8.0500 0 0 0 1309 3 21.0 1 1 22.3583 0 1 0 [418 rows x 8 columns] /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True) テストデータ: Pclass Age SibSp Parch Fare Sex_female Embarked_C Embarked_Q PassengerId 892 3 34.5 0 0 7.8292 0 0 1 893 3 47.0 1 0 7.0000 1 0 0 894 2 62.0 0 0 9.6875 0 0 1 895 3 27.0 0 0 8.6625 0 0 0 896 3 22.0 1 1 12.2875 1 0 0 ... ... ... ... ... ... ... ... ... 1305 3 21.0 0 0 8.0500 0 0 0 1306 1 39.0 0 0 108.9000 1 1 0 1307 3 38.5 0 0 7.2500 0 0 0 1308 3 21.0 0 0 8.0500 0 0 0 1309 3 21.0 1 1 22.3583 0 1 0 [418 rows x 8 columns],予測ラベル:[0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] 正解率= 0.5909090909090909 Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'PassengerId' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/name/python/実績フォルダ/taitanic_bunseki.py", line 32, in <module> np.concatenate([test_data1,test_data["PassengerId"]]) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 2800, in __getitem__ indexer = self.columns.get_loc(key) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'PassengerId'
該当のソースコード
Python
1from sklearn import svm 2from sklearn.metrics import accuracy_score 3import pandas as pd 4import numpy as np 5 6#学習データとラベルを準備 7train_data=pd.read_csv("train1.csv",index_col=0) 8print(train_data) 9train_label=pd.read_csv("train_label1.csv",index_col=0) 10print(train_label) 11 12#テストデータを準備 13test_data = pd.read_csv("test1.csv",index_col=0) 14print(test_data) 15 16#アルゴリズムを指定 17clf = svm.SVC(C=1, gamma=10) 18 19#学習 20clf.fit(train_data,train_label) 21 22#テスト 23test_label = clf.predict(test_data) 24 25#テスト結果の表示 26print("テストデータ:{0},予測ラベル:{1}".format(test_data,test_label)) 27print("正解率= {}".format(accuracy_score(train_label, test_label))) 28 29#csvデータの連結 30np.savetxt("test2.csv",test_label,fmt="%.0f",header="Survived",comments="") 31test_data1 = pd.read_csv("test2.csv") 32np.concatenate([test_data1,test_data["PassengerId"]])
回答2件
あなたの回答
tips
プレビュー
バッドをするには、ログインかつ
こちらの条件を満たす必要があります。
2020/03/30 09:30 編集
2020/03/30 10:02 編集
2020/03/30 10:27
2020/03/30 10:51
2020/03/30 14:38 編集