csvデータの連結に関して

Question

### 前提・実現したいこと

MacでVSCodeを利用し、numpyのcsvデータの連結("test2.csv"と"test_data"の"PassengerId"列)を行いたいのですが、上手くいかずお分かりの方がいれば、ご教示いただけますと幸いです。

※pandas3でcsvデータの前処理をし、svmで評価を行った際にnumpyに変換されていたりとデータ形式が原因になっているかもと、下記「発生している問題・エラーメッセージ」部分に、各データの一部を記載しております。
データ形式等、状況把握に必要なものがありましたら、ご連絡いただければと思います。

### 発生している問題・エラーメッセージ

```
/Library/Frameworks/Python.framework/Versions/3.8/bin/python3 /Users/name/python/実績フォルダ/taitanic_bunseki.py
             Pclass   Age  SibSp  Parch      Fare  Sex_female  Embarked_C  Embarked_Q
PassengerId                                                                          
627               2  57.0      0      0   12.3500           0           0           1
542               3   9.0      4      2   31.2750           1           0           0
809               2  39.0      0      0   13.0000           0           0           0
604               3  44.0      0      0    8.0500           0           0           0
266               2  36.0      0      0   10.5000           0           0           0
...             ...   ...    ...    ...       ...         ...         ...         ...
38                3  21.0      0      0    8.0500           0           0           0
660               1  58.0      0      2  113.2750           0           1           0
535               3  30.0      0      0    8.6625           1           0           0
862               2  21.0      1      0   11.5000           0           0           0
586               1  18.0      0      2   79.6500           1           0           0

[418 rows x 8 columns]
             Survived
PassengerId          
627                 0
542                 0
809                 0
604                 0
266                 0
...               ...
38                  0
660                 0
535                 0
862                 0
586                 1

[418 rows x 1 columns]
             Pclass   Age  SibSp  Parch      Fare  Sex_female  Embarked_C  Embarked_Q
PassengerId                                                                          
892               3  34.5      0      0    7.8292           0           0           1
893               3  47.0      1      0    7.0000           1           0           0
894               2  62.0      0      0    9.6875           0           0           1
895               3  27.0      0      0    8.6625           0           0           0
896               3  22.0      1      1   12.2875           1           0           0
...             ...   ...    ...    ...       ...         ...         ...         ...
1305              3  21.0      0      0    8.0500           0           0           0
1306              1  39.0      0      0  108.9000           1           1           0
1307              3  38.5      0      0    7.2500           0           0           0
1308              3  21.0      0      0    8.0500           0           0           0
1309              3  21.0      1      1   22.3583           0           1           0

[418 rows x 8 columns]
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
テストデータ：             Pclass   Age  SibSp  Parch      Fare  Sex_female  Embarked_C  Embarked_Q
PassengerId                                                                          
892               3  34.5      0      0    7.8292           0           0           1
893               3  47.0      1      0    7.0000           1           0           0
894               2  62.0      0      0    9.6875           0           0           1
895               3  27.0      0      0    8.6625           0           0           0
896               3  22.0      1      1   12.2875           1           0           0
...             ...   ...    ...    ...       ...         ...         ...         ...
1305              3  21.0      0      0    8.0500           0           0           0
1306              1  39.0      0      0  108.9000           1           1           0
1307              3  38.5      0      0    7.2500           0           0           0
1308              3  21.0      0      0    8.0500           0           0           0
1309              3  21.0      1      1   22.3583           0           1           0

[418 rows x 8 columns],予測ラベル：[0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0]
正解率= 0.5909090909090909
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'PassengerId'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/name/python/実績フォルダ/taitanic_bunseki.py", line 32, in <module>
    np.concatenate([test_data1,test_data["PassengerId"]])
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'PassengerId'
```

### 該当のソースコード

```Python
from sklearn import svm
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np

#学習データとラベルを準備
train_data=pd.read_csv("train1.csv",index_col=0)
print(train_data)
train_label=pd.read_csv("train_label1.csv",index_col=0)
print(train_label)

#テストデータを準備
test_data = pd.read_csv("test1.csv",index_col=0)
print(test_data)

#アルゴリズムを指定
clf = svm.SVC(C=1, gamma=10)

#学習
clf.fit(train_data,train_label)

#テスト
test_label = clf.predict(test_data)

#テスト結果の表示
print("テストデータ：{0},予測ラベル：{1}".format(test_data,test_label))
print("正解率= {}".format(accuracy_score(train_label, test_label)))

#csvデータの連結
np.savetxt("test2.csv",test_label,fmt="%.0f",header="Survived",comments="")
test_data1 = pd.read_csv("test2.csv")
np.concatenate([test_data1,test_data["PassengerId"]])
```

Accepted Answer

何故わざわざ numpy配列 で結合する必要があるのかがよくわかりません。

テストデータを
```Python
#【テストデータを準備】
test_data = pd.read_csv("test1.csv",index_col=0)
```
のように読み込んで、
```Python
#【テスト】
test_label = clf.predict(test_data)
```
のように結果を得たのであれば、そのまま
```Python
# テストデータのDataFrameに結果を格納
test_data['Survived'] = test_label
# 上のDataFrameより、Index('PassengerId')と結果('Survived')のみを取得(Series型)
res = test_data['Survived']
print(res)
```
で良いのではないでしょうか。
結果(res)をSeriesデータではなく配列でほしいのであれば
```Python
res = test_data['Survived'].reset_index().values
print(res)
```
となります

---
**補足**

```Python
from sklearn import svm
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np

#学習データとラベルを準備
train_data=pd.read_csv("train1.csv",index_col=0)
print(train_data)
train_label=pd.read_csv("train_label1.csv",index_col=0)
print(train_label)

#テストデータを準備
test_data = pd.read_csv("test1.csv",index_col=0)
print(test_data)

#アルゴリズムを指定
clf = svm.SVC(C=1, gamma=10)

#学習
clf.fit(train_data,train_label)

#テスト
test_label = clf.predict(test_data)

#テスト結果の表示
print("テストデータ：{0},予測ラベル：{1}".format(test_data,test_label))
print("正解率= {}".format(accuracy_score(train_label, test_label)))

#テストデータにテスト結果を結合
test_data['Survived'] = test_label

#提案１：単にCSVに吐き出したいならばこれで良い
test_data['Survived'].to_csv('out.csv')
#提案２：Indexと結果を結合した結果の配列を得たいのであればこうなる
data = test_data['Survived'].reset_index().values
print(data)
```

Answer

まずはMDを勉強しましょう。

前提・実現したいこと

発生している問題・エラーメッセージ

該当のソースコード

関連した質問