機械学習でエラーの理由がわからない

機械学習初心者です。簡単に参考書のコードを見本にcsvデータを読み込んで8割学習して2割を予測し、正解率を出すといったコードを書いていて、自分なりにコードをかいたのですが、エラーが起きてしまいます。添削と初心者です。簡単に参考書のコードを見本にcsvデータを読み込んで8割学習して2割を予測し、正解率を出すといったコードを書いていて、自分なりにコードをかいたのですが、エラーが起きてしまいます。添削といいますか、どこがエラーを引き起こしているのかを教えていただけると幸いです。ちなみに参考書は”pythonによるAI,機械学習,深層学習アプリのつくり方”というもので、エディタにはJupyterNotebookを使っています。

csvファイルです（study.csv）

,,
,subject,result
,Japanese,55
,math,47
,society,82
,science,69
,English,93
,home economics,74
,imformation,69

python
1import pandas as pd
2from sklearn.model_selection import train_test_split
3from sklearn.svm import SVC
4from sklearn.metrics import accuracy_score
5
6test_data = pd.read_csv("study.csv", encoding="Shift_JIS")
7
8y = test_data.loc[:,"subject"]
9x = test_data.loc[:, "result"]
10
11x_train, x_test, y_train, y_test = train_test_split(test_data, test_size = 0.2, train_size = 0.8, shuffle = True)
12
13clf = SVC()
14clf.fit(x_train, y_train)
15
16y_pred = clf.predict(x_test)
17print("正解率 =", accuracy_score(y_test, y_pred))

エラー内容です

KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2894             try:
-> 2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'subject'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-16-9c5b1e68579f> in <module>
      6 test_data = pd.read_csv("study.csv", encoding="Shift_JIS")
      7 
----> 8 y = test_data.loc[:,"subject"]
      9 x = test_data.loc[:, "result"]
     10 

~\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
    871                     # AttributeError for IntervalTree get_value
    872                     pass
--> 873             return self._getitem_tuple(key)
    874         else:
    875             # we by definition only have the 0th axis

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
   1042     def _getitem_tuple(self, tup: Tuple):
   1043         try:
-> 1044             return self._getitem_lowerdim(tup)
   1045         except IndexingError:
   1046             pass

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_lowerdim(self, tup)
    784                 # We don't need to check for tuples here because those are
    785                 #  caught by the _is_nested_tuple_indexer check above.
--> 786                 section = self._getitem_axis(key, axis=i)
    787 
    788                 # We should never have a scalar section here, because

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1108         # fall thru to straight lookup
   1109         self._validate_key(key, axis)
-> 1110         return self._get_label(key, axis=axis)
   1111 
   1112     def _get_slice_axis(self, slice_obj: slice, axis: int):

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _get_label(self, label, axis)
   1057     def _get_label(self, label, axis: int):
   1058         # GH#5667 this will fail if the label is not present in the axis.
-> 1059         return self.obj.xs(label, axis=axis)
   1060 
   1061     def _handle_lowerdim_multi_index_axis0(self, tup: Tuple):

~\anaconda3\lib\site-packages\pandas\core\generic.py in xs(self, key, axis, level, drop_level)
   3483 
   3484         if axis == 1:
-> 3485             return self[key]
   3486 
   3487         index = self.index

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2900             if self.columns.nlevels > 1:
   2901                 return self._getitem_multilevel(key)
-> 2902             indexer = self.columns.get_loc(key)
   2903             if is_integer(indexer):
   2904                 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:
-> 2897                 raise KeyError(key) from err
   2898 
   2899         if tolerance is not None:

KeyError: 'subject'

jbpb0

2021/04/02 00:03

test_data = pd.read_csv(... のすぐ下に、 print(test_data) を追加して実行して、その結果表示を見て、考えてみてください

行動規範の内容に同意します

回答1件

ベストアンサー

KeyError: 'subject'

上記のエラーはCSV読み込み部分を次のように修正すれば解消されます。

Python
1test_data = pd.read_csv("study.csv", encoding="Shift_JIS", header=1)

機械学習とは関係ないエラーでしたので、とりあえず回答します。
この修正がやりたいことと一致しているかまでは考慮していませんし、CSVを読み込んだ後の処理までは見ていません。
とりあえず、今のエラーは解消されます、という回答です。

投稿2021/04/01 14:09

takutakuya

総合スコア979

あなたの回答

tips

プレビュー

行動規範の内容に同意します

質問の解決につながる回答をしましょう。サンプルコードなど、より具体的な説明があると質問者の理解の助けになります。また、読む側のことを考えた、分かりやすい文章を心がけましょう。

15分調べてもわからないことは
teratailで質問しよう！

ただいまの回答率
85.35%

質問をまとめることで
思考を整理して素早く解決

テンプレート機能で
簡単に質問をまとめる

質問する

質問をすることでしか得られない、回答やアドバイスがある。

15分調べてもわからないことは、質問しよう！

機械学習でエラーの理由がわからない

関連した質問