Key　Error　について（Pyhton、機械学習、Kaggle Titaincコンペ）

前提・実現したいこと

Titanic号の生存者予測モデルを作成（Kaggleのコンペ,言語はPython）中に、
sex(性別）と乗車地点（Embarked)のデータの整形を実装中に以下のエラーメッセージが発生しました。

下記サイトを参考（有料会員のみ閲覧可）
https://aiacademy.jp/texts/show/?id=67&course=5176

データセットの事前処理の段階で、
(1) 欠損データを代理データに入れ替える
(2) 文字列カテゴリカルデータを数字へ変換

の（２）の段階にて、データセットの中で文字列が使われている性別（Sex）と乗車地点（Embarked）についてダミー変数を用いて補います。

発生している問題・エラーメッセージ

KeyError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Sex'

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
<ipython-input-27-10c83f6d9054> in <module>
----> 1 sex_dum = pd.get_dummies(df["Sex"])
2 df = pd.concat((df,sex_dum),axis=1)
3 df = df.drop("Sex",axis=1)
4 df = df.drop("female",axis=1)
5

/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in getitem(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]

/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Sex'

エラーメッセージ
Key error

### 該当のソースコード
import pandas as pd
df = pd.read_csv("/kaggle/input/titanic/train.csv")
df.head()

df.isnull().sum()

df["Age"].fillna(df["Age"].median(),inplace=True)

df =df.drop("Cabin",axis=1)

import matplotlib.pyplot as plt
import seaborn as sns

sns.countplot(x = df["Pclass"],hue = df["Survived"])
plt.show()

import numpy as np
edge = np.arange(0,100,10)

plt.hist((df[df["Survived"]==0]["Age"],df[df["Survived"]==1]["Age"]),histtype="barstacked",bins=edge,label=[0,1])
plt.legend(title="Survived")
plt.show()

df["Familysize"] =df['SibSp']+df['Parch']+1

pd.crosstab(df["Familysize"],df["Survived"],normalize='index').plot(kind="bar",stacked=True)
plt.show()

sex_dum = pd.get_dummies(df["Sex"])
df = pd.concat((df,sex_dum),axis=1)
df = df.drop("Sex",axis=1)
df = df.drop("female",axis=1)


emb_dum = pd.get_dummies(df["Embarked"])
df = pd.concat((df,emb_dum),axis=1)
df = df.drop(["Embarked","S"],axis=1)


df = df.drop(["Name","Ticket","PassengerId","Parch","SibSp"],axis=1)
#今回使わないデータの削除

### 試したこと
スペース・誤字がが無いか２、３度確認したけど、結局原因は特定できませんでした。
初心者でデバッグの知識が、ほとんど無いレベルです。
どなたかわかる方いましたら、よろしくお願いします。

meg_

2020/05/05 04:53

df.columnsの結果はどうなりますか？

scienceman

2020/05/05 06:04

Index(['Survived', 'Pclass', 'Age', 'Fare', 'Familysize', 'male', 'C', 'Q'], dtype='object') となりました。

meg_

2020/05/05 06:15

df.columnsに"Sex"がないので、KeyError: 'Sex'が発生しています。元のデータにはあったと思うのでコードのどこかでこの列を削除していませんか？

行動規範の内容に同意します

回答1件

ベストアンサー

質問のコードをほぼそのまま試しましたが問題なく実行出来ました。

Python
1import pandas as pd
2df = pd.read_csv("train.csv")
3df.head()
4
5df.isnull().sum()
6
7df["Age"].fillna(df["Age"].median(),inplace=True)
8
9df =df.drop("Cabin",axis=1)
10
11import matplotlib.pyplot as plt
12import seaborn as sns
13%matplotlib inline
14
15sns.countplot(x = df["Pclass"],hue = df["Survived"])
16
17import numpy as np
18edge = np.arange(0,100,10)
19
20plt.hist((df[df["Survived"]==0]["Age"],df[df["Survived"]==1]["Age"]),histtype="barstacked",bins=edge,label=[0,1])
21plt.legend(title="Survived")
22
23df["Familysize"] =df['SibSp']+df['Parch']+1
24
25pd.crosstab(df["Familysize"],df["Survived"],normalize='index').plot(kind="bar",stacked=True)
26
27sex_dum = pd.get_dummies(df["Sex"])
28df = pd.concat((df,sex_dum),axis=1)
29df = df.drop("Sex",axis=1)
30df = df.drop("female",axis=1)
31
32emb_dum = pd.get_dummies(df["Embarked"])
33df = pd.concat((df,emb_dum),axis=1)
34df = df.drop(["Embarked","S"],axis=1)
35
36df = df.drop(["Name","Ticket","PassengerId","Parch","SibSp"],axis=1)