前提
サンプルデータを使って’churm’を予測するモデルを作るためにランダムフォレストを使いました。
test_score train_score
RandomForest 0.019183 0.857964
そうすると上記のような訓練データとテストデータが全く違う結果がでました。
欠損値は平均値で埋めました。
実現したいこと
この2つの結果を同じくらいに寄せるにはどうすればよいのでしょうか??
該当のソースコード
python
X = df2.drop('churn',axis=1)
y = df2['churn'] # 目的変数
トレーニングデータ,テストデータの分割
X_train, X_valid, y_train, y_valid = train_test_split(X, y,test_size=0.2, random_state=0)
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
ランダムフォレストの設定
models = {
'RandomForest': RandomForestRegressor(random_state=0),
}
モデル構築
scores = {}
for model_name, model in models.items():
model.fit(X_train, y_train)
scores[(model_name, 'train_score')] = model.score(X_train, y_train)
scores[(model_name, 'test_score')] = model.score(X_valid, y_valid)
結果を表示
pd.Series(scores).unstack()
補足情報(FW/ツールのバージョンなど)
データの情報です
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 50 columns):
Column Non-Null Count Dtype
0 rev_Mean 99643 non-null float64
1 mou_Mean 99643 non-null float64
2 totmrc_Mean 99643 non-null float64
3 da_Mean 99643 non-null float64
4 ovrmou_Mean 99643 non-null float64
5 ovrrev_Mean 99643 non-null float64
6 vceovr_Mean 99643 non-null float64
7 datovr_Mean 99643 non-null float64
8 roam_Mean 99643 non-null float64
9 change_mou 99109 non-null float64
10 change_rev 99109 non-null float64
11 drop_vce_Mean 100000 non-null float64
12 drop_dat_Mean 100000 non-null float64
13 blck_vce_Mean 100000 non-null float64
14 blck_dat_Mean 100000 non-null float64
15 unan_vce_Mean 100000 non-null float64
16 unan_dat_Mean 100000 non-null float64
17 plcd_vce_Mean 100000 non-null float64
18 plcd_dat_Mean 100000 non-null float64
19 recv_vce_Mean 100000 non-null float64
20 recv_sms_Mean 100000 non-null float64
21 comp_vce_Mean 100000 non-null float64
22 comp_dat_Mean 100000 non-null float64
23 custcare_Mean 100000 non-null float64
24 ccrndmou_Mean 100000 non-null float64
25 cc_mou_Mean 100000 non-null float64
26 inonemin_Mean 100000 non-null float64
27 threeway_Mean 100000 non-null float64
28 mou_cvce_Mean 100000 non-null float64
29 mou_cdat_Mean 100000 non-null float64
30 mou_rvce_Mean 100000 non-null float64
31 owylis_vce_Mean 100000 non-null float64
32 mouowylisv_Mean 100000 non-null float64
33 iwylis_vce_Mean 100000 non-null float64
34 mouiwylisv_Mean 100000 non-null float64
35 peak_vce_Mean 100000 non-null float64
36 peak_dat_Mean 100000 non-null float64
37 mou_peav_Mean 100000 non-null float64
38 mou_pead_Mean 100000 non-null float64
39 opk_vce_Mean 100000 non-null float64
40 opk_dat_Mean 100000 non-null float64
41 mou_opkv_Mean 100000 non-null float64
42 mou_opkd_Mean 100000 non-null float64
43 drop_blk_Mean 100000 non-null float64
44 attempt_Mean 100000 non-null float64
45 complete_Mean 100000 non-null float64
46 callfwdv_Mean 100000 non-null float64
47 callwait_Mean 100000 non-null float64
48 churn 100000 non-null int64
49 months 100000 non-null int64
dtypes: float64(48), int64(2)
memory usage: 38.1 MB
None
あなたの回答
tips
プレビュー