python2.7でrandomforestのグリッドサーチを行いますと、エラーが発生して、苦慮しております!

python2.7で競馬データの勉強をしております。
この度、randomforestのグリッドサーチを行いますと、エラーが発生して、苦慮しております!

# -*- coding: utf-8 -*-

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import LeaveOneOut
from sklearn.ensemble import RandomForestClassifier

#訓練データ
jockey = pd.read_csv("pre_jockey3.csv" , sep=",")

# 特徴データとラベルデータを取り出す
jockey_except_arrival = jockey.drop("arrival", axis=1)

features = jockey_except_arrival.as_matrix()
targets = jockey['arrival'].as_matrix()

model = RandomForestClassifier()

model.fit(features, targets)

# 説明変数、目的変数
X =  jockey_except_arrival.as_matrix()
y = jockey['arrival'].as_matrix()

# 学習用、検証用データに分割
from sklearn.model_selection import train_test_split
(X_train, X_test, y_train, y_test) = train_test_split(X, y, test_size = 0.3, random_state = 666)

#ランダムフォレストによる回帰は、scikit-learnのRandomForestRegressorで行います。

# 必要なライブラリのインポート
from sklearn.ensemble import RandomForestRegressor
# モデル構築、パラメータはデフォルト
forest = RandomForestRegressor()
forest.fit(X_train, y_train)
#平均二乗誤差（MSE）とR2R2がどうなるかを見てみます。

# 予測値を計算
y_train_pred = forest.predict(X_train)
y_test_pred = forest.predict(X_test)
# MSEの計算
from sklearn.metrics import mean_squared_error
print('MSE train : %.3f, test : %.3f' % (mean_squared_error(y_train, y_train_pred), mean_squared_error(y_test, y_test_pred)) )
# R^2の計算
from sklearn.metrics import r2_score
print('r2 train : %.3f, test : %.3f' % (r2_score(y_train, y_train_pred), r2_score(y_test, y_test_pred)) )

#####################################################

#ここから後のソースコードにエラーが発生して苦慮しております！！！

# 必要なライブラリのインポート
from sklearn.model_selection import train_test_split

from sklearn.grid_search import GridSearchCV
# 動かすパラメータを明示的に表示、今回は決定木の数を変えてみる
params = {'n_estimators'  : [3, 10, 100, 1000, 10000], 'n_jobs': [-1]}

#実際にGridSearchCVで探索してみます。検証する手法は交差検証、評価する指標としてMSEを用います。

# モデルにインスタンス生成
mod = RandomForestRegressor()
# ハイパーパラメータ探索
cv = GridSearchCV(mod, params, cv = 10, scoring= 'mean_squared_error', n_jobs =1)
cv.fit(X_train, y_train)

#MSEとr2は以下のようになります。

# 予測値を計算
y_train_pred = forest.predict(X_train)
y_test_pred = forest.predict(X_test)
# MSEの計算
from sklearn.metrics import mean_squared_error
print('MSE train : %.3f, test : %.3f' % (mean_squared_error(y_train, y_train_pred), mean_squared_error(y_test, y_test_pred)) )
# R^2の計算
from sklearn.metrics import r2_score
print('r2 train : %.3f, test : %.3f' % (r2_score(y_train, y_train_pred), r2_score(y_test, y_test_pred)) )

pre_jockey3.csv

weather	race_num	race_grade	horse_weight	delta_weight	arrival
2	11	3	512	4	1
2	12	5	486	2	1
4	10	5	454	-6	0
4	8	5	494	2	0
4	6	5	494	-2	1
4	5	6	474	24	0
4	4	6	440	0	0
4	3	6	402	-4	0
3	11	2	460	14	0
3	9	5	490	-2	0
3	8	5	462	2	0
2	7	5	422	-2	0
2	6	5	476	0	0
2	5	5	456	0	0
2	3	6	430	-2	1
2	2	6	498	4	0
1	12	5	474	0	0
1	11	5	490	-4	0
1	9	5	498	2	0
1	8	5	492	10	1
1	2	6	512	-8	0
1	1	6	436	0	1
2	11	3	502	10	0
2	10	5	492	2	0
2	7	5	448	-4	0
3	5	6	466	-2	0
3	4	6	480	0	0
3	2	6	484	4	0
3	11	3	540	3	0
3	10	5	482	12	0
3	9	5	548	2	0
3	8	5	456	2	0
3	7	5	518	-6	0
3	6	5	450	2	1
3	4	6	494	2	0
3	3	6	520	4	1
3	2	6	504	-10	0
1	11	2	488	6	0
1	10	5	502	0	0
1	7	5	472	-2	0
1	4	6	456	-4	1
1	1	6	458	-6	0
1	11	3	468	4	0
1	7	5	492	10	1
1	2	6	502	2	0
1	1	6	406	-14	0
1	9	5	474	-6	0
1	11	3	470	10	1
1	10	4	488	8	0
1	9	5	490	0	0
1	8	5	490	-2	0
1	6	5	474	4	1
1	5	6	432	2	0
1	11	4	404	-10	0
1	9	5	538	-2	0
1	8	5	454	-2	0
1	4	7	520	0	1
1	2	6	520	8	1
1	1	6	464	-2	0
1	12	5	448	-2	0
1	11	2	498	-2	0
1	10	5	516	-8	0
1	8	5	470	12	1
1	6	5	452	2	0
1	5	6	452	-4	0
2	11	3	470	-6	0
2	10	5	500	14	0
2	9	5	452	-6	0
2	8	5	478	4	0
2	6	5	442	2	0
2	5	6	486	-8	0
2	4	7	436	0	1
1	2	6	480	-18	0
1	12	5	464	4	0
1	10	5	502	0	0
1	8	5	476	10	0
1	7	5	448	-2	0
1	6	6	512	-10	0
1	5	6	436	0	0
1	2	6	516	2	0
1	1	6	430	4	0
1	10	5	494	-10	1
1	9	5	476	-10	0
1	7	5	498	-4	0
1	6	7	468	0	0
1	5	6	460	-2	1
1	3	6	498	2	0
2	1	6	490	-4	0
1	12	5	536	-2	0
1	11	1	508	-1	0
1	10	5	452	0	0
1	9	4	514	2	0
1	8	5	440	-4	0
1	6	5	414	4	0
1	4	6	414	-2	1
1	3	5	502	-4	0
2	12	5	440	-8	0
2	11	3	492	-2	0
2	8	5	478	-2	1

上記のソースコードを動かしますと、下記のような答えが返ってまいります。

C:\Users\satoru\satoru_system_2.7\jockey_record\jockey_test>randomforest_1_3.py
MSE train : 0.034, test : 0.208
r2 train : 0.792, test : -0.300
C:\Python27\lib\site-packages\sklearn\cross_validation.py:44: DeprecationWarning: This module was de
precated in version 0.18 in favor of the model_selection module into which all the refactored classe
s and functions are moved. Also note that the interface of the new CV iterators are different from t
hat of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
C:\Python27\lib\site-packages\sklearn\grid_search.py:43: DeprecationWarning: This module was depreca
ted in version 0.18 in favor of the model_selection module into which all the refactored classes and
 functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
C:\Python27\lib\site-packages\sklearn\metrics\scorer.py:90: DeprecationWarning: Scoring method mean_
squared_error was renamed to neg_mean_squared_error in version 0.18 and will be removed in 0.20.
  sample_weight=sample_weight)
C:\Python27\lib\site-packages\sklearn\metrics\scorer.py:90: DeprecationWarning: Scoring method mean_
squared_error was renamed to neg_mean_squared_error in version 0.18 and will be removed in 0.20.
  sample_weight=sample_weight)

この、同じようなエラーを延々と吐き続けます。

mean_squared_error was renamed to neg_mean_squared_error となっておりますので、mean_squared_errorの部分を

neg_mean_squared_error に改変しても同じエラーとなりました。
neg_mean_squared_error は import も出来ませんでした。

改善方法について、先輩方の御教示をよろしくお願いいたします。

行動規範の内容に同意します

回答1件

ベストアンサー

この書き方は Version 0.20から使えなくなるよという、単なる警告なので、あまり気にする必要もないかと思いますが、

neg_mean_squared_error に改変しても同じエラーとなりました。
neg_mean_squared_error は import も出来ませんでした。

sklearn.metrics.mean_squared_error は現状問題なく使えるはずですので、改変せずにそのままお使いください。

修正するべき点は

cv = GridSearchCV(mod, params, cv = 10, scoring= 'mean_squared_error', n_jobs =1)
の scoring引数の値の mean_squared_error ですので、ここを neg_mean_squared_errorと修正してください。

http://scikit-learn.org/stable/modules/model_evaluation.html#the-scoring-parameter-defining-model-evaluation-rules
上記のリンクを見ていただけると分ると思いますが、metrics.mean_squared_errorを使用する場合にここで指定する値は neg_mean_squared_errorとなっております。

あと、もう一点ありまして

from sklearn.grid_search import GridSearchCV
の部分ですが、
from sklearn.model_selection import GridSearchCV
と変更されておりますので、修正ください。

http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

投稿2017/06/30 00:13