kaggleのTitanic課題kernelを写経しています。
写経対象: A Data Science Framework: To Achieve 99% Accuracy

今度はBaggingClassifierで困っています。

開発環境

Python3.6.5
Jupyter notebook
Windows7

躓いた箇所とエラー文

grid_param内の書き方がよくないのだと推測しています。もともとのTitanic課題kernelにはclassifier__ はついていなかったのですが、stackoverflowを見てclassifier__をつけることにしました。
↑classifier__削除済

削除はしたものの、ExtraTreesClassifierでエラーが出ています。

python
1#WARNING: Running is very computational intensive and time expensive.
2grid_n_estimator = [10, 50, 100, 300]
3grid_ratio = [.1, .25, .5, .75, 1.0]
4grid_learn = [.01, .03, .05, .1, .25]
5grid_max_depth = [2, 4, 6, 8, 10, None]
6grid_min_samples = [5, 10, .03, .05, .10]
7grid_criterion = ['gini', 'entropy']
8grid_bool = [True, False]
9grid_seed = [0]
10
11grid_param = [
12                [{
13                    'n_estimators': grid_n_estimator,
14                    'learning_rate': grid_learn,
15                    'random_state': grid_seed
16                }],
17    
18                [{
19                    'n_estimators': grid_n_estimator,
20                    'max_samples': grid_ratio,
21                    'random_state': grid_seed
22                }],
23
24                [{
25                    'n_estimators': grid_n_estimator,
26                    'criterion': grid_criterion,
27                    'max_depth': grid_max_depth,
28                    'random state': grid_seed
29                }],
30
31                [{
32                    'learning_rate': [.05],
33                    'n_estimators': [300],
34                    'max_depth': grid_max_depth,
35                    'random_state': grid_seed
36                }],
37
38                [{
39                    'n_estimators': grid_n_estimator,
40                    'criterion': grid_criterion,
41                    'max_depth': grid_max_depth,
42                    'oob_score': [True],
43                    'random_state': grid_seed
44                }],
45                
46                [{
47                    'max_iter_predict': grid_n_estimator,
48                    'random_state': grid_seed
49                }],
50    
51                [{
52                    'fit_intercept': grid_bool,
53                    'solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'],
54                    'random_state': grid_seed
55                }],
56    
57                [{
58                    'alpha': grid_ratio,
59                }],
60    
61                [{}],
62    
63                [{
64                    'n_neighbors': [1,2,3,4,5,6,7],
65                    'weights': ['uniform', 'distance'],
66                    'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute']
67                }],
68    
69                [{
70                    'C': [1,2,3,4,5],
71                    'gamma': grid_ratio,
72                    'decision_function_shape': ['ovo', 'ovr'],
73                    'probability': [True],
74                    'random_state': grid_seed
75                }],
76    
77                [{
78                    'learning_rate': grid_learn,
79                    'max_depth': [1,2,4,6,8,10],
80                    'n_estimators': grid_n_estimator,
81                    'seed': grid_seed
82                }]
83]
84
85
86start_total = time.perf_counter()
87for clf, param in zip (vote_est, grid_param):
88    start = time.perf_counter()
89    best_search = model_selection.GridSearchCV(estimator = clf[1], param_grid = param, cv = cv_split, scoring = 'roc_auc')
90    best_search.fit(data1[data1_x_bin], data1[Target])
91    run = time.perf_counter() - start
92    
93    best_param = best_search.best_params_
94    print('The best parameter for {} is {}  with a runtime of {:.2f} seconds'.format(clf[1].__class__.__name__, best_param, run))
95    clf[1].set_params(**best_param)
96    
97run_total = time.perf_counter() - start_total
98print('Total optimization time was {:.2f} minutes.'.format(run_total/60))
99
100print('-' *10)

classifier__削除後のエラー文

ValueError: Invalid parameter random state for estimator ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
           max_depth=2, max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False). Check the list of available parameters with `estimator.get_params().keys()`.

kaggleのTitanic課題提出まで漕ぎ着けました!

hayataka2049 さまのおかげです(*≧∀≦)

行動規範の内容に同意します

回答1件

ベストアンサー

classifier__をつける必要があるのはPipelineを使う際のみ（で、かつPipelineのstepsの各タプルで分類器の名前をclassifierにしたとき……）です。

今回は必要ないので外してください。

投稿2018/09/10 14:27

hayataka2049

総合スコア30933

Yukiya025

2018/09/11 10:05

hayataka2049さま、ありがとうございます(*≧∀≦) 早速classifier__を消したのですが、今度はExtraTreesClassifierでエラーが出るようになりましたorz 今度は何がだめなのかわかりません(>_<) 質問文にclassifier__を消したコードと新しく出たエラー文を追記しました。