hayataka2049 様のおかげで5,000以上あったDataConversionWarningはすべてなくなりました! ありがとうございます(*≧∀≦)
以前の質問: [kaggle x Titanic]うっすらピンクの背景でDataConversionWarning
開発環境
- Python3.6.5
- Jupyter notebook
- Windows7
困っていること
kaggleのTitanic課題kernelを写経して提出までこぎつけました。が、警告をガン無視して進めたので今から警告に対する対処をしようとしています。
写経対象: A Data Science Framework: To Achieve 99% Accuracy
今回はUserWarningとConvergenceWarningで困っています。
該当コード
python
1#WARNING: Running is very computational intensive and time expensive. 2grid_n_estimator = [10, 50, 100, 300] 3grid_ratio = [.1, .25, .5, .75, 1.0] 4grid_learn = [.01, .03, .05, .1, .25] 5grid_max_depth = [2, 4, 6, 8, 10, None] 6grid_min_samples = [5, 10, .03, .05, .10] 7grid_criterion = ['gini', 'entropy'] 8grid_bool = [True, False] 9grid_seed = [0] 10 11grid_param = [ 12 [{ 13 'n_estimators': grid_n_estimator, 14 'learning_rate': grid_learn, 15 'random_state': grid_seed 16 }], 17 18 [{ 19 'n_estimators': grid_n_estimator, 20 'max_samples': grid_ratio, 21 'random_state': grid_seed 22 }], 23 24 [{ 25 'n_estimators': grid_n_estimator, 26 'criterion': grid_criterion, 27 'max_depth': grid_max_depth, 28 'random_state': grid_seed 29 }], 30 31 [{ 32 'learning_rate': [.05], 33 'n_estimators': [300], 34 'max_depth': grid_max_depth, 35 'random_state': grid_seed 36 }], 37 38 [{ 39 'n_estimators': grid_n_estimator, 40 'criterion': grid_criterion, 41 'max_depth': grid_max_depth, 42 'oob_score': [True], 43 'random_state': grid_seed 44 }], 45 46 [{ 47 'max_iter_predict': grid_n_estimator, 48 'random_state': grid_seed 49 }], 50 51 [{ 52 'fit_intercept': grid_bool, 53 'solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'], 54 'random_state': grid_seed 55 }], 56 57 [{ 58 'alpha': grid_ratio, 59 }], 60 61 [{}], 62 63 [{ 64 'n_neighbors': [1,2,3,4,5,6,7], 65 'weights': ['uniform', 'distance'], 66 'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'] 67 }], 68 69 [{ 70 'C': [1,2,3,4,5], 71 'gamma': grid_ratio, 72 'decision_function_shape': ['ovo', 'ovr'], 73 'probability': [True], 74 'random_state': grid_seed 75 }], 76 77 [{ 78 'learning_rate': grid_learn, 79 'max_depth': [1,2,4,6,8,10], 80 'n_estimators': grid_n_estimator, 81 'seed': grid_seed 82 }] 83] 84 85 86start_total = time.perf_counter() 87for clf, param in zip (vote_est, grid_param): 88 start = time.perf_counter() 89 best_search = model_selection.GridSearchCV(estimator = clf[1], param_grid = param, cv = cv_split, scoring = 'roc_auc') 90 best_search.fit(data1[data1_x_bin], data1[Target].values.ravel()) 91 run = time.perf_counter() - start 92 93 best_param = best_search.best_params_ 94 print('The best parameter for {} is {} with a runtime of {:.2f} seconds'.format(clf[1].__class__.__name__, best_param, run)) 95 clf[1].set_params(**best_param) 96 97run_total = time.perf_counter() - start_total 98print('Total optimization time was {:.2f} minutes.'.format(run_total/60)) 99 100print('-' *10)
警告文
同じ内容の文が重複して出てくるので一部だけですが、このような警告文が延々と出てきます。
C:\Users\ayumusato\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py:453: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates. warn("Some inputs do not have OOB scores. " C:\Users\ayumusato\Anaconda3\lib\site-packages\sklearn\linear_model\sag.py:326: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge "the coef_ did not converge", ConvergenceWarning)
私調べUserWarning: Some inputs do not have OOB scores.
OOBスコアのない入力がある。
OOB(Out-Of-Bag): 選ばれなかったデータ。ランダムフォレストのエラーの評価に使われる。
→OOBスコアが必要? どうやれば警告への対処ができるのでしょうか。
私調べConvergenceWarning: The max_iter was reached which means the coef_ did not converge
max_iterに到達し、coef_が集中しなかった???
ConvergenceWarning: 収束問題を捉えるための警告。
どうやれば収束問題? が解決するのでしょうか。
よろしくお願いしますorz
回答2件
あなたの回答
tips
プレビュー
バッドをするには、ログインかつ
こちらの条件を満たす必要があります。
2018/09/17 10:19