前提・実現したいこと
ここに質問の内容を詳しく書いてください。
(例)PHP(CakePHP)で●●なシステムを作っています。
■■な機能を実装中に以下のエラーメッセージが発生しました。
発生している問題・エラーメッセージ
C:\Anaconda\lib\site-packages\sklearn\model_selection\_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=100. % (min_groups, self.n_splits)), UserWarning) --------------------------------------------------------------------------- _RemoteTraceback Traceback (most recent call last) _RemoteTraceback: """ Traceback (most recent call last): File "C:\Anaconda\lib\site-packages\joblib\externals\loky\process_executor.py", line 418, in _process_worker r = call_item() File "C:\Anaconda\lib\site-packages\joblib\externals\loky\process_executor.py", line 272, in __call__ return self.fn(*self.args, **self.kwargs) File "C:\Anaconda\lib\site-packages\joblib\_parallel_backends.py", line 608, in __call__ return self.func(*args, **kwargs) File "C:\Anaconda\lib\site-packages\joblib\parallel.py", line 256, in __call__ for func, args, kwargs in self.items] File "C:\Anaconda\lib\site-packages\joblib\parallel.py", line 256, in <listcomp> for func, args, kwargs in self.items] File "C:\Anaconda\lib\site-packages\sklearn\model_selection\_validation.py", line 544, in _fit_and_score test_scores = _score(estimator, X_test, y_test, scorer) File "C:\Anaconda\lib\site-packages\sklearn\model_selection\_validation.py", line 591, in _score scores = scorer(estimator, X_test, y_test) File "C:\Anaconda\lib\site-packages\sklearn\metrics\_scorer.py", line 87, in __call__ *args, **kwargs) File "C:\Anaconda\lib\site-packages\sklearn\metrics\_scorer.py", line 300, in _score raise ValueError("{0} format is not supported".format(y_type)) ValueError: multiclass format is not supported """ The above exception was the direct cause of the following exception: ValueError Traceback (most recent call last) <ipython-input-402-c25fe405b40d> in <module> ----> 1 gcv.fit(trainX,y) C:\Anaconda\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params) 708 return results 709 --> 710 self._run_search(evaluate_candidates) 711 712 # For multi-metric evaluation, store the best_index_, best_params_ and C:\Anaconda\lib\site-packages\sklearn\model_selection\_search.py in _run_search(self, evaluate_candidates) 1149 def _run_search(self, evaluate_candidates): 1150 """Search all candidates in param_grid""" -> 1151 evaluate_candidates(ParameterGrid(self.param_grid)) 1152 1153 C:\Anaconda\lib\site-packages\sklearn\model_selection\_search.py in evaluate_candidates(candidate_params) 687 for parameters, (train, test) 688 in product(candidate_params, --> 689 cv.split(X, y, groups))) 690 691 if len(out) < 1: C:\Anaconda\lib\site-packages\joblib\parallel.py in __call__(self, iterable) 1015 1016 with self._backend.retrieval_context(): -> 1017 self.retrieve() 1018 # Make sure that we get a last message telling us we are done 1019 elapsed_time = time.time() - self._start_time C:\Anaconda\lib\site-packages\joblib\parallel.py in retrieve(self) 907 try: 908 if getattr(self._backend, 'supports_timeout', False): --> 909 self._output.extend(job.get(timeout=self.timeout)) 910 else: 911 self._output.extend(job.get()) C:\Anaconda\lib\site-packages\joblib\_parallel_backends.py in wrap_future_result(future, timeout) 560 AsyncResults.get from multiprocessing.""" 561 try: --> 562 return future.result(timeout=timeout) 563 except LokyTimeoutError: 564 raise TimeoutError() C:\Anaconda\lib\concurrent\futures\_base.py in result(self, timeout) 433 raise CancelledError() 434 elif self._state == FINISHED: --> 435 return self.__get_result() 436 else: 437 raise TimeoutError() C:\Anaconda\lib\concurrent\futures\_base.py in __get_result(self) 382 def __get_result(self): 383 if self._exception: --> 384 raise self._exception 385 else: 386 return self._result ValueError: multiclass format is not supported
該当のソースコード
python
import pandas as pd import numpy as np from matplotlib import pyplot as plt %matplotlib inline from sklearn.tree import DecisionTreeClassifier as DT from sklearn.model_selection import cross_validate from sklearn.model_selection import GridSearchCV from sklearn.tree import export_graphviz import pydotplus from IPython.display import Image データ読み込み train = pd.read_csv("train.csv") test = pd.read_csv("test.csv") sample = pd.read_csv("sample_submit.csv",header=None) 欠損値の処理 train[train.isnull().any(axis=1)].head() train1 = train.copy() train1 = train1.drop(["amenities","description","first_review","host_response_rate","host_since","last_review","latitude","longitude","name","neighbourhood","number_of_reviews","thumbnail_url","zipcode"], axis=1) train2 = train1.dropna() float→intへの変更 train3 = train2.astype({"y": "int64"}) 説明変数と目的変数の作成 trainX = train3.iloc[:,0:15] y = train3["y"] ダミー変数化 trainX = pd.get_dummies(trainX) testX = pd.get_dummies(testX) 決定木モデルの箱の用意 clf = DT() グットサーチ準備 parameters1 = {"max_depth":list(range(2,11)),"min_samples_leaf":[5,10,20,50,100,500]} クロスバリデーション gcv = GridSearchCV(clf, parameters1, cv=100, scoring="roc_auc",n_jobs=-1,return_train_score=True) gcv.fit(trainX,y)
試したこと
yがもとfloat64→int64に変更
補足情報(FW/ツールのバージョンなど)
JupyterLabで実施
fit() に渡している y を print した結果を教えてください
返信頂き、ありがとうございます。
print(y)の結果は以下となりますした。
0 29.0
1 31.9
2 19.0
3 28.0
4 37.7
...
194 40.8
195 20.2
196 16.0
197 43.4
198 26.0
Name: mpg, Length: 199, dtype: float64
この後「y = y.astype("int64")」を追加して、再度実行しましたが、
こちらも同じエラーが発生しました。
クラス分類でなく、回帰問題であれば、DecisionTreeRegressor を使うべきではないでしょうか
https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html#sklearn.tree.DecisionTreeRegressor
返信ありがとうございます。
問題をよく確認して、上記の意見を参考にもう一度トライいたします。