データを層samplingを使って、train、val、testで3分割するコード
を作っています。
train_test_splitのところで、ValueError: Found input variables with inconsistent numbers of samplesのエラーがでます。解決方法が知りたいです。
#Numpyの配列に変換 y = np.array(dataset[target_col])#ターゲット変数 X = np.array(dataset[feature_cols])#説明変数 #目的変数の可視化 pd.Series(y).hist(bins=25) #目的変数の離散化、可視化 bins=[0, 10, 20, 30] print(bins) binned_y=np.digitize(y,bins) pd.Series(binned_y).hist(bins=25) # 特徴量と正解を訓練データとテストデータと検証データに分割 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, stratify=binned_y, random_state=0) X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.5, stratify=binned_y, random_state=0) # 0.67 x 0.5 = 0.335 print('X_trainの形状:',X_train.shape,' y_trainの形状:',y_train.shape,' X_valの形状:',X_val.shape,' y_valの形状:',y_val.shape,' X_testの形状:',X_test.shape,' y_testの形状:',y_test.shape)
errorメッセージは以下===============================================
ValueError Traceback (most recent call last)
<ipython-input-18-d9ef67107817> in <module>()
1 # 特徴量と正解を訓練データとテストデータと検証データに分割
2 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, stratify=binned_y, random_state=0)
----> 3 X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.5, stratify=binned_y, random_state=0) # 0.67 x 0.5 = 0.335
4
5 print('X_trainの形状:',X_train.shape,' y_trainの形状:',y_train.shape,' X_valの形状:',X_val.shape,' y_valの形状:',y_val.shape,' X_testの形状:',X_test.shape,' y_testの形状:',y_test.shape)
3 frames
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
210 if len(uniques) > 1:
211 raise ValueError("Found input variables with inconsistent numbers of"
--> 212 " samples: %r" % [int(l) for l in lengths])
213
214
ValueError: Found input variables with inconsistent numbers of samples: [77, 116]
回答1件
あなたの回答
tips
プレビュー