翔泳社の[Kaggle データ分析」に則り、ボストン住宅価格のAI構築でlight GBMを(カテゴリー変数を入れて)使いたいのですが、エラーが出てしまいます。タイプミスではないようなのですが、どこで間違えてしまっているのでしょうか?
Google Coraborately で進めています。
python
1#パッケージのインポート 2import pandas as pd 3import matplotlib.pyplot as plt 4%matplotlib inline 5import seaborn as sns 6import numpy as np 7import random 8from sklearn.preprocessing import LabelEncoder 9import lightgbm as lgb 10np.random.seed(1234) 11random.seed(1234) 12from sklearn.model_selection import KFold 13folds=3 14kf=KFold(n_splits=folds) 15from sklearn.metrics import mean_squared_error 16 17#データファイルの読み込み 18df_train=pd.read_csv('/content/drive/MyDrive/train_house.csv',encoding='cp932') 19df_test=pd.read_csv('/content/drive/MyDrive/test_house.csv',encoding='cp932') 20 21plt.style.use('ggplot') 22 23#データの確認 24df_train.shape 25df_train.dtypes 26df_train.head() 27df_train['MSZoning'].value_counts() 28all_df=pd.concat([df_train,df_test],sort=False).reset_index(drop=True) 29all_df['MSZoning'].value_counts() 30 31 32categolies=all_df.columns[all_df.dtypes=="object"] 33categolies 34for cat in categolies: 35 le=LabelEncoder() 36 print(cat) 37 38all_df[cat].fillna('missing',inplace=True) 39le=le.fit(all_df[cat]) 40all_df[cat]=le.transform(all_df[cat]) 41 42all_df[cat]=all_df[cat].astype('category') 43all_df[cat].dtypes 44 45 46train_df_le=all_df[~all_df['SalePrice'].isnull()] 47test_df_le=all_df[all_df['SalePrice'].isnull()] 48 49lgbm_params={ 50 'objective':'regression', 51 'random_seed':1234 52} 53train_x=train_df_le.drop(['SalePrice','Id'],axis=1) 54train_y=train_df_le['SalePrice'] 55 56 57 58models=[] 59rmses=[] 60oof=np.zeros(len(train_x)) 61 62for train_index,val_index in kf.split(train_x): 63 x_train =train_x.iloc[train_index] 64 x_valid =train_x.iloc[val_index] 65 y_train =train_y.iloc[train_index] 66 y_valid =train_y.iloc[val_index] 67 68 lgb_train=lgb.Dataset(x_train,y_train) 69 lgb_eval=lgb.Dataset(x_valid,y_valid,reference=lgb_train) 70 71#エラーの出る箇所# 72model_lgb=lgb.train(lgbm_params,lgb_train,valid_sets=lgb_eval,num_boost_round=100,early_stopping_rounds=20,verbose_eval=10) 73 74 y_pred=model_lgb.predict(x_valid,num_iteration=model_lgb.best_iteration) 75 tmp_rmse=np.sqrt(mean_squared_error(np.log(y_valid),np.log(y_pred))) 76 print(tmp_rmse) 77 78 models.append(model_lgb) 79 emse.append(tmp_rmse) 80 oof[val_index]=y_pred
#エラーの内容#
ValueError: DataFrame.dtypes for data must be int, float or bool.
Did not expect the data types in fields MSZoning, Street, Alley, LotShape, LandContour, Utilities, LotConfig, LandSlope, Neighborhood, Condition1, Condition2, BldgType, HouseStyle, RoofStyle, RoofMatl, Exterior1st, Exterior2nd, MasVnrType, ExterQual, ExterCond, Foundation, BsmtQual, BsmtCond, BsmtExposure, BsmtFinType1, BsmtFinType2, Heating, HeatingQC, CentralAir, Electrical, KitchenQual, Functional, FireplaceQu, GarageType, GarageFinish, GarageQual, GarageCond, PavedDrive, PoolQC, Fence, MiscFeature, SaleType
回答1件
あなたの回答
tips
プレビュー