python statsmodels でエラー / ValueError: shapes (1,3) and (4,) not aligned: 3 (dim 1) != 4 (dim 0)

概要

重回帰分析を行いたいが、predictで下記のエラーが発生する。
ValueError: shapes (1,3) and (4,) not aligned: 3 (dim 1) != 4 (dim 0)

ubuntu、google colaboratoryのどちらでも発生しています。

該当のソースコード

python
1# インポート
2import pandas as pd
3import numpy as np
4import math
5import statsmodels.api as sm
6
7# エクセル読込
8df_past = pd.read_excel("/car_past.xlsx", sheet_name='car_past')
9df_future = pd.read_excel("/car_future.xlsx", sheet_name='car_future')
10
11# 説明変数　カラム
12x_name = ["sokokyori", "seizo", "grade"]
13
14# 必要なカラムだけを抽出
15x = df_past[x_name]
16y = df_past['price']
17
18# 重回帰分析　実行
19model = sm.OLS(y, sm.add_constant(x.values))
20result = model.fit()
21print(result.summary())
22
23# 予測　実行
24result_predict = result.predict(sm.add_constant(df_future[x_name].values))

car_past.xlsx
1sokokyori	seizo		grade	price
242000		2010		8		560000
358000		2014		6		490000
414000		2014		7		720000
58000			2015		9		840000
696000		2009		4		290000
752000		2016		6		510000
849000		2014		8		500000
941000		2020		5		650000
1026000		2020		9		780000
1167000		2019		8		510000
12

car_future.xlsx
1sokokyori	seizo		grade
263000		2015		6
3

実行結果

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  price   R-squared:                       0.951
Model:                            OLS   Adj. R-squared:                  0.926
Method:                 Least Squares   F-statistic:                     38.66
Date:                Sat, 01 Apr 2023   Prob (F-statistic):           0.000255
Time:                        13:16:22   Log-Likelihood:                -118.65
No. Observations:                  10   AIC:                             245.3
Df Residuals:                       6   BIC:                             246.5
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const      -1.755e+07   8.33e+06     -2.106      0.080   -3.79e+07    2.84e+06
x1            -5.3481      0.763     -7.009      0.000      -7.215      -3.481
x2          9101.1445   4133.511      2.202      0.070   -1013.194    1.92e+04
x3          5229.4149   1.14e+04      0.459      0.662   -2.26e+04    3.31e+04
==============================================================================
Omnibus:                        0.338   Durbin-Watson:                   1.695
Prob(Omnibus):                  0.845   Jarque-Bera (JB):                0.444
Skew:                          -0.135   Prob(JB):                        0.801
Kurtosis:                       2.004   Cond. No.                     3.06e+07
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.06e+07. This might indicate that there are
strong multicollinearity or other numerical problems.
/usr/local/lib/python3.9/dist-packages/scipy/stats/_stats_py.py:1736: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-e2ae21b404ba> in <cell line: 24>()
     22 
     23 # 予測　実行
---> 24 result_predict = result.predict(sm.add_constant(df_future[x_name].values))

2 frames
/usr/local/lib/python3.9/dist-packages/statsmodels/base/model.py in predict(self, exog, transform, *args, **kwargs)
   1157             exog = np.atleast_2d(exog)  # needed in count model shape[1]
   1158 
-> 1159         predict_results = self.model.predict(self.params, exog, *args,
   1160                                              **kwargs)
   1161 

/usr/local/lib/python3.9/dist-packages/statsmodels/regression/linear_model.py in predict(self, params, exog)
    395             exog = self.exog
    396 
--> 397         return np.dot(exog, params)
    398 
    399     def get_distribution(self, params, scale, exog=None, dist_class=None):

/usr/local/lib/python3.9/dist-packages/numpy/core/overrides.py in dot(*args, **kwargs)

ValueError: shapes (1,3) and (4,) not aligned: 3 (dim 1) != 4 (dim 0)

調べると同じ質問がたくさん出てきますが、解決に至りません。
データの型、pandasの扱い方を間違えているのか、エラーを見ても正確な意味が分かりません。
よろしくお願いします。

y_waiwai

2023/04/01 14:44

エラーが出たなら、エラーメッセージを提示しましょうエラーメッセージは、よけいな省略翻訳しないで出たそのママをコピペで提示してください

drop8

2023/04/01 23:12

失礼しました。　再提示しました。

jbpb0

2023/04/03 00:48 編集

> ValueError: shapes (1,3) and (4,) not aligned: 3 (dim 1) != 4 (dim 0) result_predict = result.predict(sm.add_constant(df_future[x_name].values)) ↓ 修正 result_predict = result.predict(sm.add_constant(df_future[x_name].values, has_constant='add')) で、どうでしょうか？ print(sm.add_constant(np.random.rand(3, 3))) print(sm.add_constant(np.random.rand(2, 3))) print(sm.add_constant(np.random.rand(1, 3))) print(sm.add_constant(np.random.rand(1, 3), has_constant='add')) を実行すると分かりますが、行が一つの場合は「has_constant='add'」を付けないと定数項が追加されません参考 https://github.com/statsmodels/statsmodels/issues/7057

drop8

2023/04/03 08:48

>result_predict = result.predict(sm.add_constant(df_future[x_name].values, has_constant='add')) こちらの方法で解決しました。　ありがとうございます。