編集履歴

質問編集履歴

追記

2018/11/05 08:11

投稿

スコア79

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -246,4 +246,38 @@
 x3_I   NaN
 x4     NaN
 x9     NaN
-dtype: float64]]
+dtype: float64]]
+【デバッグ】
+```python
+コード
+``
+ちなみに、出力の差分としましては、うまくいっているデータの時は、
+関数Mahala2の中でデバッグ表示しているベクトルについて、
+vec is [[-0.40788125]
+ [ 0.00139516]
+ [ 0.00359367]
+ [ 0.00180594]
+ [ 0.00321283]]
+となっているのですが、NGの時は以下の様になっております。
+vec is [[x1_E    0.000891
+dtype: float64]
+ [x3_I    0.002092
+dtype: float64]
+ [x2_K    0.000432
+dtype: float64]
+ [x4    0.011538
+dtype: float64]
+ [x1_F    0.000403
+dtype: float64]
+ [x1_D    0.002242
+dtype: float64]
+ [x9    0.010253
+dtype: float64]
+ [-1.0186953967441772]
+ [x1_C    0.001979
+dtype: float64]]`

変更

2018/11/05 08:11

投稿

スコア79

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -3,9 +3,10 @@
 引数のベクトルの値は、ちゃんと入っているようですし、"abalone.data.txt"分析時のデバッグでも、ほぼ同様にベクトルの値が出力され、行列式の値もほぼ近いのですが、不思議なことに、その場合はちゃんと分散共分散行列の逆行列が求まっています。。。変数vecの値の入り方が違うのでしょうか？
 【環境】Window10 64bit, chrome
-【コード】
+```Python
 def step_aic(model, exog, endog, **kwargs):
     """
     This select the best exogenous variables with AIC
@@ -176,7 +177,9 @@
 plt.scatter(hl, X.dot(res.params), color="k", )
 plt.grid()
+```
 【出力】
 (x1_E    0.090009

修正

2018/11/05 06:35

投稿

スコア79

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -77,10 +77,6 @@
 d = pd.read_csv('flare.data2.txt',header=None,skiprows=1,delim_whitespace=True,names=('x1','x2','x3','x4','x5','x6','x7','x8','x9','x10','y1','y2','y3'))
 d.head()
 df2 = pd.get_dummies(d)
-#print(df2)
-#df2['x6'].unique()
-#df2.head()
-#df2.columns
 model = step_aic(smf.ols,['x4','x5','x6','x7','x8','x9','x10','x1_B','x1_C','x1_D','x1_E','x1_F','x1_H','x2_A','x2_H','x2_K','x2_R','x2_S','x2_X','x3_C','x3_I','x3_O','x3_X'],['y1'],data=df2)
@@ -145,7 +141,6 @@
 print((x1_E, x3_I, x2_K, x4, x1_F, x1_D, x9, x1_C))
 for hl in hl_l:
-    #X = sp.array([1, hl, ml, wl, prw, ppl, gw])
     X = sp.array([1,x1_E, x3_I, x2_K, x4, x1_F, x1_D, x9, hl, x1_C])
     hat_y.append(X.dot(res.params))
 plt.plot(hl_l, hat_y)
@@ -162,7 +157,6 @@
     D2.append(D2_0)
     print(D2_0)
 D2 = sp.array(D2)
-#print(D2.shape)
 interval095 = t_0025 * sp.sqrt((1/n + D2 / (n-1)) * res.scale)

追記

2018/11/05 06:33

投稿

スコア79

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -1,4 +1,4 @@
-Pythonのjupyter notebookで、UCI machine learning repositoryのあるデータ（'flare.data2.txt'）を重回帰分析すると、エラーが出ます。同じ手順で、"abalone.data.txt"というデータを分析した際には全く問題が無いのですが、なぜかわからずにおります。
+Pythonのjupyter notebookで、UCI machine learning repositoryのあるデータ（'flare.data2.txt'）を重回帰分析しております。自作関数によりStepAICで変数選択を行ったのち、信頼区間・予測区間を求めようとすると、エラーが出ます。同じ手順で、"abalone.data.txt"というデータを分析した際には全く問題が無いのですが、なぜかわからずにおります。
 エラーが出ているのは、マハラノビスの距離を求めるところで、分散共分散行列の逆行列がNANになってしまっている（下記の"mahara is..."以下の出力）ためなのですが、なぜそのようになるのかが分かりません。
 引数のベクトルの値は、ちゃんと入っているようですし、"abalone.data.txt"分析時のデバッグでも、ほぼ同様にベクトルの値が出力され、行列式の値もほぼ近いのですが、不思議なことに、その場合はちゃんと分散共分散行列の逆行列が求まっています。。。変数vecの値の入り方が違うのでしょうか？