編集履歴

質問編集履歴

出力された図を加えました

2020/01/27 10:03

投稿

ygtygtygt

スコア7

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -4,24 +4,42 @@
 混合ガウスモデルのアルゴリズムは理解しているのですがこのプログラムの出力値が何を示しているか自分の知識不足のせいでわかりません。
 自分で考えた中では入力データの負担率だと思うのですがあってますか？
+このプログラムにおけるy_trueとy_estimatedは何を表していますか？
 負担率とは、あるデータxにおいて、k番目のクラスの正規分布からの観測される確率のことです。別の言い方をすると、あるデータxにおける混合正規分布の割合とも考えられそうです。
 ３つのクラスA,B,Cに分かれると想定した場合１つのデータを入力すると
-[クラスAに属する確率,クラスBに属する確率,クラスCに属する確率] = [0.5, 0.3, 0.2]
+[クラスAに属する確率,クラスBに属する確率,クラスCに属する確率] = [0.5, 0.3, 0.2]と出るはずです．
 加えて負担率はクラスごとに計算されるものだと思うのですがこのプログラムで負担率はクラスごとに計算されますか？
-このプログラムでは入力データが100個なのですがいずれは30個で行いたいと思ってます。
-ちなみに16個のデータを入力したら重ね合わせる最適なガウス分布の個数は1で出力データは入力データの個数と同じ個数出てきました。
+ちなみに25個のデータを入力したら重ね合わせる最適なガウス分布の個数は1で出力データは入力データの個数と同じ個数出てきました。
 入力データ：
-[1,2,2,1,2,1,2,3,3,1,2,2,3,4,1,1]
+[1. 1. 1. 2. 1. 1. 1. 2. 1. 2. 3. 1. 3. 1. 2. 4. 1. 2. 4. 1. 1. 2. 5. 1. 1. ]
 出力データ：
+<y_true>
-[0.00979,　0.19649,　0.19649,　0.00979,　0.19649,　0.00979,　0.19649,　0.96889,　0.96889,　0.00979,　0.19649,　0.19649,　0.96889,　0.99995,　0.00979,　0.00979]
+[0.41558912 0.57036429 0.52914185 0.48112792 0.60293698 0.46554157
+ 0.52676959 0.54020389 0.37723733 0.44722378 0.56863281 0.58951637
+ 0.51391066 0.51027414 0.58159079 0.45768141 0.45611032 0.43729612
+ 0.55446468 0.48241021 0.45572774 0.55418018 0.51759982 0.44739818
+ 0.45398047]
+<y_estimated>
+[0.49164472 0.49164472 0.49164472 0.50693811 0.49164472 0.49164472
+ 0.49164472 0.50693811 0.49164472 0.50693811 0.52245777 0.49164472
+ 0.52245777 0.49164472 0.50693811 0.5224578  0.49164472 0.50693811
+ 0.5224578  0.49164472 0.49164472 0.50693811 0.5224578  0.49164472
+ 0.49164472]
+<図1>
+![イメージ説明](2caaa2a8c81d421942d37dcc1976ab0a.png)
+<図2>
+![イメージ説明](3f4370f4cb893ba815d3f02b1679a34b.png)
 ``````ここに言語を入力
 python
 ```
@@ -44,14 +62,14 @@
 sigma = np.array([0.05, 0.05, 0.03])
-#sample.csvの3列目の値を取得
 list = []
 with open('data/src/sample.csv') as f:#ファイル名は架空のものを記載
     reader = csv.reader(f)
     for row in reader:
-        list.append(row[2])
+        list.append(row[3])#sample.csvの4列目の値を取得
-float = float(list)#文字列をfloat型に変換
+float = [float(s) for s in list]#文字列をfloat型に変換
-x = np.array(list)#numpy型に変換　xは入力したデータ
+x = np.array(float)#numpy型に変換　xは入力したデータ
 y_true = np.random.normal(0.5, 0.05, len(x))
 for i in range(num_peak_true):
@@ -99,7 +117,8 @@
 #出力データ
 print(y_true)
+print(y_estimated)
 # gaussian fit
 fig, ax = plt.subplots(figsize=(6, 4))
 plt.scatter(x, y_true, label='true data')
@@ -110,7 +129,7 @@
 plt.legend()
 plt.tight_layout()
 plt.savefig("mix_gauss_fit.png")
-plt.show()
+plt.show()#図１
 plt.close()
 # BIC
@@ -122,6 +141,6 @@
 plt.xticks(x_bic)
 plt.tight_layout()
 plt.savefig("bic.png")
-plt.show()
+plt.show()#図2
 plt.close()
 ```[リンク内容](https://omedstu.jimdofree.com/2018/12/01/scipyによる1次元混合ガウス回帰/)

入力データと出力データをコードに反映させました。

2020/01/27 10:03

投稿

ygtygtygt

スコア7

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -32,6 +32,7 @@
 import matplotlib.pyplot as plt
 from tqdm import tqdm
 import math
+import csv
 """Setting up test data"""
 def gaussian_func(x, A, mu, sigma):
@@ -42,8 +43,15 @@
 mu = np.array([0.2, 0.4, 0.8])
 sigma = np.array([0.05, 0.05, 0.03])
-num_sample = 100
+#sample.csvの3列目の値を取得
+list = []
+with open('data/src/sample.csv') as f:#ファイル名は架空のものを記載
+    reader = csv.reader(f)
+    for row in reader:
+        list.append(row[2])
+float = float(list)#文字列をfloat型に変換
-x = np.linspace(0, 1, num_sample)
+x = np.array(list)#numpy型に変換　xは入力したデータ
 y_true = np.random.normal(0.5, 0.05, len(x))
 for i in range(num_peak_true):
@@ -87,8 +95,10 @@
 """plot results"""
 y_estimated = plsq_global_opt[0]
 for i in range(num_peak_estimated):
-    y_estimated += gaussian_func(x, plsq_global_opt[3*i+1], plsq_global_opt[3*i+2],
+    y_estimated += gaussian_func(x, plsq_global_opt[3*i+1], plsq_global_opt[3*i+2],plsq_global_opt[3*(i+1)])
+#出力データ
-                                 plsq_global_opt[3*(i+1)])
+print(y_true)
 # gaussian fit
 fig, ax = plt.subplots(figsize=(6, 4))