回答編集履歴

edit

2018/05/09 22:43

投稿

mkgrei

スコア8562

answer CHANGED Viewed

@@ -13,6 +13,17 @@
 The dataset used is the Wine Dataset available at UCI. This dataset has continuous features that are heterogeneous in scale due to differing properties that they measure (i.e alcohol content, and malic acid).
 The transformed data is then used to train a naive Bayes classifier, and a clear difference in prediction accuracies is observed wherein the dataset which is scaled before PCA vastly outperforms the unscaled version.
+```
+PC 1 without scaling:
+ [  1.76e-03  -8.36e-04   1.55e-04  -5.31e-03   2.02e-02   1.02e-03
+   1.53e-03  -1.12e-04   6.31e-04   2.33e-03   1.54e-04   7.43e-04
+   1.00e+00(13番目の成分)]
+PC 1 with scaling:
+ [ 0.13 -0.26 -0.01 -0.23  0.16  0.39  0.42 -0.28  0.33 -0.11  0.3   0.38
+  0.28(13番目の成分)]
+```
 データの標準化は重要な前処理である。
 多くのアルゴリズムでは特徴量(Xのこと)が正規化されていることが要求されるが、直感的にはPCAがその最たる例である。
 PCAでは分散を最大化するように成分を抽出する。

edit

2018/05/09 22:43

投稿

mkgrei

スコア8562

answer CHANGED Viewed

@@ -1,1 +1,22 @@
-http://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html
+http://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html
+---
+コード付きなので、実行してみると感覚をつかめるかと思います。
+前処理などの性能を比較する際によくやるようなものになっています。
+略訳
+> Feature scaling though standardization (or Z-score normalization) can be an important preprocessing step for many machine learning algorithms. Standardization involves rescaling the features such that they have the properties of a standard normal distribution with a mean of zero and a standard deviation of one.
+While many algorithms (such as SVM, K-nearest neighbors, and logistic regression) require features to be normalized, intuitively we can think of Principle Component Analysis (PCA) as being a prime example of when normalization is important. In PCA we are interested in the components that maximize the variance. If one component (e.g. human height) varies less than another (e.g. weight) because of their respective scales (meters vs. kilos), PCA might determine that the direction of maximal variance more closely corresponds with the ‘weight’ axis, if those features are not scaled. As a change in height of one meter can be considered much more important than the change in weight of one kilogram, this is clearly incorrect.
+To illustrate this, PCA is performed comparing the use of data with StandardScaler applied, to unscaled data. The results are visualized and a clear difference noted. The 1st principal component in the unscaled set can be seen. It can be seen that feature #13 dominates the direction, being a whole two orders of magnitude above the other features. This is contrasted when observing the principal component for the scaled version of the data. In the scaled version, the orders of magnitude are roughly the same across all the features.
+The dataset used is the Wine Dataset available at UCI. This dataset has continuous features that are heterogeneous in scale due to differing properties that they measure (i.e alcohol content, and malic acid).
+The transformed data is then used to train a naive Bayes classifier, and a clear difference in prediction accuracies is observed wherein the dataset which is scaled before PCA vastly outperforms the unscaled version.
+データの標準化は重要な前処理である。
+多くのアルゴリズムでは特徴量(Xのこと)が正規化されていることが要求されるが、直感的にはPCAがその最たる例である。
+PCAでは分散を最大化するように成分を抽出する。
+もし１つの成分の分散が他のものより大きいのが、その成分のスケールのせいである時、PCAはその成分の支配度を大きく見積もってしまい、これは明らかに誤りである。(例と対応させるため、小さい→大きい、に変更)
+以上を示すためにPCAの前に標準化を行うか否かで比較を行う。
+標準化を行わないと13番目の成分が大きな割合を占めるのに対して、標準化を行うことで全ての成分が同じ程度の割合で取り入れられていることがわかる。
+PCAの前に標準化を行うことでモデルの性能を大きく向上させることができる。