PCAとStandardScalerについて

次元削減をしてスケーリングするのとスケーリングしてから次元削減するのでは、やはり結果は異なりますか？どちらが正しいのでしょうか？
次元削減をしてスケーリングする場合は
StandardScalerの後にPCA(n_components=2)などとし
スケーリングしてから次元削減する場合は
PCA(whiten=True,n_components=2)
とすればStandardScalerを呼び出したのと同じことになるのでしょうか？

行動規範の内容に同意します

回答2件

ベストアンサー

http://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html

コード付きなので、実行してみると感覚をつかめるかと思います。
前処理などの性能を比較する際によくやるようなものになっています。

略訳

Feature scaling though standardization (or Z-score normalization) can be an important preprocessing step for many machine learning algorithms. Standardization involves rescaling the features such that they have the properties of a standard normal distribution with a mean of zero and a standard deviation of one.
While many algorithms (such as SVM, K-nearest neighbors, and logistic regression) require features to be normalized, intuitively we can think of Principle Component Analysis (PCA) as being a prime example of when normalization is important. In PCA we are interested in the components that maximize the variance. If one component (e.g. human height) varies less than another (e.g. weight) because of their respective scales (meters vs. kilos), PCA might determine that the direction of maximal variance more closely corresponds with the ‘weight’ axis, if those features are not scaled. As a change in height of one meter can be considered much more important than the change in weight of one kilogram, this is clearly incorrect.
To illustrate this, PCA is performed comparing the use of data with StandardScaler applied, to unscaled data. The results are visualized and a clear difference noted. The 1st principal component in the unscaled set can be seen. It can be seen that feature #13 dominates the direction, being a whole two orders of magnitude above the other features. This is contrasted when observing the principal component for the scaled version of the data. In the scaled version, the orders of magnitude are roughly the same across all the features.
The dataset used is the Wine Dataset available at UCI. This dataset has continuous features that are heterogeneous in scale due to differing properties that they measure (i.e alcohol content, and malic acid).
The transformed data is then used to train a naive Bayes classifier, and a clear difference in prediction accuracies is observed wherein the dataset which is scaled before PCA vastly outperforms the unscaled version.

PC 1 without scaling:
 [  1.76e-03  -8.36e-04   1.55e-04  -5.31e-03   2.02e-02   1.02e-03
   1.53e-03  -1.12e-04   6.31e-04   2.33e-03   1.54e-04   7.43e-04
   1.00e+00(13番目の成分)]

PC 1 with scaling:
 [ 0.13 -0.26 -0.01 -0.23  0.16  0.39  0.42 -0.28  0.33 -0.11  0.3   0.38
  0.28(13番目の成分)]

データの標準化は重要な前処理である。
多くのアルゴリズムでは特徴量(Xのこと)が正規化されていることが要求されるが、直感的にはPCAがその最たる例である。
PCAでは分散を最大化するように成分を抽出する。
もし１つの成分の分散が他のものより大きいのが、その成分のスケールのせいである時、PCAはその成分の支配度を大きく見積もってしまい、これは明らかに誤りである。(例と対応させるため、小さい→大きい、に変更)
以上を示すためにPCAの前に標準化を行うか否かで比較を行う。
標準化を行わないと13番目の成分が大きな割合を占めるのに対して、標準化を行うことで全ての成分が同じ程度の割合で取り入れられていることがわかる。
PCAの前に標準化を行うことでモデルの性能を大きく向上させることができる。

投稿2018/05/09 10:48

編集2018/05/09 22:43

mkgrei

総合スコア8562

mimamoru

2018/05/09 22:13

ありがとうございます(#^.^#) URL参考になりました！…英語が得意なら良かったですが…(o_o)

mkgrei

2018/05/09 22:40

pythonやopencvと違って翻訳されていないのですね。略訳を追記しました。

mimamoru

2018/05/09 23:36

ありがとうございます(#^.^#) 助かりました！ pcaのwhitenで次元削減後にスケーリングすることには別の利点があるのでしょうか？

mkgrei

2018/05/10 10:45

https://arxiv.org/pdf/1512.00809.pdf 難解、定量的。 https://en.m.wikipedia.org/wiki/Whitening_transformation 白色化wiki。 https://github.com/scikit-learn/scikit-learn/issues/202 実装の検討。 http://ufldl.stanford.edu/tutorial/unsupervised/PCAWhitening/ 画像応用。 http://takatakamanbou.hatenablog.com/entry/2015/02/15/150430 日本語、画像応用より。 https://hayataka2049.hatenablog.jp/entry/2018/03/27/024144 私が説明するのは適任ではないはず。

mimamoru

2018/05/12 06:03

ありがとうございました！参考にさせていただきます！

行動規範の内容に同意します

次元削減をしてスケーリングする場合は
StandardScalerの後にPCA(n_components=2)などとし
スケーリングしてから次元削減する場合は
PCA(whiten=True,n_components=2)

これは逆なのでは。

上の方（次元削減→スケーリング）はちょっと細かい計算手順は違いますが結果的には白色化するのと概ね同じで、最終的な結果の各成分が平均0分散1になる奴。

下の方（スケーリング→次元削減）は、相関行列で主成分分析するのと同じです。

写像先の空間のスケールをなくすか、元の空間のスケールを無視するかという違いがあります。

投稿2018/05/09 20:07

hayataka2049

総合スコア30939

mimamoru

2018/05/09 22:12

ありがとうございます(#^.^#) すみません。逆でした！どちらが正しいということはないのでしょうか？

hayataka2049

2018/05/09 23:56

どちらが正しいということはないし、両方やりたければ両方やることもできます白色化やスケーリング（相関行列でPCA）は筋論で言えば目的に応じてやる、実際問題としては定性的な判断があてにならないことも多いので（色々な機械学習モデルの前処理に組み込んで使う場合など）、評価指標とにらめっこして考えることになります

hayataka2049

2018/05/10 04:32

一応定性的な話としては、こういう事が言えます。入力をスケーリング：入力の変数のスケールが大きく違うときとかに役立つ（やらないとスケールの大きい変数に結果が引っ張られる）出力をスケーリング：後段の処理でスケールが違うとうまく機能しないものがあるとき役立つ（そういうアルゴリズムはたくさんあります）

mimamoru

2018/05/12 06:01

ありがとうございます！とても分かりやすいです！もっと勉強します！

行動規範の内容に同意します

あなたの回答