教師あり学習モデル（SVM）において全く同じ訓練データを用いて学習を行っても作製されるモデルに違いが出る？

Question

```Python
import numpy as np
import matplotlib.pyplot as plt
import mglearn
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC

X =  csv_titanic[["Age"],["Fare"]]
y = csv_titanic["Survived"]

X_train,X_test,y_train, y_test=train_test_split(X, y, stratify=y, random_state = 0 )

print(y_train.value_counts())
print(y_test.value_counts())
print(y.value_counts())


svm = LinearSVC().fit(X_train,y_train) 

X_array = np.array(X)

def plot_separator(model):
    mglearn.plots.plot_2d_separator(model,X_array)
    mglearn.discrete_scatter(X_array[:,0],X_array[:,1],y)
    plt.xlabel("Age") 
    plt.ylabel("Fare")
    plt.legend(["abc","alive"])
    plt.xlim([0,80])
    plt.ylim([0,300])
    plt.show()

svm_15 = LinearSVC(C=15).fit(X_train,y_train)
plot_separator(svm_15)

svm_100 = LinearSVC(C=100).fit(X_train,y_train)
plot_separator(svm_100)




```

上のようなコードでSVMを用いてモデルを作ったのですが、最後の4行だけを何度か実行すると実行するたびに異なるモデルが生成されます。（異なる分離直線が引かれます）

つまり、全く同じ訓練データを用いて同じアルゴリズムで学習させても、その時々で生成されるモデルに違いが出るということでしょうか？

よろしくおねがいします。

Accepted Answer

学習したモデルが変化するのは，学習時にランダムな要素が含まれているからだと思われます．

公式のdocument [sklearn.svm.LinearSVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html)を見ると，`random_state`と呼ばれるパラメータと，その説明があります．

> random_stateint, RandomState instance or None, optional (default=None)
> 
> The seed of the pseudo random number generator to use when shuffling the data for the dual coordinate descent (if dual=True). When dual=False the underlying implementation of LinearSVC is not random and random_state has no effect on the results. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.


説明の一文目に，dual coordiante descent (Platによる最適化のアルゴリズム）のために，dataをshuffleしており，shuffleのために擬似乱数生成をしていることが書いてあります．

つまり，学習データが毎回異なる順番になるため，学習したモデルは異なる挙動をするということです．
もしも，毎回同じモデルを生成したい場合，`LinearSVC`の引数に`random_state=0`(0ではなく適当な数字でいい) と指定することで実現できると思います．

SVMの詳しいアルゴリズムについては以下の文献や書籍が詳しく書かれていて良いと思います！
（下の機械学習のエッセンスは，一般的な機械学習手法のアルゴリズムから実装までが，丁寧に書かれているので，非常におすすめです．上は英語に抵抗がなければ良いと思います．）
- [Support Vector Machines, CS229 Lecture notes, Andrew Ng](http://cs229.stanford.edu/notes/cs229-notes3.pdf)
- [機械学習のエッセンス](https://amzn.to/3akmcNz)

関連した質問