python、機械学習のやり方について

python、機械学習初心者になります。
（画像認識を少々行った程度です）

現在、scikit-learnを使用して、ランダムな数字列(１～９９)の入ったCSVを時系列に学習させて、
１．次に出る数字列を予測
２．指定した数字列の出る確率を出す
というものを作ってみようかと考えております。
（無意味な数字の羅列から、次に出る数字列を機械で予想できないことは承知しています）

ちなみに読み込むCSVは下記になります

学習させるCSV(study.csv)

csv
1data1,data2,data3,data4
222,62,73,97
322,61,66,99
426,58,76,78
57,35,43,60
630,35,75,86
735,61,87,89
848,50,62,74
986,87,94,98
1033,34,52,95
1132,35,58,61
1220,50,79,84
1382,15,92,96
1437,50,66,78
1562,74,87,89
1651,53,65,95
1774,75,80,86
1827,89,80,90
1962,78,57,89
2077,79,88,89
2137,54,88,92

テストで出現確率を取得したいCSV(test.csv)

csvｖ
1data1,data2,data3,data4
235,77,60,43
360,43,35,77
481,21,90,33

「１．次に出る数字列を予測」についてですが、FacebookのProphetを使用するのではなく、
LinearRegressionでの使用を考えています。

やってみようとしたやり方としては、
１～４行目の場合、５行目が出る
２～５行目の場合、６行目が出る
・
・
・
を続けていった時、study.csvの最後の４行

27,89,80,90
62,78,57,89
77,79,88,89
37,54,88,92

を学習した際に次に何が出るかを機械に出力させたいと思い
教師データを作成しようとしたのですが、

●説明変数

python
1trainX = 
2[
3[[22,62,73,110],[22,61,66,99],[26,58,76,78],[7,35,43,60]],
4[[22,61,66,99],[26,58,76,78],[7,35,43,60],[30,35,100,126]]
5]

●説明変数の結果？

python
1trainY = 
2[
3[30,35,100,126],
4[35,61,97,115]
5]

※イメージとしては、trainXの1行目の結果がtrainYの1行目…という感じです

【質問１】
ここで上記のtrainX、trainYで

python
1model = LinearRegression()
2model.fit(trainX, trainY)

を行うと、

「ValueError: Found array with dim 3. Estimator expected <= 2.」
が出てきてしまいます。
fitの引数のtrainXには2次元までのlistしか使用できないのでしょうか。
3次元を2次元に変換（reshape）する必要があるのでしょうか

【質問２】
質問１が解決できたとして、LinearRegressionで次の数字列を予測・取得することはできないのでしょうか。

python
1model.predict(testX)

を使用するとなると、testXという目的変数を使用しないと動作しないと思います。
（testXにstudy.csvの最後の4行を入れて、predictし、結果を取得するのが本来のやり方になるのでしょうか）
もしLinearRegressionを使用ない、次の予測したデータの取得方法があるのであれば方法などのヒントを教えていただけると助かります。

【質問３】
また、質問１が解決できたとして、
model.predict(testX)
の目的変数testXに

python
1testX = 
2[
3[1,1,1,1]
4[1,1,1,2]
5　　・
6　　・
7　　・
8[99,99,99,98]
9[99,99,99,99]
10]

として、各要素の出現率を求めることは可能でしょうか。
（「テストで出現確率を取得したいCSV(test.csv)」を指定することで、出現する確率を取得したい）

【質問４】
このような予測は、正答率は問題外として「教師なし」の方法でも可能でしょうか

まだpython、機械学習自体が初心者で、質問自体が頓珍漢かもしれませんが、
ヒントだけでもいただけると学習の方向性がわかるかと思うのでアドバイスをよろしくお願いいたします。

meg_

2019/09/18 10:32

エラーコードは抜粋ではなく、全て掲載してください。

imonikai

2019/09/19 02:06

ソースとエラー情報になります。ちなみに環境はwin10、jupyternotebook、python3になります --------------------------------------------------------------------------- from sklearn.linear_model import LinearRegression as LR # 説明変数 trainX = \ [ [[22,62,73,97],[22,61,66,99],[26,58,76,78],[7,35,43,60]], [[22,61,66,99],[26,58,76,78],[7,35,43,60],[30,35,75,86]] ] # 説明変数の結果？ trainY = \ [ [30,35,75,86], [35,61,87,89] ] model = LinearRegression() model.fit(trainX, trainY) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-11-e4f0a312c6d1> in <module> 17 18 model = LinearRegression() ---> 19 model.fit(trainX, trainY) 20 21 ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\linear_model\base.py in fit(self, X, y, sample_weight) 461 n_jobs_ = self.n_jobs 462 X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'], --> 463 y_numeric=True, multi_output=True) 464 465 if sample_weight is not None and np.atleast_1d(sample_weight).ndim > 1: ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator) 717 ensure_min_features=ensure_min_features, 718 warn_on_dtype=warn_on_dtype, --> 719 estimator=estimator) 720 if multi_output: 721 y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False, ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator) 537 if not allow_nd and array.ndim >= 3: 538 raise ValueError("Found array with dim %d. %s expected <= 2." --> 539 % (array.ndim, estimator_name)) 540 if force_all_finite: 541 _assert_all_finite(array, ValueError: Found array with dim 3. Estimator expected <= 2.

行動規範の内容に同意します

回答2件

ドキュメントには下記説明があります。ドキュメント

X : array-like or sparse matrix, shape (n_samples, n_features)

        Training data
    y : array_like, shape (n_samples, n_targets)
        Target values. Will be cast to X's dtype if necessary

Xは配列か行列、Yは配列　のようですね。

投稿2019/09/18 10:46

meg_

総合スコア10742

On a new tool, it is always good to start with a small project. For example, in this case,the classification of iris flowers on the iris dataset.

Iit'ss a good project and is really very easy to understand.

All the attributes within the dataset are numeric, you just have to figure out how to load and handle data

It is a multi-class classification problem thereby allowing you to practice the supervised learning algorithm learn python

4 attributes and 150 rows, meaning it is small and easily fits into memory
All of the numeric attributes are in the same units and the same scale, not requiring any special scaling or transforms to get started

投稿2019/09/18 10:26

編集2019/09/18 10:28