画像認識のための機械学習プログラムでclf.fit(x, y)のエラーが発生した

ゼロから優しく始めるpython入門を参考に、画像認識のプログラムを組んでいます。

データセットの作成はサンプルプログラムを参考に、エラーなく実行できたのですが、データを学習させるプログラムでエラーが発生しました。

python
1# 学習用とテスト用に分割 --- (*1)
2from sklearn.model_selection import train_test_split as split
3x, x_test, y, y_test = split(data, target)
4
5# データを学習 --- (*2)
6from sklearn import svm
7clf = svm.LinearSVC()
8clf.fit(x, y)
9
10# モデルを評価 --- (*3)
11
12pred = clf.predict(x_test)
13result = list(pred == y_test).count(True) / len(y_test)
14print("正解率=" + str(result))

上記のプログラムを実行するとこのようなエラーが出ました。

ValueError                                Traceback (most recent call last)
<ipython-input-65-79fe9f04f566> in <module>
      6 from sklearn import svm
      7 clf = svm.LinearSVC()
----> 8 clf.fit(x, y)
      9 
     10 # モデルを評価 --- (*3)

~\Anaconda3\lib\site-packages\sklearn\svm\classes.py in fit(self, X, y, sample_weight)
    227         X, y = check_X_y(X, y, accept_sparse='csr',
    228                          dtype=np.float64, order="C",
--> 229                          accept_large_sparse=False)
    230         check_classification_targets(y)
    231         self.classes_ = np.unique(y)

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    717                     ensure_min_features=ensure_min_features,
    718                     warn_on_dtype=warn_on_dtype,
--> 719                     estimator=estimator)
    720     if multi_output:
    721         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    494             try:
    495                 warnings.simplefilter('error', ComplexWarning)
--> 496                 array = np.asarray(array, dtype=dtype, order=order)
    497             except ComplexWarning:
    498                 raise ValueError("Complex data not supported\n"

~\Anaconda3\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
    536 
    537     """
--> 538     return array(a, dtype, copy=False, order=order)
    539 
    540 

ValueError: setting an array element with a sequence.

また、データとラベルはこのように作成しました。

python
1from PIL import Image
2import numpy as np
3import glob
4
5#画像を読み込んでデータとラベルに追加する
6data = [] 
7target = [] 
8
9def glob_images(dir,label,size):
10    files = glob.glob(dir + "/*.JPG")
11    for f in files:
12        img = Image.open(f)#画像を開く
13        img = img.convert("RGB")#念のためRGB画像に変換
14        img.thumbnail((size,size),Image.LANCZOS)#サイズ変更方法を指定してリサイズ
15        ary = np.array(img).reshape(-1,) #一次元の配列にする
16        data.append(ary)
17        target.append(label) #ラベルに追加
18
19#画像ディレクトリとラベル、画像サイズを指定してデータを追加
20glob_images("./tissue",label=0,size=8)
21glob_images("./floor",label=1,size=8)

10行目は写真の拡張子がJPGなので、それに合わせ変更しています。
また、最後の２行も使用するファイル名に合わせています。

どうやら８行目の構文がおかしいようなのですが、エラーメッセージを見ても理解できませんでした。
改善方法などを教えていただけると嬉しいです。

環境
Jupyter
python 3.7.4

windows 10

tiitoi

2019/10/30 09:45

data, target の形状はどうなっているのでしょうか

falcon_titan

2019/10/30 11:15

質問内容に追記しました。情報がたりず、すみません。

tiitoi

2019/10/31 11:30

データがなくて動かせないので、原因はわかりませんが、とりあえず data と target は numpy 配列にしてはどうでしょうか data = np.array(data ) target= np.array(target) そして print(data.shape, target.shape) としたとき、data は二次元配列、target は1次元配列になっていることを確認してください

falcon_titan

2019/10/31 11:31

分かりました。確認してみます。

falcon_titan

2019/10/31 11:37

data = np.array(data) target= np.array(target) に変更したところ、 NameError Traceback (most recent call last) <ipython-input-6-d505bff7f02b> in <module> 4 5 #画像を読み込んでデータとラベルに追加する ----> 6 data = np.array(data) 7 target= np.array(target) 8 NameError: name 'data' is not defined と、dataが定義されていないと表示されます。なにか新たに追記するべきなのでしょうか。

tiitoi

2019/10/31 12:33

どこに追加したのでしょうか？ glob_images("./tissue",label=0,size=8) glob_images("./floor",label=1,size=8) の下に入れれば少なくとも定義されていないということはないはずですが

falcon_titan

2019/10/31 12:59

data = [] target = [] の部分を間違って書き換えてしまっていました。修正してprint(data.shape, target.shape)すると (144,) (144,) との結果が出ました。 dataのみでで実行した場合は array([array([187, 178, 156, 184, 174, 150, 182, 168, 140, 181, 167, 137, 177, 162, 131, 181, 167, 135, 177, 161, 127, 182, 162, 124, 187, 179, 157, 183, 171, 145, 187, 174, 145, 179, 162, 125, 180, 164, 131, 184, 171, 140, 182, 169, 134, 189, 171, 132, 189, 180, 158, 186, 172, 147, 218, 233, 236, 209, 215, 206, 179, 161, 124, 187, 175, 146, 184, 171, 135, 191, 175, 135, 190, 177, 155, 195, 182, 158, 209, 214, 208, 216, 229, 236, 199, 193, 177, 187, 174, 142, 187, 174, 140, 193, 176, 138, 192, 177, 152, 200, 187, 160, 200, 187, 158, 204, 196, 175, 204, 200, 182, 190, 180, 148, 192, 178, 144, 199, 181, 143, 199, 183, 159, 195, 182, 158, 186, 176, 155, 179, 170, 147, 180, 169, 144, 175, 166, 140, 177, 164, 133, 183, 166, 132], dtype=uint8), 　　　省略　　array([194, 186, 177, 195, 181, 169, 169, 112, 106, 151, 51, 43, 163, 95, 80, 164, 103, 66, 163, 107, 81, 174, 134, 116, 183, 170, 156, 185, 170, 157, 138, 37, 29, 149, 21, 14, 172, 93, 85, 160, 125, 105, 190, 166, 156, 237, 238, 239, 171, 153, 138, 193, 186, 179, 176, 145, 133, 136, 112, 93, 92, 107, 70, 142, 142, 119, 180, 156, 148, 196, 169, 156, 171, 151, 132, 196, 184, 177, 176, 176, 162, 46, 72, 32, 22, 52, 9, 86, 100, 75, 200, 185, 175, 183, 171, 164, 172, 152, 132, 185, 174, 166, 205, 196, 191, 95, 107, 82, 93, 98, 77, 177, 165, 140, 194, 190, 180, 80, 113, 148], dtype=uint8), array([189, 164, 155, 156, 66, 61, 160, 101, 93, 198, 193, 187, 172, 155, 150, 168, 155, 146, 107, 83, 62, 78, 56, 39, 159, 153, 153, 59, 78, 104, 104, 111, 121, 195, 182, 168, 150, 132, 119, 148, 132, 119, 139, 121, 105, 115, 96, 82, 125, 128, 136, 32, 54, 85, 63, 71, 90, 174, 163, 147, 158, 143, 128, 131, 111, 96, 150, 137, 121, 110, 89, 71, 159, 143, 120, 155, 132, 97, 154, 130, 96, 161, 146, 128, 160, 146, 131, 119, 97, 83, 141, 126, 111, 124, 105, 86, 138, 111, 67, 141, 100, 33, 137, 95, 37, 151, 131, 110, 156, 143, 128, 119, 100, 86, 117, 99, 85, 130, 117, 101], dtype=uint8)], dtype=object) と２次元配列になっているので、配列には問題ないと思います。

tiitoi

2019/10/31 13:14 編集

2次元配列の場合、data.shape が (サンプル数, 次元数) となっているはずなので、そうなっていないのはおかしいと思います。考えられる原因は data に追加した各要素の配列の長さが違うとかでしょうか？

falcon_titan

2019/11/30 12:35

ここ1か月忙しく、長い間返信することができず、すみません。解決のためにも、自分で機械学習や画像認識についてより学ぼうと思うため、この質問は保留とします。いろいろとご協力ありがとうございました。