python3.6機械学習 hogを使ったカラー写真分類がうまくできません

機械学習のカラー写真分類でHOGをインポートし局所特徴量を抽出して、クラスタリングを行いたいのですがクラスタリングを行うときにエラーがでてしまって躓いています。
使用するカラー写真はkeras.datasetsのcifar10.load_data()で訓練用とテスト用データがそれぞれ5万と1万個ずつはいっています。ラベルの数は10個です。

python
1#データの前処理
2n_samples = np.arange(10000)
3(X_train, labels_train),(X_test, labels_test) = cifar10.load_data()
4labels_train=labels_train.reshape(-1)
5labels_test=labels_test.reshape(-1)
6
7labels_train=labels_train[n_samples] #処理を軽くするためデータを1万個にする
8X_train=X_train[n_samples]
9
10#HOGの実行
11from skimage.color import rgb2gray#カラーからモノクロへ変換
12from skimage.feature import hog
13
14def get_descriptors(data):
15    orientations = 9
16    pixels_per_cell = (4,4) #写真を小領域に分割
17    cells_per_block = (3,3)#グリッドで局所特徴量を抽出
18    feature_vector = hog(rgb2gray(data), orientations, pixels_per_cell, cells_per_block)
19    return feature_vector.reshape(-1,np.multiply(*cells_per_block)* orientations)
20
21for data in X_train:
22    data_descriptors=get_descriptors(data)
23    
24    
25data_descriptors = np.array(data_descriptors)
26
27#クラスタリングでエラーが発生
28from sklearn.cluster import MiniBatchKMeans 
29np.random.seed(0)
30
31codebook_size = 1000
32
33descriptors = np.vstack(data_descriptors[X_test])
34indices = np.random.choice(np.arange(len(descriptors)), size=500000, replace=False)
35kmeans = MiniBatchKMeans(n_clusters=codebook_size, n_init=10, random_state=0)
36kmeans.fit(descriptors[indices].astype(float))
37del descriptors, indicesIndexError: index 59 is out of bounds for axis 0 with size 36

IndexError Traceback (most recent call last)
<ipython-input-12-46d604a03021> in <module>()
5 codebook_size = 1000
6
----> 7 descriptors = np.vstack(data_descriptors[X_test])
8 indices = np.random.choice(np.arange(len(descriptors)), size=500000, replace=False)
9 kmeans = MiniBatchKMeans(n_clusters=codebook_size, n_init=10, random_state=0)

IndexError: index 158 is out of bounds for axis 0 with size 36

MasashiKimura

2017/07/17 20:27

完全なエラーメッセージを載せていただかないと、どこでこのエラーが起きているのか特定できないと思います。

pikaso

2017/07/20 08:39

返信が遅くなって申し訳ありません。先ほど詳しいエラーの箇所を表記しました。

行動規範の内容に同意します

回答1件

とりあえず、ザッとみたところ数点問題がありそうです。

１．

Python
1for data in X_train:
2    data_descriptors=get_descriptors(data)
3
4data_descriptors = np.array(data_descriptors)

の部分はループ毎にdata_descriptorsを上書きしておりますので最終的に最後のデータしか残らない状態になっております。

ここは

Python
1data_descriptors = []
2for data in X_train:
3    data_descriptors.append(get_descriptors(data))
4
5data_descriptors = np.array(data_descriptors)

のように記述するべきなのではないでしょうか。

２．
get_descriptors() 関数の戻り値が

Python
1    return feature_vector.reshape(-1,np.multiply(*cells_per_block)* orientations)

のようにわざわざ多次元にreshapeしておりますが、最終的にこのデータをKmeansに入力するのでであれば、

Python
1    return feature_vector

のように、１次元の特徴量のまま保持しておくべきではないでしょうか。

３．
最後の４行

Python
1descriptors = np.vstack(data_descriptors[X_test])
2indices = np.random.choice(np.arange(len(descriptors)), size=500000, replace=False)
3kmeans = MiniBatchKMeans(n_clusters=codebook_size, n_init=10, random_state=0)
4kmeans.fit(descriptors[indices].astype(float))

はイマイチ意味が分りません。（特に１行目は何を行いたいのでしょうか？）
Kmeansでクラスタリングしたいのであれば、単純に

Python
1kmeans = MiniBatchKMeans(…)
2kmeans.fit(data_descriptors)

では駄目なのでしょうか。

４．
最後に、今回データとしてcifar10を使っているので、MiniBatchKMeans に引数として渡すクラスタリング数(n_clusters)は 1000 ではなくて 10 なのではないでしょうか。

投稿2017/07/19 23:55

magichan

総合スコア15898

あなたの回答

tips

プレビュー

行動規範の内容に同意します

質問の解決につながる回答をしましょう。サンプルコードなど、より具体的な説明があると質問者の理解の助けになります。また、読む側のことを考えた、分かりやすい文章を心がけましょう。

まだベストアンサーが選ばれていません

会員登録して回答してみよう

アカウントをお持ちの方は

15分調べてもわからないことは
teratailで質問しよう！

ただいまの回答率
85.48%

質問をまとめることで
思考を整理して素早く解決

テンプレート機能で
簡単に質問をまとめる

質問する

質問をすることでしか得られない、回答やアドバイスがある。

15分調べてもわからないことは、質問しよう！

python3.6機械学習 hogを使ったカラー写真分類がうまくできません

関連した質問