予測モデルを使用して結果を出力

Question

### 実現したいこと作成した二値分類予測モデルを使用して，新規（未知）のデータを正常か異常できるかを確認したいです。単純にそのままcsvファイルを読み込めば良いわけではないことを確認し，手探りで色々と試したところ，解決に至らない状況です。エラーに対する解決策をご教示いただけますと幸いです。学習の際に使用した及び新規データのcsvファイルの形式は，下記のような30行・119列からなるデータになります。（1行目にunnamedという解析に関係のない列が入ってしまっています） |unnamed|idx|pelvis.x|pelvis.y|pelvis.z|left_hip.x|left_hip.y|left_hip|…|hip_angle|knee_angle| |:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:| |0|0|-1437.3|-1.9|1591.|-1357.5|81.1|1592.4|…|-359.8|1133.0| |0|1|-1429.4|-9.1|1590.7|-1357.5|77.4|1592.1|…|-300.3|1128.2| |0|2|-1420.2|-68.3|1575.7|-1348.4|19.0|1578.6|…|-347.0|1102.4| |0|…|…|…|…|…|…|…|…|…|…| |0|29|-1464.9|-210.8|1552.9|-100.1|1566.6|-1495.7|…|-393.1|1043.5| ### 前提下記のコードで予測モデルを作成し，"model.h5"として保存しました。 ```python import csv import math import numpy as np from glob import glob import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import os from sklearn.preprocessing import StandardScaler from sklearn.model_selection import StratifiedKFold from google.colab import drive drive.mount('/content/drive') class DataLoader: def __init__(self, split: int, batch_size: int, epochs: int, roll: int): self.batch_size = batch_size self.epochs = epochs self.roll = roll self.file, self.y, self.l = list(), list(), list() self.category = {"normal": 0, "abnormal": 1} for name in self.category.keys(): for i, file in enumerate(glob(f"/content/drive/MyDrive/data/{name}/*.csv")): self.file.append(file) self.y.append(self.category[name]) self.l.append(len(open(file).readlines()) - 1) assert self.l[-1] > roll, f"Missing roll size: (roll, file length): ({roll}, {self.l[-1]}) on {file}" self.skf = StratifiedKFold(split, shuffle = True) def generator(self, idx, epochs): X1, X2, X3, y = list(), list(), list(), list() for e in range(epochs): np.random.shuffle(idx) for i in idx: start = np.random.randint(0, self.l[i] - self.roll - 1) data = pd.read_csv(self.file[i]).values[start: start + self.roll] data = StandardScaler().fit_transform(data.reshape(-1, 1)).reshape(data.shape) X1.append(np.concatenate([data[:, 8:26], data[:, 71:83]], axis = -1)) X2.append(data[:, 107:113]) X3.append(data[:, 117:120]) y.append(self.y[i]) if len(X1) == self.batch_size: yield list(map(np.array, [X1, X2, X3])), np.array(y) X1, X2, X3, y = list(), list(), list(), list() if len(X1): yield list(map(np.array, [X1, X2, X3])), np.array(y) def split(self): for train, test in self.skf.split(self.file, self.y): self.test_idx = test yield ( self.generator(train, self.epochs), self.generator(test, self.epochs), math.ceil(len(train) / self.batch_size), math.ceil(len(test) / self.batch_size) ) import tensorflow as tf from tensorflow.python import keras from keras.models import Sequential, Model from keras.layers import Input, Dense, Concatenate, Flatten, Dropout from keras.layers import Conv1D, AveragePooling1D, GlobalAveragePooling1D from keras.layers import LSTM from keras.optimizers import Adam from keras.callbacks import EarlyStopping, ReduceLROnPlateau def build_model(time_stamp): inputs1 = Input(shape = (time_stamp, 30)) inputs2 = Input(shape = (time_stamp, 6)) inputs3 = Input(shape = (time_stamp, 3)) x1 = Conv1D(32, 7, activation = "swish", kernel_initializer = "he_uniform")(inputs1) x1 = AveragePooling1D()(x1) x2 = Conv1D(32, 7, activation = "swish", kernel_initializer = "he_uniform")(inputs2) x2 = AveragePooling1D()(x2) x3 = Conv1D(32, 7, activation = "swish", kernel_initializer = "he_uniform")(inputs3) x3 = AveragePooling1D()(x3) combined = Concatenate(axis = -1)([x1, x2, x3]) x = LSTM(32, dropout = 0.2)(combined) x = Dense(1, activation = "sigmoid")(x) return Model(inputs = [inputs1, inputs2, inputs3], outputs = x) from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay from sklearn.metrics import balanced_accuracy_score, accuracy_score, precision_score, recall_score, f1_score epochs = 128 batch_size = 8 time_stamp = 15 generator = DataLoader(split = 3, roll = time_stamp, batch_size = batch_size, epochs = epochs) for train_gen, valid_gen, steps_per_epoch, validation_steps in generator.split(): model = build_model(time_stamp) # Be sure to rebuild the model with each fold. model.summary() model.compile( loss = "binary_crossentropy", optimizer = Adam(), metrics = ["acc"] ) es = EarlyStopping( monitor = "val_loss", # val_lossが patience = 10, # 10epoch間で mode = "min", # 最小値を更新しなかったら restore_best_weights = True, # ベストのweightsを採用して終了 verbose = 1, ) model.fit( train_gen, epochs = epochs, steps_per_epoch = steps_per_epoch, class_weight = {0: 4, 1: 1}, validation_data = valid_gen, validation_steps = validation_steps, callbacks = [es], ) y_valid, y_pred = list(), list() test_generator = generator.generator(generator.test_idx, 1) for (X1, X2, X3), y in test_generator: y_pred.extend(round(model.predict([X1, X2, X3], batch_size = batch_size))) y_valid.extend(y) model.save("/content/drive/MyDrive/model.h5") ``` ### 発生している問題・エラーメッセージ ``` --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 13 X3 = data[:, 117:120] 14 ---> 15 model.predict([X1, X2, X3]) 1 frames /usr/local/lib/python3.9/dist-packages/keras/engine/training.py in tf__predict_function(iterator) 13 try: 14 do_return = True ---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope) 16 except: 17 do_return = False ValueError: in user code: File "/usr/local/lib/python3.9/dist-packages/keras/engine/training.py", line 2169, in predict_function * return step_function(self, iterator) File "/usr/local/lib/python3.9/dist-packages/keras/engine/training.py", line 2155, in step_function ** outputs = model.distribute_strategy.run(run_step, args=(data,)) File "/usr/local/lib/python3.9/dist-packages/keras/engine/training.py", line 2143, in run_step ** outputs = model.predict_step(data) File "/usr/local/lib/python3.9/dist-packages/keras/engine/training.py", line 2111, in predict_step return self(x, training=False) File "/usr/local/lib/python3.9/dist-packages/keras/utils/traceback_utils.py", line 70, in error_handler raise e.with_traceback(filtered_tb) from None File "/usr/local/lib/python3.9/dist-packages/keras/engine/input_spec.py", line 298, in assert_input_compatibility raise ValueError( ValueError: Input 0 of layer "model_2" is incompatible with the layer: expected shape=(None, 15, 30), found shape=(None, 30) ``` ### 該当のソースコード ```python import pandas as pd from tensorflow.keras.models import load_model import numpy as np model = load_model("/content/drive/MyDrive/model.h5") data = pd.read_csv("/content/drive/MyDrive/data/new/ID001.csv").values X1 = np.concatenate([data[:, 8:26], data[:, 71:83]], axis = -1) X2 = data[:, 107:113] X3 = data[:, 117:120] model.predict([X1, X2, X3]) ``` ### 試したこと dataをmodel.predict(data)としてもエラーメッセージ（ValueError: Layer model_2 expects 3 input(s), but it received 1 input tensors.）が出て，3つの変数を入れてもエラーメッセージが出ている状況です。

Accepted Answer

np.newaxisで次元を追加することで解決

Answer

## ChatGPTの回答

> 質問文は省略

## 問題点
新規のデータを分類するために、既存のモデルを使用する方法をお探しですが、新しいデータの形式をどのように処理すればよいか分からないようです。具体的には、どのように新しいデータを前処理して、モデルに入力すればよいかわからないようです。

## 解決策
新しいデータを前処理する方法には、以下の手順を踏むことが一般的です。

## データの読み込み
データの前処理
モデルの予測
# 1. データの読み込み
新しいデータを読み込むために、pandas パッケージを使用することが一般的です。具体的には、以下のように read_csv 関数を使用して、csv ファイルを読み込みます。

```python
import pandas as pd

new_data = pd.read_csv('new_data.csv')
```
# 2. データの前処理
読み込んだ新しいデータを前処理するために、既存のデータと同じ前処理手順を使用する必要があります。例えば、以下のように、既存の DataLoader クラスの generator 関数で使用している前処理手順を新しいデータにも適用することができます。

```python
def preprocess_data(data, roll):
    data = StandardScaler().fit_transform(data.reshape(-1, 1)).reshape(data.shape)
    X1 = np.concatenate([data[:, 8:26], data[:, 71:83]], axis=-1)
    X2 = data[:, 107:113]
    X3 = data[:, 117:120]
    return [X1[-roll:], X2[-roll:], X3[-roll:]]

def preprocess_new_data(data, roll):
    data = preprocess_data(data, roll)
    return [np.expand_dims(x, axis=0) for x in data]
```
この関数では、新しいデータをスケーリングし、3つの特徴量に分割します。また、新しいデータの最後の roll 行を抽出し、各特徴量の次元を 1 に拡張します。

# 3. モデルの予測
新しいデータを前処理したら、予測を実行することができます。以下のように、predict 関数を使用して、新しいデータのクラスを予測することができます。

```python
model = keras.models.load_model('model.h5') # 保存したモデルを読み込む
preprocessed_data = preprocess_new_data
```
## 所感
もっともらしい回答をしているが、ChatGPTも問題をどう把握してよいか分からない模様。

idx	pelvis.x	pelvis.y	pelvis.z	left_hip.x	left_hip.y	left_hip	…	hip_angle	knee_angle
0	-1437.3	-1.9	1591.	-1357.5	81.1	1592.4	…	-359.8	1133.0
1	-1429.4	-9.1	1590.7	-1357.5	77.4	1592.1	…	-300.3	1128.2
2	-1420.2	-68.3	1575.7	-1348.4	19.0	1578.6	…	-347.0	1102.4
…	…	…	…	…	…	…	…	…	…
29	-1464.9	-210.8	1552.9	-100.1	1566.6	-1495.7	…	-393.1	1043.5

実現したいこと

前提

発生している問題・エラーメッセージ

該当のソースコード

試したこと

ChatGPTの回答

問題点

解決策

データの読み込み

1. データの読み込み

2. データの前処理

3. モデルの予測

所感

関連した質問