実現したいこと
Pytorch初心者です。
以下のモデルを使って画像のスコアを出力したいのですが、エラーが出てしまいます。
使用したいモデル
https://huggingface.co/Eugeoter/waifu-scorer-v3
発生している問題・分からないこと
・発生しているエラー
RuntimeError: Error(s) in loading state_dict for MLP: Missing key(s) in state_dict: "layers.7.weight", "layers.7.bias". Unexpected key(s) in state_dict: "layers.8.weight", "layers.8.bias", "layers.10.weight", "layers.10.bias", "layers.10.running_mean", "layers.10.running_var", "layers.10.num_batches_tracked", "layers.12.weight", "layers.12.bias", "layers.14.weight", "layers.14.bias", "layers.14.running_mean", "layers.14.running_var", "layers.14.num_batches_tracked", "layers.16.weight", "layers.16.bias", "layers.18.weight", "layers.18.bias", "layers.2.running_mean", "layers.2.running_var", "layers.2.num_batches_tracked", "layers.6.running_mean", "layers.6.running_var", "layers.6.num_batches_tracked". size mismatch for layers.0.weight: copying a param with shape torch.Size([2048, 768]) from checkpoint, the shape in current model is torch.Size([1024, 768]). size mismatch for layers.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for layers.2.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([128, 1024]). size mismatch for layers.2.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for layers.4.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([64, 128]). size mismatch for layers.4.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for layers.6.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([16, 64]). size mismatch for layers.6.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([16]).
こちらのソースコードを使用しました(モデルの説明文に記載されていたため)
https://github.com/christophschuhmann/improved-aesthetic-predictor/blob/main/simple_inference.py
該当のソースコード
python
1import numpy as np 2import torch 3import pytorch_lightning as pl 4import torch.nn as nn 5from torchvision import datasets, transforms 6import clip 7from PIL import Image, ImageFile 8 9 10##### This script will predict the aesthetic score for this image file: 11 12img_path = "anime.jpg" 13 14# if you changed the MLP architecture during training, change it also here: 15class MLP(pl.LightningModule): 16 def __init__(self, input_size, xcol='emb', ycol='avg_rating'): 17 super().__init__() 18 self.input_size = input_size 19 self.xcol = xcol 20 self.ycol = ycol 21 self.layers = nn.Sequential( 22 nn.Linear(self.input_size, 1024), 23 #nn.ReLU(), 24 nn.Dropout(0.2), 25 nn.Linear(1024, 128), 26 #nn.ReLU(), 27 nn.Dropout(0.2), 28 nn.Linear(128, 64), 29 #nn.ReLU(), 30 nn.Dropout(0.1), 31 32 nn.Linear(64, 16), 33 #nn.ReLU(), 34 35 nn.Linear(16, 1) 36 ) 37 38 def forward(self, x): 39 return self.layers(x) 40 41 def training_step(self, batch, batch_idx): 42 x = batch[self.xcol] 43 y = batch[self.ycol].reshape(-1, 1) 44 x_hat = self.layers(x) 45 loss = F.mse_loss(x_hat, y) 46 return loss 47 48 def validation_step(self, batch, batch_idx): 49 x = batch[self.xcol] 50 y = batch[self.ycol].reshape(-1, 1) 51 x_hat = self.layers(x) 52 loss = F.mse_loss(x_hat, y) 53 return loss 54 55 def configure_optimizers(self): 56 optimizer = torch.optim.Adam(self.parameters(), lr=1e-3) 57 return optimizer 58 59def normalized(a, axis=-1, order=2): 60 # import numpy as np # pylint: disable=import-outside-toplevel 61 62 l2 = np.atleast_1d(np.linalg.norm(a, order, axis)) 63 l2[l2 == 0] = 1 64 return a / np.expand_dims(l2, axis) 65 66 67model = MLP(768) # CLIP embedding dim is 768 for CLIP ViT L 14 68model_path = "waifu-scorer-v3.pth" 69if torch.cuda.is_available(): 70 s = torch.load(model_path) # load the model you trained previously or the model available in this repo 71else: 72 s = torch.load((model_path), map_location=torch.device('cpu')) 73 74model.load_state_dict(s) 75model.to("cuda") 76model.eval() 77 78device = "cuda" if torch.cuda.is_available() else "cpu" 79model2, preprocess = clip.load("ViT-L/14", device=device) #RN50x64 80 81 82pil_image = Image.open(img_path) 83 84image = preprocess(pil_image).unsqueeze(0).to(device) 85 86 87 88with torch.no_grad(): 89 image_features = model2.encode_image(image) 90 91im_emb_arr = normalized(image_features.cpu().detach().numpy() ) 92 93prediction = model(torch.from_numpy(im_emb_arr).to(device).type(torch.cuda.FloatTensor)) 94 95print( "Aesthetic score predicted by the model:") 96print( prediction ) 97 98
試したこと・調べたこと
- teratailやGoogle等で検索した
- ソースコードを自分なりに変更した
- 知人に聞いた
- その他
上記の詳細・結果
上記のモデルはパラメータのみ保存されており、使用するためには自分でクラスを作成する必要があると認識しています。
しかしながら、どのような層の構造にすれば良いかが分かりません。
補足
モデルのOrderedDictは次のようになっています。
layers.0.weight torch.Size([2048, 768])
layers.0.bias torch.Size([2048])
layers.2.weight torch.Size([2048])
layers.2.bias torch.Size([2048])
layers.2.running_mean torch.Size([2048])
layers.2.running_var torch.Size([2048])
layers.2.num_batches_tracked torch.Size([])
layers.4.weight torch.Size([512, 2048])
layers.4.bias torch.Size([512])
layers.6.weight torch.Size([512])
layers.6.bias torch.Size([512])
layers.6.running_mean torch.Size([512])
layers.6.running_var torch.Size([512])
layers.6.num_batches_tracked torch.Size([])
layers.8.weight torch.Size([256, 512])
layers.8.bias torch.Size([256])
layers.10.weight torch.Size([256])
layers.10.bias torch.Size([256])
layers.10.running_mean torch.Size([256])
layers.10.running_var torch.Size([256])
layers.10.num_batches_tracked torch.Size([])
layers.12.weight torch.Size([128, 256])
layers.12.bias torch.Size([128])
layers.14.weight torch.Size([128])
layers.14.bias torch.Size([128])
layers.14.running_mean torch.Size([128])
layers.14.running_var torch.Size([128])
layers.14.num_batches_tracked torch.Size([])
layers.16.weight torch.Size([32, 128])
layers.16.bias torch.Size([32])
layers.18.weight torch.Size([1, 32])
layers.18.bias torch.Size([1])
使用したプログラム
python
1import torch 2 3state_dict = torch.load("waifu-scorer-v3.pth") 4 5# OrderedDict の内容を確認 6for param_tensor in state_dict: 7 print(param_tensor, "\t", state_dict[param_tensor].size())
回答1件
あなたの回答
tips
プレビュー