【Pytorch】配布されているモデルの読み込み方が分からない

実現したいこと

Pytorch初心者です。
以下のモデルを使って画像のスコアを出力したいのですが、エラーが出てしまいます。

使用したいモデル
https://huggingface.co/Eugeoter/waifu-scorer-v3

発生している問題・分からないこと

・発生しているエラー
RuntimeError: Error(s) in loading state_dict for MLP: Missing key(s) in state_dict: "layers.7.weight", "layers.7.bias". Unexpected key(s) in state_dict: "layers.8.weight", "layers.8.bias", "layers.10.weight", "layers.10.bias", "layers.10.running_mean", "layers.10.running_var", "layers.10.num_batches_tracked", "layers.12.weight", "layers.12.bias", "layers.14.weight", "layers.14.bias", "layers.14.running_mean", "layers.14.running_var", "layers.14.num_batches_tracked", "layers.16.weight", "layers.16.bias", "layers.18.weight", "layers.18.bias", "layers.2.running_mean", "layers.2.running_var", "layers.2.num_batches_tracked", "layers.6.running_mean", "layers.6.running_var", "layers.6.num_batches_tracked". size mismatch for layers.0.weight: copying a param with shape torch.Size([2048, 768]) from checkpoint, the shape in current model is torch.Size([1024, 768]). size mismatch for layers.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for layers.2.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([128, 1024]). size mismatch for layers.2.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for layers.4.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([64, 128]). size mismatch for layers.4.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for layers.6.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([16, 64]). size mismatch for layers.6.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([16]).

こちらのソースコードを使用しました（モデルの説明文に記載されていたため）
https://github.com/christophschuhmann/improved-aesthetic-predictor/blob/main/simple_inference.py

該当のソースコード

python
1import numpy as np
2import torch
3import pytorch_lightning as pl
4import torch.nn as nn
5from torchvision import datasets, transforms
6import clip
7from PIL import Image, ImageFile
8
9
10#####  This script will predict the aesthetic score for this image file:
11
12img_path = "anime.jpg"
13
14# if you changed the MLP architecture during training, change it also here:
15class MLP(pl.LightningModule):
16    def __init__(self, input_size, xcol='emb', ycol='avg_rating'):
17        super().__init__()
18        self.input_size = input_size
19        self.xcol = xcol
20        self.ycol = ycol
21        self.layers = nn.Sequential(
22            nn.Linear(self.input_size, 1024),
23            #nn.ReLU(),
24            nn.Dropout(0.2),
25            nn.Linear(1024, 128),
26            #nn.ReLU(),
27            nn.Dropout(0.2),
28            nn.Linear(128, 64),
29            #nn.ReLU(),
30            nn.Dropout(0.1),
31
32            nn.Linear(64, 16),
33            #nn.ReLU(),
34
35            nn.Linear(16, 1)
36        )
37
38    def forward(self, x):
39        return self.layers(x)
40
41    def training_step(self, batch, batch_idx):
42            x = batch[self.xcol]
43            y = batch[self.ycol].reshape(-1, 1)
44            x_hat = self.layers(x)
45            loss = F.mse_loss(x_hat, y)
46            return loss
47    
48    def validation_step(self, batch, batch_idx):
49        x = batch[self.xcol]
50        y = batch[self.ycol].reshape(-1, 1)
51        x_hat = self.layers(x)
52        loss = F.mse_loss(x_hat, y)
53        return loss
54
55    def configure_optimizers(self):
56        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
57        return optimizer
58
59def normalized(a, axis=-1, order=2):
60    # import numpy as np  # pylint: disable=import-outside-toplevel
61
62    l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
63    l2[l2 == 0] = 1
64    return a / np.expand_dims(l2, axis)
65
66
67model = MLP(768)  # CLIP embedding dim is 768 for CLIP ViT L 14
68model_path = "waifu-scorer-v3.pth"
69if torch.cuda.is_available():
70    s = torch.load(model_path)   # load the model you trained previously or the model available in this repo
71else:
72    s = torch.load((model_path), map_location=torch.device('cpu'))
73
74model.load_state_dict(s)
75model.to("cuda")
76model.eval()
77
78device = "cuda" if torch.cuda.is_available() else "cpu"
79model2, preprocess = clip.load("ViT-L/14", device=device)  #RN50x64   
80
81
82pil_image = Image.open(img_path)
83
84image = preprocess(pil_image).unsqueeze(0).to(device)
85
86
87
88with torch.no_grad():
89   image_features = model2.encode_image(image)
90
91im_emb_arr = normalized(image_features.cpu().detach().numpy() )
92
93prediction = model(torch.from_numpy(im_emb_arr).to(device).type(torch.cuda.FloatTensor))
94
95print( "Aesthetic score predicted by the model:")
96print( prediction )
97
98

試したこと・調べたこと

teratailやGoogle等で検索した
ソースコードを自分なりに変更した
知人に聞いた
その他

上記の詳細・結果

上記のモデルはパラメータのみ保存されており、使用するためには自分でクラスを作成する必要があると認識しています。
しかしながら、どのような層の構造にすれば良いかが分かりません。

補足

モデルのOrderedDictは次のようになっています。
layers.0.weight torch.Size([2048, 768])
layers.0.bias torch.Size([2048])
layers.2.weight torch.Size([2048])
layers.2.bias torch.Size([2048])
layers.2.running_mean torch.Size([2048])
layers.2.running_var torch.Size([2048])
layers.2.num_batches_tracked torch.Size([])
layers.4.weight torch.Size([512, 2048])
layers.4.bias torch.Size([512])
layers.6.weight torch.Size([512])
layers.6.bias torch.Size([512])
layers.6.running_mean torch.Size([512])
layers.6.running_var torch.Size([512])
layers.6.num_batches_tracked torch.Size([])
layers.8.weight torch.Size([256, 512])
layers.8.bias torch.Size([256])
layers.10.weight torch.Size([256])
layers.10.bias torch.Size([256])
layers.10.running_mean torch.Size([256])
layers.10.running_var torch.Size([256])
layers.10.num_batches_tracked torch.Size([])
layers.12.weight torch.Size([128, 256])
layers.12.bias torch.Size([128])
layers.14.weight torch.Size([128])
layers.14.bias torch.Size([128])
layers.14.running_mean torch.Size([128])
layers.14.running_var torch.Size([128])
layers.14.num_batches_tracked torch.Size([])
layers.16.weight torch.Size([32, 128])
layers.16.bias torch.Size([32])
layers.18.weight torch.Size([1, 32])
layers.18.bias torch.Size([1])

使用したプログラム

python
1import torch
2
3state_dict = torch.load("waifu-scorer-v3.pth")
4
5# OrderedDict の内容を確認
6for param_tensor in state_dict:
7    print(param_tensor, "\t", state_dict[param_tensor].size())

meg_

2024/12/24 09:55 編集

> こちらのソースコードを使用しました（モデルの説明文に記載されていたため） > https://github.com/christophschuhmann/improved-aesthetic-predictor/blob/main/simple_inference.py "Inspired by"と書かれているので影響を受けて開発したという意味でモデルは別な気がしますが、他に情報がないですね・・・

行動規範の内容に同意します

回答1件

自己解決

以下の層構造にしたうえで、state_dictのキーを変更することで実行できました。

ただし実行結果はリンク先のサンプルとは一致せず、0～10の範囲外になる時もあります。
しかし数値の大小は一致しており、他の画像で試した際も直感通りの結果が得られました。

個人的なプログラムで使用する分には上記の結果で十分なため解決済みとさせて頂きます。

python
1# モデルの層構造
2nn.Linear(768, 2048), 
3nn.BatchNorm1d(2048),    
4nn.Tanh(),
5nn.Linear(2048, 512),   
6nn.BatchNorm1d(512),   
7nn.Tanh(),
8nn.Linear(512, 256), 
9nn.BatchNorm1d(256),
10nn.Tanh(),
11nn.Linear(256, 128),   
12nn.BatchNorm1d(128),       
13nn.Tanh(),
14nn.Linear(128, 32),      
15nn.Linear(32, 1)

python
1model = MLP(768)
2model_path = "waifu-scorer-v3.pth"
3s = torch.load(model_path) 
4
5# state_dictのキーを変更
6from collections import OrderedDict
7new_state_dict = OrderedDict()
8for key, value in s.items():
9    new_key = key.replace("layers.2", "layers.1")
10    new_key = new_key.replace("layers.4", "layers.3")
11    new_key = new_key.replace("layers.6", "layers.4")
12    new_key = new_key.replace("layers.8", "layers.6")
13    new_key = new_key.replace("layers.10", "layers.7")
14    new_key = new_key.replace("layers.12", "layers.9")
15    new_key = new_key.replace("layers.14", "layers.10")
16    new_key = new_key.replace("layers.16", "layers.12")
17    new_key = new_key.replace("layers.18", "layers.13")
18    new_state_dict[new_key] = value
19model.load_state_dict(new_state_dict)