質問編集履歴

誤字

2022/11/15 07:13

投稿

yuncy21

スコア2

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -10,7 +10,7 @@
 mask_vecs = outputs[0].numpy()[[i+1 for i in masked_indexs]]
 ```
 １行目のelseが何故必要なのかわかりません。
-２行目のmask_vecsは問題文全体のスコアということで合っていますでしょうか。
+２行目のmask_vecsは問題文全体のベクトルということで合っていますでしょうか。
 よろしくお願いいたします。
 ### 実現したいこと

Python

内容の修正

2022/11/15 07:12

投稿

yuncy21

スコア2

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -3,16 +3,14 @@
 bertの勉強をしようと軽く触れた段階です。プログラミング自体初心者です。
 https://www.ai-shift.co.jp/techblog/550
 勉強にあたって、上記のサイトのコードを実行してみたところ、attributeエラーが出ました。
-解決方法を教えていただきたいです。
+エラーの意味は分かるのですが、どう修正していけばいいかわかりません。解決方法を教えていただきたいです。
 ```python
 tokens = ["[MASK]" if t == "*" else t for i, t in enumerate(tokens)]
-masked_indexs = [i for i, v in enumerate(ids[0]) if v == 103]
 mask_vecs = outputs[0].numpy()[[i+1 for i in masked_indexs]]
 ```
-また、関数内で上記のコードの[　]内がどういう動きをしているのかよくわかりません。
+１行目のelseが何故必要なのかわかりません。
-[ 　]内にifやforが入るコードを見たことがなく、困惑している状況です。
+２行目のmask_vecsは問題文全体のスコアということで合っていますでしょうか。
-こちらの解説もしていただければ幸いです。
 よろしくお願いいたします。
 ### 実現したいこと
@@ -41,23 +39,24 @@
 ```Python
 def part6_slover(text, candidate, answer, q):
-    if max([len(tokenizer.tokenize(c)) for c in candidate]) == 1:
+    if max([len(tokenizer.tokenize(c)) for c in candidate]) == 1: #candidate(候補)を順番に見てすべて1単語であれば
-        return part5_slover(text, candidate, answer)
+        return part5_slover(text, candidate, answer) #part5のロジックを使用
-    tokens = tokenizer.tokenize(text)
+    tokens = tokenizer.tokenize(text) #textをトークン化
-    tokens = ["[MASK]" if t == "*" else t for i, t in enumerate(tokens)]
+    tokens = ["[MASK]" if t == "*" else t for i, t in enumerate(tokens)] #トークンを順番に見て*を[MASK]に変更
-    tokens = ["[CLS]"] + tokens + ["[SEP]"]
+    tokens = ["[CLS]"] + tokens + ["[SEP]"] #前後に[CLS],[SEP]を付与
-    ids = tokenizer.convert_tokens_to_ids(tokens)
+    ids = tokenizer.convert_tokens_to_ids(tokens) #id化
-    ids = torch.tensor(ids).reshape(1,-1)
+    ids = torch.tensor(ids).reshape(1,-1) #テンソル化
-    ids = ids.cuda()
+    ids = ids.cuda() #GPU
-    masked_indexs = [i for i, v in enumerate(ids[0]) if v == 103]
+    masked_indexs = [i for i, v in enumerate(ids[0]) if v == 103] #[MASK]のid(103)がでてくるのは何番目か
-    with torch.no_grad():
+    with torch.no_grad(): #文章ベクトルを計算
         outputs, _ = model.bert(ids)
-    mask_vecs = outputs[0].numpy()[[i+1 for i in masked_indexs]]
+    mask_vecs = outputs[0].numpy()[[i+1 for i in masked_indexs]] #？
-    c_vecs = np.array([get_bert_vec(c) for c in candidate])
+    c_vecs = np.array([get_bert_vec(c) for c in candidate]) #選択肢の文章ベクトルを計算
-    return candidate[cosine_similarity([mask_vecs[q-1]], c_vecs)[0].argsort()[-1]]
+    return candidate[cosine_similarity([mask_vecs[q-1]], c_vecs)[0].argsort()[-1]] #cos類似度の高いものを戻り値に
 ```
+コメントは独自に意味を汲み取ったものです。誤った捉え方をしていたら教えてほしいです。

Python