質問編集履歴

回答を参考に直してみました。　でもエラー文が出てきて分かりません。教えて下さい

2019/09/30 05:53

投稿

kawauso.love

スコア23

test CHANGED Viewed

File without changes

test CHANGED Viewed

@@ -1,18 +1,90 @@
 PCで名詞のみかつリスト化された文章から分散表現取得
 gensim のword2vecを使用し、分散表現はPC にファイルとして保存したいです。
+mecabを使い名詞のみリスト化された文章を使いたいです。
 ```python3
+from pymongo import MongoClient
+from bs4 import BeautifulSoup
+import MeCab
+mecab = MeCab.Tagger ('/usr/local/lib/mecab/dic/mecab-ipadic-neologd')
+def main():
+    recipes = []
+    client = MongoClient('localhost', 27017)
+    db = client.html.cookpad_html
+    collection = db.test_collection
+    htmls = list(db.find().limit(1))
+    recipes = []
+    for num, html in enumerate(htmls):
+        soup = BeautifulSoup(html["html"], 'lxml')
+        for steps in soup.find_all(attrs={"class": "step_text"}):
+            node = mecab.parseToNode(steps.get_text())
+            while node:
+                if node.feature.split(",")[0] == '名詞':
+                    recipes.append(node.feature.split(",")[6])
+                node = node.next
+                recipes = list(set(recipes))
+    print(recipes)
+if __name__ == '__main__':
+    main()
+text = 'main()'
+file = open('text_file_name.txt', 'w')
+file.write(text)
+file.close()
 from janome.tokenizer import Tokenizer
 from gensim.models import word2vec
+# 単語の分かち書き＆スペースで区切る
-# 単語の分かち書き＆スペースで区切る
+import codecs
 text_space = ""
@@ -28,15 +100,11 @@
     text_space += " "
 # ファイル書き込み
 with codecs.open('wakachigaki_file_name.txt', 'w', 'utf-8') as file:
     file.write(text_space)
 # Word2vecのモデルの作成
@@ -58,92 +126,28 @@
 model.save('model_name.model')
 # モデルの読み込みと類義語の計算
 model = word2vec.Word2Vec.load("model_name.model")
 model.most_similar(positive="単語", topn=10)
 ```
-上のような形でやりたいのですが、私は、ファイルを読み込むのではなく、mecabを使い名詞のみリスト化された文章を使いたいです。
+ "word '単語' not in vocabulary" とエラー文が出たきました。
-下にリスト化した文章があります。
-これを使いたいです。
-```python3
-from pymongo import MongoClient
-from bs4 import BeautifulSoup
-import MeCab
-mecab = MeCab.Tagger ('/usr/local/lib/mecab/dic/mecab-ipadic-neologd')
-def main():
-    recipes = []
-    client = MongoClient('localhost', 27017)
-    db = client.html.cookpad_html
-    collection = db.test_collection
-    htmls = list(db.find().limit(1))
-    recipes = []
-    for num, html in enumerate(htmls):
-        soup = BeautifulSoup(html["html"], 'lxml')
+https://lib-arts.hatenablog.com/entry/nlp_tutorial3
-        for steps in soup.find_all(attrs={"class": "step_text"}):
-            node = mecab.parseToNode(steps.get_text())
-            while node:
-                if node.feature.split(",")[0] == '名詞':
-                    recipes.append(node.feature.split(",")[6])
-                node = node.next
-    recipes = list(set(recipes))
-    print(recipes)
-if __name__ == '__main__':
+このサイトを参考にしました。
-    main()
-```
-どう、ファイルの読み込みのところを、変えたらいいのか分かりません。
+どうすればいいのか分からないので教えて下さい。
-急ぎで誰か教えてくれると助かります。