質問編集履歴

コードの追加

2022/12/06 12:55

投稿

スコア13

test CHANGED Viewed

File without changes

test CHANGED Viewed

@@ -14,5 +14,35 @@
 ### 試したこと
 unidicやipadicで試しましたが，同じ結果になりました．
+```python
+import re
+import time
+import codecs
+import MeCab
+if __name__ == '__main__':
+    file = codecs.open("/content/dataset_kurashiki_spring_test.txt", 'r', 'utf-8')   #ファイルを開いてファイルオブジェクトを取得(codecs.open())
+    documents = [document.strip() for document in file]    #strip()で空白文字を削除
+    file.close()
+    # number of documents
+    N = len(documents)
+    print(N)
+    segList = []
+    for document in documents:
+        mecab = MeCab.Tagger()
+        #print(document)
+        mecab.parse('')
+        data = mecab.parse(document)
+        node = mecab.parseToNode(document)
+        #print(node)
+        while node:
+            if node.feature.split(",")[0] == u"名詞":
+                segList.append(node.surface)
+            node = node.next
+    print(segList)
+```