質問編集履歴

エラー文の全文掲載

2018/10/26 06:49

投稿

kanpan

スコア20

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -5,7 +5,26 @@
 ### 発生している問題・エラーメッセージ
 ```
+(base) C:\jikken>python genModel.py
+C:\Anaconda\lib\site-packages\gensim\utils.py:1212: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
+  warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
+Traceback (most recent call last):
+  File "genModel.py", line 5, in <module>
+    model = word2vec.Word2Vec(data, size=200)
+  File "C:\Anaconda\lib\site-packages\gensim\models\word2vec.py", line 767, in __init__
+    fast_version=FAST_VERSION)
+  File "C:\Anaconda\lib\site-packages\gensim\models\base_any2vec.py", line 759, in __init__
+    self.build_vocab(sentences=sentences, corpus_file=corpus_file, trim_rule=trim_rule)
+  File "C:\Anaconda\lib\site-packages\gensim\models\base_any2vec.py", line 936, in build_vocab
+    sentences=sentences, corpus_file=corpus_file, progress_per=progress_per, trim_rule=trim_rule)
+  File "C:\Anaconda\lib\site-packages\gensim\models\word2vec.py", line 1571, in scan_vocab
+    total_words, corpus_count = self._scan_vocab(sentences, progress_per, trim_rule)
+  File "C:\Anaconda\lib\site-packages\gensim\models\word2vec.py", line 1540, in _scan_vocab
+    for sentence_no, sentence in enumerate(sentences):
+  File "C:\Anaconda\lib\site-packages\gensim\models\word2vec.py", line 1363, in __iter__
+    text = rest + fin.read(8192)  # avoid loading the entire file (=1 line) into RAM
 UnicodeDecodeError: 'cp932' codec can't decode byte 0xef in position 0: illegal multibyte sequence
 ```
 ### 該当のソースコード

追記

2018/10/26 06:48

投稿

kanpan

スコア20

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -27,4 +27,5 @@
 というエラーがでてできなかった。
 ### 補足情報（FW/ツールのバージョンなど）
-tweet.csvは文字コードをutf-8で保存してあります。
+tweet.csvは文字コードをutf-8で保存してあります。
+また、csvファイルの中身はすべて日本語のツイートのみです。