質問編集履歴

2

エラー文の全文掲載

2018/10/26 06:49

投稿

kanpan
kanpan

スコア20

test CHANGED
File without changes
test CHANGED
@@ -12,7 +12,45 @@
12
12
 
13
13
  ```
14
14
 
15
+ (base) C:\jikken>python genModel.py
16
+
17
+ C:\Anaconda\lib\site-packages\gensim\utils.py:1212: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
18
+
19
+ warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
20
+
21
+ Traceback (most recent call last):
22
+
23
+ File "genModel.py", line 5, in <module>
24
+
25
+ model = word2vec.Word2Vec(data, size=200)
26
+
27
+ File "C:\Anaconda\lib\site-packages\gensim\models\word2vec.py", line 767, in __init__
28
+
29
+ fast_version=FAST_VERSION)
30
+
31
+ File "C:\Anaconda\lib\site-packages\gensim\models\base_any2vec.py", line 759, in __init__
32
+
33
+ self.build_vocab(sentences=sentences, corpus_file=corpus_file, trim_rule=trim_rule)
34
+
35
+ File "C:\Anaconda\lib\site-packages\gensim\models\base_any2vec.py", line 936, in build_vocab
36
+
37
+ sentences=sentences, corpus_file=corpus_file, progress_per=progress_per, trim_rule=trim_rule)
38
+
39
+ File "C:\Anaconda\lib\site-packages\gensim\models\word2vec.py", line 1571, in scan_vocab
40
+
41
+ total_words, corpus_count = self._scan_vocab(sentences, progress_per, trim_rule)
42
+
43
+ File "C:\Anaconda\lib\site-packages\gensim\models\word2vec.py", line 1540, in _scan_vocab
44
+
45
+ for sentence_no, sentence in enumerate(sentences):
46
+
47
+ File "C:\Anaconda\lib\site-packages\gensim\models\word2vec.py", line 1363, in __iter__
48
+
49
+ text = rest + fin.read(8192) # avoid loading the entire file (=1 line) into RAM
50
+
15
51
  UnicodeDecodeError: 'cp932' codec can't decode byte 0xef in position 0: illegal multibyte sequence
52
+
53
+
16
54
 
17
55
  ```
18
56
 

1

追記

2018/10/26 06:48

投稿

kanpan
kanpan

スコア20

test CHANGED
File without changes
test CHANGED
@@ -57,3 +57,5 @@
57
57
 
58
58
 
59
59
  tweet.csvは文字コードをutf-8で保存してあります。
60
+
61
+ また、csvファイルの中身はすべて日本語のツイートのみです。