質問編集履歴

tf_idfs = vectorizer.fit_transform(training_docs)に修正しました。(変更前はwords(df.ix[i,"titlebeginning"]))

2018/06/09 09:21

投稿

James1201

スコア15

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -54,7 +54,8 @@
 for i in range(0,len(df)):
     vectorizer = TfidfVectorizer(use_idf=True, token_pattern=u'(?u)\b\w+\b')
-    tf_idfs = vectorizer.fit_transform(words(df.ix[i,"titlebeginning"]))
+    tf_idfs = vectorizer.fit_transform(training_docs)
+    print(tf_idfs)
 ```
 以上の処理を行い、最終的には、以下のコードにtf-idf処理を施して重要度の低い単語を除いたtraining_docsを代入したいのですが、どのようにしたら良いのでしょうか。
@@ -66,4 +67,51 @@
 model.docvecs.similarity(0,1551)
 ```
+エラー内容
+```python
+AttributeError                            Traceback (most recent call last)
+<ipython-input-32-a49b5702c1c0> in <module>()
+      4 for i in range(0,len(df)):
+      5     vectorizer = TfidfVectorizer(use_idf=True, token_pattern=u'(?u)\b\w+\b')
+----> 6     tf_idfs = vectorizer.fit_transform(training_docs)
+      7     print(tf_idfs)
+~/anaconda3/envs/kenkyuu/lib/python3.6/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
+   1379             Tf-idf-weighted document-term matrix.
+   1380         """
+-> 1381         X = super(TfidfVectorizer, self).fit_transform(raw_documents)
+   1382         self._tfidf.fit(X)
+   1383         # X is already a transformed view of raw_documents so
+~/anaconda3/envs/kenkyuu/lib/python3.6/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
+    867
+    868         vocabulary, X = self._count_vocab(raw_documents,
+--> 869                                           self.fixed_vocabulary_)
+    870
+    871         if self.binary:
+~/anaconda3/envs/kenkyuu/lib/python3.6/site-packages/sklearn/feature_extraction/text.py in _count_vocab(self, raw_documents, fixed_vocab)
+    790         for doc in raw_documents:
+    791             feature_counter = {}
+--> 792             for feature in analyze(doc):
+    793                 try:
+    794                     feature_idx = vocabulary[feature]
+~/anaconda3/envs/kenkyuu/lib/python3.6/site-packages/sklearn/feature_extraction/text.py in <lambda>(doc)
+    264
+    265             return lambda doc: self._word_ngrams(
+--> 266                 tokenize(preprocess(self.decode(doc))), stop_words)
+    267
+    268         else:
+~/anaconda3/envs/kenkyuu/lib/python3.6/site-packages/sklearn/feature_extraction/text.py in <lambda>(x)
+    230
+    231         if self.lowercase:
+--> 232             return lambda x: strip_accents(x.lower())
+    233         else:
+    234             return strip_accents
+AttributeError: 'TaggedDocument' object has no attribute 'lower'
+```
 [こちらが問題のファイルになります](https://www.dropbox.com/s/auixihg8n344voz/%E3%82%BF%E3%82%A4%E3%83%88%E3%83%AB%E3%81%A8%E5%86%92%E9%A0%AD%E4%B8%80%E8%A6%A7.csv?dl=0)