回答率: 85.48%

質問するログイン新規登録

トップ機械学習に関する質問 tfidfのこーどで素数名と計算結果の関係

編集履歴

回答編集履歴

2

修正

2019/04/16 04:35

投稿

sequelanonymous

スコア123

test CHANGED Viewed

@@ -1,6 +1,8 @@
 [countvectorizer](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html#sklearn.feature_extraction.text.CountVectorizer)を利用することで本題の質問の本質を解決しました。
+tfのデータ個数は以下。
 ```python
@@ -25,3 +27,21 @@
         print(word, count)
 ```
+idfのデータ個数は以下。
+```
+from sklearn.feature_extraction.text import TfidfVectorizer
+t_vec = TfidfVectorizer(token_pattern=r"(?u)\b\w\w+\b")
+len(t_vec.vocabulary_)
+len(t_vec.idf_)
+```

1

誤字

2019/04/16 04:35

投稿

sequelanonymous

スコア123

test CHANGED Viewed

@@ -22,6 +22,6 @@
     for word, count in zip(c_vec.get_feature_names(), one_hot_vector_count):
-        print(word.sorted, count)
+        print(word, count)
 ```