回答編集履歴
2
修正
answer
CHANGED
@@ -1,5 +1,6 @@
|
|
1
1
|
[countvectorizer](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html#sklearn.feature_extraction.text.CountVectorizer)を利用することで本題の質問の本質を解決しました。
|
2
2
|
|
3
|
+
tfのデータ個数は以下。
|
3
4
|
```python
|
4
5
|
from sklearn.feature_extraction.text import CountVectorizer
|
5
6
|
import numpy as np
|
@@ -11,4 +12,13 @@
|
|
11
12
|
|
12
13
|
for word, count in zip(c_vec.get_feature_names(), one_hot_vector_count):
|
13
14
|
print(word, count)
|
15
|
+
```
|
16
|
+
|
17
|
+
idfのデータ個数は以下。
|
18
|
+
```
|
19
|
+
from sklearn.feature_extraction.text import TfidfVectorizer
|
20
|
+
t_vec = TfidfVectorizer(token_pattern=r"(?u)\b\w\w+\b")
|
21
|
+
|
22
|
+
len(t_vec.vocabulary_)
|
23
|
+
len(t_vec.idf_)
|
14
24
|
```
|
1
誤字
answer
CHANGED
@@ -10,5 +10,5 @@
|
|
10
10
|
one_hot_vector_count = np.sum(a=one_hot_vector, axis=0)
|
11
11
|
|
12
12
|
for word, count in zip(c_vec.get_feature_names(), one_hot_vector_count):
|
13
|
-
print(word
|
13
|
+
print(word, count)
|
14
14
|
```
|