辞書の出力のズレを直したい

1万件のツイートを格納したテキストファイルから抽出した名詞を対象に、SlothLibを用いたストップワードの除去と辞書の構築を行いたいと考えています。
下記のように、Jupyterと新規に作成した辞書とで出力が違うのですが、これはどのようなことが起こっていて、どのように修正すればいいのか教えていただけますと幸いです。
宜しくお願いします。

ファイルの中身

Dictionary(0 unique tokens: [])

Jupyter notebook上

Dictionary(2845 unique tokens: ['', '000 ギフト コード', '038', '0458', '100 ガチ恋 距離空間 さっき お誕生日おめでとう']...)

該当のソースコード

python
1# import emoji
2#
3# words = list(filter(lambda x: x in emoji.UNICODE_EMOJI, words))
4#https://qiita.com/chamao/items/7edaba62b120a660657e
5from natto import MeCab
6import os
7import urllib.request
8import codecs
9from gensim import corpora
10from collections import Counter
11from collections import defaultdict
12frequency = defaultdict(int)
13
14f = open('test40.txt')
15corpus = f.read().split("\n")
16
17mecab = MeCab('-d /usr/local/lib/mecab/dic/mecab-ipadic-neologd')
18
19#if tagger.lang == 'ja':
20
21with codecs.open("test40.txt", "r", "utf-8") as f:
22    corpus = f.read().split("\n")
23
24rm_list = ["RT","https","co","さん","フォロー","本日","応募","今日","プレゼント","お金","FGO","無料","本人","投稿","動画","ツイート","リツイート","Twitter","ローソン","Peing","http","Amazonギフト券","bot","発売中","Youtube","www","WWW","質問箱","コラボ","フォロワー","DM","いいね","ＲＴ","lawson","://","！","peing","youtube","抽選","jp","リプ","キャンペーン","チケット","期間限定","DHC","日本","amp","人間","チャンネル","配信中","YouTube","WEB","楽しみ","イラスト","くじ","@","__"]
25
26stop_words = []
27path = 'stop_words.txt'
28with open(path) as g:
29    stop_words = g.readlines()
30
31docs = []
32for txt in corpus:
33    words = mecab.parse(txt, as_nodes=True)
34    doc = []
35
36    for w in words:
37        if w.feature.split(",")[0] == "名詞":
38            if not any(sw in w.surface for sw in stop_words):
39                if not any(rm in w.surface for rm in rm_list):
40                    if len(w.surface) >= 3:
41                        doc.append(str(w.surface))
42
43    doc = ' '.join(doc)
44    docs.append(doc)
45corpus = docs
46
47str_corpus = str(corpus).split(' ')
48
49dictionary = corpora.Dictionary([corpus])
50print(dictionary)
51
52dictionary.filter_extremes(no_below=2, no_above=0.5)
53dictionary.save_as_text('test40-dic2.txt')
54
55with open("test40-dic2-3.txt", "a", encoding="utf-8") as h:
56    h.write(str(dictionary))

補足情報（FW/ツールのバージョンなど）

iOS 10.12.6, Python 3.7.3, Atom

行動規範の内容に同意します

回答1件

自己解決

dictionary.filter_extremes(no_below=2, no_above=0.5)

が

print(dictionary)

の前についていないことに気づきました。

投稿2019/08/19 04:00

farinelli

総合スコア61

あなたの回答

tips

プレビュー

行動規範の内容に同意します

質問の解決につながる回答をしましょう。サンプルコードなど、より具体的な説明があると質問者の理解の助けになります。また、読む側のことを考えた、分かりやすい文章を心がけましょう。

15分調べてもわからないことは
teratailで質問しよう！

ただいまの回答率
85.49%

質問をまとめることで
思考を整理して素早く解決

テンプレート機能で
簡単に質問をまとめる

質問する

質問をすることでしか得られない、回答やアドバイスがある。

15分調べてもわからないことは、質問しよう！

辞書の出力のズレを直したい

ファイルの中身

Jupyter notebook上

該当のソースコード

補足情報（FW/ツールのバージョンなど）

関連した質問