pythonについての質問です
gensimのチュートリアルのコードです
>>>from gensim import corpora, models, similarities >>> >>> documents = ["Human machine interface for lab abc computer applications", >>> "A survey of user opinion of computer system response time", >>> "The EPS user interface management system", >>> "System and human system engineering testing of EPS", >>> "Relation of user perceived response time to error measurement", >>> "The generation of random binary unordered trees", >>> "The intersection graph of paths in trees", >>> "Graph minors IV Widths of trees and well quasi ordering", >>> "Graph minors A survey"] >>> # remove common words and tokenize >>> stoplist = set('for a of the and to in'.split()) >>> texts = [[word for word in document.lower().split() if word not in stoplist] >>> for document in documents] >>> >>> # remove words that appear only once >>> all_tokens = sum(texts, []) >>> tokens_once = set(word for word in set(all_tokens) if all_tokens.count(word) == 1) >>> texts = [[word for word in text if word not in tokens_once] >>> for text in texts] >>> >>> print texts [['human', 'interface', 'computer'], ['survey', 'user', 'computer', 'system', 'response', 'time'], ['eps', 'user', 'interface', 'system'], ['system', 'human', 'system', 'eps'], ['user', 'response', 'time'], ['trees'], ['graph', 'trees'], ['graph', 'minors', 'trees'], ['graph', 'minors', 'survey']]
これはインタプリタで処理を順々に行った、過程とその結果です
以下の過程がわかっていません
>>> stoplist = set('for a of the and to in'.split()) >>> texts = [[word for word in document.lower().split() if word not in stoplist] >>> for document in documents]
特にtextsのリストの部分がfor文とif文が混在していてよくわかりません
詳しく教えていただけると幸いです
また、これをインタプリタではなく、.pyのファイル形式で書き換えるなら
どうなりますか
教えていただけるとありがたいです
よろしくお願いいたします
回答2件
あなたの回答
tips
プレビュー
バッドをするには、ログインかつ
こちらの条件を満たす必要があります。