実現したいこと
読み込んだCSVから形態素解析の分かち書きを出力したい。下記にあるソースコードの数字変換までは上手くいったのですが、その後の分かち書きをした結果を返す場所でエラーが起きてしまいます。Python初心者にご教授お願い致します。
発生している問題・エラーメッセージ
~\.conda\envs\NaturalLanguage01\python.exe ~\PycharmProjects\nlp01\report\report1.py ~\PycharmProjects\nlp01\report\report1.py:23: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy review_df['review_number_to_zero'] = review_df['review'].map(replace_number_to_zero) Traceback (most recent call last): File "~\PycharmProjects\nlp01\report\report1.py", line 34, in <module> review_df['lsbw'] = review_df['review_number_to_zero'].map(leaving_space_between_words_column) File "~\.conda\envs\NaturalLanguage01\lib\site-packages\pandas\core\series.py", line 4161, in map new_values = super()._map_values(arg, na_action=na_action) File "~\.conda\envs\NaturalLanguage01\lib\site-packages\pandas\core\base.py", line 870, in _map_values new_values = map_f(values, mapper) File "pandas\_libs\lib.pyx", line 2859, in pandas._libs.lib.map_infer File "~\PycharmProjects\nlp01\report\report1.py", line 30, in leaving_space_between_words_column splitted = ' '.join([x.split('\t')[0] for x in tagger.parse(text).splitlines()[:-1] if x.split('\t')[1].split(',')[0] not in ['助詞', '助動詞', '接続詞', '動詞', '記号']]) File "~\PycharmProjects\nlp01\report\report1.py", line 30, in <listcomp> splitted = ' '.join([x.split('\t')[0] for x in tagger.parse(text).splitlines()[:-1] if x.split('\t')[1].split(',')[0] not in ['助詞', '助動詞', '接続詞', '動詞', '記号']]) IndexError: list index out of range Process finished with exit code 1
該当のソースコード
Python
1import csv 2import MeCab 3import re 4import pandas as pd 5 6shinjuku_ramen_df = pd.read_csv('data/shinjuku_ramen_review_high.csv',index_col=0) 7with open('data/shinjuku_ramen_review_high.csv') as f: 8 writer = csv.writer(f, lineterminator='\n') 9 10# 店舗情報 11store_df = shinjuku_ramen_df[['store_id', 'store_name', 'score', 'ward', 'review_cnt']] 12# 重複データを削除する 13store_df = store_df.drop_duplicates(['store_id', 'store_name', 'score', 'ward', 'review_cnt']) 14# 口コミ情報 15review_df = shinjuku_ramen_df[['store_id', 'review']] 16 17# あまり関係のないと思われる数字を全て0に置き換える関数 18def replace_number_to_zero(text): 19 changed_text = re.sub(r'[0-9]+', "0", text) #半角 20 changed_text = re.sub(r'[0-9]+', "0", changed_text) #全角 21 return changed_text 22# 数字を0に置換 23review_df['review_number_to_zero'] = review_df['review'].map(replace_number_to_zero) 24 25# 形態素解析する 26tagger = MeCab.Tagger('-Ochasen -u \"C:/laboratory/MeCab/dic/neologd/neologd.dic\"') 27 28# 分かち書きした結果を返す関数 29def leaving_space_between_words_column(text): 30 splitted = ' '.join([x.split('\t')[0] for x in tagger.parse(text).splitlines()[:-1] if x.split('\t')[1].split(',')[0] not in ['助詞', '助動詞', '接続詞', '動詞', '記号']]) 31 return splitted 32 33# 分かち書きしたカラムをdfに追加する 34review_df['lsbw'] = review_df['review_number_to_zero'].map(leaving_space_between_words_column) 35print(review_df.head())
回答2件
あなたの回答
tips
プレビュー