PythonでCSVファイルをJSONファイルに書き換える時に文字エンコーディングのエラーが生じてしまいます

PythonでCSVファイルをJSONファイルに書き換える時に文字エンコーディングのエラーが生じてしまいます。

使用しているコードは以下になります。

#coding:utf-8

import csv, json

filename = 'test_code'

header = []
data = []

def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
    # csv.py doesn't do Unicode; encode temporarily as UTF-8:
    csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),dialect=dialect, **kwargs)
    for row in csv_reader:
        # decode UTF-8 back to Unicode, cell by cell:
        yield [unicode(cell, 'utf-8') for cell in row]

def utf_8_encoder(unicode_csv_data):
    for line in unicode_csv_data:
        yield line

with open(filename + '.csv', 'rU') as csvfile:
    spamreader = unicode_csv_reader(csvfile, dialect=csv.excel)
    is_first = True
    
    for row in spamreader:
        if is_first:
            header = row[:]
            is_first = False
            continue
        items = {}
        for i in range(0, len(row)):
            item = row[i]
            items[header[i]] = item
        data.append(items)

with open(filename+'.json', 'w') as f:
    json.dump(data, f, ensure_ascii=False, indent=2, encoding='utf8')
    f.close()

使用しているCSVファイルは以下になります（実際は200行ほど続きますが、ここには一部のみ表示します）。
文字エンコーディングはutf-8です。

cityid	name
id141011	札幌市
id141020	釧路市
id141038	函館市

このままpythonプログラムを実行すると、以下のエラーが表示されます。

Traceback (most recent call last):
  File "conv_csv_to_json.py", line 37, in <module>
    json.dump(data, f, ensure_ascii=False, indent=2, encoding='utf8')
  File "c:\Python27\lib\json\__init__.py", line 190, in dump
    fp.write(chunk)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-3: ordinal not in range(128)

このエラーから、pythonコードの以下の所に間違いがあると考えられます。

    json.dump(data, f, ensure_ascii=False, indent=2, encoding='utf8')

なので、

json.dump(data, f, ensure_ascii=False, indent=2)

や

json.dump(data, f, ensure_ascii=False, indent=2).encode('utf-8')

とコードを変えて試しているのですが、改善されません。

どのようにすればこのエラーを解決できるでしょうか。
わかる方がおりましたら、よろしくお願い致します。

追記
そのままのコードを実行した場合、以下のjsonファイルが作成されます

[
  {
    "cityid": "id141011", 
    "name":

漢字のところでエラーが生じ、それ以降は処理されていません。

行動規範の内容に同意します

回答2件

python
1# decode UTF-8 back to Unicode, cell by cell:
2        yield [unicode(cell, 'utf-8') for cell in row]

を

python
1        yield row

とするのでも通るようになるかと。

投稿2015/12/04 03:08

hiro-k

総合スコア902

AudioStakes

2015/12/04 03:56

続けてありがとうございます！ yield row としても、エラーメッセージは変わりませんでした。また、 yield row items[header[i]] = item.encode('utf-8') のようにどちらとも変更した場合、エラーメッセージが以下のように変わりました Traceback (most recent call last): File "conv_csv_to_json.py", line 33, in <module> items[header[i]] = item.encode('utf-8') UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128) あと少しでできそうになりつつありますが・・・。

hiro-k

2015/12/04 04:04

yield [unicode(cell, 'utf-8') for cell in row] で utf-8 → unicode と変換していたのを、 item.encode('utf-8') で unicode → utf-8 と変換するようにしたのが1個目の修正、 yield row とすることで unicode にせずに utf-8 のままにしてるのが 2個目の修正、なので、両方修正するというのは意味がありません。

AudioStakes

2015/12/04 04:09

解説ありがとうございます。コードそれぞれがどのような役割をしているのか考えながら解決していこうと思います。

行動規範の内容に同意します

ベストアンサー

python
1    items[header[i]] = item

を

python
1    items[header[i]] = item.encode('utf-8')

とすれば通るのではないかと。

投稿2015/12/04 03:04

編集2015/12/04 03:09

hiro-k

総合スコア902

AudioStakes

2015/12/04 03:50

回答ありがとうございます！ items[header[i]] = item.encode('utf-8') にしたところ、エラーメッセージに変更はありませんでした。別のところに原因があるようです。

hiro-k

2015/12/04 04:02 編集

申し訳有りません、、、 json.dump(data, f, ensure_ascii=False, indent=2, encoding='utf8') をjson.dump(data, f, ensure_ascii=False, indent=2, encoding='utf-8') に変更する(utf8→utf-8)のが必要なのを忘れてました。 2個目の回答も同じです。

AudioStakes

2015/12/04 04:04

ありがとうございます！無事解決できました！ utf-8ではなくutf8となっていたことも原因だったのですね。単純ミスを載せてしまい、申し訳ありませんでした。本当にありがとうございました。

行動規範の内容に同意します

あなたの回答