python　jupyter notebookのエラーについて

Question

```cls_names = [] total_images = 0 for gov in govs: file_list = os.listdir(base_path + gov + '/Annotations/') for file in file_list: total_images = total_images + 1 if file =='.DS_Store': pass else: infile_xml = open(base_path + gov + '/Annotations/' +file) tree = ElementTree.parse(infile_xml) root = tree.getroot() for obj in root.iter('object'): cls_name = obj.find('name').text cls_names.append(cls_name) print("total") print("# of images：" + str(total_images)) print("# of labels：" + str(len(cls_names))) コード ```--------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) in () 13 else: 14 infile_xml = open(base_path + gov + '/Annotations/' +file) ---> 15 tree = ElementTree.parse(infile_xml) 16 root = tree.getroot() 17 for obj in root.iter('object'): /anaconda3/lib/python3.6/xml/etree/ElementTree.py in parse(source, parser) 1194 """ 1195 tree = ElementTree() -> 1196 tree.parse(source, parser) 1197 return tree 1198 /anaconda3/lib/python3.6/xml/etree/ElementTree.py in parse(self, source, parser) 595 # It can be used to parse the whole source without feeding 596 # it with chunks. --> 597 self._root = parser._parse_whole(source) 598 return self._root 599 while True: /anaconda3/lib/python3.6/codecs.py in decode(self, input, final) 319 # decode input (taking the buffer into account) 320 data = self.buffer + input --> 321 (result, consumed) = self._buffer_decode(data, self.errors, final) 322 # keep undecoded input until the next call 323 self.buffer = data[consumed:] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 45: invalid start byte コード ### 前提・実現したいこと python初心者です。 GitHubから引用し。指示通り、jupyternotebookで動かしているのですがエラーが出てしまい、進むことができませんどこかコードが間違えていると思われるのですが、検討もつかず困っています。データセットから画像などを読み込んでいると思うのですが、'utf-8' と言うものが引っかかっており、エラーが出ていると思います。　コードの追加、削除するところがあれば教えて下さい。 ### 発生している問題・エラーメッセージ 'utf-8'だと思われます ### 該当のソースコード 'utf-8' codec can't decode byte 0xb0 in position 45: invalid start byte ここに問題に対して試したことを記載してください。 #!/usr/bin/env python # -*- coding: utf-8 -*- import sys import codecs sys.stdout = codecs.getwriter('utf_8')(sys.stdout) などを追加してみたのですがダメでした ### 補足情報（FW/ツールのバージョンなど）ここにより詳細な情報を記載してください。参考にしているgithubです https://github.com/sekilab/RoadDamageDetector/blob/master/RoadDamageDatasetTutorial.ipynb

Accepted Answer

ファイルを開く際に文字コードを教えてやるとうまくいくかもしれません。
```Python
encoding = "utf-8"
# encoding = "cp932"  # utf-8がだめならこっちを試してみてください。
f = open("filename.xml", encoding=encoding)
root = ElementTree.parse(f)
```
文字コードのトラブル
* [https://teratail.com/questions/137252](https://teratail.com/questions/137252)
* [https://teratail.com/questions/137233](https://teratail.com/questions/137233)

もしくは、次のようにファイル名を与えてみてはどうでしょうか。

```Python
tree = ElementTree.parse("filename.xml")
```

Answer

読み込もうとしているxmlファイルがutf-8以外の文字エンコーディングで保存されているのでは？

案1：対象のファイルをuft-8で保存しなおしてから再実行。

案2：「[ElementTreeでダメなsjisファイルを読む方法 - ゲームエンジニアな日々](http://shive.hateblo.jp/entry/20091116/1258348316)」などを参考に、対象ファイルの文字エンコーディングで読み込むよう修正する。

Answer

infile_xmlで指定されているファイルが正しいxmlファイルではないのではないでしょうか。ソースが示されていないので断定出来ないけど。

関連した質問