スクレイピングしてきた文字列を形態素解析する際に出るUnicodeDecodeErrorを無くしたい

Question

###前提・実現したいこと
pythonでURLを指定し、bs4を用いてスクレイピングした後にタグを除去しテキストデータに保存、Jumanを用いて形態素解析するシステムを作っています。

###発生している問題・エラーメッセージ
形態素解析する際にエラーが発生しました。

Traceback (most recent call last):
  File "Juman_FILE.py", line 35, in <module>
    for l in open(FileName,"r",encoding=Encode):
  File "C:\Users\y2j\AppData\Local\Programs\Python\Python36-32\lib\codecs.py", l
ine 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 13: invalid
 start byte

###該当のソースコード

python
1#スクレイピングのプログラム
2import urllib.request
3import bs4
4from bs4 import NavigableString,Declaration,Comment
5
6
7url = 'https://jouhou.nagoya/nagoya-miyage-matome/'
8#soup = bs4.BeautifulSoup(urllib.request.urlopen(url).read(),"html.parser")
9resulttxt='解析'
10
11def getNavigableStrings(soup):
12  if isinstance(soup, NavigableString):
13    if type(soup) not in (Comment, Declaration) and soup.strip():
14      yield soup
15  elif soup.name not in ('script', 'style'):
16    for c in soup.contents:
17      for g in getNavigableStrings(c):
18        yield g
19
20soup = bs4.BeautifulSoup(urllib.request.urlopen(url).read(),"html.parser")
21
22text = '\n'.join(getNavigableStrings(soup))
23
24#text = text.decode("utf-8")
25
26import codecs
27file_object= codecs.open(resulttxt + ".txt", "wb", "cp932", "ignore")
28file_object.write(text + "\n")
29file_object.close()
30
31
32# JUMANのプログラム
33import subprocess
34import sys
35import collections
36
37
38
39# ファイルバッファ使用
40
41# 解析元のファイル名
42FileName="解析.txt"
43# 解析する最大数
44LoadCount=10000
45# 解析する最大数無制限
46#LoadCount=float("inf")
47# 表示するランキングの数
48RankCount=10
49
50# 使用するエンコード
51Encode="utf-8"
52JumanEnc="sjis"
53# 使用するコマンド
54JumanCommand=["juman"]
55# バッファのファイル名
56Buffer="tmp"
57# 外部タスク起動
58Juman=subprocess.Popen(JumanCommand,stdin=subprocess.PIPE,stdout=open(Buffer,"w"))
59
60# TF値
61TF=collections.defaultdict(lambda:0)
62# 総単語数
63WordsCount=0
64
65# 形態素解析器に文単位でテキストを渡していく
66C=0
67for l in open(FileName,"r",encoding=Encode):
68    l=l.strip()
69    if len(l)==0:
70        continue
71    for i in l.split("。"):
72        C+=1
73        sys.stdout.write("{0}/{1}\r".format(C,LoadCount))
74        sys.stdout.flush()
75        #エンコードを指定しないとうまく行かない
76        Juman.stdin.write((i+"\n").encode(JumanEnc))
77        #上限に達した際のループ脱出
78        if C>=LoadCount:
79            break
80    if C>=LoadCount:
81        break
82# JUMANの停止(バッファのファイルは自動で閉じられる)
83Juman.stdin.close()
84Juman.wait()
85# 結果の読み込みとカウント
86for l in open(Buffer,"r",encoding=JumanEnc):
87    l=l.strip().split(" ")
88    # 終端の場合ループはじめからやり直し
89    if len(l)<2:
90        continue
91    # 対象を名詞に限定
92    if l[3]!="名詞":
93        continue
94    Word=l[0]
95    # 別の解析結果を除外
96    if Word[0]=="@":
97        continue
98    TF[Word]+=1
99    WordsCount+=1
100# カウント結果からTF値へ変換
101for k in TF.keys():
102    TF[k]/=WordsCount
103# TF値降順に並び替え
104Ranking=list(TF.items())
105Ranking.sort(key=lambda x:x[1],reverse=True)
106# 表示
107print()
108print("="*40)
109print("順位\t単語\tTF値")
110print("="*40)
111for r,(k,v) in zip(range(1,RankCount+1),Ranking):
112    print("{0}\t{1}\t{2}".format(r,k,v))
113print("="*40)
114

###試したこと
http://hikm.hatenablog.com/entry/20130328/1364492471
を参考にしましたが、うまく出来ませんでした。

###補足情報(言語/FW/ツール等のバージョンなど)
より詳細な情報

Accepted Answer

`解析.txt` の文字コードが全部または一部 `utf-8` ではないためでしょう。

どうしてそう判断したかを以下に書きますね。

```
for l in open(FileName,"r",encoding=Encode):
```

この行でエラーになっているわけですが、変数を実値に置き換えるとこういうコードになります。

```
for l in open('解析.txt',"r",encoding='utf-8'):
```

ここで、例外 `UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 13: invalid` が出たと言うことは、 `解析.txt` の文字コードが全部または一部 `utf-8` ではないということになります。