前提・実現したいこと
こちらのサイトから首都名を取得しCSVファイルに出力
発生している問題・エラーメッセージ
'shift_jis' codec can't encode character '\xed' in position 4
該当のソースコード
Python
1import urllib.request 2import urllib.error 3from bs4 import BeautifulSoup 4import csv 5import numpy 6 7l_cap_name = [] 8 9url = "https://scrapethissite.com/pages/simple/" 10headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36' 11 } 12request = urllib.request.Request(url=url, headers=headers) 13 14try: 15 response = urllib.request.urlopen(request) 16except urllib.error.HTTPError as e: 17 print('HTTPError: {}'.format(e.code)) 18except urllib.error.URLError as e: 19 print('URLError: {}'.format(e.reason)) 20else: 21 for term in BeautifulSoup(response, 'lxml').find_all('span', class_='country-capital'): 22 l_cap_name.append([term.string]) 23 24# print(l_cap_name) 25 26 27with open("l_cap_name.csv", "w", encoding="Shift_jis") as f: 28 writer = csv.writer(f, lineterminator="\n") # writerオブジェクトの作成、改行記号で行を区切る 29 30 writer.writerows(l_cap_name) 31
補足情報(FW/ツールのバージョンなど)
print(l_cap_name)の実行結果を載せておきます
[['Andorra la Vella'], ['Abu Dhabi'], ['Kabul'], ["St. John's"], ['The Valley'], ['Tirana'], ['Yerevan'], ['Luanda'], ['None'], ['Buenos Aires'], ['Pago Pago'], ['Vienna'], ['Canberra'], ['Oranjestad'], ['Mariehamn'], ['Baku'], ['Sarajevo'], ['Bridgetown'], ['Dhaka'], ['Brussels'], ['Ouagadougou'], ['Sofia'], ['Manama'], ['Bujumbura'], ['Porto-Novo'], ['Gustavia'], ['Hamilton'], ['Bandar Seri Begawan'], ['Sucre'], ['Kralendijk'], ['Brasília'], ['Nassau'], ['Thimphu'], ['None'], ['Gaborone'], ['Minsk'], ['Belmopan'], ['Ottawa'], ['West Island'], ['Kinshasa'], ['Bangui'], ['Brazzaville'], ['Bern'], ['Yamoussoukro'], ['Avarua'], ['Santiago'], ['Yaoundé'], ['Beijing'], ['Bogotá'], ['San José'], ['Havana'], ['Praia'], ['Willemstad'], ['Flying Fish Cove'], ['Nicosia'], ['Prague'], ['Berlin'], ['Djibouti'], ['Copenhagen'], ['Roseau'], ['Santo Domingo'], ['Algiers'], ['Quito'], ['Tallinn'], ['Cairo'], ['Laâyoune / El Aaiún'], ['Asmara'], ['Madrid'], ['Addis Ababa'], ['Helsinki'], ['Suva'], ['Stanley'], ['Palikir'], ['Tórshavn'], ['Paris'], ['Libreville'], ['London'], ["St. George's"], ['Tbilisi'], ['Cayenne'], ['St Peter Port'], ['Accra'], ['Gibraltar'], ['Nuuk'], ['Bathurst'], ['Conakry'], ['Basse-Terre'], ['Malabo'], ['Athens'], ['Grytviken'], ['Guatemala City'], ['Hagåtña'], ['Bissau'], ['Georgetown'], ['Hong Kong'], ['None'], ['Tegucigalpa'], ['Zagreb'], ['Port-au-Prince'], ['Budapest'], ['Jakarta'], ['Dublin'], ['None'], ['Douglas'], ['New Delhi'], ['None'], ['Baghdad'], ['Tehran'], ['Reykjavik'], ['Rome'], ['Saint Helier'], ['Kingston'], ['Amman'], ['Tokyo'], ['Nairobi'], ['Bishkek'], ['Phnom Penh'], ['Tarawa'], ['Moroni'], ['Basseterre'], ['Pyongyang'], ['Seoul'], ['Kuwait City'], ['George Town'], ['Astana'], ['Vientiane'], ['Beirut'], ['Castries'], ['Vaduz'], ['Colombo'], ['Monrovia'], ['Maseru'], ['Vilnius'], ['Luxembourg'], ['Riga'], ['Tripoli'], ['Rabat'], ['Monaco'], ['Chişinău'], ['Podgorica'], ['Marigot'], ['Antananarivo'], ['Majuro'], ['Skopje'], ['Bamako'], ['Naypyitaw'], ['Ulan Bator'], ['Macao'], ['Saipan'], ['Fort-de-France'], ['Nouakchott'], ['Plymouth'], ['Valletta'], ['Port Louis'], ['Malé'], ['Lilongwe'], ['Mexico City'], ['Kuala Lumpur'], ['Maputo'], ['Windhoek'], ['Noumea'], ['Niamey'], ['Kingston'], ['Abuja'], ['Managua'], ['Amsterdam'], ['Oslo'], ['Kathmandu'], ['Yaren'], ['Alofi'], ['Wellington'], ['Muscat'], ['Panama City'], ['Lima'], ['Papeete'], ['Port Moresby'], ['Manila'], ['Islamabad'], ['Warsaw'], ['Saint-Pierre'], ['Adamstown'], ['San Juan'], ['None'], ['Lisbon'], ['Melekeok'], ['Asunción'], ['Doha'], ['Saint-Denis'], ['Bucharest'], ['Belgrade'], ['Moscow'], ['Kigali'], ['Riyadh'], ['Honiara'], ['Victoria'], ['Khartoum'], ['Stockholm'], ['Singapore'], ['Jamestown'], ['Ljubljana'], ['Longyearbyen'], ['Bratislava'], ['Freetown'], ['San Marino'], ['Dakar'], ['Mogadishu'], ['Paramaribo'], ['Juba'], ['São Tomé'], ['San Salvador'], ['Philipsburg'], ['Damascus'], ['Mbabane'], ['Cockburn Town'], ["N'Djamena"], ['Port-aux-Français'], ['Lomé'], ['Bangkok'], ['Dushanbe'], ['None'], ['Dili'], ['Ashgabat'], ['Tunis'], ["Nuku'alofa"], ['Ankara'], ['Port of Spain'], ['Funafuti'], ['Taipei'], ['Dodoma'], ['Kiev'], ['Kampala'], ['None'], ['Washington'], ['Montevideo'], ['Tashkent'], ['Vatican City'], ['Kingstown'], ['Caracas'], ['Road Town'], ['Charlotte Amalie'], ['Hanoi'], ['Port Vila'], ['Mata-Utu'], ['Apia'], ['Pristina'], ['Sanaa'], ['Mamoudzou'], ['Pretoria'], ['Lusaka'], ['Harare']]
よろしくお願いいたします。
###追記
Python
1 s = '\xed' 2 print(s) 3 4#実行結果 5í
íが原因ということがわかりましたが、文字コードの変換はわからなかったため、
Shift-jisではなくutf-8で実行したところうまくいきましたのでとりあえず解決したとさせていただきます。
回答1件
あなたの回答
tips
プレビュー