beautiful soup4 WEBスクレイピング　list index out of rangeの対処方法

前提・実現したいこと

特定のサイト内の全国データをbeautiful soup4で収集しています。
情報収集の中に「更新日」（変数はup_date）というものがありますが、特定店舗にて「list index out of range」のエラーが発生しました。
何故エラーが発生しているかが知りたいのと、エラー発生店舗の更新日が取得できるソースコードがあればご教授いただきたいです。

発生している問題・エラーメッセージ

例外が発生しました: IndexError
list index out of range
up_date = shop.find_all('div',class_='lead')[0].get_text(strip=True)

該当のソースコード

python3
1if __name__ == "__main__":
2
3    # requestsを使って、webから取得
4    base_url = 'https://p-town.dmm.com'
5    target_url = '/'
6    r = requests.get(base_url + target_url, headers=headers)
7    soup = BeautifulSoup(r.text, 'lxml')
8
9    selector = 'body > div.o-layout > div > div.o-container > main > section.default-box.-shop > div > div li'
10    # 都道府県ループ
11    for pref_ in soup.select(selector):
12        tmp_dict = {}
13        string_ = pref_.text
14        target_url = pref_.next_element.attrs.get('href')
15        #target_url = '/shops/saga'    #デバッグ用
16        area_name = target_url.rsplit('/', 1)[1]
17        tmp_dict = {area_name:{}}
18        r = requests.get(base_url + target_url, headers=headers)
19        todofuken = target_url
20        soup= BeautifulSoup(r.text, 'lxml')
21        num = 0
22        # 市区町村ループ
23        for city_ in soup.find_all('a', class_='link', href=re.compile(r'/shops/' + area_name + '/area/\d+')):
24            target_url = city_.attrs.get('href')
25            city_id = target_url.rsplit('/', 1)[1]
26            tmp_dict[area_name].update({city_id:{}})
27            print(city_.text + ':' + base_url + target_url)
28            r = requests.get(base_url + target_url, headers=headers)
29            soup = BeautifulSoup(r.text, 'lxml')
30            selector = 'body > div.o-layout > div > div.o-container > main > section li'
31            nextpage = True
32            while nextpage:
33                # 次ページ有無チェック
34                for pages_ in soup.select(selector):
35                    if pages_.attrs.get('class')[0] == 'item':
36                        if pages_.text == '>':
37                            if pages_.get('href') is not None:
38                                nextpage = True
39                                break
40                        else:
41                            nextpage = False
42                # 登録ホールループ
43                for pages_ in soup.select(selector):
44                    kisyu_list = {}
45                    if pages_.attrs.get('class')[0] == 'unit':
46                        # ホール情報収集
47                        num += 1
48                        target_url = pages_.next_element.attrs.get('href')
49                        hall_id = target_url.rsplit('/', 1)[1]
50                        time.sleep(random.randint(1, 10))   #スリープ(1秒～10秒)
51                        r2 = requests.get(base_url + target_url, headers=headers)
52                        get_date = datetime.now(JST)
53                        soup2 = BeautifulSoup(r2.text, 'lxml')
54                        tmp_dict[area_name][city_id].update({hall_id:{}})
55                        shop_url = base_url + todofuken + '/' + hall_id
56                        print(str(num) + '[' + hall_id + ']:' + shop_url)
57                        r3 = requests.get(shop_url, headers=headers)
58                        soup3 = BeautifulSoup(r3.text, 'lxml')
59                        selector = 'body > div.o-layout > div > div.o-container > main > div:nth-child(4)'
60                        #機種情報取得
61                        for shop in soup3.select(selector):
62                            #更新日取得
63                            up_date = shop.find_all('div',class_='lead')[0].get_text(strip=True)
64                            up_date = up_date.replace('更新日:', '')
65                            up_date = "".join(up_date.split())
66                            tmp_dict[area_name][city_id][hall_id].update({up_date:{}})
67                            #種別取得
68                            for type_ in soup3.find_all('h4', class_='title', id=re.compile(r'anc-machine-rate-icon-\d+')):
69                                machine_type = type_.text
70                                tmp_dict[area_name][city_id][hall_id][up_date].update({machine_type:{}})
71                                #機種ID取得
72                                for a in shop.select('a[class="link"]'):
73                                    if 'href' in a.attrs:
74                                        machine_url = a.attrs['href']
75                                        machine_id = machine_url.rsplit('/', 1)[1]
76                                    else:
77                                        machine_id = '機種ID無'
78                                    tmp_dict[area_name][city_id][hall_id][up_date][machine_type].update({machine_id:{}})
79                                    #台数取得
80                                    machine_num = a.parent.next_sibling.next_element.get_text(strip=True)                                    kisyu_list['台数'] = machine_num
81                                    tmp_dict[area_name][city_id][hall_id][up_date][machine_type][machine_id].update(kisyu_list)
82                    # 次ページ読込、なければループ終了
83                    elif pages_.attrs.get('class')[0] == 'item':
84                        if pages_.text == '>':
85                            if pages_.next.attrs.get('href') is not None:
86                                target_url = pages_.next.attrs.get('href')
87                                r = requests.get(target_url, headers=headers)
88                                soup = BeautifulSoup(r.text, 'lxml')
89                            else:
90                                nextpage = False
91                            break

試したこと

サイト内でも該当エラーに関する質問が多く、色々調査させていただきました。
エラー内容は、「取得しようとしているデータはないのでエラーですよ」と解釈しております。
そのため、エラー発生店舗とそうでない店舗の更新日の情報を比較しましたが、特に違いはないように見えました。
■エラー発生店舗
https://p-town.dmm.com/shops/hokkaido/1048
■該当の更新日

■エラー未発生店舗
https://p-town.dmm.com/shops/hokkaido/1038
■該当の更新日

何故エラーが発生しているかが知りたいのと、エラー発生店舗の更新日が取得できるソースコードがあればご教授いただきたいです。

また、もし取得できない場合は、下記if文で対処することを検討しておりますが、他にいい方法があればご教授いただきたいです。

#更新日取得
date_ = shop.find_all('div',class_='lead')
if len(date_) == 0:
    up_date = ''
else:
    up_date = date_[0].string
    up_date = up_date.replace('更新日:', '')
    up_date = "".join(up_date.split())

補足情報（FW/ツールのバージョンなど）

行動規範の内容に同意します

回答1件

自己解決

更新日の情報取得を以下に変更したところ、解決しました。
お騒がせして申し訳ありませんでした。

python3
1for shop in soup3.find_all('div',class_='lead'):
2    up_date = shop.get_text(strip=True)
3    up_date = up_date.replace('更新日:', '')
4    up_date = ''.join(up_date.split())

投稿2019/06/13 04:36

nasu0922

総合スコア17