Pythonスクレイピング：指定した項目がない場合のreturn処理

前提

Pythonを使って商品の口コミサイトのようなサイトをスクレイピングしています。
またそれをgspreadでスプレッドシートへ連携しています。

参照URL：

"https://xxx/review_list.aspx?pid=5or6桁"

こちらは１つの商品に対するレビュー一覧が掲載されているURLです。
ここのpidをfor文でループ処理を行おうとしています。

実現したいこと

「レビュー一覧を見に行った際に、「レビューがありません」となっていれば次のpidを参照しにいく」
という処理です。

現在のソースコード

Python
1import gspread
2import requests
3from bs4 import BeautifulSoup
4from oauth2client.service_account import ServiceAccountCredentials
5import time
6
7
8for page in range(91610, 91620):
9    url = "https://xxx/review_list.aspx?pid={}".format(page)
10    r = requests.get(url)
11    soup = BeautifulSoup(r.text, 'lxml')
12    time.sleep(1.0)
13
14    unelements = soup.select(".review_list li")
15    item_name1 = soup.select("#breadcrumb ul.cf li:nth-of-type(3)")
16    item_name2 = soup.select("#breadcrumb ul.cf li:nth-of-type(4)")
17    review = soup.select('.review_review_text')
18    post_time = soup.select('.review_info')
19    hyoka = soup.select(".product_rep")
20
21    # スプレッドシートへの挿入準備（認証）
22    scope = [...]
23
24    credentials = ServiceAccountCredentials.from_json_keyfile_name('xxx.json', scope)
25    gc = gspread.authorize(credentials)
26    wks = gc.open('gspreadサンプル').sheet1
27
28    def sample(unelements):
29        if unelements == "レビューの投稿はありません":
30            return
31        else:
32            for num, e in enumerate(item_name1):
33                num = index+1
34                wks.update_acell('A'+str(num+1), e.get_text())
35
36            for num, e in enumerate(item_name2):
37                num = index+1
38                wks.update_acell('B'+str(num+1), e.get_text())
39
40            for index, e in enumerate(review):
41                num = index+1
42                wks.update_acell('E'+str(num+1), e.get_text())
43
44            for index, e in enumerate(post_time):
45                num = index+1
46                wks.update_acell('C'+str(num+1), e.get_text()[0:11])
47
48            for index, e in enumerate(hyoka):
49                num = index+1
50                text_ = e.i["class"][0]
51                text_ = text_.replace('rep_','')
52                text_ = text_.replace('_','.')
53                wks.update_acell('D'+str(num+1), text_)

現状の結果

ターミナルでのエラーはないですが、スプレッドシートには全項目反映がありません。。

ちなみにdef sample()関数（if/else含め）を無くし、

Python
1for num, e in enumerate(item_name1):
2                num = index+1
3                wks.update_acell('A'+str(num+1), e.get_text())
4
5            for num, e in enumerate(item_name2):
6                num = index+1
7                wks.update_acell('B'+str(num+1), e.get_text())
8
9            for index, e in enumerate(review):
10                num = index+1
11                wks.update_acell('E'+str(num+1), e.get_text())
12
13            for index, e in enumerate(post_time):
14                num = index+1
15                wks.update_acell('C'+str(num+1), e.get_text()[0:11])
16
17            for index, e in enumerate(hyoka):
18                num = index+1
19                text_ = e.i["class"][0]
20                text_ = text_.replace('rep_','')
21                text_ = text_.replace('_','.')
22                wks.update_acell('D'+str(num+1), text_)

とした場合には、レビューがない場合でも存在している「item_name2」（=B列）はあるため、レビューはないけど商品名だけ入ってくる状態です。その場合ですと、それ以外のレビューや評価項目の行がその分ずれてしまいます。

「レビューの投稿がありません」の場合はretrunなどで返すことは可能でしょうか。

pythonがまだ深く理解できておらず、、おそらく関数やif文の使い方がダメなのだと思います。

ご教示いただけますと幸いです。よろしくお願いいたします。

aokikenichi

2020/08/11 09:38

「レビューの投稿がありません」の情報はどこに返ってくるのでしょうか。そこで場合分けすればよいと思います。

行動規範の内容に同意します

回答2件

ベストアンサー

if文でも対応が出来ます。
==で比較するのではなく
if page.find('li', string='レビューの投稿はありません'):とし
Noneが返ってきた場合にはelse以降の処理を～
Noneが返ってこない場合(該当の要素がある場合)にreturnするというものとなります。

また以下でややスムーズなコードに仕上げられるかと思うので参考にしてください。

python
1def sample(pageid):
2	url = 'https://furunavi.jp/review_list.aspx?pid={}'.format(pageid)
3	res = requests.get(url)
4	soup = BeautifulSoup(res.content, 'html.parser')
5	page_check(soup)
6
7def page_check(page):
8	elems = [".review_list li", "#breadcrumb ul.cf li:nth-of-type(3)", "#breadcrumb ul.cf li:nth-of-type(4)",
9			'.review_review_text', '.review_info', ".product_rep"]
10	if page.find('li', string='レビューの投稿はありません'):
11		return
12	else:
13		# 取得したページ内の情報をリストにひとまとめに。
14		# こに引き続き行いたい処理を書いていく。
15		elements = [page.select(e) for e in elems]
16
17
18for page in range(91610, 91620):
19	sample(page)
20
21	time.sleep(1)