「Python」検索結果からh2タイトルを抜き出したい。スクレイピングに関して

検索キーワードに対しての、1〜10位までのタイトル・各ページURL・各ページ内のh2タグを抽出したいです。

現在、キーワードに対する1位〜10位までのタイトル・URLは抜き出せていますが、各ページのh2タイトルが抜き出せていません。

どうもURLのコネクト部分がうまくできていないようです。
ただ、そのコネクトをどうしたらいいのかわかりません。

なるべく他のコードを変更しないコードで出力させることができないか模索中です。

Python
1import requests
2from bs4 import BeautifulSoup
3
4keywd = '温泉　入り方'
5response = requests.get('https://www.google.com/search?q='+keywd)
6soup = BeautifulSoup(response.text,'html.parser')
7
8#検索する
9titles = soup.select('h3')
10for title in titles:
11    print(title.get_text())
12
13#URLの取得
14page_urls = soup.select('cite')
15for url in page_urls:
16    print(url.get_text())
17
18#各HTMLページに入り込む
19h2_title = page_urls
20response = requests.get(h2_title)
21soup = BeautifulSoup(response.text,'html.parser')
22
23for h2 in soup:
24    print(h2)
25

出力結果

温泉の入り方の手順を1から教えます！マナーや禁止事項も紹介 ...

温泉ソムリエが教える！効果的な温泉の入り方 | 楽天トラベル
温泉の正しい入り方－温泉大辞典－ BIGLOBE温泉
温泉は入り方で効果が違う！？正しい温泉の入り方 ... - NAVER まとめ
温泉の入り方の基本マナーとは？これだけは守りたい8箇条！ | 生活 ...
入浴の注意点・正しい温泉の入り方～温泉保養センター | 一般財団法人 ...
美人の湯の上手な温泉の入り方（入浴方法）についてのご案内｜霧島 ...
温泉や銭湯の正しい入り方で、美肌・健康効果を最大限に高める ...
上手な温泉の入り方 - 日本健康開発財団
温泉の入り方で効果が変わる！美肌効果をアップさせるコツとは｜女性 ...
https://blog.pokke.in/hot-spring-how-to-take/
https://travel.rakuten.co.jp/mytrip/howto/onsen-hairikata/
https://travel.biglobe.ne.jp/onsen/jiten/hairikata_08.html
https://matome.naver.jp/odai/2138389396743566501
https://jinchan2016.net/681.html/
www.kousha.or.jp/spa_notes.php
https://www.you-yu.com/onsen/how-to/
https://letronc-m.com/548
www.jph-ri.or.jp/kenko/onsen/contents/jyozu.html
https://josei-bigaku.jp/onsenbihada1122/

以下エラー
Traceback (most recent call last):
File "/Users/rikiya/es.py", line 20, in <module>
response = requests.get(h2_title)
File "/Users/rikiya/.pyenv/versions/3.7.0/lib/python3.7/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/Users/rikiya/.pyenv/versions/3.7.0/lib/python3.7/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/Users/rikiya/.pyenv/versions/3.7.0/lib/python3.7/site-packages/requests/sessions.py", line 524, in request
resp = self.send(prep, **send_kwargs)
File "/Users/rikiya/.pyenv/versions/3.7.0/lib/python3.7/site-packages/requests/sessions.py", line 631, in send
adapter = self.get_adapter(url=request.url)
File "/Users/rikiya/.pyenv/versions/3.7.0/lib/python3.7/site-packages/requests/sessions.py", line 722, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)

**requests.exceptions.InvalidSchema: No connection adapters were found for **'[<cite>https://blog.pokke.in/hot-spring-how-to-take/</cite>, <cite>https://travel.rakuten.co.jp/mytrip/howto/onsen-hairikata/</cite>, <cite>https://travel.biglobe.ne.jp/onsen/jiten/hairikata_08.html</cite>, <cite>https://matome.naver.jp/odai/2138389396743566501</cite>, <cite>https://jinchan2016.net/681.html/</cite>, <cite>www.kousha.or.jp/spa_notes.php</cite>, <cite>https://www.you-yu.com/onsen/how-to/</cite>, <cite>https://letronc-m.com/548</cite>, <cite>www.jph-ri.or.jp/kenko/onsen/contents/jyozu.html</cite>, <cite>https://josei-bigaku.jp/onsenbihada1122/</cite>]'

行動規範の内容に同意します

回答2件

指定ページからタイトル、要約文、キーワードとh2要素を抽出する関数を作成してみました。

以下を参考にしました。
[Python]Googleの検索結果をスクレイピングして、スプレッドシートに保存！
python program to read a url and extract its meta keyword and meta description

Python
1import requests
2from bs4 import BeautifulSoup
3
4def get_detail(url):
5    titles,descs,kwords,h2s = [],[],[],[]
6
7    try: # 古いサイトが読み込めないので無視する？
8        res = requests.get(url)
9        print(url,res.encoding,res.apparent_encoding)
10
11        html = res.content.decode(res.apparent_encoding) # str(UNICODE)に統一
12        soup = BeautifulSoup(html,'html.parser')
13
14        # title
15        for a in soup.find_all('title'):
16            titles.append(a.get_text())
17
18        # description,keywords
19        # https://gist.github.com/jineshpaloor/6478011
20        meta = soup.find_all('meta')
21        for tag in meta:
22            # nameとcontent属性を持つもののみ
23            keys = tag.attrs.keys()
24            if not ('name' in keys and 'content' in keys):
25                continue
26            name = tag.attrs['name'].strip().lower()
27            cnt = tag.attrs['content']
28            if name == 'description':
29                descs.append( cnt)
30            elif name == 'keywords':
31                kwords.append( cnt)
32
33        #h2
34        for a in soup.find_all('h2'):
35            h2s.append(a.get_text())
36
37    except:
38        import traceback
39        traceback.print_exc()
40
41    return titles,descs,kwords,h2s
42
43with open('ret.txt','w',encoding='utf-8') as f:
44
45    for url in ['https://www.example.com/',
46                'https://google.co.jp/',
47                'https://www.metatags.org/',
48                'https://teratail.com/',
49                'http://abehiroshi.la.coocan.jp/']:
50
51        f.write('{}-----\n'.format(url))
52
53        titles,descs,kwords,h2s = get_detail(url)
54
55        f.write('titles={}\n'.format(titles))
56        f.write('descs={}\n'.format(descs))
57        f.write('kwords={}\n'.format(kwords))
58        f.write('h2s={}\n'.format(h2s))