複数urlをループでスクレイピング

https://syllabusView?syllabusYear=2020&syllabusNo=K1-152&kougicd=0004552&request_locale=ja

K1-153&kougicd=0004553
K1-154&kougicd=0004554
K1-155&kougicd=0004555
.....
K1-164&kougicd=0004564
.....
このようにURL内の数字を変えながらスクレイピングしていきたいです。可能なのでしょうか。
１つのwebサイトからurlを取得してきてスクレイピングする作業ができ、ループで同じように複数のurlから取得したいです。urlは末だけ異なるものです。ヒントをください。

import requests
from bs4 import BeautifulSoup

# Webページを取得して解析する
for i in range(3):
    url = "https://portal.kyoto-wu.ac.jp/Syllabus/syllabusView?syllabusYear=2020&syllabusNo=K1-{153 + i}&kougicd=' + f'{4553 + i}'"

html = requests.get(url)
soup = BeautifulSoup(html.content,"html.parser") #AttributeError: 'Response' object has no attribute 'contents'

for script in soup(["script", "style"]):#スクリプトやスタイルを含む要素を取り除く
    script.decompose() #.decompose()は、削除のメソッド
#print(soup)
text=soup.get_text()#テキストのみ=タグ取り
#print(text) #タグなし、空白あり
lines= [line.strip() for line in text.splitlines()]
text="\n".join(line for line in lines if line)
print(text)#空白、タグなし

思うような結果がでませんでした。どうなっているのでしょうか。

行動規範の内容に同意します

回答3件

ベストアンサー

Python
1for i in range(3):
2    s = f'K1-{153 + i}&kougicd=' + f'{4553 + i}'.zfill(7) # f-strings記法とゼロ埋めです。
3    print(s)
4
5# 出力結果です。↓
6# K1-153&kougicd=0004553
7# K1-154&kougicd=0004554
8# K1-155&kougicd=0004555

これで可能だと思います。

投稿2020/07/08 10:09

calliope

総合スコア27

こちらをお使いください

yukihiko-shinoda/parallel-html-scraper
parallelhtmlscraper · PyPI

ライブラリーの利用例:

python
1from bs4 import BeautifulSoup
2
3from parallelhtmlscraper.html_analyzer import HtmlAnalyzer
4from parallelhtmlscraper.parallel_html_scraper import ParallelHtmlScraper
5
6class AnalyzerForTest(HtmlAnalyzer):
7    async def execute(self, soup: BeautifulSoup) -> str:
8        return soup.find('title').text
9
10host_google = 'https://www.google.co.jp'
11path_and_content = [
12    '/webhp?tab=rw',                                              # Google 検索
13    '/imghp?hl=ja&tab=wi&ogbl',                                   # Google 画像検索
14    '/shopping?hl=ja&source=og&tab=wf',                           # Google ショッピング
15    '/save',                                                      # コレクション
16    'https://www.google.co.jp/maps',                              # Google マップ
17    'https://www.google.co.jp/drive/apps.html',                   # Google ドライブ
18    'https://www.google.co.jp/mail/help/intl/ja/about.html?vm=r', # GMail
19]
20
21list_response = ParallelHtmlScraper.execute(f'{host_google}', path_and_content, AnalyzerForTest())
22print(list_response)

実行結果:

python
1$ python test.py
2['Google', 'Google 画像検索', 'Google ショッピング', 'コレクション', 'Google マップ', '\n        Google ドライブ\n    ', '\n        Gmail - Google のメール\n    ']