BeautifulSoupで画像をダウンロードしたい

###前提・実現したいこと
BeautifulSoupでオライリー社の本の表紙をまとめてダウンロードしたい。
Windowsにアナコンダをインストールして使っています。

###発生している問題・エラーメッセージ

表紙画像が見つかりませんでした。
ページをダウンロード中http://www.oreilly.com/animals.csp．．．
表紙画像が見つかりませんでした。
ページをダウンロード中http://www.oreilly.com/animals.csp．．．
表紙画像が見つかりませんでした。
ページをダウンロード中http://www.oreilly.com/animals.csp．．．
表紙画像が見つかりませんでした。
ページをダウンロード中http://www.oreilly.com/animals.csp．．．

と出続ける

###該当のソースコード

Python
1#! python3
2# downloadoreilly.py
3
4import requests,os,bs4
5
6url='http://www.oreilly.com/animals.csp' #開始URL
7os.makedirs('oreilly',exist_ok=True)     # ./oreillyに保存する
8
9while not url.endswith('1000'):
10
11    #ページをダウンロードする
12    print('ページをダウンロード中{}．．．'.format(url))
13    res=requests.get(url)
14    res.raise_for_status()
15
16    soup=bs4.BeautifulSoup(res.text)
17
18    #表紙画像のURLを見つける
19    oreilly_elem=soup.select('skiptocontent img')
20    if oreilly_elem==[]:
21        print('表紙画像が見つかりませんでした。')
22    else:
23        oreilly_url='http:'+oreilly_elem[0].get('src')
24        #画像をダウンロードする
25        print('画像をダウンロード中{}．．．'.format(oreilly_url))
26        res=requests.get(oreilly_url)
27        res.raise_for_status()
28
29        #画像を./oreillyに保存する
30        image_file=open(os.path.join('oreilly',os.path.basename(oreilly_url)),'wb')
31        for chunk in res.iter_content(1000):
32            image_file.write(chunk)
33        image_file.close()
34
35        #PrevボタンのURLを取得する
36        prev_link=soup.select('a[rel="prev"]')[0]
37        url='http://www.oreilly.com/animals.csp'+prev_link.get('href')
38
39print('完了')
40

###補足情報(言語/FW/ツール等のバージョンなど)
http://www.oreilly.com/animals.cspを参照して表紙画像を全てダウンロードしたいと考えています。

行動規範の内容に同意します

回答1件

ベストアンサー

オライリーのサイトを見る限り、２点ほど間違っている箇所があります。

skiptocontent はタグではなくIDなので、select()メソッドで探す場合は、#skiptocontentとする必要があります。
id=skiptocontentの中にはimgタグが無いようです。

html
1<div id="skiptocontent">
2    <a href="#maincontent"><span class="skiplink">skip to main content</span></a>
3</div>

本来のimgタグをピックアップできるように、select()の内容を正しく選択する必要があります。
例を挙げると、下記のように指定すると対象となるimgタグをとれます。

python
1>>> from urllib.request import urlopen
2>>> import bs4
3>>> url = 'http://www.oreilly.com/animals.csp'
4>>> resp = urlopen(url)
5>>>
6>>> soup = bs4.BeautifulSoup(resp.read(), 'html.parser')
7>>> oreilly_elem = soup.select('.animal-row img') # <-- ここ
8>>> len(oreilly_elem)
920
10>>> oreilly_elem[0]
11<img class="book-cvr" src="http://covers.oreilly.com/images/9780596155452/cat.gif">
12<h1 class="book-title">Mobile Design and Development</h1>
13</img>