回答編集履歴

2

Update

2022/03/28 05:10

投稿

melian
melian

スコア19840

test CHANGED
@@ -10,7 +10,7 @@
10
10
 
11
11
  r = requests.get(page_url, headers=headers)
12
12
  soup = BeautifulSoup(r.content, 'lxml')
13
- img_tag = soup.select_one('div[class^="cover"] img')
13
+ img_tag = soup.select_one('div.book-cover > img')
14
14
  img_url = img_tag['src']
15
15
  print(img_url)
16
16
 

1

Update

2022/03/28 05:05

投稿

melian
melian

スコア19840

test CHANGED
@@ -1,8 +1,7 @@
1
- カバー画像は JavaScript によって配置されていま、その JavaScript コードの本文から URL を取得します。
1
+ `User-Agent` を設定る必要がある様です。
2
2
  ```python
3
3
  import requests
4
- import re
5
- import json
4
+ from bs4 import BeautifulSoup
6
5
 
7
6
  page_url = "https://www.sciencedirect.com/book/9780124157590/haschek-and-rousseauxs-handbook-of-toxicologic-pathology"
8
7
  headers = {
@@ -10,11 +9,12 @@
10
9
  }
11
10
 
12
11
  r = requests.get(page_url, headers=headers)
13
- m = re.findall(r'(?<=var reduxData = )(.+?)(?=;\n)', r.text)
12
+ soup = BeautifulSoup(r.content, 'lxml')
14
- if m:
15
- cover_url = json.loads(m[0])['simpleBook']['coverImages']['large']
13
+ img_tag = soup.select_one('div[class^="cover"] img')
14
+ img_url = img_tag['src']
16
- print(cover_url)
15
+ print(img_url)
17
16
 
18
17
  #
19
18
  https://ars.els-cdn.com/content/image/3-s2.0-C20101678509-cov200h.gif
20
19
  ```
20
+