Scrapy、for文内で指定した複数リンク先から特定の値を取得しCSV出力したい

##■やりたいこと
こちらのデモサイト（http://books.toscrape.com/）の書籍一覧ページと個別詳細ページから、それぞれ情報を抽出して以下のような一つのCSVファイルにまとめて出力しようと試みています。※個別ページ（②）からも書籍タイトルやURLを抽出できますが、ここではそれらを一覧ページ（①）から抽出する前提で考えています。

以下に、記述途中のコードがあります。scrapy.Requestのfor文、yieldの行をコメントアウトすると一覧ページからitemは抽出できる状態ですが、個別詳細ページのdescriptionは抽出できていません。この未完成コードを意図した動作をするように修正する方法をご教示いただきたいです。どうぞ宜しくお願い申し上げます。

#####<CSV出力イメージ、構成>

書籍タイトル(①)	書籍の個別詳細URL(①)	概要(②)
A Light in the Attic	http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html	It's hard to imagine a world without A Light in the Attic. This now-classic collection of poetry and （中略） ...more
Tipping the Velvet	http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html	"Erotic and absorbing...Written with starling power."（中略） ...more
(リスト項目数分つづく)	(リスト項目数分つづく)	(リスト項目数分つづく)

①一覧ページ（http://books.toscrape.com/）から抽出
②個別ページ（http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html 等）から抽出

#####<一覧ページからの抽出箇所(赤枠)>
http://books.toscrape.com/

#####<個別ページからの抽出箇所(赤枠)>
http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html

##■現状

Python
1# -*- coding: utf-8 -*-
2#注意: 現状で一部正常に動作しないコードです。
3import scrapy
4from ..items import BooksItem
5
6class BooksSpider(scrapy.Spider):
7    name = "books"
8    allowed_domains = ["books.toscrape.com"]
9    start_urls = [
10        'http://books.toscrape.com/',
11    ]
12    def parse(self, response):
13        list_rows = response.xpath('//*[@id="default"]/div/div/div/div/section/div[2]/ol/li')
14        for list_row in list_rows:
15            # ↓一覧ページから書籍タイトル、書籍の個別詳細URLを抽出
16            item = BooksItem()
17            item['book_title'] = list_row.xpath('/article/h3/a/@title').extract_first()
18            item['book_url'] = list_row.xpath('article/div[1]/a/@href').extract_first()
19            # ↓個別ページから概要を抽出（現状正常に動作せず）          
20            for book_url in response.css("article.product_pod > h3 > a ::attr(href)").extract():
21                yield scrapy.Request(response.urljoin(book_url), callback=self.parse_book_page) 
22            yield item
23    
24    def parse_book_page(self, response):
25        item = BooksItem()
26        product = response.css("div.product_main")
27        item['description'] = response.xpath("//div[@id='product_description']/following-sibling::p/text()").extract_first()
28        yield item

行動規範の内容に同意します

回答1件

ベストアンサー

必要な情報をmetaで渡します。
https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta

This dict is shallow copied when the request is cloned using the copy() or replace() methods, and can also be accessed, in your spider, from the response.meta attribute.

https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Response.meta

A shortcut to the Request.meta attribute of the Response.request object (ie. self.request.meta).

python
1    def parse(self, response):
2        list_rows = response.xpath('//*[@id="default"]/div/div/div/div/section/div[2]/ol/li')
3        for list_row in list_rows:
4            # ↓一覧ページから書籍タイトル、書籍の個別詳細URLを抽出
5            book_title = list_row.xpath('/article/h3/a/@title').extract_first()
6            book_url = list_row.xpath('article/div[1]/a/@href').extract_first()
7            # ↓個別ページから概要を抽出（現状正常に動作せず）          
8            for book_url in response.css("article.product_pod > h3 > a ::attr(href)").extract():
9                item = BooksItem()
10                item['book_title'] = book_title
11                item['book_url'] = book_url
12                yield scrapy.Request(response.urljoin(book_url), 
13                        callback=self.parse_book_page,
14                        meta={'item': item})
15
16    def parse_book_page(self, response):
17        # requestでmetaにセットされたitemを取り出す
18        item = response.meta['item']
19        product = response.css("div.product_main")
20        item['description'] = response.xpath("//div[@id='product_description']/following-sibling::p/text()").extract_first()
21        yield item