Scrapy、抽出したリンク先の特定HTML要素を隣の列に配置してCSV出力したい

##■やりたいこと
こちらのデモサイト(http://books.toscrape.com/)で列挙されている書籍リスト項目から
①書籍単品のURL/aタグhref属性（図1赤枠）
②そのリンク先のh1タグ文字列（図2赤枠）

を取得し、以下レイアウトのCSVファイルとして出力したいです。サンプルコードをご教示いただけると誠にありがたく存じます。

#####<CSV出力時の理想レイアウト>

書籍URL(図1)	書籍URLリンク先のh1タグ文字列(図2)
http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html	A Light in the Attic
http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html	Tipping the Velvet
(ページ内の書籍リスト分続く)	(ページ内の書籍リスト分続く)

#####<図1>

#####<図2>

##■現状
下の記述サンプルなどのitemやscrapy.Request周辺を編集したりして各項目を個別に取得できたりまではするのですが、以下失敗イメージのように行列を揃えての出力ができていません。for文やyieldによる出力の流れ/制御の理解も足りていないんだと想像しています。

どうぞよろしくお願い申し上げます。

######<失敗イメージ>

書籍URL(図1)	書籍URLリンク先のh1タグ文字列(図2)
http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html	,
http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html	,
,	A Light in the Attic
,	Tipping the Velvet

#####<記述サンプル>

Scrapy
1# -*- coding: utf-8 -*-
2import scrapy
3
4
5class BooksSpider(scrapy.Spider):
6    name = "books"
7    allowed_domains = ["books.toscrape.com"]
8    start_urls = [
9        'http://books.toscrape.com/',
10    ]
11
12    def parse(self, response):
13        for book_url in response.css("article.product_pod > h3 > a ::attr(href)").extract():
14            yield scrapy.Request(response.urljoin(book_url), callback=self.parse_book_page)
15        next_page = response.css("li.next > a ::attr(href)").extract_first()
16        if next_page:
17            yield scrapy.Request(response.urljoin(next_page), callback=self.parse)
18
19    def parse_book_page(self, response):
20        item = {}
21        product = response.css("div.product_main")
22        item["title"] = product.css("h1 ::text").extract_first()
23        yield item
24

行動規範の内容に同意します

回答1件

ベストアンサー

タイトル取得するときに現在のURLも一緒に取得すればいいのでは
item["url"]の名前は変更してください

python
1def parse_book_page(self, response):
2    item = {}
3    product = response.css("div.product_main")
4    item["title"] = product.css("h1 ::text").extract_first()
5    # 追加
6    item["url"] = response.url
7    yield item