Python Scrapyスパイダーでjsonファイルが作られない

やりたいこと

PythonのScrapyスパイダーを使ってjsonファイルを作りたい。
現在「PythonとJavaScriptではじめるデータビジュアライゼーション」で学習しています。スクレイピングをする上で、jsonファイルが作られず、なぜなのか不明です。

ディレクトリ構成

nobel_winners	scrapy.cfg

/nobel_winners:
__init__.py	items.py	pipelines.py	spiders
__pycache__	middlewares.py	settings.py

/nobel_winners/spiders:
__init__.py		__pycache__		nwinners_list_spider.py

作業工程/コード

/nobel_winners/spiders にある nwinners_list_spider.py の中に以下のコードを入力。

python
1#encoding:utf-8
2
3import scrapy, re
4
5class NWinnerItem(scrapy.Item):
6    country = scrapy.Field()
7
8class NWinnerSpider(scrapy.Spider):
9    name = 'nwinners_list'
10    allowed_domains = ['en.wikipedia.org']
11    start_urls = ["https://en.wikipedia.org/wiki/List_of_Nobel_laureates_by_country"]
12
13    def parse(self, response):
14
15        h2s = response.xpath('//h2')
16
17        for h2 in h2s:
18            country = h2.xpath('span[@class="mw-headline"]/text()').extract()

rootディレクトリで以下のコードを入力。

scrapy crawl nwinners_list -o nobel_winners.json

エラー

以下のような表示が出て、jsonファイルには何もデータが入らない。

2018-07-25 10:01:53 [scrapy.core.engine] INFO: Spider opened
2018-07-25 10:01:53 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

試したこと

1.テキストでは、もう少し長いソースだったが、countryだけに絞って確かめてみた。
2.scrapy shellを入力し、IPythonベースのシェルを使ってひとつひとつの動きを確かめてみた。

python
1h2s = response.xpath('//h2')
2
3    for h2 in h2s:
4        country = h2.xpath('span[@class="mw-headline"]/text()').extract()
5        print(country)

これは、しっかりcountryに値が入っていることが確認できた。

行動規範の内容に同意します

回答1件

自己解決

以下のコードで解決できました。

Python
1import scrapy
2
3class NWinnerItem(scrapy.Item):
4    country = scrapy.Field()
5
6class NWinnerSpider(scrapy.Spider):
7    name = 'nwinners_list'
8    allowed_domains = ['en.wikipedia.org']
9    start_urls = ["https://en.wikipedia.org/wiki/List_of_Nobel_laureates_by_country"]
10
11    def parse(self, response):
12
13        h2s = response.xpath('//h2')
14
15        for h2 in h2s:
16            yield NWinnerItem(
17                country = h2.xpath('span[@class="mw-headline"]/text()').extract_first()
18            )