質問編集履歴

内容

2018/08/20 13:42

投稿

ari1235

スコア11

test CHANGED Viewed

File without changes

test CHANGED Viewed

@@ -1,107 +1 @@
-### 前提・実現したいこと
-scrapyの練習として、[ヤフーファイナンス](https://finance.yahoo.co.jp/)のヘッドラインニュースの見出しとURLを取得したいです。
-リンクを取り出すのにはxpathを使ってます。（chromeの検証検証からすぐわかるので使いやすいかなと）
-### 発生している問題・エラーメッセージ
-`scrapy shell https://finance.yahoo.co.jp/`から
-`response.xpath('//*[@id="ytopContentIn"]/ul/li/a/span[@class="dtl"]//text()').extract()`で見出しは取得できるとわかるのですが、うまくspiderをかけません。
-どう書いたらいいのか教えていただきたいです。
+問題が起きたため一度内容を削除し、改めて質問させていただきます。
-### 該当のソースコード
-spider
-```ここに言語名を入力
-# -*- coding: utf-8 -*-
-import scrapy
-from finance.items import FinanceItem
-class ArticlesSpider(scrapy.Spider):
-    name = 'articles'
-    allowed_domains = ['finance.yahoo.co.jp/']
-    start_urls = ['https://finance.yahoo.co.jp//']
-    def parse(self, response):
-      for article in response.xpath('//*[@id="ytopContentIn"]/ul'):
-          item = FinanceItem()
-          item['title'] = response.xpath('li/a/span[@class="dtl"]//text()').extract_first()
-          item['url'] = response.xpath('li/a/@href').extract_first()
-          #特にこの辺の書き方がわかりません。
-          yield item
-```
-items.py
-```
-# -*- coding: utf-8 -*-
-# Define here the models for your scraped items
-#
-# See documentation in:
-# https://doc.scrapy.org/en/latest/topics/items.html
-import scrapy
-class FinanceItem(scrapy.Item):
-    # define the fields for your item here like:
-    name = scrapy.Field()
-    price = scrapy.Field()
-    month = scrapy.Field()
-    title = scrapy.Field()
-    url = scrapy.Field()
-```
-### 補足情報（FW/ツールのバージョンなど）
-python3.5.2

タイトルがわかりづらいので書き換えました。

2018/08/20 13:42

投稿

ari1235

スコア11

test CHANGED Viewed

	@@ -1 +1 @@
1	- pythonのscrapyを使った~~スクレイピングの練習をして~~います
1	+ pythonのscrapyで、xpathを使った抽出がわからない！

test CHANGED Viewed

File without changes