回答率: 85.29%

質問するログイン新規登録

トップに関する質問 Scrapyを用いてのスクレイピング

編集履歴

質問編集履歴

3

図を追加

2017/10/23 10:02

投稿

スコア13

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -10,11 +10,11 @@
 xPath、CSSセレクタの指定の仕方が知識と理解が足りなく苦戦しています。
 まず、下記図の赤枠部分のテキストとURLを取得したいです。
-[イメージ説明](5905d9491734b713cf56ada7e855c815.jpeg)
+![イメージ説明](c9aa8a19174331c9a13544c75caa2b58.jpeg)
 ###該当のソースコード(\shareshare\spiders\get_shareshare.py)
 ```ここに言語を入力
-# -*- coding: utf-8 -*-!
+# -*- coding: utf-8 -*-
 import scrapy
 class shareshareSpider(scrapy.Spider):

2

具体的な説明追記

2017/10/23 10:02

投稿

スコア13

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -9,9 +9,12 @@
 [シェアハウス検索サイト「シェアシェア」](http://share-share.jp/search/result/?limit=25&page=1&sort%5B1%5D=upd)
 xPath、CSSセレクタの指定の仕方が知識と理解が足りなく苦戦しています。
+まず、下記図の赤枠部分のテキストとURLを取得したいです。
+[イメージ説明](5905d9491734b713cf56ada7e855c815.jpeg)
 ###該当のソースコード(\shareshare\spiders\get_shareshare.py)
 ```ここに言語を入力
-# -*- coding: utf-8 -*-
+# -*- coding: utf-8 -*-!
 import scrapy
 class shareshareSpider(scrapy.Spider):

1

コード編集

2017/10/23 10:01

投稿

スコア13

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -14,9 +14,6 @@
 # -*- coding: utf-8 -*-
 import scrapy
-#from shareshare.items import shareshareItem
-#from scrapy.selector import Selector # 追加
 class shareshareSpider(scrapy.Spider):
     name = "share_share"
     allowed_domains = ["share-share.jp"]
@@ -24,18 +21,12 @@
     start_urls = (
         'http://share-share.jp/search/result/?limit=25&page=1&sort%5B1%5D=upd'
     )
-    # インデント
     def parse(self, response):
         for sel in response.css("div.result-list"):
             article = shareshareItem()
             article['title'] = sel.css("table > tbody > tr:nth-child(1) > td > div > h3 > a::text").extract_first()
             article['url'] = sel.css("div.result-list > table > tbody > tr:nth-child(1) > td > div > h3 > a::attr('href')").extract_first()
-　　　　 #「グノシー」からのコピペ。次ページの遷移方法がわからずコメント化
-        #next_page = response.css("div.page-link-option > a::attr('href')")
-        #if next_page:
-        #    url = response.urljoin(next_page[0].extract())
-        #    yield scrapy.Request(url, callback=self.parse)
 ```