回答率: 85.35%

質問するログイン新規登録

トップに関する質問 scrapyでスクレイピングを行う時の xpathについて

編集履歴

質問編集履歴

1

ソースコード、エラーを追加

2020/06/09 15:45

投稿

スコア1

test CHANGED Viewed

File without changes

test CHANGED Viewed

@@ -22,30 +22,86 @@
+### 試したこと
+参考書では店舗名のテキスト取得のコードが使用されていました。
+これは店舗名しかうまくいかなかったのでネットで検索したところ、google chromeで生成できるxpathで抽出しようと思い試みましたがエラーも出ず、何も抽出されませんでした。
+```python
+import scrapy
+#https://ramendb.supleks.jp/s/4227.html
+scrapy shell https://ramendb.supleks.jp/s/4227.html
+#店舗名のテキストを取得
+response.css('.shopname').xpath('string()').get()
+#開店日のテキストを取得①
+response.xpath('//*[@id="data-table"]/tbody/tr[12]/text()').extract()
+#開店日のテキストを取得②
+response.xpath('/html/body/div[5]/div/div[1]/div/div[5]/div[1]/div/table/tbody/tr[12]/td/text()').extract()
+```
+### 発生している問題・エラーメッセージ
+```
+#scrapy shellの部分は省略
+>>> response.css('.shopname').xpath('string()').get()
+'ちばから'
+>>> #開店日のテキストを取得①
+>>> response.xpath('//*[@id="data-table"]/tbody/tr[12]/text()').extract()
+[]
+>>> #開店日のテキストを取得②
+>>> response.xpath('/html/body/div[5]/div/div[1]/div/div[5]/div[1]/div/table/tbody/tr[12]/td/text()').extract()
+[]
 ```
-### 試したこと
+###追加と修正
-google chromeで生成しました。
+octoparse様の回答を参考に行ったところ。抽出はできました。
-どちらも何も抽出できませんでした。
+ですが、ここからテキストのみを取得したいです。
-```python
-response.xpath('//*[@id="data-table"]/tbody/tr[12]/text()').extract()
-response.xpath('/html/body/div[5]/div/div[1]/div/div[5]/div[1]/div/table/tbody/tr[12]/td/text()').extract()
 ```
+>>>response.xpath('//th[text()="開店日"]/following-sibling::td[1]').get()
+'<td>2004年10月8日</td>'
+>>> response.xpath('//div[@id="shop-data"]//span[@itemprop="address"]').get()
+'<span itemprop="address">〒290-0072 <a href="/search/shop?state=chiba">千葉県</a><a href="/search/shop?state=chiba&amp;city=%E5%B8%82%E5%8E%9F%E5%B8%82">市原市</a>西国分寺台1-3-16</span>'
+```