実現したいこと
情報サイトから1,085件の情報を取得するべく、Scrapyを用いて開発中です。
15件1ページの一覧ページから各件の詳細ページへ遷移し、そこから名称、住所、連絡先などの情報を取得、
一覧ページに「次のページへ」がある限り、次の一覧ページへ遷移し、「次のページ」がなくなるまで
スクレイピングを続け、1,085件の情報を取得します。
問題点と質問
コードを書き終え、試しに10ページ(1ページ15件)をスクレイピングしたところ、無事全件150件から情報を取得できました。
そこで、本番、全ページをクロールしたところ、1,085件のうち948件しか取得できませんでした。
ログを見たところ、最初の方から少しづつ取得でききていないようです。(19ページクロールし、15ページスクレイプした)
INFO: Crawled 19 pages (at 19 pages/min), scraped 15 items (at 15 items/min)
エラーが発生せず、①取得できないのはどのような原因が考えられるでしょうか。
サイト側がスクレイプさせないよう防御したりしているのでしょうか。
②取得するためにはどのような方法がありますでしょうか。
2点につきアドバイスを頂ければ助かります。
ログ
2023-08-19 09:45:16 [scrapy.core.engine] INFO: Spider opened 2023-08-19 09:45:16 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2023-08-19 09:45:16 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2023-08-19 09:46:16 [scrapy.extensions.logstats] INFO: Crawled 19 pages (at 19 pages/min), scraped 15 items (at 15 items/min) 2023-08-19 09:47:16 [scrapy.extensions.logstats] INFO: Crawled 37 pages (at 18 pages/min), scraped 32 items (at 17 items/min) 2023-08-19 09:48:16 [scrapy.extensions.logstats] INFO: Crawled 54 pages (at 17 pages/min), scraped 47 items (at 15 items/min) 2023-08-19 09:49:16 [scrapy.extensions.logstats] INFO: Crawled 70 pages (at 16 pages/min), scraped 62 items (at 15 items/min) 2023-08-19 09:50:16 [scrapy.extensions.logstats] INFO: Crawled 87 pages (at 17 pages/min), scraped 78 items (at 16 items/min) 2023-08-19 09:51:16 [scrapy.extensions.logstats] INFO: Crawled 104 pages (at 17 pages/min), scraped 93 items (at 15 items/min) 2023-08-19 09:52:16 [scrapy.extensions.logstats] INFO: Crawled 120 pages (at 16 pages/min), scraped 107 items (at 14 items/min) 2023-08-19 09:53:16 [scrapy.extensions.logstats] INFO: Crawled 137 pages (at 17 pages/min), scraped 123 items (at 16 items/min) 2023-08-19 09:54:16 [scrapy.extensions.logstats] INFO: Crawled 152 pages (at 15 pages/min), scraped 137 items (at 14 items/min) 2023-08-19 09:55:16 [scrapy.extensions.logstats] INFO: Crawled 169 pages (at 17 pages/min), scraped 153 items (at 16 items/min) 2023-08-19 09:56:16 [scrapy.extensions.logstats] INFO: Crawled 186 pages (at 17 pages/min), scraped 169 items (at 16 items/min) 2023-08-19 09:57:16 [scrapy.extensions.logstats] INFO: Crawled 203 pages (at 17 pages/min), scraped 184 items (at 15 items/min) 2023-08-19 09:58:16 [scrapy.extensions.logstats] INFO: Crawled 220 pages (at 17 pages/min), scraped 199 items (at 15 items/min) 2023-08-19 09:59:16 [scrapy.extensions.logstats] INFO: Crawled 237 pages (at 17 pages/min), scraped 215 items (at 16 items/min) 2023-08-19 10:00:16 [scrapy.extensions.logstats] INFO: Crawled 252 pages (at 15 pages/min), scraped 229 items (at 14 items/min) 2023-08-19 10:01:16 [scrapy.extensions.logstats] INFO: Crawled 268 pages (at 16 pages/min), scraped 244 items (at 15 items/min) 2023-08-19 10:02:16 [scrapy.extensions.logstats] INFO: Crawled 284 pages (at 16 pages/min), scraped 259 items (at 15 items/min) 2023-08-19 10:03:16 [scrapy.extensions.logstats] INFO: Crawled 301 pages (at 17 pages/min), scraped 275 items (at 16 items/min) 2023-08-19 10:04:16 [scrapy.extensions.logstats] INFO: Crawled 318 pages (at 17 pages/min), scraped 291 items (at 16 items/min) 2023-08-19 10:05:16 [scrapy.extensions.logstats] INFO: Crawled 334 pages (at 16 pages/min), scraped 305 items (at 14 items/min) 2023-08-19 10:06:16 [scrapy.extensions.logstats] INFO: Crawled 351 pages (at 17 pages/min), scraped 321 items (at 16 items/min) 2023-08-19 10:07:16 [scrapy.extensions.logstats] INFO: Crawled 366 pages (at 15 pages/min), scraped 335 items (at 14 items/min) 2023-08-19 10:08:16 [scrapy.extensions.logstats] INFO: Crawled 383 pages (at 17 pages/min), scraped 351 items (at 16 items/min) 2023-08-19 10:09:16 [scrapy.extensions.logstats] INFO: Crawled 398 pages (at 15 pages/min), scraped 365 items (at 14 items/min) 2023-08-19 10:10:16 [scrapy.extensions.logstats] INFO: Crawled 416 pages (at 18 pages/min), scraped 382 items (at 17 items/min) 2023-08-19 10:11:16 [scrapy.extensions.logstats] INFO: Crawled 432 pages (at 16 pages/min), scraped 397 items (at 15 items/min) 2023-08-19 10:12:16 [scrapy.extensions.logstats] INFO: Crawled 448 pages (at 16 pages/min), scraped 412 items (at 15 items/min) 2023-08-19 10:13:16 [scrapy.extensions.logstats] INFO: Crawled 465 pages (at 17 pages/min), scraped 428 items (at 16 items/min) 2023-08-19 10:14:16 [scrapy.extensions.logstats] INFO: Crawled 480 pages (at 15 pages/min), scraped 442 items (at 14 items/min) 2023-08-19 10:15:16 [scrapy.extensions.logstats] INFO: Crawled 497 pages (at 17 pages/min), scraped 458 items (at 16 items/min) 2023-08-19 10:16:16 [scrapy.extensions.logstats] INFO: Crawled 513 pages (at 16 pages/min), scraped 473 items (at 15 items/min) 2023-08-19 10:17:16 [scrapy.extensions.logstats] INFO: Crawled 528 pages (at 15 pages/min), scraped 487 items (at 14 items/min) 2023-08-19 10:18:16 [scrapy.extensions.logstats] INFO: Crawled 544 pages (at 16 pages/min), scraped 502 items (at 15 items/min) 2023-08-19 10:19:16 [scrapy.extensions.logstats] INFO: Crawled 560 pages (at 16 pages/min), scraped 517 items (at 15 items/min) 2023-08-19 10:20:16 [scrapy.extensions.logstats] INFO: Crawled 577 pages (at 17 pages/min), scraped 533 items (at 16 items/min) 2023-08-19 10:21:16 [scrapy.extensions.logstats] INFO: Crawled 595 pages (at 18 pages/min), scraped 550 items (at 17 items/min) 2023-08-19 10:22:16 [scrapy.extensions.logstats] INFO: Crawled 610 pages (at 15 pages/min), scraped 564 items (at 14 items/min) 2023-08-19 10:23:16 [scrapy.extensions.logstats] INFO: Crawled 626 pages (at 16 pages/min), scraped 579 items (at 15 items/min) 2023-08-19 10:24:16 [scrapy.extensions.logstats] INFO: Crawled 642 pages (at 16 pages/min), scraped 594 items (at 15 items/min) 2023-08-19 10:25:16 [scrapy.extensions.logstats] INFO: Crawled 659 pages (at 17 pages/min), scraped 610 items (at 16 items/min) 2023-08-19 10:26:16 [scrapy.extensions.logstats] INFO: Crawled 675 pages (at 16 pages/min), scraped 625 items (at 15 items/min) 2023-08-19 10:27:16 [scrapy.extensions.logstats] INFO: Crawled 691 pages (at 16 pages/min), scraped 640 items (at 15 items/min) 2023-08-19 10:28:16 [scrapy.extensions.logstats] INFO: Crawled 707 pages (at 16 pages/min), scraped 655 items (at 15 items/min) 2023-08-19 10:29:16 [scrapy.extensions.logstats] INFO: Crawled 722 pages (at 15 pages/min), scraped 669 items (at 14 items/min) 2023-08-19 10:30:16 [scrapy.extensions.logstats] INFO: Crawled 738 pages (at 16 pages/min), scraped 684 items (at 15 items/min) 2023-08-19 10:31:16 [scrapy.extensions.logstats] INFO: Crawled 754 pages (at 16 pages/min), scraped 699 items (at 15 items/min) 2023-08-19 10:32:16 [scrapy.extensions.logstats] INFO: Crawled 769 pages (at 15 pages/min), scraped 713 items (at 14 items/min) 2023-08-19 10:33:16 [scrapy.extensions.logstats] INFO: Crawled 786 pages (at 17 pages/min), scraped 729 items (at 16 items/min) 2023-08-19 10:34:16 [scrapy.extensions.logstats] INFO: Crawled 803 pages (at 17 pages/min), scraped 745 items (at 16 items/min) 2023-08-19 10:35:16 [scrapy.extensions.logstats] INFO: Crawled 819 pages (at 16 pages/min), scraped 760 items (at 15 items/min) 2023-08-19 10:36:16 [scrapy.extensions.logstats] INFO: Crawled 836 pages (at 17 pages/min), scraped 776 items (at 16 items/min) 2023-08-19 10:37:16 [scrapy.extensions.logstats] INFO: Crawled 852 pages (at 16 pages/min), scraped 790 items (at 14 items/min) 2023-08-19 10:38:16 [scrapy.extensions.logstats] INFO: Crawled 869 pages (at 17 pages/min), scraped 806 items (at 16 items/min) 2023-08-19 10:39:16 [scrapy.extensions.logstats] INFO: Crawled 884 pages (at 15 pages/min), scraped 818 items (at 12 items/min) 2023-08-19 10:40:16 [scrapy.extensions.logstats] INFO: Crawled 900 pages (at 16 pages/min), scraped 833 items (at 15 items/min) 2023-08-19 10:41:16 [scrapy.extensions.logstats] INFO: Crawled 917 pages (at 17 pages/min), scraped 849 items (at 16 items/min) 2023-08-19 10:42:16 [scrapy.extensions.logstats] INFO: Crawled 934 pages (at 17 pages/min), scraped 865 items (at 16 items/min) 2023-08-19 10:43:16 [scrapy.extensions.logstats] INFO: Crawled 949 pages (at 15 pages/min), scraped 879 items (at 14 items/min) 2023-08-19 10:44:16 [scrapy.extensions.logstats] INFO: Crawled 965 pages (at 16 pages/min), scraped 894 items (at 15 items/min) 2023-08-19 10:45:16 [scrapy.extensions.logstats] INFO: Crawled 980 pages (at 15 pages/min), scraped 908 items (at 14 items/min) 2023-08-19 10:46:16 [scrapy.extensions.logstats] INFO: Crawled 997 pages (at 17 pages/min), scraped 924 items (at 16 items/min) 2023-08-19 10:47:16 [scrapy.extensions.logstats] INFO: Crawled 1013 pages (at 16 pages/min), scraped 939 items (at 15 items/min) 2023-08-19 10:47:45 [scrapy.core.engine] INFO: Closing spider (finished) 2023-08-19 10:47:45 [scrapy.extensions.feedexport] INFO: Stored csv feed (948 items) in: 0819.csv
試したこと
UserAgentの変更はやってみましたが結果は同じでした。

回答1件
あなたの回答
tips
プレビュー