Pythonのwebスクレイピングを高速化したい。

Pythonでwebのスクレイピングをしていたのですが、遅すぎて、どこのコードが遅くしている原因であるのかわかる方がいましたらご教授いただきたいですm()m
私のコードの順序として

page移動をfor loopで回す
ページごとにsoupを作り、for loopで回して
項目一回ごとにpandasに追加する形をとっています。

from bs4 import BeautifulSoup
import pandas as pd
import datetime
import urllib.request, urllib.error

Python
1basic_url = "-----"
2url = "---"
3count = 0#page1
4follower = 100000
5df = pd.DataFrame(columns=[、、、、、、、])
6
7while follower>10000:
8  if count > 0:
9    new_url = basic_url + url + "/" + str(count)
10  else:
11    new_url = basic_url + url
12
13  html = urllib.request.urlopen(url=new_url)
14  soup = BeautifulSoup(html, 'lxml')
15  for i in range(50):
16    try:
17      profile = soup('div', id="sideBreak4")[i].get_text().replace('\r','').replace('\n','').replace('\t','')
18      ranking  =int(soup('td',class_="col1")[i].get_text())
19      name = soup('td',class_="col3")[i].select('a')[0].get_text()#name
20      idd = soup('td',class_="col3")[i].select('a')[1].get_text()
21      follower = int(soup('span',class_="red")[i].get_text().replace(',',''))
22      follow = int(soup('td',class_="col4")[i].get_text())
23      post_count = int(soup('td',class_="col7")[i].get_text())
24      count += 50
25      one_article = pd.Series([ranking, name, idd, profile, follow, follower, post_count],
26                              [、、、、、、])
27      df = df.append(one_article, ignore_index=True)
28    except:
29      break
30df.to_csv(url + '.csv')

行動規範の内容に同意します

回答2件

Pythonでwebのスクレイピングをしていたのですが、遅すぎて、どこのコードが遅くしている原因であるのかわかる方がいましたらご教授いただきたいですm()m

基本的に、スクレイピングを高速で行うのは相手のサーバに迷惑です。相手先の許可無しでやる場合には、最大でも1ページ1秒ぐらいのペースにしておきましょう。

投稿2019/01/18 03:00

maisumakun

総合スコア145183

ベストアンサー

こういうのは自分でデバッグするべきです

下記を参考にしてひとつづつの処理時間を自分でみてみてください

Python
1from datetime import datetime
2t1=datetime.now()
3処理1
4t2=datetime.now()
5print(t2-t1)

投稿2019/01/18 02:59

yamato_user

総合スコア2321

あなたの回答

tips

プレビュー

行動規範の内容に同意します

質問の解決につながる回答をしましょう。サンプルコードなど、より具体的な説明があると質問者の理解の助けになります。また、読む側のことを考えた、分かりやすい文章を心がけましょう。

15分調べてもわからないことは
teratailで質問しよう！

ただいまの回答率
85.48%

質問をまとめることで
思考を整理して素早く解決

テンプレート機能で
簡単に質問をまとめる

質問する

質問をすることでしか得られない、回答やアドバイスがある。

15分調べてもわからないことは、質問しよう！

Pythonのwebスクレイピングを高速化したい。

関連した質問