【Python】seleniumを並列処理したいが、エラーが出ます...

前提・実現したいこと

Python初心者です。seleniumをつかって東京メトロのホームページから運行情報を取得しているのですが、時間がかかってしまうため、multiprocessingを使って並列処理をしたいと思いました。いろいろなサイトを参考にして、コードを書いてみたのですが、エラーが出てしまいます。自分なりに状況やエラー文から検索したのですが、さっぱり解決策がわかりませんでした。。。どなたか教えていただけると幸いです。

コード

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
from multiprocessing import Pool
import chromedriver_binary

def fetchclass(url):
    driver.get(url)
    time.sleep(3)
    html = driver.page_source
    soup = BeautifulSoup(html, 'lxml')
    text = soup.find(class_='v2_unkouReportInfo').text
    return print(text.strip())

# ブラウザーを起動
options = Options()
options.add_argument('--headless')
driver = webdriver.Chrome(options=options)

urls=['https://www.tokyometro.jp/unkou/history/ginza.html','https://www.tokyometro.jp/unkou/history/marunouchi.html',
      'https://www.tokyometro.jp/unkou/history/hibiya.html','https://www.tokyometro.jp/unkou/history/touzai.html',
      'https://www.tokyometro.jp/unkou/history/chiyoda.html','https://www.tokyometro.jp/unkou/history/yurakucho.html',
      'https://www.tokyometro.jp/unkou/history/hanzoumon.html','https://www.tokyometro.jp/unkou/history/nanboku.html',
      'https://www.tokyometro.jp/unkou/history/fukutoshin.html']

if __name__ == "__main__":
    p=Pool(4)
    result = p.map(fetchclass,urls)
    print(result)

# ブラウザーを終了
driver.quit()
```

### エラー文抜粋
```ここに言語を入力
multiprocessing.pool.RemoteTraceback: 

ConnectionRefusedError: [WinError 10061] 対象のコンピューターによって拒否されたため、接続できませんでした。

During handling of the above exception, another exception occurred:

urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x00000206A123B688>: Failed to establish a new connection: [WinError 10061] 対象のコンピューターによって拒否されたため、接続できませんでした。

The above exception was the direct cause of the following exception:
```

行動規範の内容に同意します

回答3件

こちらでステータス取得すれば１回ですみます

下の方に日本語があります。

https://www.tokyometro.jp/library/common/operation/status.json

python
1import requests
2import json
3
4url = "https://www.tokyometro.jp/library/common/operation/status.json"
5
6r = requests.get(url)
7
8r.raise_for_status()
9
10# jsonpからjsonに変換
11data_json = r.text.split("(", 1)[1].strip(")")
12
13result = json.loads(data_json)
14
15for line in result["jp"]["lines"]:
16    print(f"{line['line_name']}:{line['contents']}")

投稿2019/11/24 03:03

編集2019/11/24 03:26

barobaro

総合スコア1286

tsukas

2019/11/24 12:08

自分がやりたいことに対して１００％完璧な回答でした...ありがとうございます！このURLはどのようにして見つけることができるのでしょうか...？

barobaro

2019/11/25 01:43

前にこちらに説明しているので https://teratail.com/questions/164050 responseで内容確認したらいいです。

行動規範の内容に同意します

ベストアンサー

Python
1from bs4 import BeautifulSoup
2from selenium import webdriver
3from selenium.webdriver.chrome.options import Options
4import time,sys
5from multiprocessing import Pool
6
7#import chromedriver_binary
8
9def fetchclass(url):
10    options = Options()
11    options.add_argument('--headless')
12    driver = webdriver.Chrome(options=options)
13    driver.get(url)
14    time.sleep(5)
15    html = driver.page_source
16    soup = BeautifulSoup(html, 'lxml')
17    text = soup.find(class_='v2_unkouReportInfo').text
18    #sys.stdout.buffer.write(text.encode('utf-8'))
19    driver.quit()
20    return (text.strip())
21    
22
23
24urls=['https://www.tokyometro.jp/unkou/history/ginza.html','https://www.tokyometro.jp/unkou/history/marunouchi.html',
25      'https://www.tokyometro.jp/unkou/history/hibiya.html','https://www.tokyometro.jp/unkou/history/touzai.html',
26      'https://www.tokyometro.jp/unkou/history/chiyoda.html','https://www.tokyometro.jp/unkou/history/yurakucho.html',
27      'https://www.tokyometro.jp/unkou/history/hanzoumon.html','https://www.tokyometro.jp/unkou/history/nanboku.html',
28      'https://www.tokyometro.jp/unkou/history/fukutoshin.html']
29
30if __name__ == "__main__":
31    
32    with Pool(9) as p:
33        result = (p.map(fetchclass, urls))
34        for l in result:
35            sys.stdout.buffer.write(l.encode('utf-8'))
36            #print(l)
37