Python3でEC.presence_of_element_locatedを使用してもうまくいかない

#前提
Python3でaguseのブラックリスト判定の結果をスクレイピングで自動的に収集する、というプログラムを作りたいです

プログラムの動作は
__1.__調べたいIPアドレスの一覧を予めIPlist.txtというファイルに保存
__2.__aguseの検索画面にアクセスしIPlist.txtから検索するIPアドレスを一つずつ入力
__3.__検索結果画面の下部にあるブラックリストの判定結果部分が全てsafe判定ならsafe、一つでもcaution判定ならcautionとする
※判定結果はリストにappendしていきます
__4.__IPlist.txt無いのIPアドレスを全て検索するまで、1~3を繰り返す
となっています。

こちらで書いたソースコードを以下に示します。

# -*- coding: Shift-JIS -*

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

from time import sleep
import os, argparse, csv

import pyautogui as pgui

from bs4 import BeautifulSoup

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from requests.exceptions import Timeout
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import UnexpectedAlertPresentException


driver = webdriver.Chrome(r'Path_to_chromedriver.exe')

#IPlist.txtのIPアドレス一覧を入れる用のリスト
search = []

#判定結果を入れる用のリスト
result = []

with open(r'IPlist.txt', encoding='utf-8') as f:

    for rows in f:
        row = rows.rstrip('\n\n')
        search.append(row)

for ip in search:
    try:
        driver.get('https://www.aguse.jp/')
        sleep(1)
        #検索欄を選択
        id=driver.find_element_by_id('url')
        #searchからipアドレスを一つずつ検索欄に入力する
        id.send_keys(ip)
  
    
        #検索開始ボタンを押下する
        driver.find_element_by_class_name('btn1').click()
        
        #判定結果が全てsafeになるまで待機
        WebDriverWait(driver, 60).until(
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_wwwphishtankcom"]/img[@alt="safe]')) and
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_codegooglecomphish"]/img[@alt="safe"]')) and
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_codegooglecomblack"]/img[@alt="safe"]')) and
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_bbarracudacentralorg"]/img[@alt="safe"]')) and
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_sbl-xblspamhausorg"]/img[@alt="safe"]')) and
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result__multisurblorg"]/img[@alt="safe"]')) and
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result___multisurblorg"]/img[@alt="safe"]')) and
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result____multisurblorg"]/img[@alt="safe"]')) and
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_cblabuseatorg"]/img[@alt="safe"]'))
        )
        
        #遷移先のhtmlを取得
        source = driver.page_source
        soup = BeautifulSoup(source, "html.parser")
    

    except TimeoutException:
        print('timeout')
        sleep(2)
        result.append("timeout")
        handle = driver.window_handles
        driver.switch_to_window(handle[0])
        pgui.keyDown('esc')
        pgui.keyUp('esc')
        continue
    

    except UnexpectedAlertPresentException:
        print('timeout')
        sleep(3)
        result.append("timeout")
        handle = driver.window_handles
        driver.switch_to_window(handle[0])
        pgui.keyDown('esc')
        pgui.keyUp('esc')
        continue
    

    wwwphishtankcom = soup.select('div#BL_result_wwwphishtankcom')
    a = wwwphishtankcom[0].select('img')
    a_text = a[0].attrs['alt']
    print(a_text)
    
    codegooglecomphish = soup.select('div#BL_result_codegooglecomphish')
    b = codegooglecomphish[0].select('img')
    b_text = b[0].attrs['alt']
    print(b_text)

    codegooglecomblack = soup.select('div#BL_result_codegooglecomblack')
    c = codegooglecomblack[0].select('img')
    c_text = c[0].attrs['alt']
    print(c_text)

    bbarracudacentralorg = soup.select('div#BL_result_bbarracudacentralorg')
    d = bbarracudacentralorg[0].select('img')
    d_text = d[0].attrs['alt']
    print(d_text)
    
    sbl_xblspamhausorg = soup.select('div#BL_result_sbl-xblspamhausorg')
    e = sbl_xblspamhausorg[0].select('img')
    e_text = e[0].attrs['alt']
    print(e_text)
    
    multisurblorg = soup.select('div#BL_result__multisurblorg')
    f = multisurblorg[0].select('img')
    f_text = f[0].attrs['alt']
    print(f_text)
    
    ___multisurblorg = soup.select('div#BL_result___multisurblorg')
    g = ___multisurblorg[0].select('img')
    g_text = g[0].attrs['alt']
    print(g_text)

    ____multisurblorg = soup.select('div#BL_result____multisurblorg')
    h = ___multisurblorg[0].select('img')
    h_text = h[0].attrs['alt']
    print(h_text)

    cblabuseatorg = soup.select('div#BL_result_cblabuseatorg')
    i = cblabuseatorg[0].select('img')
    i_text = i[0].attrs['alt']
    print(i_text)
    
    #全て一致していたらsafe
    if a==b==c==d==e==f==g==h==i:
            result.append('safe')
    #1つでもcoutionならcoution
    else:
        result.append("coution")
    
    print(result)

with open(r'Path_to_result_aguse.txt', mode='a') as f:
    for rows in result:
        f.write(rows+"\n")

#質問内容
##まずaguseの挙動について
aguseは検索画面に遷移した直後、検索結果画面の下部にあるブラックリストの判定結果が<img src="image/indicator.gif">となっており、
<img src="image/indicator.gif">が<img alt="safe" src="/image/judge-safe.gif">になった時にsafeと表示され、
<img src="image/indicator.gif">が<img alt="caution" src="/image/judge-caution.gif">になった時にcautionと表示されます。

#aguesの挙動に対応する待ち時間について
欲しい情報は<img src="image/indicator.gif">が<img alt="safe" src="/image/judge-safe.gif">変化するか、
<img src="image/indicator.gif">が<img alt="caution" src="/image/judge-caution.gif">変化したときに表示されるため、
変化するまでの待ち時間を以下の様にしました。
※上記プログラム内にも同様のコードが記述されています。

        #判定結果が全てsafeになるまで待機
        WebDriverWait(driver, 60).until(
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_wwwphishtankcom"]/img[@alt="safe]')) and
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_codegooglecomphish"]/img[@alt="safe"]')) and
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_codegooglecomblack"]/img[@alt="safe"]')) and
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_bbarracudacentralorg"]/img[@alt="safe"]')) and
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_sbl-xblspamhausorg"]/img[@alt="safe"]')) and
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result__multisurblorg"]/img[@alt="safe"]')) and
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result___multisurblorg"]/img[@alt="safe"]')) and
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result____multisurblorg"]/img[@alt="safe"]')) and
        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_cblabuseatorg"]/img[@alt="safe"]'))
        )

しかしこのとき

b_text = b[0].attrs['alt']
KeyError: 'alt'

というエラーが発生してしまいます。

こちらの想定ではaltという要素が出現するまで、上限の1分間まで待ち続けるはずなのに、なぜalt要素が出現する前に次の動作に移ってしまうのか分からないです。

何が原因か教えていただきたいです。

行動規範の内容に同意します

回答1件

ベストアンサー

seleniumを使ったことはまったくないのですが、仕様や実装を見る限りand演算できなさそうです。

7. WebDriver API — Selenium Python Bindings 2 documentation
selenium/expected_conditions.py at master · SeleniumHQ/selenium · GitHub

特にboolへの変換が定義されていないということは単純にobject.__bool__が呼ばれ、pythonのオブジェクトはデフォルトでTrueの真理値に変換されるので・・・ということです。

また、untilメソッドは引数をcallableとみなして呼び出すらしいので、質問文みたいなことをやりたければ以下を試してください。真っ当なコーディングかどうかは正直わかりません。

python
1WebDriverWait(driver, 60).until(lambda driver: 
2        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_wwwphishtankcom"]/img[@alt="safe]'))(driver) and
3        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_codegooglecomphish"]/img[@alt="safe"]'))(driver) and
4        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_codegooglecomblack"]/img[@alt="safe"]'))(driver) and
5        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_bbarracudacentralorg"]/img[@alt="safe"]'))(driver) and
6        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_sbl-xblspamhausorg"]/img[@alt="safe"]'))(driver) and
7        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result__multisurblorg"]/img[@alt="safe"]'))(driver) and
8        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result___multisurblorg"]/img[@alt="safe"]'))(driver) and
9        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result____multisurblorg"]/img[@alt="safe"]'))(driver) and
10        EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_cblabuseatorg"]/img[@alt="safe"]'))(driver)
11        )

投稿2019/03/10 04:37

hayataka2049

総合スコア30939

SZR0601

2019/03/11 15:10

ご回答いただき、ありがとうございます。教えていただいた方法でもダメでした・・・言えて頂いたコードを実行するとtimeoutになってしまうようで、webDriverWaitの段階で止まってしまうようです。自分なり考えて以下のようにコードを書き変えました。 ``` WebDriverWait(driver, 60).until(lambda drivers: EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_wwwphishtankcom"]/img[@alt="safe]')) and EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_codegooglecomphish"]/img[@alt="safe"]')) and EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_codegooglecomblack"]/img[@alt="safe"]')) and EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_bbarracudacentralorg"]/img[@alt="safe"]')) and EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_sbl-xblspamhausorg"]/img[@alt="safe"]')) and EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result__multisurblorg"]/img[@alt="safe"]')) and EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result___multisurblorg"]/img[@alt="safe"]')) and EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result____multisurblorg"]/img[@alt="safe"]')) and EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_cblabuseatorg"]/img[@alt="safe"]'))(drivers) ) ``` 書き換えると動くようにはなったんですが、下記のエラーが再度発生してしまいました。 ``` b_text = b[0].attrs['alt'] KeyError: 'alt' ```

hayataka2049

2019/03/11 15:31 編集

EC.presence_of_element_located((By.XPATH, '//*[@id="BL_result_wwwphishtankcom"]/img[@alt="safe]')) のような部分はすべてcallableでして、呼び出さないと意味を持ちません。timeoutになったのであれば、それでプログラムとしては正常に動作しているはずです。ただ、ロジックエラーがあるのだと思います。

hayataka2049

2019/03/11 15:34 編集

いや、とりあえず一番上のxpathに凡ミスがありますね（img[@alt="safe]）条件式は本当にandで合っていますか？　一部はorなのかも、と思ったりするのですが（質問文の記述だとそう読めなくもない）。

SZR0601

2019/03/13 16:04

ご回答いただき、ありがとうございます。見つけていただいた凡ミスを直したら、正常に動きました！本当にありがとうございます。また、条件式については、色々考えた結果、 /img[@alt="safe"]'を /img[@alt!="indicator"]'に書き換えることでブラックリストの判定結果にcautionがある場合にも正常に動作するようになりました。本当にありがとうございました。

行動規範の内容に同意します