BeautifulSoupでのfindの使い方について

スクレイピングで疑問なのです。

#現象

event_res = requests.get(user_url, params={'page': page_num})
e = BeautifulSoup(event_res.text, 'html.parser')

findでpタグを探した時は.textでアクセスできました。

date = e.find('p', class_='hoge1').text
#なんかテキストがとれる。

ですがspanタグの時は.textが見つかりません。
これは何故ですか？モジラサイトのタグの詳細を見ても特に書いてない気がして困っております。
やりたいことはタグを除去すれば解決するのですが、なぜ.textが使えないのかをご存知の方いらっしゃいますでしょうか。

series_title = e.find('span', class_='hoge2')
#<span class="hoge2">aaaa</span>

#具体的に

connpassのサイトをスクレイピングするこのコードをいじってます。
https://github.com/achiku/cnps/blob/master/cnps/dump.py#L52

def _parse_event(soup):
    event_dates = []
    event_dates_soup = soup.find_all('div', class_='event_list vevent')
    for e in event_dates_soup:
        year = e.find('p', class_='year').text
        date = e.find('p', class_='date').text
        status = e.find('p', class_='label_status_tag').getText()
        series_title = e.find('span', class_='series_title') #追加
        a = e.find('a', class_='url summary').text #追加

        status_label = ''
        if status == 'キャンセル':
            status_label = 'canceled'
        elif status == '補欠':
            status_label = 'on_waitlist'
        elif status == '申込済':
            status_label = 'applyed'
        elif status == '抽選中':
            status_label = 'in_lottery'

        dt = datetime.strptime("{0}/{1}".format(year, date), '%Y/%m/%d')
        event_dates.append({'status': status_label, 'date': dt, 'series_title': series_title, 'a': a})
        # print("event")
        print(type(a))
        print(type(series_title))
    return event_dates

#実行方法

git clone https://github.com/achiku/cnps.git
上の修正を入れる
python setup.py install
cnps dump https://kikaigakushuu.connpass.com/event/65221/

ブラウザで直接見てみる
https://connpass.com/user/keiichirou_miyamoto/

ブラウザのコンソールで叩く。

$x('//p[@class="date"]') #正常
$x('//p[@class="date"][0]') #エラー
>(10) [p.date, p.date, p.date, p.date, p.date, p.date, p.date, p.date, p.date, p.date] #この状態もよくわからず。一個一個の要素にブラウザからアクセスするほうほうがわからない。配列になってるからループで回さないとダメな気もする。
$x('//p[@class="date"].text') #エラー

色々やったのですがよくわかってないので
もしご存知の方いらっしゃいましたらご教示お願いいたします。

行動規範の内容に同意します

回答1件

普通にspanタグでもtext利用できますね。もう一度確認してみることをおすすめします。多分textが使えないというエラーはそもそもfindの時点でタグが見つかってない(=None)なんじゃないですかね。

python
1from bs4 import BeautifulSoup
2
3html = """
4<html>
5  <head></head>
6  <body>
7    <p class="hoge1">ptext</p>
8    <span class="hoge2">spantext</span>
9  </body>
10</html>
11"""
12
13soup = BeautifulSoup(html, 'html.parser')
14print(soup.find('p', class_='hoge1').text)
15print(soup.find('span', class_='hoge2').text)
16