Beautiful soup 特定の文字列を抽出したい。

環境：　Windows10　python3　jupyternote

とあるサイトから　Beautiful soup　を用いて特定の文字列を抽出したいのですが思い通りにできなくて困っています。
状況といたしまして

from bs4 import BeautifulSoup
import urllib.request as req
parse_html = BeautifulSoup(html,'html.parser')
url= "https://yakkun.com/swsh/zukan/n812"
aaa = parse_html.find_all('tr')
bbb = aaa[15:30]
print(bbb)

とすると
[<tr><th class="left" colspan="6"><a href="#" id="race" name="race" onclick="helpZukan(event,'race');return false;" onkeypress="helpZukan(event,'race');return false;"><img alt="ヘルプ" class="help" src="//78npc3br.user.webaccel.jp/page/help.gif"/></a>◆ ゴリランダーの種族値</th></tr>,

<tr><td class="c1" style="width:125px;">HP</td><td class="left" colspan="5"><img src="//78npc3br.user.webaccel.jp/bar.gif" style="width:60px;height:10px"/> 100</td></tr>, <tr><td class="c1">こうげき</td><td class="left" colspan="5"><img src="//78npc3br.user.webaccel.jp/bar.gif" style="width:75px;height:10px"/> 125</td></tr>, <tr><td class="c1">ぼうぎょ</td><td class="left" colspan="5"><img src="//78npc3br.user.webaccel.jp/bar.gif" style="width:54px;height:10px"/> 90</td></tr>, <tr><td class="c1">とくこう</td><td class="left" colspan="5"><img src="//78npc3br.user.webaccel.jp/bar.gif" style="width:36px;height:10px"/> 60</td></tr>, <tr><td class="c1">とくぼう</td><td class="left" colspan="5"><img src="//78npc3br.user.webaccel.jp/bar.gif" style="width:42px;height:10px"/> 70</td></tr>, <tr><td class="c1">すばやさ</td><td class="left" colspan="5"><img src="//78npc3br.user.webaccel.jp/bar.gif" style="width:51px;height:10px"/> 85</td></tr>, <tr><td class="c1">平均 / 合計</td><td class="left" colspan="5"><img src="//78npc3br.user.webaccel.jp/bar.gif" style="width:52.98px;height:10px"/> 88.3 / 530</td></tr>, <tr><th class="left"><a href="#" id="stats" name="stats" onclick="helpZukan(event,'stats');return false;" onkeypress="helpZukan(event,'stats');return false;"><img alt="ヘルプ" class="help" src="//78npc3br.user.webaccel.jp/page/help.gif"/></a>◆ 実数値</th><th>最高</th><th>準</th><th>無振</th><th>下降</th><th>最低</th></tr>, <tr><td class="c1">HP</td><td>207</td><td>207</td><td>175</td><td>175</td><td>160</td></tr>, <tr><td class="c1">こうげき</td><td>194</td><td>177</td><td>145</td><td>130</td><td>117</td></tr>, <tr><td class="c1">ぼうぎょ</td><td>156</td><td>142</td><td>110</td><td>99</td><td>85</td></tr>, <tr><td class="c1">とくこう</td><td>123</td><td>112</td><td>80</td><td>72</td><td>58</td></tr>, <tr><td class="c1">とくぼう</td><td>134</td><td>122</td><td>90</td><td>81</td><td>67</td></tr>, <tr><td class="c1">すばやさ</td><td>150</td><td>137</td><td>105</td><td>94</td><td>81</td></tr>] と表示されます。ここからゴリランダーの種族値 HP　100 こうげき　125 ぼうぎょ　90

中略

すばやさ　150　137　105　94　81

といった感じで抽出したいのですがどうやったらいいのかわからないです。
ちなみに

ccc = aaa.find_all(class_="c1")
print(ccc)
としたら

<td class="c1" style="width:125px;">HP</td>, <td class="c1">こうげき</td>, <td class="c1">ぼうぎょ</td>, <td class="c1">とくこう</td>, <td class="c1">とくぼう</td>, <td class="c1">すばやさ</td>, <td class="c1">平均 / 合計</td>, なり, ccc[0].string を書くと 'HP' と表示されますが

ddd = aaa.find_all(class_="left")
ddd[1].string
と書いても何も表示されないです。

図鑑リストからリンクさせて表示させるようなコードを書いて実行したら
このページだけ

from bs4 import BeautifulSoup
import urllib.request as req

url= "https://yakkun.com/swsh/zukan/n421"
html = req.urlopen(url)
parse_html = BeautifulSoup(html,'html.parser')
table = parse_html.find(class_="table layout_right")
for tr in table.find_all('tr'):
if '種族値' in tr.text:
title = tr.text
print(title)
hitpoint = tr.find_next_sibling()
attack = hitpoint.find_next_sibling()
defence = attack.find_next_sibling()
special_attack = defence.find_next_sibling()
special_defence = special_attack.find_next_sibling()
speed = special_defence.find_next_sibling()
average = speed.find_next_sibling()

    print(hitpoint.text.split())
    print(attack.text.split())
    print(defence.text.split())
    print(special_attack.text.split())
    print(special_defence.text.split())
    print(speed.text.split())
    print(average.text.split())

実行すると

◆ チェリムの種族値
['HP', '70']
['こうげき', '60']
['ぼうぎょ', '70']
['とくこう', '87']
['とくぼう', '78']
['すばやさ', '85']
['平均', '/', '合計', '75.0', '/', '450']
フラワーギフト天気が『にほんばれ』の時、ポジフォルムにフォルムチェンジする。種族値などは変わらないが、自分とすべての味方の『こうげき』『とくぼう』が1.5倍になる。

と表示され
AttributeError Traceback (most recent call last)
<ipython-input-4-ed11b57b4eb5> in <module>
8 special_attack = defence.find_next_sibling()
9 special_defence = special_attack.find_next_sibling()
---> 10 speed = special_defence.find_next_sibling()
11 average = speed.find_next_sibling()
12

AttributeError: 'NoneType' object has no attribute 'find_next_sibling'

エラーになります。このページだけ他と違う理由がわからないです。

行動規範の内容に同意します

回答2件

ddd[1].stringをddd[1].textに変えてみてください。
ざっくり言うと、stringは目的のタグの中に別のタグがあると文字列を取り出せません。
参考：
BeautifulSoupの.stringを利用してタグ内のデータを表示しようとするとNoneが返ってきてしまう時の解消法です。

投稿2020/10/22 23:32

jeanbiego

総合スコア3966

ベストアンサー

jeanbiegoさんが回答されていらっしゃる様に
どちらもテキストを取得する事ができるメソッドですが
.stringでは対象に子要素がある場合には正常に値は返りません。
.textは対象の要素内に含まれる全ての文字列を取得します。

どこまでの範囲の取得をしたいのかわからない為以下はあくまでも参考例となります。

まずはlayout_rightクラスの要素を取得
for文でクラス内のtr要素を1つずつ検証していき
[実数値]という単語が含まれる要素を見つけたら場合にそこから先6個のtr要素を取得する(HPからすばやさまで)
tr要素の取得は、[実数値]の要素より.find_next_sibling()を使用し

次の兄弟要素を取得　これを6回繰り返している

それぞれのtr要素(HPからすばやさまで)を取得し終わったら

更にfor文でそれぞれのtd要素を1つずつ検証し、数値を取得していく

python
1from bs4 import BeautifulSoup
2import requests
3
4url = 'https://yakkun.com/swsh/zukan/n812'
5res = requests.get(url)
6soup = BeautifulSoup(res.content, 'html.parser')
7table = soup.find('div', class_='layout_right')
8
9for tr in table.find_all('tr'):
10	if '種族値' in tr.text:
11		title = tr.text
12		print(title)
13		hitpoint = tr.find_next_sibling()
14		attack = hitpoint.find_next_sibling()
15		defence = attack.find_next_sibling()
16		special_attack = defence.find_next_sibling()
17		special_defence = special_attack.find_next_sibling()
18		speed = special_defence.find_next_sibling()
19		average = speed.find_next_sibling()
20
21		print(hitpoint.text.split())
22		print(attack.text.split())
23		print(defence.text.split())
24		print(special_attack.text.split())
25		print(special_defence.text.split())
26		print(speed.text.split())
27		print(average.text.split())
28
29	if '実数値' in tr.text:
30		hitpoint = tr.find_next_sibling()
31		attack = hitpoint.find_next_sibling()
32		defence = attack.find_next_sibling()
33		special_attack = defence.find_next_sibling()
34		special_defence = special_attack.find_next_sibling()
35		speed = special_defence.find_next_sibling()
36
37		print([td.string for td in hitpoint.find_all('td')])
38		print([td.string for td in attack.find_all('td')])
39		print([td.string for td in defence.find_all('td')])
40		print([td.string for td in special_attack.find_all('td')])
41		print([td.string for td in special_defence.find_all('td')])
42		print([td.string for td in speed.find_all('td')])
43
44>> ◆ ゴリランダーの種族値
45>> ['HP', '100']
46>> ['こうげき', '125']
47>> ['ぼうぎょ', '90']
48>> ['とくこう', '60']
49>> ['とくぼう', '70']
50>> ['すばやさ', '85']
51>> ['平均', '/', '合計', '88.3', '/', '530']
52
53>> ['HP', '207', '207', '175', '175', '160']
54>> ['こうげき', '194', '177', '145', '130', '117']
55>> ['ぼうぎょ', '156', '142', '110', '99', '85']
56>> ['とくこう', '123', '112', '80', '72', '58']
57>> ['とくぼう', '134', '122', '90', '81', '67']
58>> ['すばやさ', '150', '137', '105', '94', '81']

以下はもっとスマートかもしれませんね。

python
1for tr in table.find_all('tr'):
2	if '種族値' in tr.text:
3		title = tr.text
4		print(title)
5		box = []
6		elem = tr
7		for i in range(7):
8			new_elem = elem.find_next_sibling()
9			box.append(new_elem)
10			elem = new_elem
11
12		data = [elems.text.split() for elems in box]
13		print(data)
14	if '実数値' in tr.text:
15		box = []
16		elem = tr
17		for i in range(6):
18			new_elem = elem.find_next_sibling()
19			box.append(new_elem)
20			elem = new_elem
21
22		data = [[td.string for td in elems.find_all('td')] for elems in box]
23		print(data)