teratail header banner
teratail header banner
質問するログイン新規登録

回答編集履歴

1

BeautifulSoupを追加

2018/11/15 11:58

投稿

barobaro
barobaro

スコア1286

answer CHANGED
@@ -13,4 +13,51 @@
13
13
  item['time'] = table_row.xpath('td[1]/text()').extract_first()
14
14
  item['note'] = table_row.xpath('td[3]/img/@src').extract()
15
15
  yield item
16
- ```
16
+ ```
17
+
18
+ BeautifulSoupでしたら
19
+
20
+ ```python
21
+ from bs4 import BeautifulSoup
22
+
23
+ html = """
24
+ <table>
25
+ <tbody>
26
+ <tr>
27
+ <td>A3</td>
28
+ <td>B3</td>
29
+ <td>
30
+ <img src="../../media/test1.gif"> <!-- ◯ -->
31
+ <img src="../../media/test2.gif"> <!-- ★ -->
32
+ </td>
33
+ </tr>
34
+ <tr>
35
+ <td>A2</td>
36
+ <td>B2</td>
37
+ <td>
38
+ <img src="../../media/test3.gif">
39
+ </td>
40
+ </tr>
41
+ </tbody>
42
+ </table>
43
+ """
44
+
45
+ soup = BeautifulSoup(html, 'html.parser')
46
+
47
+ for trs in soup.select('tr'):
48
+
49
+ result = []
50
+
51
+ for tds in trs.select('td'):
52
+ if tds.img:
53
+ for i in tds.select('img'):
54
+ result.append(i.get('src'))
55
+ else:
56
+ result.append(tds.get_text(strip=True))
57
+
58
+ print(result)
59
+ ```
60
+
61
+ 結果
62
+ ['A3', 'B3', '../../media/test1.gif', '../../media/test2.gif']
63
+ ['A2', 'B2', '../../media/test3.gif']