回答編集履歴
4
Update
test
CHANGED
@@ -37,3 +37,23 @@
|
|
37
37
|
Crypto New Flash, CoinQuora, and U.TODAY.
|
38
38
|
|
39
39
|
```
|
40
|
+
|
41
|
+
|
42
|
+
|
43
|
+
※ どうやら `class` attribute の値が `p` タグごとに微妙に異なっているらしく、最初の paragraph しか取れていません。
|
44
|
+
|
45
|
+
```python
|
46
|
+
|
47
|
+
class="hy hz ct ia b ib ic id ie if ig ih ii ij ik il im in io ip iq ir is it iu iv cl dq"
|
48
|
+
|
49
|
+
```
|
50
|
+
|
51
|
+
|
52
|
+
|
53
|
+
最初の 8 文字(`hy hz ct`)で照合すると `4626` 文字抽出されて本文全体が取れている様な感じです。
|
54
|
+
|
55
|
+
```python
|
56
|
+
|
57
|
+
text = '\n'.join(i.text for i in soup.select(f'p[class*="{cls_name[:8]}"]'))
|
58
|
+
|
59
|
+
```
|
3
Update
test
CHANGED
@@ -16,7 +16,9 @@
|
|
16
16
|
|
17
17
|
|
18
18
|
|
19
|
+
# p tag element which has id and class attributes
|
20
|
+
|
19
|
-
cls_name = ' '.join(soup.select_one('p').get('class'))
|
21
|
+
cls_name = ' '.join(soup.select_one('p[id][class]').get('class'))
|
20
22
|
|
21
23
|
text = '\n'.join(i.text for i in soup.select(f'p[class="{cls_name}"]'))
|
22
24
|
|
2
Update
test
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
`class` attribute
|
1
|
+
最初に `soup.select_one()` で `p` タグ要素を一つ取得して `class` attribute の値を取得しておきます。そして、その値を使って本文を抽出します。
|
2
2
|
|
3
3
|
```python
|
4
4
|
|
@@ -16,28 +16,22 @@
|
|
16
16
|
|
17
17
|
|
18
18
|
|
19
|
+
cls_name = ' '.join(soup.select_one('p').get('class'))
|
20
|
+
|
19
|
-
text = '\n'.join(i.text for i in soup.select('p[class
|
21
|
+
text = '\n'.join(i.text for i in soup.select(f'p[class="{cls_name}"]'))
|
20
22
|
|
21
23
|
|
22
|
-
|
23
|
-
print(len(text))
|
24
24
|
|
25
25
|
print(text)
|
26
26
|
|
27
27
|
|
28
28
|
|
29
|
-
#
|
29
|
+
# 適宜改行を入れています
|
30
30
|
|
31
|
-
|
31
|
+
Following the announcement of COTI’s growth plan, various media outlets have provided coverage
|
32
32
|
|
33
|
-
|
33
|
+
on COTI’s roadmap to become a next-generation financial ecosystem. COTI was recently featured on
|
34
34
|
|
35
|
-
|
35
|
+
Crypto New Flash, CoinQuora, and U.TODAY.
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
:
|
40
|
-
|
41
|
-
|
42
36
|
|
43
37
|
```
|
1
Update
test
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
`class` attribute に `hy hz ct ia` という文字列を含む
|
1
|
+
`class` attribute に `hy hz ct ia` という文字列を含む `p` 要素を抽出します。これで本文全体を取得できているかどうかは、、、不明です。
|
2
2
|
|
3
3
|
```python
|
4
4
|
|
@@ -34,6 +34,8 @@
|
|
34
34
|
|
35
35
|
Why is this news so groundbreaking? First and foremost, it’s notable that enterprises and merchants across the world are beginning to accept crypto payments.
|
36
36
|
|
37
|
+
|
38
|
+
|
37
39
|
:
|
38
40
|
|
39
41
|
|