質問するログイン新規登録

回答編集履歴

4

Update

2021/11/23 12:10

投稿

melian
melian

スコア21527

answer CHANGED
@@ -17,4 +17,14 @@
17
17
  Following the announcement of COTI’s growth plan, various media outlets have provided coverage
18
18
  on COTI’s roadmap to become a next-generation financial ecosystem. COTI was recently featured on
19
19
  Crypto New Flash, CoinQuora, and U.TODAY.
20
+ ```
21
+
22
+ ※ どうやら `class` attribute の値が `p` タグごとに微妙に異なっているらしく、最初の paragraph しか取れていません。
23
+ ```python
24
+ class="hy hz ct ia b ib ic id ie if ig ih ii ij ik il im in io ip iq ir is it iu iv cl dq"
25
+ ```
26
+
27
+ 最初の 8 文字(`hy hz ct`)で照合すると `4626` 文字抽出されて本文全体が取れている様な感じです。
28
+ ```python
29
+ text = '\n'.join(i.text for i in soup.select(f'p[class*="{cls_name[:8]}"]'))
20
30
  ```

3

Update

2021/11/23 12:10

投稿

melian
melian

スコア21527

answer CHANGED
@@ -7,7 +7,8 @@
7
7
  r = requests.get(url)
8
8
  soup = BeautifulSoup(r.content , 'html.parser')
9
9
 
10
+ # p tag element which has id and class attributes
10
- cls_name = ' '.join(soup.select_one('p').get('class'))
11
+ cls_name = ' '.join(soup.select_one('p[id][class]').get('class'))
11
12
  text = '\n'.join(i.text for i in soup.select(f'p[class="{cls_name}"]'))
12
13
 
13
14
  print(text)

2

Update

2021/11/23 11:59

投稿

melian
melian

スコア21527

answer CHANGED
@@ -1,4 +1,4 @@
1
- `class` attribute に `hy hz ct ia` という文字列を含む `p` 要素を抽出ます。これで本文全体を取得きているかどうかは、、不明です。
1
+ 最初に `soup.select_one()` `p` タグ要素を一つ取得て `class` attribute の値を取得しておます。そして、その値を使って本文を抽出します。
2
2
  ```python
3
3
  import requests
4
4
  from bs4 import BeautifulSoup
@@ -7,16 +7,13 @@
7
7
  r = requests.get(url)
8
8
  soup = BeautifulSoup(r.content , 'html.parser')
9
9
 
10
+ cls_name = ' '.join(soup.select_one('p').get('class'))
10
- text = '\n'.join(i.text for i in soup.select('p[class*="hy hz ct ia"]'))
11
+ text = '\n'.join(i.text for i in soup.select(f'p[class="{cls_name}"]'))
11
12
 
12
- print(len(text))
13
13
  print(text)
14
14
 
15
- #
16
- 4626
17
- Following the announcement of COTI’s growth plan, various media outlets have provided coverage on COTI’s roadmap to become a next-generation financial ecosystem. COTI was recently featured on Crypto New Flash, CoinQuora, and U.TODAY.
18
- Why is this news so groundbreaking? First and foremost, it’s notable that enterprises and merchants across the world are beginning to accept crypto payments.
19
-
20
- :
21
-
15
+ # 適宜改行を入れています
16
+ Following the announcement of COTI’s growth plan, various media outlets have provided coverage
17
+ on COTI’s roadmap to become a next-generation financial ecosystem. COTI was recently featured on
18
+ Crypto New Flash, CoinQuora, and U.TODAY.
22
19
  ```

1

Update

2021/11/23 11:29

投稿

melian
melian

スコア21527

answer CHANGED
@@ -1,4 +1,4 @@
1
- `class` attribute に `hy hz ct ia` という文字列を含む HTML を抽出します。これで本文全体を取得できているかどうかは、、、不明です。
1
+ `class` attribute に `hy hz ct ia` という文字列を含む `p` 要素を抽出します。これで本文全体を取得できているかどうかは、、、不明です。
2
2
  ```python
3
3
  import requests
4
4
  from bs4 import BeautifulSoup
@@ -16,6 +16,7 @@
16
16
  4626
17
17
  Following the announcement of COTI’s growth plan, various media outlets have provided coverage on COTI’s roadmap to become a next-generation financial ecosystem. COTI was recently featured on Crypto New Flash, CoinQuora, and U.TODAY.
18
18
  Why is this news so groundbreaking? First and foremost, it’s notable that enterprises and merchants across the world are beginning to accept crypto payments.
19
+
19
20
  :
20
21
 
21
22
  ```