回答編集履歴

Update

2021/11/23 12:10

投稿

スコア20675

test CHANGED Viewed

@@ -37,3 +37,23 @@
 Crypto New Flash, CoinQuora, and U.TODAY.
 ```
+※ どうやら `class` attribute の値が `p` タグごとに微妙に異なっているらしく、最初の paragraph しか取れていません。
+```python
+class="hy hz ct ia b ib ic id ie if ig ih ii ij ik il im in io ip iq ir is it iu iv cl dq"
+```
+最初の 8 文字(`hy hz ct`)で照合すると `4626` 文字抽出されて本文全体が取れている様な感じです。
+```python
+text = '\n'.join(i.text for i in soup.select(f'p[class*="{cls_name[:8]}"]'))
+```

Update

2021/11/23 12:10

投稿

スコア20675

test CHANGED Viewed

@@ -16,7 +16,9 @@
+# p tag element which has id and class attributes
-cls_name = ' '.join(soup.select_one('p').get('class'))
+cls_name = ' '.join(soup.select_one('p[id][class]').get('class'))
 text = '\n'.join(i.text for i in soup.select(f'p[class="{cls_name}"]'))

Update

2021/11/23 11:59

投稿

スコア20675

test CHANGED Viewed

@@ -1,4 +1,4 @@
-`class` attribute に `hy hz ct ia` という文字列を含む `p` 要素を抽出します。これで本文全体を取得できているかどうかは、、、不明です。
+最初に `soup.select_one()` で `p` タグ要素を一つ取得して `class` attribute の値を取得しておきます。そして、その値を使って本文を抽出します。
 ```python
@@ -16,28 +16,22 @@
+cls_name = ' '.join(soup.select_one('p').get('class'))
-text = '\n'.join(i.text for i in soup.select('p[class*="hy hz ct ia"]'))
+text = '\n'.join(i.text for i in soup.select(f'p[class="{cls_name}"]'))
-print(len(text))
 print(text)
-#
+# 適宜改行を入れています
-4626
+Following the announcement of COTI’s growth plan, various media outlets have provided coverage
-Following the announcement of COTI’s growth plan, various media outlets have provided coverage on COTI’s roadmap to become a next-generation financial ecosystem. COTI was recently featured on Crypto New Flash, CoinQuora, and U.TODAY.
+on COTI’s roadmap to become a next-generation financial ecosystem. COTI was recently featured on
-Why is this news so groundbreaking? First and foremost, it’s notable that enterprises and merchants across the world are beginning to accept crypto payments.
+Crypto New Flash, CoinQuora, and U.TODAY.
-                                  :
 ```

Update

2021/11/23 11:29

投稿

スコア20675

test CHANGED Viewed

@@ -1,4 +1,4 @@
-`class` attribute に `hy hz ct ia` という文字列を含む HTML を抽出します。これで本文全体を取得できているかどうかは、、、不明です。
+`class` attribute に `hy hz ct ia` という文字列を含む `p` 要素を抽出します。これで本文全体を取得できているかどうかは、、、不明です。
 ```python
@@ -34,6 +34,8 @@
 Why is this news so groundbreaking? First and foremost, it’s notable that enterprises and merchants across the world are beginning to accept crypto payments.
                                   :