回答率: 85.30%

質問するログイン新規登録

トップに関する質問 h2内のリンクを取得したい。webスクレイピング

編集履歴

質問編集履歴

2

enumerate追加

2018/08/28 10:08

投稿

スコア86

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -74,15 +74,13 @@
 alltxt = soup_content.get_text()
 with open('h2textlink.csv', 'w+',newline='',encoding='utf-8') as f:
-    n = 0
     writer = csv.writer(f, lineterminator='\n')
     std_link = 'http://kondou.com/BS4/'
-    for subheading in soup_content.find_all('h2'):
+    for n, subheading in enumerate(soup_content.find_all('h2')):
         sh = subheading.get_text()
         h2link = subheading.a['href']
         writer.writerow([n, sh, std_link + h2link])
-        n += 1
 pass
 ```
 ## コード実行結果

1

完成コード追加

2018/08/28 10:08

投稿

スコア86

title CHANGED Viewed

	@@ -1,1 +1,1 @@
1	- h3内のリンクを取得したい。webスクレイピング
1	+ h2内のリンクを取得したい。webスクレイピング

body CHANGED Viewed

@@ -56,4 +56,40 @@
 ...(略)
 ```
+# h2のテキストとリンク取得完了!
-以上よろしくお願いしますm(_)m
+おかげさまでできました! (≧∀≦)
+```python
+# -*- coding: utf-8 -*-
+from bs4 import BeautifulSoup
+import requests
+import csv
+"""
+csvファイルに<h2>のテキストとリンクを保存
+"""
+r = requests.get("http://kondou.com/BS4/index.html")
+soup_content = BeautifulSoup(r.content, "html.parser")
+alltxt = soup_content.get_text()
+with open('h2textlink.csv', 'w+',newline='',encoding='utf-8') as f:
+    n = 0
+    writer = csv.writer(f, lineterminator='\n')
+    std_link = 'http://kondou.com/BS4/'
+    for subheading in soup_content.find_all('h2'):
+        sh = subheading.get_text()
+        h2link = subheading.a['href']
+        writer.writerow([n, sh, std_link + h2link])
+        n += 1
+pass
+```
+## コード実行結果
+```csv
+0,(訳注)石鹸は食べられない¶,http://kondou.com/BS4/#id2
+1,この文書について¶,http://kondou.com/BS4/#id3
+2,助けてほしいときは¶,http://kondou.com/BS4/#id5
+3,インストール後の問題¶,http://kondou.com/BS4/#id9
+...(略)
+```