回答率: 85.30%

質問するログイン新規登録

トップスクレイピングに関する質問青空文庫をルビ付きでスクレピングする

編集履歴

回答編集履歴

2

Update

2022/07/21 08:54

投稿

スコア21265

answer CHANGED Viewed

@@ -16,7 +16,9 @@
     ## 必要な情報を、タグごと取得
     contents = soup.find('div', class_='main_text')
+    contents = ''.join(
-    contents = ''.join(str(i) if i.name in ('br', 'ruby') else i.text.strip() for i in contents)
+        str(i) if i.name in ('br', 'ruby') else i.text.strip().replace('\r', '')
+        for i in contents)
     return title, author, contents
 if __name__ == '__main__':

1

Update

2022/07/21 08:49

投稿

スコア21265

answer CHANGED Viewed

@@ -11,8 +11,8 @@
     soup = BeautifulSoup(html_text.content, 'html.parser')
     # タイトルと著者名を取得する
-    title = soup.find('h1')
+    title = soup.find('h1').text
-    author = soup.find('h2')
+    author = soup.find('h2').text
     ## 必要な情報を、タグごと取得
     contents = soup.find('div', class_='main_text')
@@ -21,8 +21,8 @@
 if __name__ == '__main__':
     title, author, contents = get('https://www.aozora.gr.jp/cards/000329/files/18376_12100.html')
+    with open(f'青空文庫_{title}_{author}.txt', 'w') as f:
-    print(title)
+        f.write(f'{title}\n')
-    print(author)
+        f.write(f'{author}\n\n')
-    print()
-    print(contents)
+        f.write(contents)
 ```