初心者です。連載しているwebページをPDF印刷したいです。

前提

webにある連載ページをそれぞれ開いてpdf印刷したいと思っています。
pdf印刷のやり方を紹介していたサイトを参考にして作成しました。
連載のページはバラバラのURLと、ある部分に＋１をしたらいいものが多数ありました。
for文で繰り返しが使えるかと思い入れてみたのですが思うように作動しません。

実現したいこと

統一性のあるものについては、全部のURLを入れるのではなく、繰り返しなどを使用してpdf化できたらと思います。

発生している問題・エラーメッセージ

繰り返しで設定したところは「404 Not Found」になります。

該当のソースコード

python
1from selenium import webdriver
2from selenium.webdriver.chrome.options import Options
3from selenium.webdriver.support.ui import WebDriverWait
4from selenium.webdriver.support import expected_conditions as EC
5import json
6import time
7import requests
8
9def PrintSetUp():
10    #印刷としてPDF保存する設定
11    chopt=webdriver.ChromeOptions() # Chromeのオプションとして定義
12    appState = {
13        "recentDestinations": [
14            {
15                "id": "Save as PDF",
16                "origin": "local",
17                "account":""
18            }
19        ],
20        "selectedDestinationId": "Save as PDF",
21        "version": 2,
22        "isLandscapeEnabled": False, #印刷の向きを指定 tureで横向き、falseで縦向き。
23        "pageSize": 'A4', #用紙タイプ(A3、A4、A5、Legal、 Letter、Tabloidなど)
24        #"mediaSize": {"height_microns": 355600, "width_microns": 215900}, #紙のサイズ　（10000マイクロメートル = １cm）
25        #"marginsType": 0, #余白タイプ #0:デフォルト 1:余白なし 2:最小
26        #"scalingType": 3 , #0：デフォルト 1：ページに合わせる 2：用紙に合わせる 3：カスタム
27        #"scaling": "141" ,#倍率
28        #"profile.managed_default_content_settings.images": 2,  #画像を読み込ませない
29        "isHeaderFooterEnabled": False, #ヘッダーとフッター
30        "isCssBackgroundEnabled": True, #背景のグラフィック
31        #"isDuplexEnabled": False, #両面印刷 tureで両面印刷、falseで片面印刷
32        #"isColorEnabled": True, #カラー印刷 trueでカラー、falseで白黒
33        #"isCollateEnabled": True #部単位で印刷
34    }
35    
36    prefs = {'printing.print_preview_sticky_settings.appState':
37             json.dumps(appState),
38             "download.default_directory": "~/Downloads" # 保存先の指定（WindowsだとC:\\Users\\downloadなど)
39             } #appState --> pref
40    chopt.add_experimental_option('prefs', prefs) #prefs --> chopt 印刷オプションを格納
41    chopt.add_argument('--kiosk-printing') #印刷ダイアログが開くと、印刷ボタンを無条件に押す。 印刷オプションを格納
42    return chopt
43
44def main_WebToPDF(BlogURL):
45    #Web ページもしくはhtmlファイルをPDFにSeleniumを使って変換する
46    chopt = PrintSetUp()
47    driver_path = "./chromedriver" #webdriverのパス
48    driver = webdriver.Chrome(executable_path=driver_path, options=chopt)
49    driver.implicitly_wait(10) # 秒 暗示的待機 
50    driver.get(BlogURL) #ブログのURL 読み込み
51    WebDriverWait(driver, 15).until(EC.presence_of_all_elements_located)  # ページ上のすべての要素が読み込まれるまで待機（15秒でタイムアウト判定）
52    driver.execute_script('return window.print()') #Print as PDF
53    time.sleep(10) #ファイルのダウンロードのために10秒待機
54    driver.quit() #Close Screen
55
56if __name__ == '__main__': #プログラム開始 第１～９回はURLがそろっていないので単独のURLを入力
57        BlogURLList=['https://www.worldts.com/english-writing/eigoronbunwriting-1/index.html',
58                     'https://www.worldts.com/english-writing/eigoronbun-equipment/index.html',
59                     'https://www.worldts.com/english-writing/eigoronbun3/index.html',
60                     'https://www.worldts.com/english-writing/eigoronbun-writing4/index.html',
61                     'https://www.worldts.com/english-writing/eigoronbun-daimeishi/index.html',
62                     'https://www.worldts.com/english-writing/eigoronbun6-kekka/index.html',
63                     'https://www.worldts.com/english-writing/eigoronbun-kekka/index.html',
64                     'https://www.worldts.com/english-writing/eigoronbun-writing8/index.html',
65                     'https://www.worldts.com/english-writing/eigoronbun-topheavy/index.html',
66                     'https://worldts.com/english-writing/no13-comments/index.html',
67                     'https://worldts.com/english-writing/396/index.html',
68                     'https://worldts.com/english-writing/398/index.html'] 
69        for BlogURL in  BlogURLList:
70            main_WebToPDF(BlogURL)
71
72if __name__ == '__main__': #プログラム開始　10～79回は揃っているので繰り返しで設定してみる
73        for i in range(9,78):
74            BlogURLList=['https://worldts.com/english-writing/eigo-ronbun{i+1}/index.html'] 
75        for BlogURL in  BlogURLList:
76            main_WebToPDF(BlogURL)

試したこと

for文、URLにそれがどうやったら組み込めるかわからず、csvでリストを作り入れられないかとかやってみたのですが、全くダメでした。

補足情報（FW/ツールのバージョンなど）

ここにより詳細な情報を記載してください。

novelistory

2022/09/06 03:05

コードの表示が崩れてしまっているようです。 Markdown形式で入力すると見やすくなります。

退会済みユーザー

2022/09/06 04:37

ご指摘ありがとうございました。直して入れなおしました。

行動規範の内容に同意します

回答1件

ベストアンサー

python
1if __name__ == '__main__': #プログラム開始　10～79回は揃っているので繰り返しで設定してみる
2   for i in range(9,78):
3       BlogURLList=['https://worldts.com/english-writing/eigo-ronbun{i+1}/index.html'
4   for BlogURL in BlogURLList:
5       main_WebToPDF(BlogURL)

の部分ですが、
・上にif name == 'main'があるので、これは不要です。
・{i+1}としても、format関数を使わなければformatはされず、{i+1}という文字列として認識されてしまいます。
・そもそもループの範囲をページ名に揃えれば分かりやすくなります。
・for文ごとにBlogURLListを加えるのではなく=で置き換えてしまっているので、リストが最後のページしか表示されなくなっています。append(リストの最後に要素を追加)を使いましょう。

それを踏まえて、上のmain部分からコードを書き直すと、

python
1if __name__ == '__main__':
2        BlogURLList=['https://www.worldts.com/english-writing/eigoronbunwriting-1/index.html',
3                     'https://www.worldts.com/english-writing/eigoronbun-equipment/index.html',
4                     'https://www.worldts.com/english-writing/eigoronbun3/index.html',
5                     'https://www.worldts.com/english-writing/eigoronbun-writing4/index.html',
6                     'https://www.worldts.com/english-writing/eigoronbun-daimeishi/index.html',
7                     'https://www.worldts.com/english-writing/eigoronbun6-kekka/index.html',
8                     'https://www.worldts.com/english-writing/eigoronbun-kekka/index.html',
9                     'https://www.worldts.com/english-writing/eigoronbun-writing8/index.html',
10                     'https://www.worldts.com/english-writing/eigoronbun-topheavy/index.html',
11                     'https://worldts.com/english-writing/no13-comments/index.html',
12                     'https://worldts.com/english-writing/396/index.html',
13                     'https://worldts.com/english-writing/398/index.html'] 
14
15        for i in range(10,80): #10以上80未満
16            #formatで{0}にiを代入、BlogURLListに追加
17            BlogURLList.append('https://worldts.com/english-writing/eigo-ronbun{0}/index.html'.format(i))
18
19        #1~9回も10回以降のものも最後にまとめてPDF化
20        for BlogURL in BlogURLList:
21            main_WebToPDF(BlogURL)