1ページ上の複数リンクをスクレイピングしてCSVに書き出しを行いたい

前提

CSVに記載のURL一覧から各ページに進み
各ページから複数のリンクをスクレイピングしてCSVに書き出しを行ったところ
CSVに最後のリンクしか書き出しされない

実現したいこと

まずはCSVのページ分の各ページの複数リンクをCSVに書き出したい

該当のソースコード

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.options import Options
from subprocess import CREATE_NO_WINDOW
from pydrive.drive import GoogleDrive
from pydrive.auth import GoogleAuth
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import os
import time
import datetime
import csv
import glob
import subprocess

from PIL import Image
import io
from urllib import request

df = pd.read_csv('C:\\Users\\v0_thumb.csv', names=['urls'], encoding="utf-8")

# HEADER = ['title','url_link'] # 追加部分

options = Options()
options.headless = True
options.add_argument('--headless')
path = r'C:\\Users\\geckodriver.exe'

# CSVファイルを最終行まで繰り返す
for i in range(len(df)):
    # CSVデータのi行0列目のデータを取得
    url = df.iloc[i, 0]
    driver = webdriver.Firefox(executable_path=path, options=options)
    driver.get(url)

    # 待機処理
    driver.implicitly_wait(40)

    # 対象のURLにリクエストを送る

    element = driver.find_elements(by=By.ID, value='banners')
    
    with open('C:\\Users\\v1_thumb.csv','a', encoding='utf_8',newline='') as f:
        writer = csv.writer(f)        
        for elem in element:                   
            try:
                url_link = elem.find_elements(by=By.TAG_NAME, value="a")
                for url in url_link:
                    a = url.get_attribute("href")
                    print(a)
                    
                row = [a]  
                writer.writerow(row)
                df = pd.to_csv('C:\\Users\\v1_thumb.csv', index=False, na=True, encoding='utf_8')
                print(df)
                     
                # 待機処理
                time.sleep(5)                                                     
            
            except:
                pass 
            # タスクキル  
            subprocess.Popen('taskkill /im firefox.exe /f', shell=True)
                       
driver.quit()  
time.sleep(5)

CSVの中身は以下になってます。

0   https://www.irasutoya.com/p/event.html
1  https://www.irasutoya.com/p/figure.html
2    https://www.irasutoya.com/p/food.html
3  https://www.irasutoya.com/p/school.html

試したこと

CSV書き出しの位置を変更したり（for文に含まれるよう）しましたがprintが期待した通りにできないなど
行きつまっております。

行動規範の内容に同意します

回答2件

writer.writerow(row)によって１行ずつ出力しているにもかかわらず
df = pd.to_csv('C:\\Users\\s1_thumb.csv', ～)によって、ファイル全体が上書きされてしまっています。
これは不要ではないでしょうか。
あるいは行の内容を蓄積しておいて最後に一括出力するなど処理の流れを見直してください。

投稿2023/01/20 01:52

can110

総合スコア38268

自己解決

ありがとうございます。以下で解決できました。

    with open('C:\\Users\\v1_thumb.csv','a', encoding='utf_8',newline='') as f:
        writer = csv.writer(f)        
        for elem in element:                   
            try:
                url_link = elem.find_elements(by=By.TAG_NAME, value="a")
                for url in url_link:
                    a = url.get_attribute("href")
                    print(a)
                    
                    row = [a]  
                    writer.writerow(row)
                    print(df)

投稿2023/01/20 02:43

c-man

総合スコア5