編集履歴

質問編集履歴

コードの再修正・・・未解決

2020/10/17 12:52

投稿

Dantesu

スコア8

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -1,6 +1,15 @@
-検索サイト（＊規約確認済）内で、情報をスクレイピングしたいのですが、調べながら作成した以下のコードでは次ページの情報まで取得できません。(10/10 23:38)listにテキスト抽出を命じていたので、ページ遷移箇所を書き直したら今度はプログラムが終わりません。
+検索サイト（＊規約確認済）内で、情報をスクレイピングしたいのですが、調べながら作成した以下のコードでは次ページの情報まで取得できません。(10/17 21:48)コードを再度改めてみましたが、エラーは出なくなりましたが、ページ遷移しません。中身も取れていません。
+(10/10 23:38)
+listにテキスト抽出を命じていたので、ページ遷移箇所を書き直したら今度はプログラムが終わりません。
 どうぞ宜しくお願いします。
-```python
+```python　10/17編集済み
+import time
+from selenium import webdriver
+driver=webdriver.Chrome()
+driver.get('https://www.mrso.jp/searches/?redirect&view=plan')
 def search(driver):
     i = 1               # ループ番号、ページ番号を定義
     i_max = 5           # 最大何ページまで分析するかを定義
@@ -25,21 +34,16 @@
         # 「次へ」は1つしかないが、あえてelementsで複数検索。空のリストであれば最終ページの意味になる。
         for elem in  class_group:
-            next=elem.find_elements_by_class_name('-item -next')
+            next_list=elem.find_elements_by_class_name('-item -next')
-            if next==[]:
+        if next_list==[]:
-                i = i_max + 1
+            i = i_max + 1
         else:
-            # 次ページのURLは-item -nextのhref属性
-            for elem in  class_group:
+            next_list.click()
-                next_page = elem.find_elements_by_class_name('-item -next').get_attribute('href')
-                driver.get(next_page)   # 次ページへ遷移する
-                i = i + 1               # iを更新
+            i = i + 1               # iを更新
             time.sleep(3)           # 3秒間待機
     return courses_list,facili_list, price_list,link_list    # タイトルとリンクのリストを戻り値に指定
+courses_list,facili_list,price_list,link_list=search(driver)
-courses_list,facili_list,price_list,link_list=search(driver)
-search.quit()
 ```

頂いた回答をヒントに、プログラムを修正しました。

2020/10/17 12:52

投稿

Dantesu

スコア8

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -1,18 +1,6 @@
-検索サイト（＊規約確認済）内で、情報をスクレイピングしたいのですが、調べながら作成した以下のコードでは次ページの情報まで取得できません。単一ページでは取得できましたが、次へボタンの遷移の箇所を挿入すると下記のエラーが出ます。
+検索サイト（＊規約確認済）内で、情報をスクレイピングしたいのですが、調べながら作成した以下のコードでは次ページの情報まで取得できません。(10/10 23:38)listにテキスト抽出を命じていたので、ページ遷移箇所を書き直したら今度はプログラムが終わりません。
-invalid selector: Compound class names not permitted
 どうぞ宜しくお願いします。
 ```python
-Jypterで作成しています。Jupyter上で次の行に入力している箇所は下記のコードで2行空けています。
-courses_listで始まる塊でenterを押すとエラーが出ます。
-import time
-from selenium import webdriver
-driver=webdriver.Chrome()
-driver.get('https://www.mrso.jp/searches/?redirect&view=plan')
 def search(driver):
     i = 1               # ループ番号、ページ番号を定義
     i_max = 5           # 最大何ページまで分析するかを定義
@@ -20,11 +8,12 @@
     facili_list=[]
     price_list=[]
     link_list=[]
+    next_list=[]
     # 現在のページが指定した最大分析ページを超えるまでループする
     while i <= i_max:
-        class_group =driver.find_elements_by_class_name('page-search__wrap facility')
+        class_group =driver.find_elements_by_class_name('page-search__wrap.facility')
-        # コース名、施設名、価格、リンクを抽出しリストに追加するforループ
+        # タイトルとリンクを抽出しリストに追加するforループ
         for elem in  class_group:
             courses_list.append(elem.find_element_by_class_name('-name').text)
         for elem in  class_group:
@@ -35,17 +24,21 @@
             link_list.append(elem.find_element_by_class_name('-link').get_attribute('href'))
         # 「次へ」は1つしかないが、あえてelementsで複数検索。空のリストであれば最終ページの意味になる。
+        for elem in  class_group:
-        if  class_group.find_elements_by_class_name('-item -next') == []:
+            next=elem.find_elements_by_class_name('-item -next')
+            if next==[]:
-            i = i_max + 1
+                i = i_max + 1
         else:
             # 次ページのURLは-item -nextのhref属性
+            for elem in  class_group:
-            next_page = class_group.find_elements_by_class_name('-item -next').get_attribute('href')
+                next_page = elem.find_elements_by_class_name('-item -next').get_attribute('href')
-            class_group.get(next_page)   # 次ページへ遷移する
+                driver.get(next_page)   # 次ページへ遷移する
-            i = i + 1               # iを更新
+                i = i + 1               # iを更新
             time.sleep(3)           # 3秒間待機
-    return courses_list,facili_list, price_list,link_list    # コース名、施設名、価格、リンクを戻り値に指定
+    return courses_list,facili_list, price_list,link_list    # タイトルとリンクのリストを戻り値に指定
 courses_list,facili_list,price_list,link_list=search(driver)
 search.quit()

誤字

2020/10/10 14:39

投稿

Dantesu

スコア8

title CHANGED Viewed

	@@ -1,1 +1,1 @@
1	- ごｓスクレイピング：検索サイト内での次ページへの遷移
1	+ webスクレイピング：検索サイト内での次ページへの遷移

body CHANGED Viewed

File without changes

誤植

2020/10/10 08:22

投稿

Dantesu

スコア8

title CHANGED Viewed

	@@ -1,1 +1,1 @@
1	- スクレイピング：検索サイト内での次ページへの遷移
1	+ ごｓスクレイピング：検索サイト内での次ページへの遷移

body CHANGED Viewed

@@ -43,7 +43,7 @@
             class_group.get(next_page)   # 次ページへ遷移する
             i = i + 1               # iを更新
             time.sleep(3)           # 3秒間待機
-    return courses_list,facili_list, price_list,link_list    # タイトルとリンクのリストを戻り値に指定
+    return courses_list,facili_list, price_list,link_list    # コース名、施設名、価格、リンクを戻り値に指定
 courses_list,facili_list,price_list,link_list=search(driver)

誤植

2020/10/10 08:22

投稿

Dantesu

スコア8

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -24,7 +24,7 @@
     # 現在のページが指定した最大分析ページを超えるまでループする
     while i <= i_max:
         class_group =driver.find_elements_by_class_name('page-search__wrap facility')
-        # タイトルとリンクを抽出しリストに追加するforループ
+        # コース名、施設名、価格、リンクを抽出しリストに追加するforループ
         for elem in  class_group:
             courses_list.append(elem.find_element_by_class_name('-name').text)
         for elem in  class_group:

読みにくかったため改行の挿入

2020/10/10 08:21

投稿

Dantesu

スコア8

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -2,7 +2,8 @@
 invalid selector: Compound class names not permitted
 どうぞ宜しくお願いします。
 ```python
-Jypterで作成しています。Jupyter上で次の行に入力している箇所は下記のコードで2行空けています。courses_listで始まる塊でenterを押すとエラーが出ます。
+Jypterで作成しています。Jupyter上で次の行に入力している箇所は下記のコードで2行空けています。
+courses_listで始まる塊でenterを押すとエラーが出ます。
 import time
 from selenium import webdriver