編集履歴

質問編集履歴

補足情報を書いた。

2020/06/14 23:51

投稿

退会済みユーザー

スコア0

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -127,134 +127,6 @@
       time.sleep(10)
 ##追記
-適当にードを実行すると以下のようなエラー文が出ました。
-エラー文の最後に、「TypeError: 'in <string>' requires string as left operand, not NoneType」と書いてあるので、for文の所でしょうか。
-どこを、どう直せばよいかわからず質問しました。
-教えて下さい。
-よろしくお願いします。
-##発生している問題・エラーメッセージ
-Traceback (most recent call last):
-  File "d:/Pythonからspreadsheetsへ/venv1/lancerswork/scraping.py", line 111, in <module>
-    set_with_dataframe(sh, dict_df)
-  File "D:\Pythonからspreadsheetsへ\venv1\lib\site-packages\gspread_dataframe.py", line 241, in set_with_dataframe
-    _cellrepr(cell_value, allow_formulas))
-  File "D:\Pythonからspreadsheetsへ\venv1\lib\site-packages\gspread_dataframe.py", line 49, in _cellrepr
-    if pd.isnull(value) is True:
-  File "D:\Pythonからspreadsheetsへ\venv1\lib\site-packages\pandas\core\dtypes\missing.py", line 126, in isna
-    return _isna(obj)
-  File "D:\Pythonからspreadsheetsへ\venv1\lib\site-packages\pandas\core\dtypes\missing.py", line 141, in _isna_new
-    elif isinstance(
-  File "D:\Pythonからspreadsheetsへ\venv1\lib\site-packages\pandas\core\dtypes\generic.py", line 12, in _check
-    return getattr(inst, attr, "_typ") in comp
-TypeError: 'in <string>' requires string as left operand, not NoneType
-##該当のソースコード
-```python
-import requests
-from bs4 import BeautifulSoup as bs
-from selenium import webdriver
-import time
-from tqdm import tqdm
-import json
-import  gspread
-from googleapiclient import discovery
-from oauth2client.service_account import ServiceAccountCredentials
-from gspread_dataframe import get_as_dataframe
-from gspread_dataframe import set_with_dataframe
-import pandas as pd
-headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36"}
-office_names = []
-address_lists = []
-profile_names = []
-office_names2 = []
-for page in range(1,3):
-    url = 'https://www.zeiri4.com/firm/search/?FirmSearchForm%5BPrefecture_id%5D=13&FirmSearchForm%5BIndustry_id%5D%5B0%5D=1&FirmSearchForm%5BIndustry_id%5D%5B1%5D=4&page={}'.format(page)
-    response = requests.get(url, headers=headers)
-    soup = bs(response.content, 'html.parser')
-    for office in soup.find_all('h2'):
-      offices = office.text.strip('\n')
-      office_names.append(offices)
-      time.sleep(10)
-    for address in soup.find_all('dl', class_='b-firmSearchPanel__datalist'):
-      address_ = address.find('dd')
-      address_lists.append(address_)
-      time.sleep(10)
-    url1 = 'https://www.bengo4.com/tokyo/f_12/?page={}'.format(page)
-    response1 = requests.get(url1, headers=headers)
-    soup1 = bs(response1.content, 'html.parser')
-    for name in soup1.find_all(class_='profile__name'):
-      name_ = name.text
-      profile_names.append(name_)
-      time.sleep(10)
-    for office2 in soup1.find_all('p', class_='office'):
-      office2_ = office2.text
-      office_names2.append(office2_)
-      time.sleep(10)
-browser = webdriver.Chrome(r'D:\Pythonからspreadsheetsへ\venv1\lancerswork\chromedriver.exe')
-browser.implicitly_wait(10)
-address_lists2 = []
-for page in range(1,3):
-    url2 = 'https://www.bengo4.com/tokyo/f_12/?page={}'.format(page)
-    browser.get(url2)
-    for address2 in browser.find_elements_by_class_name('address'):
-      address2_ = address2.text
-      address_lists2.append(address2_)
-      time.sleep(10)
-scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']
-credentials = ServiceAccountCredentials.from_json_keyfile_name('伏せておきます', scope)
-gc = gspread.authorize(credentials)
-SPREADSHEET_KEY = '1qPbI3Rf995Z53sU3sK60SlgMwtFkO13w4P-wBQlTcv8'
-worksheet = gc.open_by_key(SPREADSHEET_KEY)
-wb = worksheet.sheet1
-mydict = {
-  '事務所名':office_names,
-  '住所':address_lists,
-  '':'',
-  '弁護士名':profile_names,
-  '事務所名1':office_names2,
-  '住所1':address_lists2,
-}
-dict_df = pd.DataFrame({key:pd.Series(value) for key, value in mydict.items()})
-sh = gc.open_by_key('1qPbI3Rf995Z53sU3sK60SlgMwtFkO13w4P-wBQlTcv8').worksheet('シート1')
-set_with_dataframe(sh, dict_df)
-```
-##試したこと
-以下のスクレイピングをする最初の部分だけを残し、他の部分をコメントアウトして実行したときはspread sheetまで書き込むことができました。
-for office in soup.find_all('h2'):
-      offices = office.text.strip('\n')
-      office_names.append(offices)
-      time.sleep(10)
-##追記
 下のように適当に作った辞書型のデータで実行してみるとうまくいきますが、スクレイピングのデータで実行するとエラーになります。
 mydict={

補足情報を書いた。

2020/06/14 23:51

投稿

退会済みユーザー

スコア0

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -124,4 +124,142 @@
 for office in soup.find_all('h2'):
       offices = office.text.strip('\n')
       office_names.append(offices)
-      time.sleep(10)
+      time.sleep(10)
+##追記
+適当にードを実行すると以下のようなエラー文が出ました。
+エラー文の最後に、「TypeError: 'in <string>' requires string as left operand, not NoneType」と書いてあるので、for文の所でしょうか。
+どこを、どう直せばよいかわからず質問しました。
+教えて下さい。
+よろしくお願いします。
+##発生している問題・エラーメッセージ
+Traceback (most recent call last):
+  File "d:/Pythonからspreadsheetsへ/venv1/lancerswork/scraping.py", line 111, in <module>
+    set_with_dataframe(sh, dict_df)
+  File "D:\Pythonからspreadsheetsへ\venv1\lib\site-packages\gspread_dataframe.py", line 241, in set_with_dataframe
+    _cellrepr(cell_value, allow_formulas))
+  File "D:\Pythonからspreadsheetsへ\venv1\lib\site-packages\gspread_dataframe.py", line 49, in _cellrepr
+    if pd.isnull(value) is True:
+  File "D:\Pythonからspreadsheetsへ\venv1\lib\site-packages\pandas\core\dtypes\missing.py", line 126, in isna
+    return _isna(obj)
+  File "D:\Pythonからspreadsheetsへ\venv1\lib\site-packages\pandas\core\dtypes\missing.py", line 141, in _isna_new
+    elif isinstance(
+  File "D:\Pythonからspreadsheetsへ\venv1\lib\site-packages\pandas\core\dtypes\generic.py", line 12, in _check
+    return getattr(inst, attr, "_typ") in comp
+TypeError: 'in <string>' requires string as left operand, not NoneType
+##該当のソースコード
+```python
+import requests
+from bs4 import BeautifulSoup as bs
+from selenium import webdriver
+import time
+from tqdm import tqdm
+import json
+import  gspread
+from googleapiclient import discovery
+from oauth2client.service_account import ServiceAccountCredentials
+from gspread_dataframe import get_as_dataframe
+from gspread_dataframe import set_with_dataframe
+import pandas as pd
+headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36"}
+office_names = []
+address_lists = []
+profile_names = []
+office_names2 = []
+for page in range(1,3):
+    url = 'https://www.zeiri4.com/firm/search/?FirmSearchForm%5BPrefecture_id%5D=13&FirmSearchForm%5BIndustry_id%5D%5B0%5D=1&FirmSearchForm%5BIndustry_id%5D%5B1%5D=4&page={}'.format(page)
+    response = requests.get(url, headers=headers)
+    soup = bs(response.content, 'html.parser')
+    for office in soup.find_all('h2'):
+      offices = office.text.strip('\n')
+      office_names.append(offices)
+      time.sleep(10)
+    for address in soup.find_all('dl', class_='b-firmSearchPanel__datalist'):
+      address_ = address.find('dd')
+      address_lists.append(address_)
+      time.sleep(10)
+    url1 = 'https://www.bengo4.com/tokyo/f_12/?page={}'.format(page)
+    response1 = requests.get(url1, headers=headers)
+    soup1 = bs(response1.content, 'html.parser')
+    for name in soup1.find_all(class_='profile__name'):
+      name_ = name.text
+      profile_names.append(name_)
+      time.sleep(10)
+    for office2 in soup1.find_all('p', class_='office'):
+      office2_ = office2.text
+      office_names2.append(office2_)
+      time.sleep(10)
+browser = webdriver.Chrome(r'D:\Pythonからspreadsheetsへ\venv1\lancerswork\chromedriver.exe')
+browser.implicitly_wait(10)
+address_lists2 = []
+for page in range(1,3):
+    url2 = 'https://www.bengo4.com/tokyo/f_12/?page={}'.format(page)
+    browser.get(url2)
+    for address2 in browser.find_elements_by_class_name('address'):
+      address2_ = address2.text
+      address_lists2.append(address2_)
+      time.sleep(10)
+scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']
+credentials = ServiceAccountCredentials.from_json_keyfile_name('伏せておきます', scope)
+gc = gspread.authorize(credentials)
+SPREADSHEET_KEY = '1qPbI3Rf995Z53sU3sK60SlgMwtFkO13w4P-wBQlTcv8'
+worksheet = gc.open_by_key(SPREADSHEET_KEY)
+wb = worksheet.sheet1
+mydict = {
+  '事務所名':office_names,
+  '住所':address_lists,
+  '':'',
+  '弁護士名':profile_names,
+  '事務所名1':office_names2,
+  '住所1':address_lists2,
+}
+dict_df = pd.DataFrame({key:pd.Series(value) for key, value in mydict.items()})
+sh = gc.open_by_key('1qPbI3Rf995Z53sU3sK60SlgMwtFkO13w4P-wBQlTcv8').worksheet('シート1')
+set_with_dataframe(sh, dict_df)
+```
+##試したこと
+以下のスクレイピングをする最初の部分だけを残し、他の部分をコメントアウトして実行したときはspread sheetまで書き込むことができました。
+for office in soup.find_all('h2'):
+      offices = office.text.strip('\n')
+      office_names.append(offices)
+      time.sleep(10)
+##追記
+下のように適当に作った辞書型のデータで実行してみるとうまくいきますが、スクレイピングのデータで実行するとエラーになります。
+mydict={
+  'yoo':2,
+  'poo':3,
+  'foo':4,
+  'too':5
+}

補足情報を書いた。

2020/06/14 23:49

投稿

退会済みユーザー

スコア0

title CHANGED Viewed

File without changes

body CHANGED Viewed

@@ -116,4 +116,12 @@
 sh = gc.open_by_key('1qPbI3Rf995Z53sU3sK60SlgMwtFkO13w4P-wBQlTcv8').worksheet('シート1')
 set_with_dataframe(sh, dict_df)
-```
+```
+##試したこと
+以下のスクレイピングをする最初の部分だけを残し、他の部分をコメントアウトして実行したときはspread sheetまで書き込むことができました。
+for office in soup.find_all('h2'):
+      offices = office.text.strip('\n')
+      office_names.append(offices)
+      time.sleep(10)