エラーの解決方法がわかりません

##前提・実現したいこと
コードを実行すると以下のようなエラー文が出ました。

エラー文の最後に、「TypeError: 'in <string>' requires string as left operand, not NoneType」と書いてあるので、for文の所でしょうか。

どこを、どう直せばよいかわからず質問しました。
教えて下さい。
よろしくお願いします。

##発生している問題・エラーメッセージ
Traceback (most recent call last):
File "d:/Pythonからspreadsheetsへ/venv1/lancerswork/scraping.py", line 111, in <module>
set_with_dataframe(sh, dict_df)
File "D:\Pythonからspreadsheetsへ\venv1\lib\site-packages\gspread_dataframe.py", line 241, in set_with_dataframe
_cellrepr(cell_value, allow_formulas))
File "D:\Pythonからspreadsheetsへ\venv1\lib\site-packages\gspread_dataframe.py", line 49, in _cellrepr
if pd.isnull(value) is True:
File "D:\Pythonからspreadsheetsへ\venv1\lib\site-packages\pandas\core\dtypes\missing.py", line 126, in isna
return _isna(obj)
File "D:\Pythonからspreadsheetsへ\venv1\lib\site-packages\pandas\core\dtypes\missing.py", line 141, in _isna_new
elif isinstance(
File "D:\Pythonからspreadsheetsへ\venv1\lib\site-packages\pandas\core\dtypes\generic.py", line 12, in _check
return getattr(inst, attr, "_typ") in comp
TypeError: 'in <string>' requires string as left operand, not NoneType

##該当のソースコード

python
1import requests 
2from bs4 import BeautifulSoup as bs
3
4from selenium import webdriver
5import time
6from tqdm import tqdm
7
8import json
9import  gspread
10from googleapiclient import discovery
11from oauth2client.service_account import ServiceAccountCredentials
12from gspread_dataframe import get_as_dataframe
13from gspread_dataframe import set_with_dataframe
14import pandas as pd 
15
16
17headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36"}
18
19office_names = []
20address_lists = []
21
22profile_names = []
23office_names2 = []
24
25for page in range(1,3):
26    url = 'https://www.zeiri4.com/firm/search/?FirmSearchForm%5BPrefecture_id%5D=13&FirmSearchForm%5BIndustry_id%5D%5B0%5D=1&FirmSearchForm%5BIndustry_id%5D%5B1%5D=4&page={}'.format(page)
27
28    response = requests.get(url, headers=headers)
29    soup = bs(response.content, 'html.parser')
30 
31    for office in soup.find_all('h2'):
32      offices = office.text.strip('\n')
33      office_names.append(offices)
34      time.sleep(10)
35
36    for address in soup.find_all('dl', class_='b-firmSearchPanel__datalist'):
37      address_ = address.find('dd')
38      address_lists.append(address_)
39      time.sleep(10)
40
41    url1 = 'https://www.bengo4.com/tokyo/f_12/?page={}'.format(page)
42    response1 = requests.get(url1, headers=headers)
43    soup1 = bs(response1.content, 'html.parser')
44
45    for name in soup1.find_all(class_='profile__name'):
46      name_ = name.text 
47      profile_names.append(name_)
48      time.sleep(10)
49
50    for office2 in soup1.find_all('p', class_='office'):
51      office2_ = office2.text 
52      office_names2.append(office2_)
53      time.sleep(10)
54
55
56browser = webdriver.Chrome(r'D:\Pythonからspreadsheetsへ\venv1\lancerswork\chromedriver.exe')
57browser.implicitly_wait(10)
58
59address_lists2 = []
60
61for page in range(1,3):
62    url2 = 'https://www.bengo4.com/tokyo/f_12/?page={}'.format(page)
63    browser.get(url2)
64    
65    for address2 in browser.find_elements_by_class_name('address'):
66      address2_ = address2.text 
67      address_lists2.append(address2_)
68      time.sleep(10)
69
70
71scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']
72credentials = ServiceAccountCredentials.from_json_keyfile_name('伏せておきます', scope)
73gc = gspread.authorize(credentials)
74SPREADSHEET_KEY = '1qPbI3Rf995Z53sU3sK60SlgMwtFkO13w4P-wBQlTcv8'
75worksheet = gc.open_by_key(SPREADSHEET_KEY)
76wb = worksheet.sheet1
77
78mydict = {
79  '事務所名':office_names,
80  '住所':address_lists,
81  '':'',
82  '弁護士名':profile_names,
83  '事務所名1':office_names2,
84  '住所1':address_lists2,
85}
86
87dict_df = pd.DataFrame({key:pd.Series(value) for key, value in mydict.items()}) 
88
89sh = gc.open_by_key('1qPbI3Rf995Z53sU3sK60SlgMwtFkO13w4P-wBQlTcv8').worksheet('シート1')
90
91set_with_dataframe(sh, dict_df)

##試したこと
以下のスクレイピングをする最初の部分だけを残し、他の部分をコメントアウトして実行したときはspread sheetまで書き込むことができました。

for office in soup.find_all('h2'):
offices = office.text.strip('\n')
office_names.append(offices)
time.sleep(10)

##追記
下のように適当に作った辞書型のデータで実行してみるとうまくいきますが、スクレイピングのデータで実行するとエラーになります。

mydict={
'yoo':2,
'poo':3,
'foo':4,
'too':5
}

行動規範の内容に同意します

回答2件

ベストアンサー

エラーが出ている箇所は、最終行です。
エラーの1行目からわかります。

File "d:/Pythonからspreadsheetsへ/venv1/lancerswork/scraping.py", line 111, in <module>
set_with_dataframe(sh, dict_df)

gspread_dataframe.set_with_dataframe の説明は、以下の公式のページにあります。
https://pythonhosted.org/gspread-dataframe/#gspread_dataframe.set_with_dataframe
こちら、1つ目がworksheetで2つ目がdataframeだと思うのですが、以下のような説明があります。

worksheet – the gspread worksheet to set with content of DataFrame.
dataframe – the DataFrame.

dict_df はすでにDataFrameにしていると思うので、shをstr型にする必要があります。

TypeError: 'in <string>' requires string as left operand, not NoneType

から、現時点では sh がおそらくNoneTypeになっていて、stringにする必要があると判断しました。

投稿2020/06/14 06:23

kabayan55

総合スコア389

退会済みユーザー

2020/06/15 23:19

回答ありがとうございます。うまくいかず、いろいろ試したいのですが、サーバーの負担を考えると何回もスクレイピングができないので、一旦やめます。詳しく説明していただき、感謝します。ありがとうございました。

kabayan55

2020/06/16 02:54

何回もスクレイピングできないとのことですが、一回スクレイピングしたら保存しておけばいいと思います。 dict_df（とsh?）をpickleファイルなどに保存しておけば、2回目以降は保存したファイルを読み込んで下3行のみいろいろ試しながら実行すればよいのではないでしょうか。df_dict、shについて以下3つさえ行えば解決すると思います。 - 型の確認 - 中身の確認 - 適切な型変換の処理を書く

退会済みユーザー

2020/06/16 03:21

コメントありがとうございます。 pickleファイルに保存して試してみます。ご教示いただきありがとうございます。

退会済みユーザー

2020/06/18 02:30 編集

無事にspread sheetへ書き込むことができました。 address_listsをDataFrameに格納したら、keyError:0と出ました。取得した情報を加工せず、そのままやっていたのがエラーの原因だったようです。取得した情報に、.text.sprit('\n')と加工した後でリストに格納するべきでした。いろいろとご教示いただき、ありがとうございました。失礼します。

行動規範の内容に同意します

set_with_dataframe(sh, dict_df)

sh, dict_dfのどっちかがNoneなんじゃね

投稿2020/06/14 06:09

y_waiwai

総合スコア88173

退会済みユーザー

2020/06/15 23:18

回答ありがとうございます。うまくいかず、いろいろ試したいのですが、サーバーの負担を考えると何回もスクレイピングができないので、一旦やめます。ありがとうございました。

退会済みユーザー

2020/06/18 02:29

無事にspread sheetへ書き込むことができました。 address_listsをDataFrameに格納したら、keyError:0と出ました。取得した情報を加工せず、そのままやっていたのがエラーの原因だったようです。取得した情報に、.text.sprit('\n')と加工した後でリストに格納するべきでした。エラー文を読んで問題解決ができるように勉強します。失礼しました。

行動規範の内容に同意します

あなたの回答