Colaboratoryを使ったPython機械学習でエラーが発生したので修正したい。

前提・実現したいこと

http://www.algo-fx-blog.com/lstm-fx-predict/
こちらのサイトを参考に機械学習の勉強をしているのですが、途中でエラーが出て進めなくなりましたので、原因を教えていただけるとありがたいです。
（使用しているデータはAPIからではなく、Axioryのヒストリカルデータを加工して使用しています。）

CSVの内容＝
date,time,open,high,low,close,volume
2021.02.01,00:00,143.568,143.572,143.568,143.572,2
2021.02.01,00:01,143.57,143.584,143.57,143.581,6
2021.02.01,00:02,143.586,143.62,143.586,143.62,3

発生している問題・エラーメッセージ

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

/usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in na_arithmetic_op(left, right, op, is_cmp)
    142     try:
--> 143         result = expressions.evaluate(op, left, right)
    144     except TypeError:

8 frames

TypeError: unsupported operand type(s) for /: 'str' and 'str'


During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)

/usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in masked_arith_op(x, y, op)
    110         if mask.any():
    111             with np.errstate(all="ignore"):
--> 112                 result[mask] = op(xrav[mask], y)
    113 
    114     result, _ = maybe_upcast_putmask(result, ~mask, np.nan)

TypeError: unsupported operand type(s) for /: 'str' and 'str'

該当のソースコード

python
1import pandas as pd
2import numpy as np
3import seaborn as sns
4import matplotlib.pyplot as plt
5import configparser
6import datetime
7from datetime import datetime, timedelta
8
9#1分足CSVファイルのインポート　Googleドライブからのみ可能
10import csv
11
12response = open('/content/drive/MyDrive/GBPJPY_2021_02.csv', encoding='utf8')
13csvreader = csv.DictReader(response)
14
15#for row in csvreader:
16# print(row)
17#-------------------------------------------
18# dictからDataFrameへ変換
19res = pd.DataFrame(csvreader)
20
21# CSVの中身を表示
22print(res)
23
24
25# 必要なデータへ切り分け（askのみ）
26df = res[['date', 'time', 'open', 'high', 'low',"close", 'volume']]
27
28# データフレームの399件〜410件を表示
29df[399:410]
30
31# 訓練とテストで日付区切る
32split_date = '2021.02.03'
33train, test = df[df['date'] < split_date], df[df['date']>=split_date]
34del train['time']
35del test['time']
36
37# 念のため確認
38train.shape, test.shape
39
40# windowを設定
41window_len = 10
42
43# LSTMへの入力用に処理（訓練）
44train_lstm_in = []
45for i in range(len(train) - window_len):
46    temp = train[i:(i + window_len)].copy()
47    for col in train:
48       temp.loc[:, col] = temp[col] / temp[col].iloc[0] - 1
49    train_lstm_in.append(temp)
50lstm_train_out = (train['close'][window_len:].values / train['close'][:-window_len].values)-1
51 
52
53 # LSTMへの入力用に処理（テスト）
54test_lstm_in = []
55for i in range(len(test) - window_len):
56    temp = test[i:(i + window_len)].copy()
57    for col in test:
58        temp.loc[:, col] = temp[col] / temp[col].iloc[0] - 1
59    test_lstm_in.append(temp)
60lstm_test_out = (test['close'][window_len:].values / test['close'][:-window_len].values)-1
61
62
63	
64# PandasのデータフレームからNumpy配列へ変換しましょう
65train_lstm_in = [np.array(train_lstm_input) for train_lstm_input in train_lstm_in]
66train_lstm_in = np.array(train_lstm_in)
67 
68test_lstm_in = [np.array(test_lstm_input) for test_lstm_input in test_lstm_in]
69test_lstm_in = np.array(test_lstm_in)
70
71

試したこと

原因がつかめず、なにも試すことができませんでした。

補足情報（FW/ツールのバージョンなど）

ここにより詳細な情報を記載してください。

行動規範の内容に同意します

回答1件

ベストアンサー

おそらく以下のように、df['close'].iloc[0]の型が数ではなく、文字列になったている可能性が高いです。

python
1>>> print(type(df['close'].iloc[0]))
2<class 'str'>

これらの値を文字列ではなく数にしましょう。

投稿2021/03/05 13:25

ppaul

総合スコア24672

spa

2021/03/06 01:39

print(temp[col]) としたところ、 0 2021.02.01 1 2021.02.01 2 2021.02.01 3 2021.02.01 4 2021.02.01 5 2021.02.01 6 2021.02.01 7 2021.02.01 8 2021.02.01 9 2021.02.01 Name: date, dtype: object と出ました。またエラーメッセージは：TypeError: unsupported operand type(s) for /: 'str' and 'str' でした。いろいろ検索しましたが、修正の方法がわからないので教えていただけますでしょうか？よろしくお願いします。

spa

2021/03/06 01:41

print(type(temp[col].iloc[0])) の結果はおっしゃるとおり、 <class 'str'> でした。

spa

2021/03/06 06:29

print(temp[col][1].replace('.', '').double()) で、日付になっていた部分の.を消して数字に変換。そのあと、.double()にしましたが失敗しました。

spa

2021/03/06 09:02

print(float(str(df["close"]))) で文字列から少数にしましたが、 ValueError: could not convert string to float: '0 143.572\n1 143.581\n2 143.62\n3 143.582\n4 143.585\n ... \n4440 143.264\n4441 143.295\n4442 143.293\n4443 143.298\n4444 143.284\nName: close, Length: 4445, dtype: object' という形で記号が入ってしまいうまくいきませんでした。色々試しましたが、ギブアップです。

ppaul

2021/03/06 09:16

response = open('/content/drive/MyDrive/GBPJPY_2021_02.csv', encoding='utf8') csvreader = csv.DictReader(response) だと変換で苦労するので res = pd.read_csv('/content/drive/MyDrive/GBPJPY_2021_02.csv') で読み込んだ方が楽です。数は自動的に変換されます。 https://note.nkmk.me/python-pandas-read-csv-tsv/

spa

2021/03/06 12:13

返信ありがとうございます。 res = pd.read_csv('/content/drive/MyDrive/GBPJPY_2021_02.csv') に置き換えたらラクになりました。今は、 temp.loc[:, col] = temp[col].str.replace('.', '').astype(float) / temp[col].str.replace('.', '').astype(float).iloc[0] - 1 でfloat型にすることはできましたが、 AttributeError: Can only use .str accessor with string values! のエラーが出ました。なかなか難しいですね。

spa

2021/03/06 12:53

temp.loc[:, col] = temp[col].astype(str).str.replace('.', '').astype(float) / temp[col].astype(str).str.replace('.', '').astype(float).iloc[0] - 1 としたところ、動かすことができました。いろいろとヒントをいただきありがとうございました！

行動規範の内容に同意します