dfのコピーで結果が変わる

http://www.algo-fx-blog.com/python-fx-trend-line/
こちらのページを参考に、FXのバックテストでトレンドを判定するメソッドを作成しているのですが、自身でコピーした場合に結果が変わってしまいます。

現在使っているdfです。2012年1月4日の１時間足のデータです。

python
1                       time    open    high     low   close  volume  weekday   time_id  
20 2012-01-04 00:00:00+00:00  76.734  76.746  76.638  76.646   638.0        2   　　1
31 2012-01-04 01:00:00+00:00  76.644  76.678  76.620  76.672   247.0        2       2
42 2012-01-04 02:00:00+00:00  76.670  76.684  76.654  76.662   234.0        2       3
53 2012-01-04 03:00:00+00:00  76.664  76.696  76.658  76.690    98.0        2       4
64 2012-01-04 04:00:00+00:00  76.688  76.688  76.668  76.674   121.0        2       5

リンク先のコード

python
1df['time_id'] = df.index + 1
2
3# 元データを目的別に切り分ける
4df_fin = df.copy()
5df_high = df.copy()
6df_low = df.copy()
7
8
9# 上昇トレンドライン
10while len(df_high) > 3:
11    reg_1 = linregress(
12        x=df_high['time_id'],
13        y=df_high['high'],
14    )
15    df_high = df_high.loc[df_high['high'] > reg_1[0] * df_high['time_id'] + reg_1[1]]
16
17reg_1 = linregress(
18    x=df_high['time_id'],
19    y=df_high['high'],
20)
21
22df_fin['high_trend'] = reg_1[0] * df_fin['time_id'] + reg_1[1]
23
24# 安値のトレンドライン
25while len(df_low) > 3:
26    reg_2 = linregress(
27        x=df_low['time_id'],
28        y=df_low['low'],
29    )
30    df_low = df_low.loc[df_low['low'] < reg_2[0] * df_low['time_id'] + reg_2[1]]
31
32reg_2 = linregress(
33    x=df_low['time_id'],
34    y=df_low['low'],
35)
36
37df_fin['low_trend'] = reg_2[0] * df_fin['time_id'] + reg_2[1]
38
39print(reg_1.slope)
40print(reg_2.slope)
41
42出力
430.005733333333332287
440.004571428571430098

しかし、自分のバックテストでは、以下のようにdfを分けて使っているので、調整しました。

python
1
2dfData = {
3    "index": df.index,
4    "time": df.time.values,
5    "volume": df.volume.values,
6    "open": df.open.values,
7    "high": df.high.values,
8    "low": df.low.values,
9    "close": df.close.values,
10    "weekday": df.weekday.values,
11    "sma": df.sma.values,
12    "divergence": df.divergence.values,
13    "time_id": df.time_id,
14}
15
16for idx in range(df.shape[0]):
17　　バックテストの処理
18

修正した結果が以下です。
修正点としては、コピーを格納した辞書を作成しました。

python
1dfData = {
2    "index": df.index,
3    "time": df.time.values,
4    "volume": df.volume.values,
5    "open": df.open.values,
6    "high": df.high.values,
7    "low": df.low.values,
8    "close": df.close.values,
9    "weekday": df.weekday.values,
10    "sma": df.sma.values,
11    "divergence": df.divergence.values,
12    "time_id": df.time_id,
13}
14
15df_high = {
16    "time_id": dfData["time_id"].copy(),
17    "high": dfData["high"].copy(),
18}
19
20df_low = {
21    "time_id": dfData["time_id"].copy(),
22    "low": dfData["low"].copy(),
23}
24
25df_fin = {
26    "time_id": dfData["time_id"].copy(),
27    "high": dfData["high"].copy(),
28    "low": dfData["low"].copy(),
29}
30
31# 上昇トレンドライン
32while len(df_high) > 3:
33    reg_1 = linregress(
34        x=df_high['time_id'],
35        y=df_high['high'],
36    )
37    df_high = df_high.loc[df_high['high'] > reg_1[0] * df_high['time_id'] + reg_1[1]]
38
39reg_1 = linregress(
40    x=df_high['time_id'],
41    y=df_high['high'],
42)
43
44df_fin['high_trend'] = reg_1[0] * df_fin['time_id'] + reg_1[1]
45
46# 安値のトレンドライン
47while len(df_low) > 3:
48    reg_2 = linregress(
49        x=df_low['time_id'],
50        y=df_low['low'],
51    )
52    df_low = df_low.loc[df_low['low'] < reg_2[0] * df_low['time_id'] + reg_2[1]]
53
54reg_2 = linregress(
55    x=df_low['time_id'],
56    y=df_low['low'],
57)
58
59df_fin['low_trend'] = reg_2[0] * df_fin['time_id'] + reg_2[1]
60
61print(reg_1.slope)
62print(reg_2.slope)
63
64出力
650.0033991304347826324
660.0028386956521739303

追加した辞書の型は以下です

python
1print(type(df_high["time_id"]))
2<class 'pandas.core.series.Series'>
3
4print(type(df_high["high"]))
5<class 'numpy.ndarray'>
6
7print(type(df_low['time_id']))
8<class 'pandas.core.series.Series'>
9
10print(type(df_low['low']))
11<class 'numpy.ndarray'>
12
13print(type(df_fin['time_id']))
14<class 'pandas.core.series.Series'>
15
16print(type(df_fin['high']))
17<class 'numpy.ndarray'>
18
19print(type(df_fin['low']))
20<class 'numpy.ndarray'>

リストに対して計算が行わているのではないかと思ったのですが、具体的な原因部分と解決方法がわからず、教えて頂きたく思います。

よろしくお願いいたします。

行動規範の内容に同意します

回答1件

ベストアンサー

問題があるのは下記の部分でしょうか。

df_high = {
    "time_id": dfData["time_id"].copy(),
    "high": dfData["high"].copy(),
}

オリジナルのソースコードではdf_highはpandasのDataFrameでしたが、上記ソースコードで辞書型に変わっています。そのためlen(df_high)は常に2を返すため、以下のwhileループは実行されていません。

python
1while len(df_high) > 3:
2    ...

投稿2020/08/19 00:02

yymmt

総合スコア1615

666_paru

2020/08/19 09:52

確認したところ、確かにwhileは実行されない状態でした。また、実行されてもdf_high.locでエラーになり進めないことがわかりました。今後dfでならできることも多そうなので、dfをそのまま使うよう調整しようと思います。ありがとうございました。

行動規範の内容に同意します

あなたの回答

tips

プレビュー

行動規範の内容に同意します

質問の解決につながる回答をしましょう。サンプルコードなど、より具体的な説明があると質問者の理解の助けになります。また、読む側のことを考えた、分かりやすい文章を心がけましょう。

15分調べてもわからないことは
teratailで質問しよう！

ただいまの回答率
85.47%

質問をまとめることで
思考を整理して素早く解決

テンプレート機能で
簡単に質問をまとめる

質問する

質問をすることでしか得られない、回答やアドバイスがある。

15分調べてもわからないことは、質問しよう！

dfのコピーで結果が変わる

関連した質問