回答編集履歴

計測結果を追記

2018/02/21 12:57

投稿

退会済みユーザー

スコア0

test CHANGED Viewed

@@ -77,3 +77,141 @@
 ```
 ご参考までに。
+--------
+追記
+速度を測ってみました。結論から言うと、1回の検索ですらpandasのほうが平均して遅いという結果になりました。
+適当にデータを作成
+```Python
+import pandas as pd
+from itertools import product
+temp = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
+test_dict = {s+t+u: [s+t+u+"0", s+t+u+"1", s+t+u+"2"] for s, t, u in product(temp, repeat=3)}
+df = pd.DataFrame.from_dict(test_dict, orient="index")
+print("head")
+print(df.head())
+print("tail")
+print(df.tail())
+print("レコード数", len(df))
+# head
+#         0     1     2
+# aaa  aaa0  aaa1  aaa2
+# aab  aab0  aab1  aab2
+# aac  aac0  aac1  aac2
+# aad  aad0  aad1  aad2
+# aae  aae0  aae1  aae2
+# tail
+#         0     1     2
+# ZZV  ZZV0  ZZV1  ZZV2
+# ZZW  ZZW0  ZZW1  ZZW2
+# ZZX  ZZX0  ZZX1  ZZX2
+# ZZY  ZZY0  ZZY1  ZZY2
+# ZZZ  ZZZ0  ZZZ1  ZZZ2
+# レコード数 140608
+```
+これに対して元の方法とpandasを比較します。
+```Python
+def get_value0(index, target):
+    # 元の方法
+    for key, value in test_dict.items():
+        if target == value[index]:
+            return value
+def get_value1(index, target):
+    # pandas使った方法
+    return df[df[index] == target].values.tolist()[0]
+# 探したいデータ
+index = 2     # 取得したい値のインデックス
+target = "Gcw2"  # 取得したい値
+# 同じ結果になるか確認
+assert get_value0(index, target) == get_value1(index, target)
+# 元の方法
+%timeit get_value0(index, target)
+# 7.78 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
+# pandasを使った方法
+%timeit get_value1(index, target)
+# 11.7 ms ± 231 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
+```
+targetによっては元の方法では最後までloopを回さないとダメなにで、パフォーマンスにムラがあります。ただ、ワーストケースでpandasより1ms遅い程度でした。
+実行環境
+Python 3.6.4
+pandas==0.22.0
+MBP