すでにDFがあれば、それをnumpyのarrayにして.tolistでリストにすればよいです。
すなわち質問コードに合わせれば
「for e in itertools.product(array1.tolist(), array2.tolist(), array3.tolist(), array4.tolist()):」
とすればよいです。
1from sklearn.utils.extmath import cartesian
234defdf_product(df_list):5 arrs =[df.to_numpy()for df in df_list]6 lens = np.array([len(arr)for arr in arrs])78 idx = cartesian([np.arange(length)for length in lens])9 offset = lens.cumsum()- lens
1011 new_arr = np.vstack(arrs)[idx+offset].reshape(-1,2*len(df_list))12return pd.DataFrame(new_arr, columns=np.concatenate(13[df.columns.to_numpy()for df in df_list]))
動作確認
python
1In [11]: l_1 =[[1,2],[3,4]]2: l_2 =[[11,22],[33,44]]3: l_3 =[[111,222],[333,444]]4: l_4 =[[1111,2222],[3333,4444]]5:6: df1 = pd.DataFrame(l_1, columns=['A','B'])7: df2 = pd.DataFrame(l_2, columns=['C','D'])8: df3 = pd.DataFrame(l_3, columns=['E','F'])9: df4 = pd.DataFrame(l_4, columns=['G','H'])10:11: df_product([df1, df2, df3, df4])12Out[11]:13 A B C D E F G H
14012112211122211112222151121122111222333344441621211223334441111222217312112233344433334444184123344111222111122221951233441112223333444420612334433344411112222217123344333444333344442283411221112221111222223934112211122233334444241034112233344411112222251134112233344433334444261234334411122211112222271334334411122233334444281434334433344411112222291534334433344433334444
↓数が増えても大丈夫です。
python
1In [12]: l_1 =[[1,2],[3,4],[5,6]]# ←[5, 6]を追加2: l_2 =[[11,22],[33,44]]3: l_3 =[[111,222],[333,444]]4: l_4 =[[1111,2222],[3333,4444]]5:6: df1 = pd.DataFrame(l_1, columns=['A','B'])7: df2 = pd.DataFrame(l_2, columns=['C','D'])8: df3 = pd.DataFrame(l_3, columns=['E','F'])9: df4 = pd.DataFrame(l_4, columns=['G','H'])10:11: df_product([df1, df2, df3, df4])12Out[12]:13 A B C D E F G H
14012112211122211112222151121122111222333344441621211223334441111222217# (中略)1813343344111222333344441914343344333444111122222015343344333444333344442116561122111222111122222217561122111222333344442318561122333444111122222419561122333444333344442520563344111222111122222621563344111222333344442722563344333444111122222823563344333444333344442930In [13]: l_1 =[[1,2],[3,4]]31: l_2 =[[11,22],[33,44]]32: l_3 =[[111,222],[333,444]]33: l_4 =[[1111,2222],[3333,4444]]34: l_5 =[[11111,22222],[33333,44444]]# ←l_5を追加35:36: df1 = pd.DataFrame(l_1, columns=['A','B'])37: df2 = pd.DataFrame(l_2, columns=['C','D'])38: df3 = pd.DataFrame(l_3, columns=['E','F'])39: df4 = pd.DataFrame(l_4, columns=['G','H'])40: df5 = pd.DataFrame(l_5, columns=['I','J'])41:42: df_product([df1, df2, df3, df4, df5])43Out[13]:44 A B C D E F G H I J
45012112211122211112222111112222246112112211122211112222333334444447212112211122233334444111112222248312112211122233334444333334444449412112233344411112222111112222250512112233344411112222333334444451612112233344433334444111112222252712112233344433334444333334444453812334411122211112222111112222254# (後略)
速度比較
python
1In [21]:defdf_product_itertools(df_list):2: lsts =[df.to_numpy().tolist()for df in df_list]3: new_list =[reduce(add, e)for e in itertools.product(*lsts)]4:return pd.DataFrame(new_list, columns=np.concatenate(5:[df.columns.to_numpy()for df in df_list]))67In [22]: l_1 =[[1,2],[3,4]]8: l_2 =[[11,22],[33,44]]9: l_3 =[[111,222],[333,444]]10: l_4 =[[1111,2222],[3333,4444]]11: l_5 =[[11111,22222],[33333,44444]]12:13: df1 = pd.DataFrame(l_1, columns=['A','B'])14: df2 = pd.DataFrame(l_2, columns=['C','D'])15: df3 = pd.DataFrame(l_3, columns=['E','F'])16: df4 = pd.DataFrame(l_4, columns=['G','H'])17: df5 = pd.DataFrame(l_5, columns=['I','J'])1819In [23]:%timeit df_product([df1, df2, df3, df4])20:%timeit df_product_itertools([df1, df2, df3, df4])21355 µs ± 18.4 µs per loop (mean ± std. dev. of 7 runs,1000 loops each)22928 µs ± 16.9 µs per loop (mean ± std. dev. of 7 runs,1000 loops each)2324In [24]:%timeit df_product([df1, df2, df3, df4, df5])25:%timeit df_product_itertools([df1, df2, df3, df4, df5])26378 µs ± 12.9 µs per loop (mean ± std. dev. of 7 runs,1000 loops each)271.12 ms ± 17.5 µs per loop (mean ± std. dev. of 7 runs,1000 loops each)
バッドをするには、ログインかつ
こちらの条件を満たす必要があります。
2020/07/17 00:26 編集
2020/07/17 00:41
2020/07/22 02:33