回答編集履歴
1
追記
test
CHANGED
@@ -1,4 +1,8 @@
|
|
1
|
+
## シンプルな解答
|
2
|
+
|
3
|
+
|
4
|
+
|
1
|
-
```
|
5
|
+
```python
|
2
6
|
|
3
7
|
df.loc[:, (df != 'NaN').sum() > 5]
|
4
8
|
|
@@ -25,3 +29,81 @@
|
|
25
29
|
# 9 10 NaN NaN
|
26
30
|
|
27
31
|
```
|
32
|
+
|
33
|
+
|
34
|
+
|
35
|
+
`FutureWarning`が出ますが気にしなくて大丈夫です。エラーが気になる場合は`.isin()`を使った、より速いコードがあります。
|
36
|
+
|
37
|
+
|
38
|
+
|
39
|
+
```python
|
40
|
+
|
41
|
+
|
42
|
+
|
43
|
+
%%timeit
|
44
|
+
|
45
|
+
df.loc[:, (df != 'NaN').sum() > 5]
|
46
|
+
|
47
|
+
# 1.23 ms ± 35.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
|
48
|
+
|
49
|
+
|
50
|
+
|
51
|
+
%%timeit
|
52
|
+
|
53
|
+
df.loc[:, (~df.isin({'NaN'})).sum() > 5]
|
54
|
+
|
55
|
+
# 1.06 ms ± 7.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
|
56
|
+
|
57
|
+
```
|
58
|
+
|
59
|
+
|
60
|
+
|
61
|
+
|
62
|
+
|
63
|
+
## @can110氏のコード
|
64
|
+
|
65
|
+
|
66
|
+
|
67
|
+
```python
|
68
|
+
|
69
|
+
%%timeit
|
70
|
+
|
71
|
+
df.drop(columns=[c for c in df.columns
|
72
|
+
|
73
|
+
if sum(~np.isnan(df.replace('NaN', np.nan)[c])) < 5])
|
74
|
+
|
75
|
+
# 5.56 ms ± 30.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
|
76
|
+
|
77
|
+
```
|
78
|
+
|
79
|
+
|
80
|
+
|
81
|
+
*今回の場合、`df.replace('NaN', np.nan)`は`df.astype(float)`で代用可能です。
|
82
|
+
|
83
|
+
|
84
|
+
|
85
|
+
---
|
86
|
+
|
87
|
+
|
88
|
+
|
89
|
+
```python
|
90
|
+
|
91
|
+
%%timeit
|
92
|
+
|
93
|
+
df.loc[:, (~np.isnan(df.astype(float).to_numpy())).sum(0) > 5]
|
94
|
+
|
95
|
+
# 779 µs ± 3.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
|
96
|
+
|
97
|
+
|
98
|
+
|
99
|
+
%%timeit
|
100
|
+
|
101
|
+
df.loc[:, np.isfinite(df.astype(float).to_numpy()).sum(0) > 5]
|
102
|
+
|
103
|
+
# 767 µs ± 3.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
|
104
|
+
|
105
|
+
```
|
106
|
+
|
107
|
+
|
108
|
+
|
109
|
+
さらに速くなりました。
|