TypeError: could not convert string to float: 'GP'の解決方法がわからない

実現したいこと

データのすべての列の変動係数を○○/○○という形のコードで表示したい。
「student_data_math.std(ddof = 0) / student_data_math.mean()」
上記のコードを入力したところエラーがでました。

教材で用意されているデータなので、できる限りデータに変更を加えずに解決したいです。

発生している問題・分からないこと

おそらくデータ内に含まれる文字列をfloatに変換できない状態です。
データにはschool,sex,addressなど文字列が含まれるデータの列があり、school列のデータにエラーに表示されている"GP"というでーたが格納されています。
教材のコードをそのまま実行したのですがエラーが出てしまいます。
教材の回答では、自動的に文字列が含まれるデータ列（schoolなど）を除く、データ型float64の列の結果だけが表示されています。
列を一つ一つ指定する方法以外で、文字列が含まれるデータ以外を一気に表示する方法はありますか。

エラーメッセージ

error
1---------------------------------------------------------------------------
2ValueError                                Traceback (most recent call last)
3File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:85, in disallow.__call__.<locals>._f(*args, **kwargs)
4     84 try:
5---> 85     return f(*args, **kwargs)
6     86 except ValueError as e:
7     87     # we want to transform an object array
8     88     # ValueError message to the more typical TypeError
9     89     # e.g. this is normally a disallowed function on
10     90     # object arrays that contain strings
11
12File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:147, in bottleneck_switch.__call__.<locals>.f(values, axis, skipna, **kwds)
13    146 else:
14--> 147     result = alt(values, axis=axis, skipna=skipna, **kwds)
15    149 return result
16
17File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:1013, in nanvar(values, axis, skipna, ddof, mask)
18   1007 # xref GH10242
19   1008 # Compute variance via two-pass algorithm, which is stable against
20   1009 # cancellation errors and relatively accurate for small numbers of
21   1010 # observations.
22   1011 #
23   1012 # See https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
24-> 1013 avg = _ensure_numeric(values.sum(axis=axis, dtype=np.float64)) / count
25   1014 if axis is not None:
26
27File ~\anaconda3\Lib\site-packages\numpy\core\_methods.py:49, in _sum(a, axis, dtype, out, keepdims, initial, where)
28     47 def _sum(a, axis=None, dtype=None, out=None, keepdims=False,
29     48          initial=_NoValue, where=True):
30---> 49     return umr_sum(a, axis, dtype, out, keepdims, initial, where)
31
32ValueError: could not convert string to float: 'GP'
33
34The above exception was the direct cause of the following exception:
35
36TypeError                                 Traceback (most recent call last)
37Cell In[86], line 1
38----> 1 student_data_math.std(ddof = 0)
39
40File ~\anaconda3\Lib\site-packages\pandas\core\frame.py:11748, in DataFrame.std(self, axis, skipna, ddof, numeric_only, **kwargs)
41  11739 @doc(make_doc("std", ndim=2))
42  11740 def std(
43  11741     self,
44   (...)
45  11746     **kwargs,
46  11747 ):
47> 11748     result = super().std(axis, skipna, ddof, numeric_only, **kwargs)
48  11749     if isinstance(result, Series):
49  11750         result = result.__finalize__(self, method="std")
50
51File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:12358, in NDFrame.std(self, axis, skipna, ddof, numeric_only, **kwargs)
52  12350 def std(
53  12351     self,
54  12352     axis: Axis | None = 0,
55   (...)
56  12356     **kwargs,
57  12357 ) -> Series | float:
58> 12358     return self._stat_function_ddof(
59  12359         "std", nanops.nanstd, axis, skipna, ddof, numeric_only, **kwargs
60  12360     )
61
62File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:12322, in NDFrame._stat_function_ddof(self, name, func, axis, skipna, ddof, numeric_only, **kwargs)
63  12319 elif axis is lib.no_default:
64  12320     axis = 0
65> 12322 return self._reduce(
66  12323     func, name, axis=axis, numeric_only=numeric_only, skipna=skipna, ddof=ddof
67  12324 )
68
69File ~\anaconda3\Lib\site-packages\pandas\core\frame.py:11562, in DataFrame._reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
70  11558     df = df.T
71  11560 # After possibly _get_data and transposing, we are now in the
72  11561 #  simple case where we can use BlockManager.reduce
73> 11562 res = df._mgr.reduce(blk_func)
74  11563 out = df._constructor_from_mgr(res, axes=res.axes).iloc[0]
75  11564 if out_dtype is not None and out.dtype != "boolean":
76
77File ~\anaconda3\Lib\site-packages\pandas\core\internals\managers.py:1500, in BlockManager.reduce(self, func)
78   1498 res_blocks: list[Block] = []
79   1499 for blk in self.blocks:
80-> 1500     nbs = blk.reduce(func)
81   1501     res_blocks.extend(nbs)
82   1503 index = Index([None])  # placeholder
83
84File ~\anaconda3\Lib\site-packages\pandas\core\internals\blocks.py:404, in Block.reduce(self, func)
85    398 @final
86    399 def reduce(self, func) -> list[Block]:
87    400     # We will apply the function and reshape the result into a single-row
88    401     #  Block with the same mgr_locs; squeezing will be done at a higher level
89    402     assert self.ndim == 2
90--> 404     result = func(self.values)
91    406     if self.values.ndim == 1:
92    407         res_values = result
93
94File ~\anaconda3\Lib\site-packages\pandas\core\frame.py:11481, in DataFrame._reduce.<locals>.blk_func(values, axis)
95  11479         return np.array([result])
96  11480 else:
97> 11481     return op(values, axis=axis, skipna=skipna, **kwds)
98
99File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:147, in bottleneck_switch.__call__.<locals>.f(values, axis, skipna, **kwds)
100    145         result = alt(values, axis=axis, skipna=skipna, **kwds)
101    146 else:
102--> 147     result = alt(values, axis=axis, skipna=skipna, **kwds)
103    149 return result
104
105File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:950, in nanstd(values, axis, skipna, ddof, mask)
106    947 orig_dtype = values.dtype
107    948 values, mask = _get_values(values, skipna, mask=mask)
108--> 950 result = np.sqrt(nanvar(values, axis=axis, skipna=skipna, ddof=ddof, mask=mask))
109    951 return _wrap_results(result, orig_dtype)
110
111File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:92, in disallow.__call__.<locals>._f(*args, **kwargs)
112     86 except ValueError as e:
113     87     # we want to transform an object array
114     88     # ValueError message to the more typical TypeError
115     89     # e.g. this is normally a disallowed function on
116     90     # object arrays that contain strings
117     91     if is_object_dtype(args[0]):
118---> 92         raise TypeError(e) from e
119     93     raise
120
121TypeError: could not convert string to float: 'GP'

該当のソースコード

student_data_math.std(ddof = 0) / student_data_math.mean()

試したこと・調べたこと

teratailやGoogle等で検索した
ソースコードを自分なりに変更した
知人に聞いた
その他

上記の詳細・結果

力不足により解決できませんでした。

補足

特になし

melian

2024/09/02 06:47 編集

pandas.DataFrame.std() と pandas.DataFrame.mean() には numeric_only というキーワードオプションがありますので、それを指定するとよいかもしれません。 student_data_math.std(ddof = 0, numeric_only=True) / student_data_math.mean(numeric_only=True)

行動規範の内容に同意します

回答1件

ベストアンサー

select_dtypes() を使って数値カラムだけを抜き出すのはどうでしょうか。
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.select_dtypes.html

python
1num_data = student_data_math.select_dtypes(include='number')
2num_data.std(ddof=0) / num_data.mean()

pipe() を使って一つの式にまとめると下記のようになります。

python
1student_data_math.select_dtypes(include='number') \
2    .pipe(lambda df: df.std(ddof=0) / df.mean())

追記

mean()とstd()しか計算しないなら、numeric_only=Trueを指定して数値カラムだけを計算するようにしてもよいと思います。

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.std.html

python
1student_data_math.std(ddof=0, numeric_only=True) / student_data_math.mean(numeric_only=True)

投稿2024/09/02 07:10

編集2024/09/02 07:18

bsdfan

総合スコア4925

tatatan

2024/09/02 07:49

解決いたしました。多岐にわたる解決方法を教えていただいてとても助かりました。ありがとうございます🙇

行動規範の内容に同意します

あなたの回答

tips

プレビュー

行動規範の内容に同意します

質問の解決につながる回答をしましょう。サンプルコードなど、より具体的な説明があると質問者の理解の助けになります。また、読む側のことを考えた、分かりやすい文章を心がけましょう。

15分調べてもわからないことは
teratailで質問しよう！

ただいまの回答率
85.30%

質問をまとめることで
思考を整理して素早く解決

テンプレート機能で
簡単に質問をまとめる

質問する

実現したいこと

発生している問題・分からないこと

エラーメッセージ

該当のソースコード

試したこと・調べたこと

上記の詳細・結果

補足

追記

関連した質問