複数のcsvのファイルの平均を取る方法

前提・実現したいこと

現在、自分の仕事で複数のcsvファイルの平均を取る方法を模索しているのですが、うまくできません。csvファイルは全て５列のデータで、１列目から５列目まで生理指標のデータとなっています。ただ、行数は同じではなく、データを取っている時間がバラバラです。

発生している問題・エラーメッセージ

エラーメッセージ
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in na_arithmetic_op(left, right, op, is_cmp)
    142     try:
--> 143         result = expressions.evaluate(op, left, right)
    144     except TypeError:

10 frames
TypeError: unsupported operand type(s) for /: 'str' and 'int'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in masked_arith_op(x, y, op)
    110         if mask.any():
    111             with np.errstate(all="ignore"):
--> 112                 result[mask] = op(xrav[mask], y)
    113 
    114     result, _ = maybe_upcast_putmask(result, ~mask, np.nan)

TypeError: unsupported operand type(s) for /: 'str' and 'int'

該当のソースコード

Python

ソースコード

import pandas as pd
import csv

average_array = [[0 for i in range(5)] for j in range(450)]

#データ読み取り


df1 = pd.read_csv("/content/drive/Shareddrives/**, skiprows=3)

# df1~df26まで省略
df26 = pd.read_csv("/content/drive/Shareddrives/**, skiprows=3)

rowcount = max(len(df1),len(df2),len(df3),len(df4),len(df5),len(df6),len(df7),len(df8),len(df9),len(df10),len(df11),len(df12),len(df13),len(df14),len(df15),len(df16),len(df17),len(df18),len(df19),len(df20),len(df21),len(df22),len(df23),len(df24),len(df25),len(df26))
columncount = len(df1.columns)
if len(df1) < rowcount:
  df1 = df1.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df2) < rowcount:
  df2 = df2.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df3) < rowcount:
  df3 = df3.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df4) < rowcount:
  df4 = df4.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df5) < rowcount:
  df5 = df5.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df6) < rowcount:
  df6 = df6.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df7) < rowcount:
  df7 = df7.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df8) < rowcount:
  df8 = df8.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df9) < rowcount:
  df9 = df9.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df10) < rowcount:
  df10 = df10.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df11) < rowcount:
  df11 = df11.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df12) < rowcount:
  df12 = df12.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df13) < rowcount:
  df13 = df13.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df14) < rowcount:
  df14 = df14.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df15) < rowcount:
  df15 = df15.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df16) < rowcount:
  df16 = df16.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df17) < rowcount:
  df17 = df17.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df18) < rowcount:
  df18 = df18.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df19) < rowcount:
  df19 = df19.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df20) < rowcount:
  df20 = df20.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df21) < rowcount:
  df21 = df21.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df22) < rowcount:
  df22 = df22.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df23) < rowcount:
  df23 = df23.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df24) < rowcount:
  df24 = df24.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df25) < rowcount:
  df25 = df25.append(pd.Series([0]*columncount),ignore_index=True)
elif len(df26) < rowcount:
  df26 = df26.append(pd.Series([0]*columncount),ignore_index=True)


result = (df1 + df2 + df3 + df4 + df5 + df6 + df7 + df8 + df9 + df10 + df11 + df12 + df13 + df14 + df15 + df16 + df17 + df18 + df19 + df20 + df21 + df22 + df23 + df24 + df25 + df26) / 26
print(result)


#データ書き出し
with open('ave_data.csv', 'w') as file:
    writer = csv.writer(file, lineterminator='\n')
    writer.writerows(average_array)

試したこと

str型とint型が違うのか、resultの右辺をいじりましたが直らなかったです。

補足情報（FW/ツールのバージョンなど）

現在のverのGoogle colaboratoryを使ってドライブに書き出そうとしています。

行動規範の内容に同意します

回答1件

python
1df1 = pd.read_csv("/content/drive/Shareddrives/**, skiprows=3)

としていますが、それ以降の行に文字列の行があります。そのため

pythonresult
1

で26で割ろうとして失敗してます。

もっとも疑わしいのは4行目です。skiprows=3は0行目から2行目までをskipするもので、普通に読み込んで3というindexが付く行(つまり4行目)はskipしないからです。
もしも、print(df1[0])とかを入れてみてdtypeがobjectになるかどうかを確認してください。

投稿2021/12/09 05:54

ppaul

総合スコア24670

sugizone5

2021/12/11 07:25

skiprows=3は０行目から２行目までラベルのついたセルがあるため、スキップしています。 print(df1[0])を実行しましたが、エラーが出てしまいました。dtypeがobjectじゃないのかもしれないです。 --------------------------------------------------------------------------- KeyError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 2897 try: -> 2898 return self._engine.get_loc(casted_key) 2899 except KeyError as err: pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 0 The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) 2 frames /usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 2898 return self._engine.get_loc(casted_key) 2899 except KeyError as err: -> 2900 raise KeyError(key) from err 2901 2902 if tolerance is not None: KeyError: 0 それ以降の行に文字列の行があるとは、どの行のことを言っていますか？

行動規範の内容に同意します