Pythonプログラムの入力ファイルのエラーメッセージ

Question

### 前提・実現したいこと Python初心者です。 csvファイル2種類を読み込んで走査を行うプログラムを作成しているのですが、エラーの意味が分からず困っています。入力ファイルのサイズに関するエラーなのはわかるのですが、どのような訂正を行えば良いか分かりません。初歩的な質問ですが、ご回答いただけると幸いです。 ### 発生している問題・エラーメッセージ ``` PS C:\Users\suu\PycharmProjects\pythonProject> python scan.py Traceback (most recent call last): File "C:\Users\suu\PycharmProjects\pythonProject\scan.py", line 34, in dataset = np.append(dataset, data.reshape(1, -1), axis=0) File "<__array_function__ internals>", line 5, in append File "C:\Users\suu\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages umpy\lib\ function_base.py", line 4817, in append return concatenate((arr, values), axis=axis) File "<__array_function__ internals>", line 5, in concatenate ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 121 and the array at index 1 has size 110 ``` ### 該当のソースコード pythonファイル ```python import pandas as pd import numpy as np INPUT_STRAIN_CSV_PATH = r"C:\programming\strain.csv" INPUT_BGS_CSV_PATH = r"C:\programming\observed_bgs.csv" OUTPUT_CSV_PATH = 'test_bgs.csv' C = 50 # 比例定数[GHz] f_B = 11.08 # ひずみが無い場合の中心周波数[GHz] delta_f = 0.005 # 周波数間隔[GHz] delta_z = 0.1 # 計測位置間隔[m] M = 11 # 行 N = 11 # 列 df_STRAIN = pd.read_csv(INPUT_STRAIN_CSV_PATH, header=None, index_col=None, dtype='float') df_BGS = pd.read_csv(INPUT_BGS_CSV_PATH, header=0, index_col=0, dtype='float') df_STRAIN = df_STRAIN*C + f_B # 周波数シフト num_list = ['label'] for m in range(M): for n in range(N): num_list.append('['+str(m)+']['+str(n)+']') # リストに'+str(m)+str(n)'をそれぞれ11まで追加 dataset = np.empty((1, M*N)) # 1行121列の値が適当な配列の生成 column_num_z = int((N - 1) / 2) for true_f_B in df_STRAIN[0]: for approximate_f_B in df_BGS.index: if true_f_B - delta_f/2 <= approximate_f_B < true_f_B + delta_f/2: # 周波数間隔内に近い中心周波数がある場合 index_num_f = df_BGS.index.get_loc(approximate_f_B) # 観測BGS値のファイルにあるあてはまるデータの行数を返す start_f, end_f = int(index_num_f - (M - 1) / 2), int(index_num_f + (M - 1) / 2 + 1) # 行に観測周波数 start_z, end_z = int(column_num_z - (N - 1) / 2), int(column_num_z + (N - 1) / 2 + 1) # 列に観測位置 data = df_BGS.iloc[start_f:end_f, start_z:end_z].values # スタートの観測位置・周波数から終わりの観測位置・周波数の範囲にあるBGS値を取り出し、1行に並べる dataset = np.append(dataset, data.reshape(1, -1), axis=0) # 先ほど作成した1行121列の値が適当な配列に縦方向に上の行のBGS値を追加する column_num_z += 1 dataset = np.delete(dataset, 0, 0) # datasetの0行目（適当に作成した値の行）を削除 output = pd.DataFrame(dataset, index=df_STRAIN[0]) # 行のラベルがdf_STRAIN[0]となる配列datasetをoutputという名前で作成 print(len(output)) # outputの要素数を取得、ターミナルに表示 output = pd.DataFrame(output, columns=num_list, index=None) print(output) ``` 入力に用いているcsvファイル（1行目と1列目はラベル） ```observed_bgs.csv ,1.000000,1.100000,1.200000,1.300000,1.400000,1.500000,1.600000,1.700000,1.800000,1.900000,2.000000, 11.055000,1.490014e-001,2.124409e-001,3.007518e-001,1.808967e-001,3.054668e-001,3.431774e-001,-4.210322e-002,4.011200e-002,3.539206e-001,3.262569e-001,-5.839732e-002, 11.060000,2.035853e-001,1.660419e-001,3.366550e-001,1.037324e-002,4.397776e-001,2.541221e-001,1.118989e-001,7.178671e-002,3.731853e-001,8.672985e-002,-7.714465e-002, 11.065000,1.820284e-001,1.257406e-001,1.539731e-001,1.634584e-001,2.943186e-001,2.839953e-001,2.182672e-001,3.542577e-001,4.995189e-001,2.484616e-001,3.013704e-001, 11.070000,4.253567e-001,4.271026e-001,4.260287e-001,3.171626e-001,3.713739e-001,6.300043e-001,5.840841e-001,4.365117e-001,4.673692e-001,5.320352e-001,4.835563e-001, 11.075000,5.439099e-001,5.400301e-001,6.477162e-001,7.950613e-001,5.961943e-001,6.699225e-001,6.539321e-001,6.780738e-001,6.614270e-001,5.516639e-001,6.593625e-001, 11.080000,1.010312e+000,8.952163e-001,1.013250e+000,9.671943e-001,1.008806e+000,1.011744e+000,1.089623e+000,9.396322e-001,7.608374e-001,1.021424e+000,8.930884e-001, 11.085000,9.134308e-001,8.684833e-001,1.025697e+000,1.118062e+000,9.602207e-001,1.076239e+000,8.398812e-001,1.025173e+000,1.003321e+000,9.614065e-001,1.049188e+000, 11.090000,8.405426e-001,8.809619e-001,8.758572e-001,7.283722e-001,9.289356e-001,1.013398e+000,7.797117e-001,8.230195e-001,7.077125e-001,9.423536e-001,8.999260e-001, 11.095000,7.321427e-001,5.268207e-001,6.224229e-001,4.072632e-001,5.110210e-001,6.265857e-001,5.455612e-001,4.626299e-001,4.156494e-001,4.014521e-001,5.963694e-001, 11.100000,3.861631e-001,2.377489e-001,3.210669e-001,5.498842e-001,3.387315e-001,3.426206e-001,3.899067e-001,5.276549e-001,3.496963e-001,4.343438e-001,3.474111e-001, 11.105000,2.037398e-001,3.452732e-001,2.145458e-001,2.452674e-001,3.469219e-001,3.969867e-001,3.397168e-001,3.521850e-001,2.853276e-001,3.189511e-001,1.221901e-001, ``` ```strain.csv 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 7.69E-05 ``` ### 試したこと二つ目のstrain.csvを行に並べ変えて実行しましたが、同じエラーが表示されました。

Accepted Answer

> ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 121 and the array at index 1 has size 110

`dataset` と `data.reshape(1, -1)` の shape を表示してみると以下の様になっていて不一致になっています。
```python
    # スタートの観測位置・周波数から終わりの観測位置・周波数の範囲にあるBGS値を取り出し、1行に並べる
    print(f'{dataset.shape=}', f'{data.reshape(1, -1).shape=}')
    dataset = np.append(dataset, data.reshape(1, -1), axis=0)

=>
    dataset.shape=(1, 121) data.reshape(1, -1).shape=(1, 110)
```

どの様にすればよいのかは判りませんので、とりあえず zero padding しておきます。
```python
    # スタートの観測位置・周波数から終わりの観測位置・周波数の範囲にあるBGS値を取り出し、1行に並べる
    # dataset = np.append(dataset, data.reshape(1, -1), axis=0)
    data = data.reshape(1, -1)
    dataset = np.append(
        dataset, np.pad(data, ((0,0),(0,dataset.shape[1]-data.shape[1]))),
        axis=0)
```

以上の変更を行って実行してみたのですが、最後のデータフレーム化でエラーになってしまいますので、そこも変更します。
```python
# 行のラベルがdf_STRAIN[0]となる配列datasetをoutputという名前で作成
print(len(output))  # outputの要素数を取得、ターミナルに表示
#output = pd.DataFrame(output, columns=num_list, index=None)
output = pd.DataFrame(output.values, columns=num_list[1:], index=None)
print(output)
```

**実行結果**
```
121
       [0][0]    [0][1]    [0][2]    [0][3]  ...  [10][7]  [10][8]  [10][9]  [10][10]
0    0.203585  0.166042  0.336655  0.010373  ...      0.0      0.0      0.0       0.0
1    0.166042  0.336655  0.010373  0.439778  ...      0.0      0.0      0.0       0.0
2    0.336655  0.010373  0.439778  0.254122  ...      0.0      0.0      0.0       0.0
3    0.010373  0.439778  0.254122  0.111899  ...      0.0      0.0      0.0       0.0
4    0.439778  0.254122  0.111899  0.071787  ...      0.0      0.0      0.0       0.0
..        ...       ...       ...       ...  ...      ...      ...      ...       ...
116  0.000000  0.000000  0.000000  0.000000  ...      0.0      0.0      0.0       0.0
117  0.000000  0.000000  0.000000  0.000000  ...      0.0      0.0      0.0       0.0
118  0.000000  0.000000  0.000000  0.000000  ...      0.0      0.0      0.0       0.0
119  0.000000  0.000000  0.000000  0.000000  ...      0.0      0.0      0.0       0.0
120  0.000000  0.000000  0.000000  0.000000  ...      0.0      0.0      0.0       0.0

[121 rows x 121 columns]
```

前提・実現したいこと

発生している問題・エラーメッセージ

該当のソースコード

試したこと

関連した質問