Python accuracy計算時の配列メモリエラーを改善したい

実現したいこと

Google colablatoryを使用。
ground truth labelとpred labelのaccuracyの計算を行うために、以下のようなコードを書きました。しかし、1ファイル140000行あるため、配列のメモリエラーが起こってしまいます。
他に方法が思いつかないため、メモリエラーを起こさずにaccuracyの計算を行う方法がありましたら教えていただけると幸いです。
よろしくお願いいたします。

発生している問題・エラーメッセージ

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

該当のソースコード

Python
1from sklearn.metrics import accuracy_score
2
3groud_truth_file = 'a.txt'
4pred_file = 'b.txt'
5
6with open(groud_truth_file) as f:
7    a = f.readlines()
8groud_truth_label = [line.rstrip('\n') for line in a]
9
10with open(pred_file) as f:
11    b = f.readlines()
12pred_label = [line.rstrip('\n') for line in b]
13
14print('groud_truth_label', groud_truth_label)
15print('pred_label', pred_label)
16
17print('Accuracy : ', accuracy_score(groud_truth_label, pred_label))
18del groud_truth_label, pred_label

補足情報

a.txtおよびb.txtには一行に一つ0か1が書かれている状態で、各ファイル140000行あります。

bsdfan

2022/08/15 04:09

メモリエラーではなく、140000要素のリストをprintしようとしている部分が原因ではないですか？

行動規範の内容に同意します

回答2件

ベストアンサー

google colabで実行してみましたが、エラーは出ませんでした

a.txtおよびb.txtには一行に一つ0か1が書かれている状態で、各ファイル140000行あります。

を、下記コードを実行して乱数で作成

python
1import numpy as np
2
3groud_truth_label_org = np.random.randint(2, size=(140000, 1))
4pred_label_org = np.random.randint(2, size=(140000, 1))
5
6groud_truth_file = 'a.txt'
7pred_file = 'b.txt'
8
9np.savetxt(groud_truth_file, groud_truth_label_org, fmt='%d')
10np.savetxt(pred_file, pred_label_org, fmt='%d')
11
12# 作成したファイルの行数を確認
13!wc -l a.txt
14!wc -l b.txt