カラムの値がNaNかどうかを判定した結果を、機械学習モデルに学習させたい

###実現したい事
およそ25列の特徴量 * 5万行のcsvファイルがあり、このデータを用いて回帰の機械学習モデルを作成しています。（例として下記のようなファイルです）

No,name,message,...,score
1,bob,hello,...,80
2,taro,,...,67
3,keiko,how are you,...,77
:
:
50000,kai,,...,59

この"message"というカラムを特徴量として利用したいと考えています。
"message"がNaNであれば0、何かしら文字列が格納されていれば1を返す関数を"message"カラムに適用し、その結果を新たなカラム"messageX"に格納して、これをモデルに学習させたいのですが、エラーが出るなどしてうまくいきません。

python3
1import pandas as pd  
2import numpy as np
3
4prof = pd.read_csv("prof.csv")
5
6def judge(x):
7    if prof["message"] == 'NaN':
8        return 0
9    else:
10        return 1
11
12prof["messageX"] = prof["message"].apply(lambda x: judge(x))
13
14
15ValueError       Traceback (most recent call last)
16<ipython-input-109-2ee4dfdbc177> in <module>
17----> 1 prof["messageX"] = prof["message"].apply(lambda x: judge(x))
18:
19:
20ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

もしこの方法が不可能であれば別の手段でもいいのですが、このような特徴量をうまく活用する方法をご存知の方がいらっしゃいましたら、アドバイスいただけますと幸いです。プログラミング初心者のため、コーディングにおかしな所があるかもしれません。

###試した事
関数の判定がカラムの個別の要素ではなく、カラム全体に及んでいるからエラーになる、というような記事を見つけたので下記のようなコードを試しましたがエラーになりました。

python3
1def judge(x):
2    for i in len(prof):
3        if prof["message"][i].any() == 'NaN':
4            return 0
5        else:
6            return 1
7
8prof["messageX"] = prof["message"].apply(lambda x: judge(x))
9
10
11ValueError       Traceback (most recent call last)
12<ipython-input-109-2ee4dfdbc177> in <module>
13----> 1 prof["messageX"] = prof["message"].apply(lambda x: judge(x))
14:
15:
16TypeError: 'int' object is not iterable

他にもfillna()で欠損値を埋めてからコードを実行したら全て「1」になってしまったり、関数の判定部分を「if prof["message"] == '':」としてみたりしたのですが、うまくいきませんでした。

###利用環境
anaconda3
python3.8

行動規範の内容に同意します

回答1件

ベストアンサー

以下です。

python
1df['messageX'] = 1 - df['message'].isnull()

実行結果

python
1>>> print(df)
2      No   name      message  ...  score
30      1    bob        hello  ...     80
41      2   taro          NaN  ...     67
52      3  keiko  how are you  ...     77
63  50000    kai          NaN  ...     59
7>>> df['messageX'] = 1 - df['message'].isnull()
8>>> print(df)
9      No   name      message  ...  score  messageX
100      1    bob        hello  ...     80         1
111      2   taro          NaN  ...     67         0
122      3  keiko  how are you  ...     77         1
133  50000    kai          NaN  ...     59         0