numpy配列のNaNデータポイントを無視して、Pythonで正規化されたデータを生成するにはどうすればよいですか？

Question

Float（ 'nan'）を含むnumpy配列があるとします。今はそれらのデータを代入したくないので、最初にそれらを正規化し、NaNデータを元のスペースに保持したいとします。できる方法はありますか。それ？

以前はsklearn.Preprocessingでnormalize関数を使用していましたが、その関数はNaNを含む配列を入力として受け取ることができないようです。

Chiel · Accepted Answer

numpy.ma.array関数を使用して配列をマスクし、その後、任意のnumpy操作を適用できます。

import numpy as np a = np.random.Rand(10) # Generate random data. a = np.where(a > 0.8, np.nan, a) # Set all data larger than 0.8 to NaN a = np.ma.array(a, mask=np.isnan(a)) # Use a mask to mark the NaNs a_norm = a / np.sum(a) # The sum function ignores the masked values. a_norm2 = a / np.std(a) # The std function ignores the masked values.

生データには引き続きアクセスできます。

print a.data

Warren Weckesser · Answer

numpy.nansum ノルムを計算し、nanを無視するには：

In [54]: x Out[54]: array([ 1., 2., nan, 3.])

nanを無視した場合の基準は次のとおりです。

In [55]: np.sqrt(np.nansum(np.square(x))) Out[55]: 3.7416573867739413

yは正規化された配列です。

In [56]: y = x / np.sqrt(np.nansum(np.square(x))) In [57]: y Out[57]: array([ 0.26726124, 0.53452248, nan, 0.80178373]) In [58]: np.linalg.norm(y[~np.isnan(y)]) Out[58]: 1.0