NumPy：多くのベクトルをすばやく正規化する方法は？

Question

NumPyでベクトルのリストをエレガントに正規化するにはどうすればよいですか？

これはしない動作する例です：

from numpy import * vectors = array([arange(10), arange(10)]) # All x's, then all y's norms = apply_along_axis(linalg.norm, 0, vectors) # Now, what I was expecting would work: print vectors.T / norms # vectors.T has 10 elements, as does norms, but this does not work

最後の操作では、「形状の不一致：オブジェクトを単一の形状にブロードキャストできません」が発生します。

NumPyを使用して、vectorsの2Dベクトルの正規化をエレガントに行うにはどうすればよいですか？

編集：ディメンションをnormsに追加しているときに上記が機能しないのはなぜですか（以下の私の回答に従って）？

Olivier Verdier · Accepted Answer

まあ、私が何かを見逃していない限り、これはうまくいきます：

vectors / norms

あなたの提案の問題は放送規則です。

vectors # shape 2, 10 norms # shape 10

形が同じ長さではありません！したがって、ルールは、最初にleftで小さな形状を1つ拡張することです。

norms # shape 1,10

あなたはそれを手動で行うことができます：

vectors / norms.reshape(1,-1) # same as vectors/norms

vectors.T/normsを計算する場合は、次のように手動で再形成する必要があります。

vectors.T / norms.reshape(-1,1) # this works

Geoff · Answer

マグニチュードを計算しています

私はこの質問に出くわし、あなたの正規化の方法に興味を持ちました。マグニチュードの計算には別の方法を使用します。 注：通常、最後のインデックス（この場合は列ではなく行）全体のノルムも計算します

magnitudes = np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]

ただし、通常は、次のように正規化します。

vectors /= np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]

時間比較

時間を比較するテストを実行したところ、私の方法はかなり高速であることがわかりましたが、 Freddie Witherdon の提案はさらに高速です。

import numpy as np vectors = np.random.Rand(100, 25) # OP's %timeit np.apply_along_axis(np.linalg.norm, 1, vectors) # Output: 100 loops, best of 3: 2.39 ms per loop # Mine %timeit np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis] # Output: 10000 loops, best of 3: 13.8 us per loop # Freddie's (from comment below) %timeit np.sqrt(np.einsum('...i,...i', vectors, vectors)) # Output: 10000 loops, best of 3: 6.45 us per loop

ただし、この StackOverflowの回答の注記にあるように、einsumでは安全性チェックが行われないため、絶対値の2乗を正確に保存するには、dtypeのvectorsで十分であることを確認してください。

Eric O Lebigot · Answer

了解：NumPyの配列形状ブロードキャストは、配列形状のleftに、右側ではなく次元を追加します。ただし、NumPyは、norms配列の右側に次元を追加するように指示できます。

print vectors.T / norms[:, newaxis]

動作します！

print vectors.T / norms[:, newaxis]

動作します！

SenhorSchaefers · Answer

scikit learnにはすでに関数があります。

import sklearn.preprocessing as preprocessing norm =preprocessing.normalize(m, norm='l2')*

詳細：

http://scikit-learn.org/stable/modules/preprocessing.html

Fnord · Answer

ベクトルを正規化するための私の好ましい方法は、numpyのinner1dを使用してその大きさを計算することです。これがinner1dと比較してこれまでに提案されたものです

import numpy as np from numpy.core.umath_tests import inner1d COUNT = 10**6 # 1 million points points = np.random.random_sample((COUNT,3,)) A = np.sqrt(np.einsum('...i,...i', points, points)) B = np.apply_along_axis(np.linalg.norm, 1, points) C = np.sqrt((points ** 2).sum(-1)) D = np.sqrt((points*points).sum(axis=1)) E = np.sqrt(inner1d(points,points)) print [np.allclose(E,x) for x in [A,B,C,D]] # [True, True, True, True]

CProfileを使用したパフォーマンスのテスト：

import cProfile cProfile.run("np.sqrt(np.einsum('...i,...i', points, points))**0.5") # 3 function calls in 0.013 seconds cProfile.run('np.apply_along_axis(np.linalg.norm, 1, points)') # 9000018 function calls in 10.977 seconds cProfile.run('np.sqrt((points ** 2).sum(-1))') # 5 function calls in 0.028 seconds cProfile.run('np.sqrt((points*points).sum(axis=1))') # 5 function calls in 0.027 seconds cProfile.run('np.sqrt(inner1d(points,points))') # 2 function calls in 0.009 seconds

inner1dは、einsumよりも速く髪の大きさを計算しました。したがって、inner1dを使用して正規化します。

n = points/np.sqrt(inner1d(points,points))[:,None] cProfile.run('points/np.sqrt(inner1d(points,points))[:,None]') # 2 function calls in 0.026 seconds

Scikitに対するテスト：

import sklearn.preprocessing as preprocessing n_ = preprocessing.normalize(points, norm='l2') cProfile.run("preprocessing.normalize(points, norm='l2')") # 47 function calls in 0.047 seconds np.allclose(n,n_) # True

結論：inner1dを使用するのが最善の方法のようです

Viliam Vadocz · Answer

2次元の場合、大きさを計算するためにnp.hypot(vectors[:,0],vectors[:,1])を使用すると、Freddie Witherdenのnp.sqrt(np.einsum('...i,...i', vectors, vectors))よりも高速に見えます。（ジェフによる回答の参照）

import numpy as np # Generate array of 2D vectors. vectors = np.random.random((1000,2)) # Using Freddie's %timeit np.sqrt(np.einsum('...i,...i', vectors, vectors)) # Output: 11.1 µs ± 173 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) # Using numpy.hypot() %timeit np.hypot(vectors[:,0], vectors[:,1]) # Output: 6.81 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

正規化されたベクトルを取得するには、次のようにします。

vectors /= np.hypot(vectors[:,0], vectors[:,1])