ナンピーシャッフル多次元配列を行のみで、列の順序を変更せずに保持

Question

Pythonでのみ行ごとに多次元配列をシャッフルするには（列をシャッフルしないでください）。

行列が非常に大きいため、最も効率的なソリューションを探しています。元の配列でこれを非常に効率的に行うこともできますか？

例：

import numpy as np X = np.random.random((6, 2)) print(X) Y = ???shuffle by row only not colls??? print(Y)

私が今期待しているのは元のマトリックスです：

[[ 0.48252164 0.12013048] [ 0.77254355 0.74382174] [ 0.45174186 0.8782033 ] [ 0.75623083 0.71763107] [ 0.26809253 0.75144034] [ 0.23442518 0.39031414]]

出力は、colsではなく行をシャッフルします。例：

[[ 0.45174186 0.8782033 ] [ 0.48252164 0.12013048] [ 0.77254355 0.74382174] [ 0.75623083 0.71763107] [ 0.23442518 0.39031414] [ 0.26809253 0.75144034]]

Kasr&#226;mvd · Accepted Answer

それがnumpy.random.shuffle()の目的です：

>>> X = np.random.random((6, 2)) >>> X array([[ 0.9818058 , 0.67513579], [ 0.82312674, 0.82768118], [ 0.29468324, 0.59305925], [ 0.25731731, 0.16676408], [ 0.27402974, 0.55215778], [ 0.44323485, 0.78779887]]) >>> np.random.shuffle(X) >>> X array([[ 0.9818058 , 0.67513579], [ 0.44323485, 0.78779887], [ 0.82312674, 0.82768118], [ 0.29468324, 0.59305925], [ 0.25731731, 0.16676408], [ 0.27402974, 0.55215778]])

Janmejaya Nanda · Answer

少し実験した後、nd-arrayのデータ（行ごと）をシャッフルするほとんどのメモリと時間効率的な方法が見つかりました、インデックスをシャッフルし、シャッフルされたインデックスからデータを取得します

Rand_num2 = np.random.randint(5, size=(6000, 2000)) perm = np.arange(Rand_num2.shape[0]) np.random.shuffle(perm) Rand_num2 = Rand_num2[perm]

詳細
ここでは、メモリ使用量を見つけるために memory_profiler を使用し、時間を記録して以前のすべての回答を比較するためにpythonの組み込み「時間」モジュールを使用しています

def main(): # shuffle data itself Rand_num = np.random.randint(5, size=(6000, 2000)) start = time.time() np.random.shuffle(Rand_num) print('Time for direct shuffle: {0}'.format((time.time() - start))) # Shuffle index and get data from shuffled index Rand_num2 = np.random.randint(5, size=(6000, 2000)) start = time.time() perm = np.arange(Rand_num2.shape[0]) np.random.shuffle(perm) Rand_num2 = Rand_num2[perm] print('Time for shuffling index: {0}'.format((time.time() - start))) # using np.take() Rand_num3 = np.random.randint(5, size=(6000, 2000)) start = time.time() np.take(Rand_num3, np.random.Rand(rand_num3.shape[0]).argsort(), axis=0, out=Rand_num3) print("Time taken by np.take, {0}".format((time.time() - start)))

時間の結果

Time for direct shuffle: 0.03345608711242676 # 33.4msec Time for shuffling index: 0.019818782806396484 # 19.8msec Time taken by np.take, 0.06726956367492676 # 67.2msec

メモリプロファイラの結果

Line # Mem usage Increment Line Contents ================================================ 39 117.422 MiB 0.000 MiB @profile 40 def main(): 41 # shuffle data itself 42 208.977 MiB 91.555 MiB Rand_num = np.random.randint(5, size=(6000, 2000)) 43 208.977 MiB 0.000 MiB start = time.time() 44 208.977 MiB 0.000 MiB np.random.shuffle(Rand_num) 45 208.977 MiB 0.000 MiB print('Time for direct shuffle: {0}'.format((time.time() - start))) 46 47 # Shuffle index and get data from shuffled index 48 300.531 MiB 91.555 MiB Rand_num2 = np.random.randint(5, size=(6000, 2000)) 49 300.531 MiB 0.000 MiB start = time.time() 50 300.535 MiB 0.004 MiB perm = np.arange(Rand_num2.shape[0]) 51 300.539 MiB 0.004 MiB np.random.shuffle(perm) 52 300.539 MiB 0.000 MiB Rand_num2 = Rand_num2[perm] 53 300.539 MiB 0.000 MiB print('Time for shuffling index: {0}'.format((time.time() - start))) 54 55 # using np.take() 56 392.094 MiB 91.555 MiB Rand_num3 = np.random.randint(5, size=(6000, 2000)) 57 392.094 MiB 0.000 MiB start = time.time() 58 392.242 MiB 0.148 MiB np.take(Rand_num3, np.random.Rand(rand_num3.shape[0]).argsort(), axis=0, out=Rand_num3) 59 392.242 MiB 0.000 MiB print("Time taken by np.take, {0}".format((time.time() - start)))

Ben-Hur Cardoso · Answer

np.vectorize()関数を使用して、2次元配列A 行ごとをシャッフルできます。

shuffle = np.vectorize(np.random.permutation, signature='(n)->(n)') A_shuffled = shuffle(A)

TassosK · Answer

これについて質問があります（または答えかもしれません）shape =（1000,60,11,1）のnumpy配列Xがあるとします。また、Xがサイズ60x11およびチャンネル番号=の画像の配列であるとします1（60x11x1）。

これらすべての画像の順序をシャッフルしたい場合、Xのインデックスでシャッフルを使用します。

def shuffling( X): indx=np.arange(len(X)) # create a array with indexes for X data np.random.shuffle(indx) X=X[indx] return X

それは機能しますか？私の知る限り、len（X）は最大の寸法サイズを返します。