重複のない乱数のリストを作成するにはどうすればよいですか？

Question

random.randint(0, 100)を使用してみましたが、いくつかの数値は同じでした。リスト固有の乱数を作成する方法/モジュールはありますか？

def getScores(): # open files to read and write f1 = open("page.txt", "r"); p1 = open("pgRes.txt", "a"); gScores = []; bScores = []; yScores = []; # run 50 tests of 40 random queries to implement "bootstrapping" method for i in range(50): # get 40 random queries from the 50 lines = random.sample(f1.readlines(), 40);

Greg Hewgill · Accepted Answer

これにより、重複することなく、0〜99の範囲から選択された10個の数字のリストが返されます。

import random random.sample(range(100), 10)

特定のコード例を参照すると、おそらくファイルonceからすべての行を読み取り、メモリに保存されたリストからランダムな行を選択することができます。例えば：

all_lines = f1.readlines() for i in range(50): lines = random.sample(all_lines, 40)

この方法では、ループの前にファイルから実際に1回だけ読み取る必要があります。これは、ファイルの先頭に戻り、ループの繰り返しごとにf1.readlines()を再度呼び出すよりもはるかに効率的です。

Ricardo Murillo · Answer

次のようにランダムモジュールのshuffle関数を使用できます。

import random my_list = list(xrange(1,100)) # list of integers from 1 to 99 # adjust this boundaries to fit your needs random.shuffle(my_list) print my_list # <- List of unique random numbers

ここで、shuffleメソッドは期待どおりのリストを返さないことに注意してください。参照によって渡されたリストのみをシャッフルします。

ben · Answer

最初にaからbの番号のリストを作成できます。ここで、aとbはそれぞれリスト内の最小数と最大数です。 Fisher-Yates アルゴリズムまたはPythonの random.shuffle メソッドを使用します。

inspectorG4dget · Answer

この回答で示されている解決策は機能しますが、サンプルサイズが小さい場合にメモリに問題が生じる可能性がありますが、人口は膨大です（例：random.sample(insanelyLargeNumber, 10)）。

それを修正するために、私はこれで行きます：

answer = set() sampleSize = 10 answerSize = 0 while answerSize < sampleSize: r = random.randint(0,100) if r not in answer: answerSize += 1 answer.add(r) # answer now contains 10 unique, random integers from 0.. 100

wowserx · Answer

したがって、この投稿は6年前のものですが、オーバーヘッドが大きいと実用的ではありませんが、（通常）アルゴリズムのパフォーマンスが向上する別の答えがあります。

その他の答えには、シャッフルメソッドと、セットを使用した「有効になるまで試行する」メソッドがあります。

間隔0 ... N-1から置換せずにK個の整数をランダムに選択する場合、シャッフルメソッドはO(N)ストレージとO(N)操作を使用します。大きなNから小さなKを選択する場合、これは迷惑です。setメソッドはO(K)ストレージのみを使用しますが、最悪の場合はO(inf)が期待されますO（n * log （n））Nに近いKの場合（k = n-1 = 10 ^ 6の場合、999998を選択済みで、2つの許可された回答のうち最後の数をランダムに取得しようとすることを想像してください）。

したがって、setメソッドはK〜1で問題なく、shuffleメソッドはK〜Nで問題ありません。どちらも、予想される> K RNG呼び出しを使用します。

別の方法; ふりを実行してFisher–Yatesシャッフルを実行し、新しいランダム選択ごとに、既に選択した要素に対してバイナリ検索操作を実行して、目的の値を見つけますwouldまだ選択していないすべての要素の配列を実際に保存している場合に取得します。

すでに選択されている値が[2,4]で、乱数ジェネレーターが間隔（N-num_already_selected）に2を吐き出す場合、[0,1,3,5,6、..から選択するふりをします。。]既に選択されている回答よりも小さい値をカウントすることにより。この場合、3番目に選択した値は3になります。次に、次のステップで、乱数が2 againの場合、5にマッピングされます（pretend list [0,1,5,6]）。なぜなら、（既に選択された値のソート済みリスト[5、3、4]の潜在的なインデックス5、3）+ 2 = 5.

したがって、バランスの取れたバイナリ検索ツリーに既に選択されている値を保存し、各ノードにランク（その値より小さい値の数）を保存し、範囲から乱数Rを選択します（0 ... n既に選択されている数））。次に、あたかも検索するかのようにツリーを下降しますが、検索値はRに現在のノードのランクを加えたものです。リーフノードに到達したら、そのノードのランクに乱数を追加し、バランスの取れたバイナリツリーに合計を挿入します。

K個の要素を取得したら、それらをツリーから配列に読み込み、シャッフルします（順序が重要な場合）。

これには、O(K)ストレージ、O（K * log（K））パフォーマンス、および正確にK個のrandint呼び出しが必要です。

ランダムサンプリングの実装例（最終的な順序はランダムではありませんが、O(K)はシャッフルできます）、O(k)ストレージ、およびO（k ---（log ^ 2 （k））パフォーマンス（O（k log（k）ではありません）この実装では、バランスの取れたバイナリツリーをカスタムで下降させることができないためです）：

from sortedcontainers import SortedList def sample(n, k): ''' Return random k-length-subset of integers from 0 to n-1. Uses only O(k) storage. Bounded k*log^2(k) worst case. K RNG calls. ''' ret = SortedList() for i in range(k): to_insert = random.randint(0, n-1 - len(ret)) to_insert = binsearch_adding_rank(ret, to_insert) ret.add(to_insert) return ret def binsearch_adding_rank(A, v): l, u = 0, len(A)-1 m=0 while l <= u: m = l+(u-l)//2 if v + m >= A[m]: l = m+1 m+=1 # We're binary searching for partitions, so if the last step was to the right then add one to account for offset because that's where our insert would be. Elif v+m < A[m]: u = m-1 return v+m

そして妥当性を示すために：

既に[1,4,6,7,8,9,15,16]を選択し、乱数5でフィッシャーイェーツシャッフルを行っている場合、まだ選択されていない配列は[0 、2,3,5,10,11,12、...]、したがって、要素5は11です。したがって、binsearch-functionは、5と[1,4,6,7,8,9,15が与えられると、11を返します。、16]：

assert binsearch_adding_rank([1,4,6,7,8,9,15,16], 5) == 11

[1,2,3]の逆は[0,4,5,6,7,8、...]で、その5番目の要素は8です。

assert binsearch_adding_rank([1,2,3], 5) == 8

[2,3,5]の逆は[0,1,4,6、...]であり、その最初の要素は（まだ）1です。

assert binsearch_adding_rank([2,3,5], 1) == 1

逆は[0,6,7,8、...]、3番目の要素は8、および：

assert binsearch_adding_rank([1,2,3,4,5,10], 3) == 8

そして、全体的な機能をテストするには：

# Edge cases: assert sample(50, 0) == [] assert sample(50, 50) == list(range(0,50)) # Variance should be small and equal among possible values: x = [0]*10 for i in range(10_000): for v in sample(10, 5): x[v] += 1 for v in x: assert abs(5_000 - v) < 250, v del x # Check for duplication: y = sample(1500, 1000) assert len(frozenset(y)) == len(y) del y

ただし、実際には、K〜> N/2にはシャッフルメソッドを使用し、K〜<N/2にはsetメソッドを使用します。

編集：再帰を使用してそれを行う別の方法です！ O（k * log（n））私は思う。

def divide_and_conquer_sample(n, k, l=0): u = n-1 # Base cases: if k == 0: return [] Elif k == n-l: return list(range(l, n)) Elif k == 1: return [random.randint(l, u)] # Compute how many left and how many right: m = l + (u-l)//2 k_right = 0 k_left = 0 for i in range(k): # Base probability: (# of available values in right interval) / (total available values) if random.random() <= (n-m - k_right)/(n-l-k_right-k_left): k_right += 1 else: k_left += 1 # Recur return divide_and_conquer_sample(n, k_right, m) + divide_and_conquer_sample(m, k_left, l)

Thomas Lux · Answer

線形合同擬似乱数ジェネレータ

O（1）メモリ

O（k）操作

この問題は、単純な Linear Congruential Generator で解決できます。これには、一定のメモリオーバーヘッド（8整数）と最大2 *（シーケンス長）の計算が必要です。

他のすべてのソリューションは、より多くのメモリとより多くのコンピューティングを使用します！数個のランダムシーケンスのみが必要な場合、この方法は大幅に安価です。サイズNの範囲で、Nユニークk- sequences以上の順序で生成する場合は、組み込みメソッドrandom.sample(range(N),k)を使用する承認済みのソリューションをお勧めしますこれ最適化されています in python in speed。

コード

# Return a randomized "range" using a Linear Congruential Generator # to produce the number sequence. Parameters are the same as for # python builtin "range". # Memory -- storage for 8 integers, regardless of parameters. # Compute -- at most 2*"maximum" steps required to generate sequence. # def random_range(start, stop=None, step=None): import random, math # Set a default values the same way "range" does. if (stop == None): start, stop = 0, start if (step == None): step = 1 # Use a mapping to convert a standard range into the desired range. mapping = lambda i: (i*step) + start # Compute the number of numbers in this range. maximum = (stop - start) // step # Seed range with a random integer. value = random.randint(0,maximum) # # Construct an offset, multiplier, and modulus for a linear # congruential generator. These generators are cyclic and # non-repeating when they maintain the properties: # # 1) "modulus" and "offset" are relatively prime. # 2) ["multiplier" - 1] is divisible by all prime factors of "modulus". # 3) ["multiplier" - 1] is divisible by 4 if "modulus" is divisible by 4. # offset = random.randint(0,maximum) * 2 + 1 # Pick a random odd-valued offset. multiplier = 4*(maximum//4) + 1 # Pick a multiplier 1 greater than a multiple of 4. modulus = int(2**math.ceil(math.log2(maximum))) # Pick a modulus just big enough to generate all numbers (power of 2). # Track how many random numbers have been returned. found = 0 while found < maximum: # If this is a valid value, yield it in generator fashion. if value < maximum: found += 1 yield mapping(value) # Calculate the next value in the sequence. value = (value*multiplier + offset) % modulus

使用法

この関数 "random_range"の使用法は、他のジェネレーターと同じです（ "range"など）。例：

# Show off random range. print() for v in range(3,6): v = 2**v l = list(random_range(v)) print("Need",v,"found",len(set(l)),"(min,max)",(min(l),max(l))) print("",l) print()

サンプル結果

Required 8 cycles to generate a sequence of 8 values. Need 8 found 8 (min,max) (0, 7) [1, 0, 7, 6, 5, 4, 3, 2] Required 16 cycles to generate a sequence of 9 values. Need 9 found 9 (min,max) (0, 8) [3, 5, 8, 7, 2, 6, 0, 1, 4] Required 16 cycles to generate a sequence of 16 values. Need 16 found 16 (min,max) (0, 15) [5, 14, 11, 8, 3, 2, 13, 1, 0, 6, 9, 4, 7, 12, 10, 15] Required 32 cycles to generate a sequence of 17 values. Need 17 found 17 (min,max) (0, 16) [12, 6, 16, 15, 10, 3, 14, 5, 11, 13, 0, 1, 4, 8, 7, 2, ...] Required 32 cycles to generate a sequence of 32 values. Need 32 found 32 (min,max) (0, 31) [19, 15, 1, 6, 10, 7, 0, 28, 23, 24, 31, 17, 22, 20, 9, ...] Required 64 cycles to generate a sequence of 33 values. Need 33 found 33 (min,max) (0, 32) [11, 13, 0, 8, 2, 9, 27, 6, 29, 16, 15, 10, 3, 14, 5, 24, ...]

Handcraftsman · Answer

非常に大きな数をサンプリングする必要がある場合は、rangeを使用できません

random.sample(range(10000000000000000000000000000000), 10)

スローするため：

OverflowError: Python int too large to convert to C ssize_t

また、範囲が小さすぎるためにrandom.sampleが必要なアイテム数を生成できない場合

 random.sample(range(2), 1000)

投げる：

 ValueError: Sample larger than population

この関数は両方の問題を解決します。

import random def random_sample(count, start, stop, step=1): def gen_random(): while True: yield random.randrange(start, stop, step) def gen_n_unique(source, n): seen = set() seenadd = seen.add for i in (i for i in source() if i not in seen and not seenadd(i)): yield i if len(seen) == n: break return [i for i in gen_n_unique(gen_random, min(count, int(abs(stop - start) / abs(step))))]

非常に大きな数での使用：

print('
'.join(map(str, random_sample(10, 2, 10000000000000000000000000000000))))

サンプル結果：

7822019936001013053229712669368 6289033704329783896566642145909 2473484300603494430244265004275 5842266362922067540967510912174 6775107889200427514968714189847 9674137095837778645652621150351 9969632214348349234653730196586 1397846105816635294077965449171 3911263633583030536971422042360 9864578596169364050929858013943

範囲が要求されたアイテムの数より小さい使用法：

print(', '.join(map(str, random_sample(100000, 0, 3))))

サンプル結果：

2, 0, 1

また、負の範囲とステップで動作します：

print(', '.join(map(str, random_sample(10, 10, -10, -2)))) print(', '.join(map(str, random_sample(10, 5, -5, -2))))

サンプル結果：

2, -8, 6, -2, -4, 0, 4, 10, -6, 8 -3, 1, 5, -1, 3

Mitch Wheat · Answer

1からNまでのN個の数字のリストがランダムに生成される場合、はい、いくつかの数字が繰り返される可能性があります。

ランダムな順序で1からNまでの数字のリストが必要な場合は、配列に1からNまでの整数を入れてから、 Fisher-Yates shuffle またはPythonの random.shuffle() を使用します=。

dataLeo · Answer

以下に示すように、Numpyライブラリを使用してすばやく回答できます-

与えられたコードスニペットは、0〜5の範囲の6個のunique番号をリストします。快適にパラメーターを調整できます。

import numpy as np import random a = np.linspace( 0, 5, 6 ) random.shuffle(a) print(a)

出力

[ 2. 1. 5. 3. 4. 0.]

Random-sampleで参照されている here のように、制約はありません。

これが少し役立つことを願っています。

Vinicius Torino · Answer

あなたの問題も解決する非常にシンプルな機能

from random import randint data = [] def unique_Rand(inicial, limit, total): data = [] i = 0 while i < total: number = randint(inicial, limit) if number not in data: data.append(number) i += 1 return data data = unique_Rand(1, 60, 6) print(data) """ prints something like [34, 45, 2, 36, 25, 32] """

orange · Answer

セットベースのアプローチ（「戻り値にランダムな値がある場合、再試行」）の問題は、特に大量のランダムな値が返される場合、衝突（別の「再試行」の反復が必要）のためにランタイムが未定であることです範囲から。

この非決定的ランタイムになりにくい代替手段は次のとおりです。

import bisect import random def fast_sample(low, high, num): """ Samples :param num: integer numbers in range of [:param low:, :param high:) without replacement by maintaining a list of ranges of values that are permitted. This list of ranges is used to map a random number of a contiguous a range (`r_n`) to a permissible number `r` (from `ranges`). """ ranges = [high] high_ = high - 1 while len(ranges) - 1 < num: # generate a random number from an ever decreasing # contiguous range (which we'll map to the true # random number). # consider an example with low=0, high=10, # part way through this loop with: # # ranges = [0, 2, 3, 7, 9, 10] # # r_n :-> r # 0 :-> 1 # 1 :-> 4 # 2 :-> 5 # 3 :-> 6 # 4 :-> 8 r_n = random.randint(low, high_) range_index = bisect.bisect_left(ranges, r_n) r = r_n + range_index for i in xrange(range_index, len(ranges)): if ranges[i] <= r: # as many "gaps" we iterate over, as much # is the true random value (`r`) shifted. r = r_n + i + 1 Elif ranges[i] > r_n: break # mark `r` as another "gap" of the original # [low, high) range. ranges.insert(i, r) # Fewer values possible. high_ -= 1 # `ranges` happens to contain the result. return ranges[:-1]

Recaiden · Answer

追加する番号が一意であることを確認したい場合は、 Set object を使用できます

2.7以上を使用している場合、またはsetsモジュールをインポートしていない場合。

他の人が述べたように、これは数字が本当にランダムではないことを意味します。

aak318 · Answer

here の答えは、時間とメモリの点で非常にうまく機能しますが、yieldなどの高度なpython構造を使用するため、もう少し複雑です。 simpler answer は実際にはうまく機能しますが、その答えの問題は、必要なセットを実際に構築する前に多くの偽の整数を生成する可能性があることです。 PopulationSize = 1000、sampleSize = 999で試してみてください。理論的には、終了しない可能性があります。

以下の答えは両方の問題に対処します。それは決定論的であり、現在は他の2つほど効率的ではありませんが、いくらか効率的です。

def randomSample(populationSize, sampleSize): populationStr = str(populationSize) dTree, samples = {}, [] for i in range(sampleSize): val, dTree = getElem(populationStr, dTree, '') samples.append(int(val)) return samples, dTree

ここで、関数getElem、percolateUpは以下に定義されているとおりです。

import random def getElem(populationStr, dTree, key): msd = int(populationStr[0]) if not key in dTree.keys(): dTree[key] = range(msd + 1) idx = random.randint(0, len(dTree[key]) - 1) key = key + str(dTree[key][idx]) if len(populationStr) == 1: dTree[key[:-1]].pop(idx) return key, (percolateUp(dTree, key[:-1])) newPopulation = populationStr[1:] if int(key[-1]) != msd: newPopulation = str(10**(len(newPopulation)) - 1) return getElem(newPopulation, dTree, key) def percolateUp(dTree, key): while (dTree[key] == []): dTree[key[:-1]].remove( int(key[-1]) ) key = key[:-1] return dTree

最後に、以下に示すように、nの値が大きい場合の平均タイミングは約15ミリ秒でした。

In [3]: n = 10000000000000000000000000000000 In [4]: %time l,t = randomSample(n, 5) Wall time: 15 ms In [5]: l Out[5]: [10000000000000000000000000000000L, 5731058186417515132221063394952L, 85813091721736310254927217189L, 6349042316505875821781301073204L, 2356846126709988590164624736328L]