numpyベクトルで最も頻繁な数を見つける

Question

私はPythonで次のリストを持っていると仮定します：

a = [1,2,3,1,2,1,1,1,3,2,2,1]

このリストで最も頻繁な番号をきれいに見つける方法は？

JoshAdel · Accepted Answer

リストにすべての非負の整数が含まれている場合、numpy.bincountsを確認する必要があります。

http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html

そして、おそらくnp.argmaxを使用します：

a = np.array([1,2,3,1,2,1,1,1,3,2,2,1]) counts = np.bincount(a) print np.argmax(counts)

より複雑なリスト（おそらく負の数値または非整数値を含む）の場合は、np.histogramを同様の方法で使用できます。あるいは、numpyを使用せずにpythonで作業したい場合は、collections.Counterがこの種のデータを処理する良い方法です。

from collections import Counter a = [1,2,3,1,2,1,1,1,3,2,2,1] b = Counter(a) print b.most_common(1)

Apogentus · Answer

使用してもよい

(values,counts) = np.unique(a,return_counts=True) ind=np.argmax(counts) print values[ind] # prints the most frequent element

ある要素が別の要素と同じ頻度である場合、このコードは最初の要素のみを返します。

Fred Foo · Answer

SciPy を使用する場合：

>>> from scipy.stats import mode >>> mode([1,2,3,1,2,1,1,1,3,2,2,1]) (array([ 1.]), array([ 6.])) >>> most_frequent = mode([1,2,3,1,2,1,1,1,3,2,2,1])[0][0] >>> most_frequent 1.0

iuridiniz · Answer

ここにあるいくつかのソリューションのパフォーマンス（iPythonを使用）：

>>> # small array >>> a = [12,3,65,33,12,3,123,888000] >>> >>> import collections >>> collections.Counter(a).most_common()[0][0] 3 >>> %timeit collections.Counter(a).most_common()[0][0] 100000 loops, best of 3: 11.3 µs per loop >>> >>> import numpy >>> numpy.bincount(a).argmax() 3 >>> %timeit numpy.bincount(a).argmax() 100 loops, best of 3: 2.84 ms per loop >>> >>> import scipy.stats >>> scipy.stats.mode(a)[0][0] 3.0 >>> %timeit scipy.stats.mode(a)[0][0] 10000 loops, best of 3: 172 µs per loop >>> >>> from collections import defaultdict >>> def jjc(l): ... d = defaultdict(int) ... for i in a: ... d[i] += 1 ... return sorted(d.iteritems(), key=lambda x: x[1], reverse=True)[0] ... >>> jjc(a)[0] 3 >>> %timeit jjc(a)[0] 100000 loops, best of 3: 5.58 µs per loop >>> >>> max(map(lambda val: (a.count(val), val), set(a)))[1] 12 >>> %timeit max(map(lambda val: (a.count(val), val), set(a)))[1] 100000 loops, best of 3: 4.11 µs per loop >>>

最高は「最大」で「セット」です

JJC · Answer

上記の回答のほとんどは有用ですが、次の場合に役立ちます。1）非正の整数値（浮動小数点数または負の整数など）をサポートする必要がある、2）Python 2.7にない（ which collections.Counterが必要です）、3）コードにscipy（またはnumpy）の依存関係を追加しないことを好みます。次に、純粋にpython 2.6ソリューションであるO(nlogn)（つまり、効率的）これだけです：

from collections import defaultdict a = [1,2,3,1,2,1,1,1,3,2,2,1] d = defaultdict(int) for i in a: d[i] += 1 most_frequent = sorted(d.iteritems(), key=lambda x: x[1], reverse=True)[0]

Artsiom Rudzenka · Answer

また、モジュールをロードせずに最も頻繁な値（正または負）を取得する場合は、次のコードを使用できます。

lVals = [1,2,3,1,2,1,1,1,3,2,2,1] print max(map(lambda val: (lVals.count(val), val), set(lVals)))

Vikas · Answer

JoshAdelのソリューションが気に入っています。

しかし、キャッチは1つだけです。

np.bincount()ソリューションは数値でのみ機能します。

文字列がある場合は、collections.Counterソリューションが有効です。

Lean Bravo · Answer

このメソッドを展開して、分布の中心から値がどれだけ離れているかを確認するために実際の配列のインデックスが必要なデータのモードを見つけることに適用されます。

(_, idx, counts) = np.unique(a, return_index=True, return_counts=True) index = idx[np.argmax(counts)] mode = a[index]

Len（np.argmax（counts））> 1のときは必ずモードを破棄してください

Yury Kliachko · Answer

Python 3では、次が機能するはずです。

max(set(a), key=lambda x: a.count(x))

Devin Cairns · Answer

以下に、純粋なnumpyを使用して、値に関係なく軸に沿って適用できる一般的なソリューションを示します。また、一意の値が多数ある場合、これはscipy.stats.modeよりもはるかに高速であることがわかりました。

import numpy def mode(ndarray, axis=0): # Check inputs ndarray = numpy.asarray(ndarray) ndim = ndarray.ndim if ndarray.size == 1: return (ndarray[0], 1) Elif ndarray.size == 0: raise Exception('Cannot compute mode on empty array') try: axis = range(ndarray.ndim)[axis] except: raise Exception('Axis "{}" incompatible with the {}-dimension array'.format(axis, ndim)) # If array is 1-D and numpy version is > 1.9 numpy.unique will suffice if all([ndim == 1, int(numpy.__version__.split('.')[0]) >= 1, int(numpy.__version__.split('.')[1]) >= 9]): modals, counts = numpy.unique(ndarray, return_counts=True) index = numpy.argmax(counts) return modals[index], counts[index] # Sort array sort = numpy.sort(ndarray, axis=axis) # Create array to transpose along the axis and get padding shape transpose = numpy.roll(numpy.arange(ndim)[::-1], axis) shape = list(sort.shape) shape[axis] = 1 # Create a boolean array along strides of unique values strides = numpy.concatenate([numpy.zeros(shape=shape, dtype='bool'), numpy.diff(sort, axis=axis) == 0, numpy.zeros(shape=shape, dtype='bool')], axis=axis).transpose(transpose).ravel() # Count the stride lengths counts = numpy.cumsum(strides) counts[~strides] = numpy.concatenate([[0], numpy.diff(counts[~strides])]) counts[strides] = 0 # Get shape of padded counts and slice to return to the original shape shape = numpy.array(sort.shape) shape[axis] += 1 shape = shape[transpose] slices = [slice(None)] * ndim slices[axis] = slice(1, None) # Reshape and compute final counts counts = counts.reshape(shape).transpose(transpose)[slices] + 1 # Find maximum counts and return modals/counts slices = [slice(None, i) for i in sort.shape] del slices[axis] index = numpy.ogrid[slices] index.insert(axis, numpy.argmax(counts, axis=axis)) return sort[index], counts[index]