Numpy行列からPythonセットを構築する

Question

私は以下を実行しようとしています

>> from numpy import * >> x = array([[3,2,3],[4,4,4]]) >> y = set(x) TypeError: unhashable type: 'numpy.ndarray'

Numpy配列のすべての要素を含むセットを簡単かつ効率的に作成するにはどうすればよいですか？

Eric O Lebigot · Accepted Answer

要素のセットが必要な場合は、別の、おそらくより高速な方法があります。

_y = set(x.flatten()) _

PS：10x100アレイで_x.flat_、x.flatten()、およびx.ravel()を比較したところ、すべてほぼ同じ速度で動作することがわかりました。 3x3アレイの場合、最速のバージョンはイテレータバージョンです。

_y = set(x.flat) _

これはメモリコストの低いバージョンであるため、お勧めします（配列のサイズに合わせてスケールアップします）。

[〜＃〜] ps [〜＃〜]：同様のことを行うNumPy関数もあります：

_y = numpy.unique(x) _

これにより、set(x.flat)と同じ要素を持つNumPy配列が生成されますが、NumPy配列として生成されます。これは非常に高速です（ほぼ10倍高速です）が、setが必要な場合、set(numpy.unique(x))の実行は他の手順よりも少し遅くなります（セットの構築には大きなオーバーヘッドが伴います）。

miku · Answer

配列の不変の対応物はタプルです。したがって、配列の配列をタプルの配列に変換してみてください。

>> from numpy import * >> x = array([[3,2,3],[4,4,4]]) >> x_hashable = map(Tuple, x) >> y = set(x_hashable) set([(3, 2, 3), (4, 4, 4)])

xperroni · Answer

上記の回答は、ndarrayに含まれるelementsからセットを作成したいが、ndarrayオブジェクトのセットを作成したい場合–またはndarrayオブジェクトを辞書のキーとして使用します。その場合、それらのハッシュ可能なラッパーを提供する必要があります。簡単な例については、以下のコードを参照してください。

from hashlib import sha1 from numpy import all, array, uint8 class hashable(object): r'''Hashable wrapper for ndarray objects. Instances of ndarray are not hashable, meaning they cannot be added to sets, nor used as keys in dictionaries. This is by design - ndarray objects are mutable, and therefore cannot reliably implement the __hash__() method. The hashable class allows a way around this limitation. It implements the required methods for hashable objects in terms of an encapsulated ndarray object. This can be either a copied instance (which is safer) or the original object (which requires the user to be careful enough not to modify it). ''' def __init__(self, wrapped, tight=False): r'''Creates a new hashable object encapsulating an ndarray. wrapped The wrapped ndarray. tight Optional. If True, a copy of the input ndaray is created. Defaults to False. ''' self.__tight = tight self.__wrapped = array(wrapped) if tight else wrapped self.__hash = int(sha1(wrapped.view(uint8)).hexdigest(), 16) def __eq__(self, other): return all(self.__wrapped == other.__wrapped) def __hash__(self): return self.__hash def unwrap(self): r'''Returns the encapsulated ndarray. If the wrapper is "tight", a copy of the encapsulated ndarray is returned. Otherwise, the encapsulated ndarray itself is returned. ''' if self.__tight: return array(self.__wrapped) return self.__wrapped

ラッパークラスの使用は非常に簡単です。

>>> from numpy import arange >>> a = arange(0, 1024) >>> d = {} >>> d[a] = 'foo' Traceback (most recent call last): File "<input>", line 1, in <module> TypeError: unhashable type: 'numpy.ndarray' >>> b = hashable(a) >>> d[b] = 'bar' >>> d[b] 'bar'

Marcelo Cantos · Answer

要素のセットが必要な場合：

>> y = set(e for r in x for e in r) set([2, 3, 4])

行のセットの場合：

>> y = set(Tuple(r) for r in x) set([(3, 2, 3), (4, 4, 4)])

bmc · Answer

@EricLebigotと彼の素晴らしい投稿に追加します。

以下は、テンソルルックアップテーブルを構築するためのトリックを行いました。

a = np.array([[1, 0, 0], [1, 0, 0], [2, 3, 4]]) np.unique(a, axis=0)

出力：

array([[1, 0, 0], [2, 3, 4]])

np.uniqueドキュメント

Askold Ilvento · Answer

私は好きだった xperroniのアイデア。しかし、実装は、ラップする代わりにndarrayからの直接継承を使用して簡略化できると思います。

_from hashlib import sha1 from numpy import ndarray, uint8, array class HashableNdarray(ndarray): def __hash__(self): if not hasattr(hasattr, '__hash'): self.__hash = int(sha1(self.view(uint8)).hexdigest(), 16) return self.__hash def __eq__(self, other): if not isinstance(other, HashableNdarray): return super(HashableNdarray, self).__eq__(other) return super(HashableNdarray, self).__eq__(super(HashableNdarray, other)).all() _

NumPy ndarrayは派生クラスと見なされ、ハッシュ可能なオブジェクトとして使用できます。 view(ndarray)は逆変換に使用できますが、ほとんどの場合は必要ありません。

_>>> a = array([1,2,3]) >>> b = array([2,3,4]) >>> c = array([1,2,3]) >>> s = set() >>> s.add(a.view(HashableNdarray)) >>> s.add(b.view(HashableNdarray)) >>> s.add(c.view(HashableNdarray)) >>> print(s) {HashableNdarray([2, 3, 4]), HashableNdarray([1, 2, 3])} >>> d = next(iter(s)) >>> print(d == a) [False False False] >>> import ctypes >>> print(d.ctypes.data_as(ctypes.POINTER(ctypes.c_double))) <__main__.LP_c_double object at 0x7f99f4dbe488> _