pythonでスパース行列を構築して更新する

Question

ファイルからデータを読み取るときに、疎行列を作成して更新しようとしています。行列のサイズは100000X40000です

スパース行列の複数のエントリを更新する最も効率的な方法は何ですか？具体的には、各エントリを1ずつ増やす必要があります。

行インデックスがあるとしましょう[2, 236, 246, 389, 1691]

および列インデックス[117, 3, 34, 2757, 74, 1635, 52]

そのため、次のすべてのエントリを1つ増やす必要があります。

(2,117) (2,3) (2,34) (2,2757) ...

(236,117) (236,3) (236, 34) (236,2757) ...

等々。

単一のエントリを更新しようとしたときに警告が表示されたため、すでにlil_matrixを使用しています。

lil_matrix形式は、すでに複数の更新をサポートしていません。 matrix[1:3,0] += [2,3]により、実装されていないエラーが発生します。

すべてのエントリを個別にインクリメントすることで、これを簡単に行うことができます。これを行うためのより良い方法、または使用できるより良いスパース行列の実装があるかどうか疑問に思っていました。

私のコンピューターは4GB RAMの平均的なi5マシンでもあるので、爆破しないように注意する必要があります:)

Jaime · Accepted Answer

これを行うには、新しい座標に1sを使用して2番目の行列を作成し、それを既存の行列に追加することが考えられます。

>>> import scipy.sparse as sps >>> shape = (1000, 2000) >>> rows, cols = 1000, 2000 >>> sps_acc = sps.coo_matrix((rows, cols)) # empty matrix >>> for j in xrange(100): # add 100 sets of 100 1's ... r = np.random.randint(rows, size=100) ... c = np.random.randint(cols, size=100) ... d = np.ones((100,)) ... sps_acc = sps_acc + sps.coo_matrix((d, (r, c)), shape=(rows, cols)) ... >>> sps_acc <1000x2000 sparse matrix of type '<type 'numpy.float64'>' with 9985 stored elements in Compressed Sparse Row format>

Ray · Answer

import scipy.sparse rows = [2, 236, 246, 389, 1691] cols = [117, 3, 34, 2757, 74, 1635, 52] prod = [(x, y) for x in rows for y in cols] # combinations r = [x for (x, y) in prod] # x_coordinate c = [y for (x, y) in prod] # y_coordinate data = [1] * len(r) m = scipy.sparse.coo_matrix((data, (r, c)), shape=(100000, 40000))

私はそれがうまく機能し、ループを必要としないと思います。 doc を直接フォローしています

<100000x40000 sparse matrix of type '<type 'numpy.int32'>' with 35 stored elements in COOrdinate format>

Warren Weckesser · Answer

この回答は、@ behzad.nouriのコメントを拡張します。行と列のインデックスのリストの「外積」で値をインクリメントするには、ブロードキャスト用に構成されたnumpy配列としてこれらを作成するだけです。この場合、それは行を列に入れることを意味します。例えば、

In [59]: a = lil_matrix((4,4), dtype=int) In [60]: a.A Out[60]: array([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]) In [61]: rows = np.array([1,3]).reshape(-1, 1) In [62]: rows Out[62]: array([[1], [3]]) In [63]: cols = np.array([0, 2, 3]) In [64]: a[rows, cols] += np.ones((rows.size, cols.size)) In [65]: a.A Out[65]: array([[0, 0, 0, 0], [1, 0, 1, 1], [0, 0, 0, 0], [1, 0, 1, 1]]) In [66]: rows = np.array([0, 1]).reshape(-1,1) In [67]: cols = np.array([1, 2]) In [68]: a[rows, cols] += np.ones((rows.size, cols.size)) In [69]: a.A Out[69]: array([[0, 1, 1, 0], [1, 1, 2, 1], [0, 0, 0, 0], [1, 0, 1, 1]])