トーチスパーステンソルの列/行スライス

Question

このスライスを使用して行/列ごとにスライスする必要があるpytorchスパーステンソルがあります[idx][:,idx]ここで、idxはインデックスのリストであり、前述のスライスを使用すると、通常のフロートテンソルで目的の結果が得られます。スパーステンソルに同じスライスを適用することは可能ですか？ここでの例：

#constructing sparse matrix i = np.array([[0,1,2,2],[0,1,2,1]]) v = np.ones(4) i = torch.from_numpy(i.astype("int64")) v = torch.from_numpy(v.astype("float32")) test1 = torch.sparse.FloatTensor(i, v) #constructing float tensor test2 = np.array([[1,0,0],[0,1,0],[0,1,1]]) test2 = autograd.Variable(torch.cuda.FloatTensor(test2), requires_grad=False) #slicing idx = [1,2] print(test2[idx][:,idx])

出力：

Variable containing: 1 0 1 1 [torch.cuda.FloatTensor of size 2x2 (GPU 0)]

250.000 x 250.000の隣接行列を保持しています。ここでは、ランダムIDXを使用して、nランダムIDXをサンプリングするだけで、n行とn列をスライスする必要があります。データセットが非常に大きいため、より便利なデータ型に変換することは現実的ではありません。

test1で同じスライス結果を達成できますか？それも可能ですか？そうでない場合、回避策はありますか？

現在、次のソリューションの「ハック」を使用してモデルを実行しています。

idx = sorted(random.sample(range(0, np.shape(test1)[0]), 9000)) test1 = test1AsCsr[idx][:,idx].todense().astype("int32") test1 = autograd.Variable(torch.cuda.FloatTensor(test1), requires_grad=False)

ここで、test1AsCsrは、numpyCSRマトリックスに変換された私のtest1です。このソリューションは機能しますが、非常に遅く、CPUメモリから常に読み取り/書き込みを行う必要があるため、GPUの使用率が非常に低くなります。

編集：結果としてスパースでないテンソルで問題ありません

benjaminplanche · Answer

2次元スパースインデックスの可能な答え

(len(idx), len(idx))の形のコンパクトなスライステンソルを出力するために、いくつかのpytorchメソッド（torch.eq()、torch.unique()、torch.sort()など）で遊んで、以下の答えを見つけてください。。

いくつかのEdgeケース（順序付けされていないidx、vと0s、iと複数の同じインデックスペアなど）をテストしましたが、いくつか忘れている可能性があります。パフォーマンスもチェックする必要があります。

import torch import numpy as np def in1D(x, labels): """ Sub-optimal equivalent to numpy.in1D(). Hopefully this feature will be properly covered soon c.f. https://github.com/pytorch/pytorch/issues/3025 Snippet by Aron Barreira Bordin Args: x (Tensor): Tensor to search values in labels (Tensor/list): 1D array of values to search for Returns: Tensor: Boolean tensor y of same shape as x, with y[ind] = True if x[ind] in labels Example: >>> in1D(torch.FloatTensor([1, 2, 0, 3]), [2, 3]) FloatTensor([False, True, False, True]) """ mapping = torch.zeros(x.size()).byte() for label in labels: mapping = mapping | x.eq(label) return mapping def compact1D(x): """ "Compact" values 1D uint tensor, so that all values are in [0, max(unique(x))]. Args: x (Tensor): uint Tensor Returns: Tensor: uint Tensor of same shape as x Example: >>> densify1D(torch.ByteTensor([5, 8, 7, 3, 8, 42])) ByteTensor([1, 3, 2, 0, 3, 4]) """ x_sorted, x_sorted_ind = torch.sort(x, descending=True) x_sorted_unique, x_sorted_unique_ind = torch.unique(x_sorted, return_inverse=True) x[x_sorted_ind] = x_sorted_unique_ind return x # Input sparse tensor: i = torch.from_numpy(np.array([[0,1,4,3,2,1],[0,1,3,1,4,1]]).astype("int64")) v = torch.from_numpy(np.arange(1, 7).astype("float32")) test1 = torch.sparse.FloatTensor(i, v) print(test1.to_dense()) # tensor([[ 1., 0., 0., 0., 0.], # [ 0., 8., 0., 0., 0.], # [ 0., 0., 0., 0., 5.], # [ 0., 4., 0., 0., 0.], # [ 0., 0., 0., 3., 0.]]) # note: test1[1, 1] = v[i[1,:]] + v[i[6,:]] = 2 + 6 = 8 # since both i[1,:] and i[6,:] are [1,1] # Input slicing indices: idx = [4,1,3] # Getting the elements in `i` which correspond to `idx`: v_idx = in1D(i, idx).byte() v_idx = v_idx.sum(dim=0).squeeze() == i.size(0) # or `v_idx.all(dim=1)` for pytorch 0.5+ v_idx = v_idx.nonzero().squeeze() # Slicing `v` and `i` accordingly: v_sliced = v[v_idx] i_sliced = i.index_select(dim=1, index=v_idx) # Building sparse result tensor: i_sliced[0] = compact1D(i_sliced[0]) i_sliced[1] = compact1D(i_sliced[1]) # To make sure to have a square dense representation: size_sliced = torch.Size([len(idx), len(idx)]) res = torch.sparse.FloatTensor(i_sliced, v_sliced, size_sliced) print(res) # torch.sparse.FloatTensor of size (3,3) with indices: # tensor([[ 0, 2, 1, 0], # [ 0, 1, 0, 0]]) # and values: # tensor([ 2., 3., 4., 6.]) print(res.to_dense()) # tensor([[ 8., 0., 0.], # [ 4., 0., 0.], # [ 0., 3., 0.]])

1次元スパースインデックスの以前の回答

これは、関連する未解決の問題（この機能がすぐに適切にカバーされることを願っています）で共有された直感に従った（おそらく最適ではなく、すべてのエッジケースをカバーしていない）ソリューションです：

# Constructing a sparse tensor a bit more complicated for the sake of demo: i = torch.LongTensor([[0, 1, 5, 2]]) v = torch.FloatTensor([[1, 3, 0], [5, 7, 0], [9, 9, 9], [1,2,3]]) test1 = torch.sparse.FloatTensor(i, v) # note: if you directly have sparse `test1`, you can get `i` and `v`: # i, v = test1._indices(), test1._values() # Getting the slicing indices: idx = [1,2] # Preparing to slice `v` according to `idx`. # For that, we gather the list of indices `v_idx` such that i[v_idx[k]] == idx[k]: i_squeeze = i.squeeze() v_idx = [(i_squeeze == j).nonzero() for j in idx] # <- doesn't seem optimal... v_idx = torch.cat(v_idx, dim=1) # Slicing `v` accordingly: v_sliced = v[v_idx.squeeze()][:,idx] # Now defining your resulting sparse tensor. # I'm not sure what kind of indexing you want, so here are 2 possibilities: # 1) "Dense" indixing: test1x = torch.sparse.FloatTensor(torch.arange(v_idx.size(1)).long().unsqueeze(0), v_sliced) print(test1x) # torch.sparse.FloatTensor of size (3,2) with indices: # # 0 1 # [torch.LongTensor of size (1,2)] # and values: # # 7 0 # 2 3 # [torch.FloatTensor of size (2,2)] # 2) "Sparse" indixing using the original `idx`: test1x = torch.sparse.FloatTensor(autograd.Variable(torch.LongTensor(idx)).unsqueeze(0), v_sliced) # note: this indexing would fail if elements of `idx` were not in `i`. print(test1x) # torch.sparse.FloatTensor of size (3,2) with indices: # # 1 2 # [torch.LongTensor of size (1,2)] # and values: # # 7 0 # 2 3 # [torch.FloatTensor of size (2,2)]