pytorchはargmaxをどのようにバックプロパゲーションしますか？

Question

私は、期待値最大化の代わりに、重心位置で最急降下法を使用してpytorchでKmeansを構築しています。損失は、各ポイントから最も近い重心までの2乗距離の合計です。どの重心が各点に最も近いかを識別するために、どこでも微分可能ではないargminを使用します。ただし、pytorchは引き続き重み（重心位置）をバックプロパゲーションおよび更新できるため、データのsklearnkmeansと同様のパフォーマンスが得られます。

これがどのように機能しているか、またはpytorch内でこれをどのように理解できるかについてのアイデアはありますか？ pytorch githubに関する議論は、argmaxが微分可能ではないことを示唆しています： https://github.com/pytorch/pytorch/issues/1339 。

以下のサンプルコード（ランダムポイント）：

import numpy as np import torch num_pts, batch_size, n_dims, num_clusters, lr = 1000, 100, 200, 20, 1e-5 # generate random points vector = torch.from_numpy(np.random.Rand(num_pts, n_dims)).float() # randomly pick starting centroids idx = np.random.choice(num_pts, size=num_clusters) kmean_centroids = vector[idx][:,None,:] # [num_clusters,1,n_dims] kmean_centroids = torch.tensor(kmean_centroids, requires_grad=True) for t in range(4001): # get batch idx = np.random.choice(num_pts, size=batch_size) vector_batch = vector[idx] distances = vector_batch - kmean_centroids # [num_clusters, #pts, #dims] distances = torch.sum(distances**2, dim=2) # [num_clusters, #pts] # argmin membership = torch.min(distances, 0)[1] # [#pts] # cluster distances cluster_loss = 0 for i in range(num_clusters): subset = torch.transpose(distances,0,1)[membership==i] if len(subset)!=0: # to prevent NaN cluster_loss += torch.sum(subset[:,i]) cluster_loss.backward() print(cluster_loss.item()) with torch.no_grad(): kmean_centroids -= lr * kmean_centroids.grad kmean_centroids.grad.zero_()

prosti · Answer

これを想像してみてください：

t = torch.tensor([-0.0627, 0.1373, 0.0616, -1.7994, 0.8853, -0.0656, 1.0034, 0.6974, -0.2919, -0.0456]) torch.argmax(t).item() # outputs 6

いくつかのt[0]を増やし、δを0に近づけます。これにより、argmaxが更新されますか？そうではないので、常に0の勾配を扱っています。このレイヤーを無視するか、フリーズしていると想定してください。

argmin、または従属変数が離散ステップにある他の関数についても同じことが言えます。