インデックスの配列を1-hot encoded numpy配列に変換

Question

1次元の派手な配列があるとしましょう

a = array([1,0,3])

これを2次元1ホット配列としてエンコードしたい

b = array([[0,1,0,0], [1,0,0,0], [0,0,0,1]])

これを行う簡単な方法はありますか？単にaをループしてbの要素を設定するよりも速くなります。

YXD · Accepted Answer

配列aは、出力配列内のゼロ以外の要素の列を定義します。また、行を定義してから派手なインデックスを使用する必要があります。

>>> a = np.array([1, 0, 3]) >>> b = np.zeros((3, 4)) >>> b[np.arange(3), a] = 1 >>> b array([[ 0., 1., 0., 0.], [ 1., 0., 0., 0.], [ 0., 0., 0., 1.]])

K3---rnc · Answer

>>> values = [1, 0, 3] >>> n_values = np.max(values) + 1 >>> np.eye(n_values)[values] array([[ 0., 1., 0., 0.], [ 1., 0., 0., 0.], [ 0., 0., 0., 1.]])

Franck Dernoncourt · Answer

sklearn.preprocessing.LabelBinarizer を使うことができます。

例：

import sklearn.preprocessing a = [1,0,3] label_binarizer = sklearn.preprocessing.LabelBinarizer() label_binarizer.fit(range(max(a)+1)) b = label_binarizer.transform(a) print('{0}'.format(b))

出力：

[[0 1 0 0] [1 0 0 0] [0 0 0 1]]

とりわけ、transformの出力がまばらになるようにsklearn.preprocessing.LabelBinarizer()を初期化することができます。

D.Samchuk · Answer

これが私が役に立つと思うものです：

def one_hot(a, num_classes): return np.squeeze(np.eye(num_classes)[a.reshape(-1)])

ここでnum_classesはあなたが持っているクラスの数を表します。そのため、（10000、）という形のaベクトルがある場合、この関数はそれを（10000、C）に変換します。 aはゼロインデックスされている、すなわちone_hot(np.array([0, 1]), 2)は[[1, 0], [0, 1]]を与えることに注意してください。

まさにあなたが私が信じて欲しいものでした。

シモンズ：ソースはシーケンスモデルです - deeplearning.ai

Jodo · Answer

あなたがkerasを使っている場合、そのための組み込みのユーティリティがあります。

from keras.utils.np_utils import to_categorical categorical_labels = to_categorical(int_labels, num_classes=3)

そしてそれは @ YXDの答えとほとんど同じです（ source-codeを見てください）。

Karma · Answer

numpy.eye（クラスのサイズ）[変換対象のベクトル]

stackoverflowuser2010 · Answer

これは、1-Dベクトルを2-Dワンホット配列に変換する関数です。

#!/usr/bin/env python import numpy as np def convertToOneHot(vector, num_classes=None): """ Converts an input 1-D vector of integers into an output 2-D array of one-hot vectors, where an i'th input value of j will set a '1' in the i'th row, j'th column of the output array. Example: v = np.array((1, 0, 4)) one_hot_v = convertToOneHot(v) print one_hot_v [[0 1 0 0 0] [1 0 0 0 0] [0 0 0 0 1]] """ assert isinstance(vector, np.ndarray) assert len(vector) > 0 if num_classes is None: num_classes = np.max(vector)+1 else: assert num_classes > 0 assert num_classes >= np.max(vector) result = np.zeros(shape=(len(vector), num_classes)) result[np.arange(len(vector)), vector] = 1 return result.astype(int)

下記は使用例です。

>>> a = np.array([1, 0, 3]) >>> convertToOneHot(a) array([[0, 1, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1]]) >>> convertToOneHot(a, num_classes=10) array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]])

David Nemeskey · Answer

簡単な答えはノーだと思います。 n次元におけるより一般的なケースのために、私はこれを思い付きました：

# For 2-dimensional data, 4 values a = np.array([[0, 1, 2], [3, 2, 1]]) z = np.zeros(list(a.shape) + [4]) z[list(np.indices(z.shape[:-1])) + [a]] = 1

もっと良い解決策があるかどうか私は思っています - 私は最後の2行でそれらのリストを作成しなければならないのが好きではありません。とにかく、私はtimeitを使っていくつかの測定をしました、そしてそれはnumpyベース（indices/arange）と反復バージョンはほぼ同じように実行するようです。

Inaam Ilahi · Answer

次のコードを使ってワンホットベクトルに変換することができます。

xをクラス0からいくつかの数の単一の列を持つ法線ベクトルとする。

import numpy as np np.eye(x.max()+1)[x]

0がクラスでない場合次に+1を削除します。

Hans T · Answer

私は最近同じ種類の問題に遭遇し、あなたが特定の形式の範囲内に入る数を持っている場合にのみ満足できるものであることが判明した上記の解決策を見つけました。たとえば、次のリストをワンホットエンコードしたい場合は、

all_good_list = [0,1,2,3,4]

先に進んで、投稿された解決策はすでに上で言及されています。しかし、このデータを考慮するとどうなりますか。

problematic_list = [0,23,12,89,10]

上記の方法でそれを行うと、90個のワンホットカラムになる可能性があります。これはすべての答えにn = np.max(a)+1のようなものが含まれているからです。私は私のために解決し、あなたと共有したいと思ったもっと一般的な解決策を見つけました：

import numpy as np import sklearn sklb = sklearn.preprocessing.LabelBinarizer() a = np.asarray([1,2,44,3,2]) n = np.unique(a) sklb.fit(n) b = sklb.transform(a)

私は誰かが上記の解決策で同じ制限に遭遇したことを願っています、これは便利になるかもしれません

Inaam Ilahi · Answer

次のコードを使用してください。それは最もうまくいきます。

def one_hot_encode(x): """ argument - x: a list of labels return - one hot encoding matrix (number of labels, number of class) """ encoded = np.zeros((len(x), 10)) for idx, val in enumerate(x): encoded[idx][val] = 1 return encoded

ここで見つけました P.Sリンクに入る必要はありません。

MiFi · Answer

どの値が連続して最も高いのかを知り、そこに1を置き、それ以外は0にします。
清潔で簡単な解決策：.

max_elements_i = np.expand_dims(np.argmax(p, axis=1), axis=1) one_hot = np.zeros(p.shape) np.put_along_axis(one_hot, max_elements_i, 1, axis=1)

Emil Melnikov · Answer

K3 --- rnc から優れた答えを詳しく説明するために、これはより一般的なバージョンです：

def onehottify(x, n=None, dtype=float): """1-hot encode x with the max value n (computed from data if n is None).""" x = np.asarray(x) n = np.max(x) + 1 if n is None else n return np.eye(n, dtype=dtype)[x]

また、これはこの方法の手っ取り早いベンチマークと現在受け入れられている答えからによる方法です。 _ yxd _ （わずかに変更され、後者は1D ndarraysでのみ機能することを除いて同じAPIを提供します）。

def onehottify_only_1d(x, n=None, dtype=float): x = np.asarray(x) n = np.max(x) + 1 if n is None else n b = np.zeros((len(x), n), dtype=dtype) b[np.arange(len(x)), x] = 1 return b

後者の方法は約35％高速です（MacBook Pro 13 2015）が、前者の方が一般的です。

>>> import numpy as np >>> np.random.seed(42) >>> a = np.random.randint(0, 9, size=(10_000,)) >>> a array([6, 3, 7, ..., 5, 8, 6]) >>> %timeit onehottify(a, 10) 188 µs ± 5.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) >>> %timeit onehottify_only_1d(a, 10) 139 µs ± 2.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Sudeep K Rana · Answer

そのようなタイプのエンコーディングは通常、派手な配列の一部です。あなたがこのようなでこぼこした配列を使っているならば：

a = np.array([1,0,3])

それを1-hot encodingに変換するためのとても簡単な方法があります

out = (np.arange(4) == a[:,None]).astype(np.float32)

それでおしまい。

eqzx · Answer

これは、次元に依存しないスタンドアロンソリューションです。

これは、非負の整数のN次元配列arrを1ホットN + 1次元配列one_hotに変換します。ここで、one_hot[i_1,...,i_N,c] = 1はarr[i_1,...,i_N] = cを意味します。 np.argmax(one_hot, -1)経由で入力を回復することができます

def expand_integer_grid(arr, n_classes): """ :param arr: N dim array of size i_1, ..., i_N :param n_classes: C :returns: one-hot N+1 dim array of size i_1, ..., i_N, C :rtype: ndarray """ one_hot = np.zeros(arr.shape + (n_classes,)) axes_ranges = [range(arr.shape[i]) for i in range(arr.ndim)] flat_grids = [_.ravel() for _ in np.meshgrid(*axes_ranges, indexing='ij')] one_hot[flat_grids + [arr.ravel()]] = 1 assert((one_hot.sum(-1) == 1).all()) assert(np.allclose(np.argmax(one_hot, -1), arr)) return one_hot

Aaron Lelevier · Answer

これが、上記の答えと私自身の使用例に基づいてこれを行うために書いた関数の例です。

def label_vector_to_one_hot_vector(vector, one_hot_size=10): """ Use to convert a column vector to a 'one-hot' matrix Example: vector: [[2], [0], [1]] one_hot_size: 3 returns: [[ 0., 0., 1.], [ 1., 0., 0.], [ 0., 1., 0.]] Parameters: vector (np.array): of size (n, 1) to be converted one_hot_size (int) optional: size of 'one-hot' row vector Returns: np.array size (vector.size, one_hot_size): converted to a 'one-hot' matrix """ squeezed_vector = np.squeeze(vector, axis=-1) one_hot = np.zeros((squeezed_vector.size, one_hot_size)) one_hot[np.arange(squeezed_vector.size), squeezed_vector] = 1 return one_hot label_vector_to_one_hot_vector(vector=[[2], [0], [1]], one_hot_size=3)

Jordy Van Landeghem · Answer

私は、テンキーの演算子だけを使って、単純な関数を完成させるために追加しています。

 def probs_to_onehot(output_probabilities): argmax_indices_array = np.argmax(output_probabilities, axis=1) onehot_output_array = np.eye(np.unique(argmax_indices_array).shape[0])[argmax_indices_array.reshape(-1)] return onehot_output_array

それは入力として確率行列をとる：

[[0.03038822 0.65810204 0.16549407 0.3797123] ... [0.02771272 0.2760752 0.3280924 0.33458805]]

そしてそれは戻ります

[[0 1 0 0] ... [0 0 0 1]]