Seabornライブラリのヒストグラム、カウントプロット、ディスプロットの主な違いは何ですか？

Question

見た目は同じだと思いますが、違いがあるはずです。

それらはすべて入力として単一の列を取り、y軸にはすべてのプロットのカウントがあります。

ImportanceOfBeingErnest · Answer

これらのプロット関数pyplot.hist、seaborn.countplot、およびseaborn.displotはすべて、単一の変数の頻度をプロットするためのヘルパーツールです。この変数の性質に応じて、視覚化に多かれ少なかれ適している場合があります。

連続変数

連続変数xは、度数分布を示すためにヒストグラム化できます。

import matplotlib.pyplot as plt import numpy as np x = np.random.Rand(100)*100 hist, edges = np.histogram(x, bins=np.arange(0,101,10)) plt.bar(edges[:-1], hist, align="Edge", ec="k", width=np.diff(edges)) plt.show()

同じことは、pyplot.histまたはseaborn.distplotを使用して実現できます。

plt.hist(x, bins=np.arange(0,101,10), ec="k")

または

sns.distplot(x, bins=np.arange(0,101,10), kde=False, hist_kws=dict(ec="k"))

distplotはpyplot.histをラップしますが、他にもいくつかの機能があります。カーネル密度推定を示します。

離散変数

離散変数の場合、ヒストグラムが適切な場合と適切でない場合があります。 numpy.histogramを使用する場合、ビンは予想される離散観測値のちょうど中間にある必要があります。

x1 = np.random.randint(1,11,100) hist, edges = np.histogram(x1, bins=np.arange(1,12)-0.5) plt.bar(edges[:-1], hist, align="Edge", ec="k", width=np.diff(edges)) plt.xticks(np.arange(1,11))

代わりに、xの一意の要素を数えることもできます。

u, counts = np.unique(x1, return_counts=True) plt.bar(u, counts, align="center", ec="k", width=1) plt.xticks(u)

上記と同じプロットになります。主な違いは、すべての可能な観測が占有されているわけではない場合です。 5はデータの一部ではないと言います。ヒストグラムアプローチでもそれが表示されますが、一意の要素の一部ではありません。

x2 = np.random.choice([1,2,3,4,6,7,8,9,10], size=100) plt.subplot(1,2,1) plt.title("histogram") hist, edges = np.histogram(x2, bins=np.arange(1,12)-0.5) plt.bar(edges[:-1], hist, align="Edge", ec="k", width=np.diff(edges)) plt.xticks(np.arange(1,11)) plt.subplot(1,2,2) plt.title("counts") u, counts = np.unique(x2, return_counts=True) plt.bar(u.astype(str), counts, align="center", ec="k", width=1)

後者はseaborn.countplotが行うことです。

sns.countplot(x2, color="C0")

したがって、離散変数またはカテゴリ変数に適しています。

概要

すべての関数pyplot.hist、seaborn.countplot、およびseaborn.displotは、matplotlib棒グラフのラッパーとして機能し、そのような棒グラフを手動でプロットするのが面倒であると考えられる場合に使用できます。
連続変数の場合、pyplot.histまたはseaborn.distplotを使用できます。離散変数の場合、seaborn.countplotの方が便利です。