Pythonで文字列のリストからヒストグラムを作成する方法は？

Question

文字列のリストがあります：

a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']

文字の頻度分布を表示するためのヒストグラムを作成します。次のコードを使用して、各文字の数を含むリストを作成できます。

from itertools import groupby b = [len(list(group)) for key, group in groupby(a)]

ヒストグラムを作成するにはどうすればよいですか？リストaには、100万個のそのような要素があります。

notconfusing · Accepted Answer

Pandasを使用すると非常に簡単です。

import pandas from collections import Counter a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e'] letter_counts = Counter(a) df = pandas.DataFrame.from_dict(letter_counts, orient='index') df.plot(kind='bar')

Counterは頻度をカウントしているため、プロットタイプは'bar' not 'hist'。

histogram of letter counts

drammock · Answer

ここに簡潔なオールパンダのアプローチがあります：

a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e'] pd.Series(a).value_counts().plot('bar')

Ramon Martinez · Answer

上記で指摘した@notconfusingのように、これはPandasとCounterで解決できます。何らかの理由でPandasを使用する必要がない場合次のコードの関数を使用して、matplotlibのみで：

from collections import Counter import numpy as np import matplotlib.pyplot as plt a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e'] letter_counts = Counter(a) def plot_bar_from_counter(counter, ax=None): """" This function creates a bar plot from a counter. :param counter: This is a counter object, a dictionary with the item as the key and the frequency as the value :param ax: an axis of matplotlib :return: the axis wit the object in it """ if ax is None: fig = plt.figure() ax = fig.add_subplot(111) frequencies = counter.values() names = counter.keys() x_coordinates = np.arange(len(counter)) ax.bar(x_coordinates, frequencies, align='center') ax.xaxis.set_major_locator(plt.FixedLocator(x_coordinates)) ax.xaxis.set_major_formatter(plt.FixedFormatter(names)) return ax plot_bar_from_counter(letter_counts) plt.show()

これは

Martijn Pieters · Answer

groupby()（入力をソートする必要がある）を使用する代わりに、 collections.Counter() ;を使用します。これは、入力をカウントするためだけに中間リストを作成する必要はありません。

_from collections import Counter counts = Counter(a) _

「ヒストグラム」とみなすものを実際に指定していません。端末でこれを実行したいと仮定します。

_width = 120 # Adjust to desired width longest_key = max(len(key) for key in counts) graph_width = width - longest_key - 2 widest = counts.most_common(1)[0][1] scale = graph_width / float(widest) for key, size in sorted(counts.items()): print('{}: {}'.format(key, int(size * scale) * '*')) _

デモ：

_>>> from collections import Counter >>> a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e'] >>> counts = Counter(a) >>> width = 120 # Adjust to desired width >>> longest_key = max(len(key) for key in counts) >>> graph_width = width - longest_key - 2 >>> widest = counts.most_common(1)[0][1] >>> scale = graph_width / float(widest) >>> for key, size in sorted(counts.items()): ... print('{}: {}'.format(key, int(size * scale) * '*')) ... a: ********************************************************************************************* b: ********************************************** c: ********************************************************************** d: *********************** e: ********************************************************************************************************************* _

numpy.histogram() および matplotlib.pyplot.hist() 関数には、より洗練されたツールがあります。これらはあなたのために集計を行い、matplotlib.pyplot.hist()もグラフ出力を提供します。

tommyo · Answer

チェックアウト matplotlib.pyplot.bar。もあります numpy.histogramより広いビンが必要な場合により柔軟です。

Mitul Panchal · Answer

Pythonで文字ヒストグラムを作成するシンプルで効果的な方法

import numpy as np import matplotlib.pyplot as plt from collections import Counter a = [] count =0 d = dict() filename = raw_input("Enter file name: ") with open(filename,'r') as f: for Word in f: for letter in Word: if letter not in d: d[letter] = 1 else: d[letter] +=1 num = Counter(d) x = list(num.values()) y = list(num.keys()) x_coordinates = np.arange(len(num.keys())) plt.bar(x_coordinates,x) plt.xticks(x_coordinates,y) plt.show() print x,y