リストの最も一般的な要素を見つける方法は？

Question

次のリストを考えます

['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']

各Wordが表示され、上位3を表示する回数をカウントしようとしています。

ただし、最初の文字が大文字になっている上位3つのみを検索し、最初の文字が大文字になっていないすべての単語を無視するだけです。

私はこれよりも良い方法があると確信していますが、私の考えは次のことをすることでした：

リストの最初の単語をuniquewordsと呼ばれる別のリストに入れます
最初のWordとそのすべての複製を元のリストから削除します
新しい最初の単語を一意の単語に追加します
最初のWordとそのすべての複製を元のリストから削除します。
等...
元のリストが空になるまで....
uniquewordsの各Wordが元のリストに表示される回数をカウントします
トップ3を見つけて印刷する

Johnsyweb · Accepted Answer

以前のバージョンのPythonを使用している場合、または独自のWordカウンターをロールする非常に正当な理由がある場合は（聞きたい！）、 dict。

Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> Word_list = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', ''] >>> Word_counter = {} >>> for Word in Word_list: ... if Word in Word_counter: ... Word_counter[Word] += 1 ... else: ... Word_counter[Word] = 1 ... >>> popular_words = sorted(Word_counter, key = Word_counter.get, reverse = True) >>> >>> top_3 = popular_words[:3] >>> >>> top_3 ['Jellicle', 'Cats', 'and']

トップヒント：インタラクティブなPythonインタープリターは、このようなアルゴリズムでプレイしたいときはいつでもあなたの友人です。方法。

Mark Byers · Answer

Python 2.7以降では Counter と呼ばれるクラスがあります。

_from collections import Counter words_to_count = (Word for Word in Word_list if Word[:1].isupper()) c = Counter(words_to_count) print c.most_common(3) _

結果：

_[('Jellicle', 6), ('Cats', 5), ('And', 2)] _

私はプログラミングにまったく慣れていないので、最も基本的な方法で試してみてください。

その代わりに、キーがWordで、値がそのWordのカウントである辞書を使用してこれを行うことができます。最初に単語が存在しない場合は辞書に追加し、存在する場合は単語のカウントを増やします。次に、上位3つを見つけるには、単純なO(n*log(n))ソートアルゴリズムを使用して結果から最初の3つの要素を取得するか、O(n)アルゴリズムを使用して、上位3つの要素のみ。

初心者にとって重要なことは、目的のために設計された組み込みクラスを使用することで、多くの作業を節約したり、パフォーマンスを向上させたりできることです。標準ライブラリとそれが提供する機能に精通しているのは良いことです。

unlockme · Answer

最も一般的な単語を含むリストを返すには：

_from collections import Counter words=["i", "love", "you", "i", "you", "a", "are", "you", "you", "fine", "green"] most_common_words= [Word for Word, Word_count in Counter(words).most_common(3)] print most_common_words _

これは印刷します：

_['you', 'i', 'a'] _

「most_common(3)」の3は、印刷するアイテムの数を指定します。 Counter(words).most_common()は、タプルのリストを返します。各タプルは、最初のメンバーとしてWordを持ち、2番目のメンバーとして頻度を持ちます。タプルは、Wordの頻度順に並べられます。

_`most_common = [item for item in Counter(words).most_common()] print(str(most_common)) [('you', 4), ('i', 2), ('a', 1), ('are', 1), ('green', 1), ('love',1), ('fine', 1)]` _

「the _Word for Word, Word_counter in_」は、タプルの最初のメンバーのみを抽出します。

Tim Seed · Answer

これだけではないでしょうか....

_Word_list=['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', ''] from collections import Counter c = Counter(Word_list) c.most_common(3) _

どちらを出力するか

[('Jellicle', 6), ('Cats', 5), ('are', 3)]

mmmdreg · Answer

nltk は、多くの言語処理に便利です。頻度分布のメソッドが組み込まれています。次のようなものです。

import nltk fdist = nltk.FreqDist(your_list) # creates a frequency distribution from a list most_common = fdist.max() # returns a single element top_three = fdist.keys()[:3] # returns a list

Chrigi · Answer

追加のモジュールを必要としないシンプルな2行のソリューションは、次のコードです。

_lst = ['Jellicle', 'Cats', 'are', 'black', 'and','white,', 'Jellicle', 'Cats','are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and','bright,', 'And', 'pleasant', 'to','hear', 'when', 'they', 'caterwaul.','Jellicle', 'Cats', 'have','cheerful', 'faces,', 'Jellicle', 'Cats','have', 'bright', 'black','eyes;', 'They', 'like', 'to', 'practise','their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle','Moon', 'to', 'rise.', ''] lst_sorted=sorted([ss for ss in set(lst) if len(ss)>0 and ss.istitle()], key=lst.count, reverse=True) print lst_sorted[0:3] _

出力：

_['Jellicle', 'Cats', 'And'] _

角括弧内の用語は、空ではなく大文字で始まる、リスト内のすべての一意の文字列を返します。 sorted()関数は、リストに表示される頻度で（_lst.count_キーを使用して）逆順でソートします。

JJC · Answer

@Mark Byersからの答えが最適ですが、Python <2.7（ただし、少なくとも2.5、これはかなり古い））のバージョンを使用している場合、Counterクラスの機能を複製できます非常に単純にdefaultdictを使用します（それ以外の場合、python <2.5の場合、@ Johnnyswebの答えのように、d [i] + = 1の前に3行のコードが必要です）。

from collections import defaultdict class Counter(): ITEMS = [] def __init__(self, items): d = defaultdict(int) for i in items: d[i] += 1 self.ITEMS = sorted(d.iteritems(), reverse=True, key=lambda i: i[1]) def most_common(self, n): return self.ITEMS[:n]

次に、Mark Byersの答えとまったく同じようにクラスを使用します。

words_to_count = (Word for Word in Word_list if Word[:1].isupper()) c = Counter(words_to_count) print c.most_common(3)

Matthew D. Scholefield · Answer

リスト内で最も頻繁に使用される値を見つけるには、2つの標準ライブラリ方法があります。

statistics.mode ：

from statistics import mode most_common = mode([3, 2, 2, 2, 1, 1]) # 2 most_common = mode([3, 2]) # StatisticsError: no unique mode

固有の最頻値がない場合に例外を発生させます
単一の最頻値のみを返します

collections.Counter.most_common ：

from collections import Counter most_common, count = Counter([3, 2, 2, 2, 1, 1]).most_common(2) # 2, 3 (most_common_1, count_1), (most_common_2, count_2) = Counter([3, 2, 2]).most_common(2) # (2, 2), (3, 1)

複数の最も頻繁な値を返すことができます
要素数も返します

したがって、質問の場合、2番目のものが正しい選択です。副次的な注意事項として、両方のパフォーマンスの面で同一です。

jvdneste · Answer

簡単な方法これを行うには（リストが「l」にあると仮定）：

>>> counter = {} >>> for i in l: counter[i] = counter.get(i, 0) + 1 >>> sorted([ (freq,Word) for Word, freq in counter.items() ], reverse=True)[:3] [(6, 'Jellicle'), (5, 'Cats'), (3, 'to')]

完全なサンプル：

>>> l = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', ''] >>> counter = {} >>> for i in l: counter[i] = counter.get(i, 0) + 1 ... >>> counter {'and': 3, '': 1, 'merry': 1, 'rise.': 1, 'small;': 1, 'Moon': 1, 'cheerful': 1, 'bright': 1, 'Cats': 5, 'are': 3, 'have': 2, 'bright,': 1, 'for': 1, 'their': 1, 'rather': 1, 'when': 1, 'to': 3, 'airs': 1, 'black': 2, 'They': 1, 'practise': 1, 'caterwaul.': 1, 'pleasant': 1, 'hear': 1, 'they': 1, 'white,': 1, 'wait': 1, 'And': 2, 'like': 1, 'Jellicle': 6, 'eyes;': 1, 'the': 1, 'faces,': 1, 'graces': 1} >>> sorted([ (freq,Word) for Word, freq in counter.items() ], reverse=True)[:3] [(6, 'Jellicle'), (5, 'Cats'), (3, 'to')]

シンプルというのは、ほぼすべてのバージョンのpythonで作業するということです。

このサンプルで使用されている関数の一部を理解していない場合は、（上記のコードを貼り付けた後）インタープリターでいつでも実行できます。

>>> help(counter.get) >>> help(sorted)

drew · Answer

Countを使用している場合、または独自のCount-style dictを作成していて、アイテムの名前とそのカウントを表示する場合は、辞書を反復処理できます。そのようです：

top_10_words = Counter(my_long_list_of_words) # Iterate around the dictionary for Word in top_10_words: # print the Word print Word[0] # print the count print Word[1]

またはテンプレートでこれを反復する：

{% for Word in top_10_words %} <p>Word: {{ Word.0 }}</p> <p>Count: {{ Word.1 }}</p> {% endfor %}

これが誰かを助けることを願っています