リスト内の単語の頻度を数え、頻度でソートする

Question

Python 3.3を使用しています

2つのリストを作成する必要があります。1つは一意の単語用で、もう1つは単語の頻度用です。

頻度リストに基づいて一意のWordリストを並べ替えて、頻度が最も高いWordがリストの最初に来るようにします。

私はテキストでデザインを持っていますが、Pythonでそれをどのように実装するのか不確かです。

私がこれまでに見つけたメソッドは、Counterまたは学習していない辞書を使用します。すべての単語を含むファイルからリストを作成しましたが、リスト内の各単語の頻度を見つける方法はわかりません。これを行うにはループが必要であることは知っていますが、それを理解することはできません。

基本的なデザインは次のとおりです。

 original list = ["the", "car",....] newlst = [] frequency = [] for Word in the original list if Word not in newlst: newlst.append(Word) set frequency = 1 else increase the frequency sort newlst based on frequency list

Ashif Abdulrahman · Answer

これを使って

from collections import Counter list1=['Apple','Egg','Apple','banana','Egg','Apple'] counts = Counter(list1) print(counts) # Counter({'Apple': 3, 'Egg': 2, 'banana': 1})

tdolydong · Answer

使用できます

from collections import Counter

Python 2.7をサポートしています。詳細はこちらをご覧くださいこちら

1。

>>>c = Counter('abracadabra') >>>c.most_common(3) [('a', 5), ('r', 2), ('b', 2)]

辞書を使う

>>>d={1:'one', 2:'one', 3:'two'} >>>c = Counter(d.values()) [('one', 2), ('two', 1)]

ただし、最初にファイルを読み取ってから、dictに変換する必要があります。

2. python docsの例です。reとCounterを使用します

# Find the ten most common words in Hamlet >>> import re >>> words = re.findall(r'\w+', open('hamlet.txt').read().lower()) >>> Counter(words).most_common(10) [('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631), ('you', 554), ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]

kyle k · Answer

words = file("test.txt", "r").read().split() #read the words into a list. uniqWords = sorted(set(words)) #remove duplicate words and sort for Word in uniqWords: print words.count(Word), Word

Gadi · Answer

Reduce（）-機能的な方法を使用できます。

words = "Apple banana Apple strawberry banana lemon" reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})

戻り値：

{'strawberry': 1, 'lemon': 1, 'Apple': 2, 'banana': 2}

Milo P · Answer

1つの方法は、リストのリストを作成することです。新しいリストの各サブリストには、Wordとカウントが含まれます。

list1 = [] #this is your original list of words list2 = [] #this is a new list for Word in list1: if Word in list2: list2.index(Word)[1] += 1 else: list2.append([Word,0])

または、より効率的に：

for Word in list1: try: list2.index(Word)[1] += 1 except: list2.append([Word,0])

これは、辞書を使用するよりも効率的ではありませんが、より基本的な概念を使用します。

Reza Abtin · Answer

コレクションを使用しない別のアルゴリズムによるさらに別のソリューション：

def countWords(A): dic={} for x in A: if not x in dic: #Python 2.7: if not dic.has_key(x): dic[x] = A.count(x) return dic dic = countWords(['Apple','Egg','Apple','banana','Egg','Apple']) sorted_items=sorted(dic.items()) # if you want it sorted

Karan Goel · Answer

理想的な方法は、Wordをそのカウントにマップする辞書を使用することです。ただし、それを使用できない場合は、2つのリストを使用できます。1つは単語を保存し、もう1つは単語のカウントを保存します。ここでは、単語とカウントの順序が重要であることに注意してください。これを実装するのは難しく、あまり効率的ではありません。

johannestaas · Answer

Counterを使用するのが最善の方法ですが、それを行いたくない場合は、この方法で自分で実装できます。

# The list you already have Word_list = ['words', ..., 'other', 'words'] # Get a set of unique words from the list Word_set = set(Word_list) # create your frequency dictionary freq = {} # iterate through them, once per unique Word. for Word in Word_set: freq[Word] = Word_list.count(Word) / float(len(Word_list))

freqは、すでにあるリスト内の各Wordの頻度になります。

整数の1つをfloatに変換するにはfloatが必要なので、結果の値はfloatになります。

編集：

Dictまたはsetを使用できない場合、次の方法が効率的ではありません。

# The list you already have Word_list = ['words', ..., 'other', 'words'] unique_words = [] for Word in Word_list: if Word not in unique_words: unique_words += [Word] Word_frequencies = [] for Word in unique_words: Word_frequencies += [float(Word_list.count(Word)) / len(Word_list)] for i in range(len(unique_words)): print(unique_words[i] + ": " + Word_frequencies[i])

unique_wordsとWord_frequenciesのインデックスは一致します。

Michaelpanicci · Answer

パンダの答え：

import pandas as pd original_list = ["the", "car", "is", "red", "red", "red", "yes", "it", "is", "is", "is"] pd.Series(original_list).value_counts()

代わりに昇順にしたい場合は、次のように簡単です：

pd.Series(original_list).value_counts().sort_values(ascending=True)

skay · Answer

ここにあなたの質問をサポートするコードがありますis_char（）チェックは文字列のみを検証し、ハッシュマップはPythonの辞書です

def is_Word(word): cnt =0 for c in Word: if 'a' <= c <='z' or 'A' <= c <= 'Z' or '0' <= c <= '9' or c == '$': cnt +=1 if cnt==len(Word): return True return False def words_freq(s): d={} for i in s.split(): if is_Word(i): if i in d: d[i] +=1 else: d[i] = 1 return d print(words_freq('the the sky$ is blue not green'))

Paige Goulding · Answer

これを試して：

words = [] freqs = [] for line in sorted(original list): #takes all the lines in a text and sorts them line = line.rstrip() #strips them of their spaces if line not in words: #checks to see if line is in words words.append(line) #if not it adds it to the end words freqs.append(1) #and adds 1 to the end of freqs else: index = words.index(line) #if it is it will find where in words freqs[index] += 1 #and use the to change add 1 to the matching index in freqs