ハフマン符号化と復号化のツリーを作成するにはどうすればよいですか？

Question

私の任務では、ハフマンツリーのエンコードとデコードを行います。ツリーの作成に問題があり、行き詰まっています。

Printステートメントは気にしないでください。これらは、関数の実行時に出力が何であるかをテストして確認するためのものです。

最初のforループでは、テスト用にメインブロックで使用したテキストファイルからすべての値とインデックスを取得しました。

2番目のforループでは、すべてのものを優先キューに挿入しました。

私は次にどこに行くべきかについてとても行き詰まっています-私はノードを作ろうとしていますが、どのように進めるかについて混乱しています。私がこれを正しく行っているかどうか誰かに教えてもらえますか？

def _create_code(self, frequencies): '''(HuffmanCoder, sequence(int)) -> NoneType iterate over index into the sequence keeping it 256 elements long, ''' #fix docstring p = PriorityQueue() print frequencies index = 0 for value in frequencies: if value != 0: print value #priority print index #Elm print '-----------' index = index + 1 for i in range(len(frequencies)): if frequencies[i] != 0: p.insert(i, frequencies[i]) print i,frequencies[i] if p.is_empty(): a = p.get_min() b = p.get_min() n1 = self.HuffmanNode(None, None, a) n2 = self.HuffmanNode(None, None, b) print a, b, n1, n2 while not p.is_empty(): p.get_min()

最初の2つを手動で挿入してツリーを開始しましたが、それは正しいですか？

どうすれば続行できますか？私はそれの考えを知っています、ちょうどコードに関して私は非常に立ち往生しています。

ちなみに、これはpythonを使用しています。ウィキペディアを見てみました。手順はわかっています。コードのヘルプが必要で、どのように続行する必要がありますか。ありがとうございます。

HuffmanNodeは、次のネストされたクラスに由来します。

class HuffmanNode(object): def __init__(self, left=None, right=None, root=None): self.left = left self.right = right self.root = root

Dave · Answer

ウィキペディアのハフマンアルゴリズムは、ノードツリーの作成方法を正確に示しているため、プログラムはそのアルゴリズムまたはそれに類似したアルゴリズムに基づくことができます。これはPythonプログラムで、対応するウィキペディアのアルゴリズムステップを示すコメントが付いています。テストデータは、英語のテキストのアルファベットの文字の頻度です。

ノードツリーが作成されたら、それを下に移動して、データセット内の各シンボルにハフマンコードを割り当てる必要があります。これは宿題なので、そのステップはあなた次第ですが、再帰的アルゴリズムはそれを処理するための最も簡単で最も自然な方法です。あと6行のコードです。

import queue class HuffmanNode(object): def __init__(self, left=None, right=None, root=None): self.left = left self.right = right self.root = root # Why? Not needed for anything. def children(self): return((self.left, self.right)) freq = [ (8.167, 'a'), (1.492, 'b'), (2.782, 'c'), (4.253, 'd'), (12.702, 'e'),(2.228, 'f'), (2.015, 'g'), (6.094, 'h'), (6.966, 'i'), (0.153, 'j'), (0.747, 'k'), (4.025, 'l'), (2.406, 'm'), (6.749, 'n'), (7.507, 'o'), (1.929, 'p'), (0.095, 'q'), (5.987, 'r'), (6.327, 's'), (9.056, 't'), (2.758, 'u'), (1.037, 'v'), (2.365, 'w'), (0.150, 'x'), (1.974, 'y'), (0.074, 'z') ] def create_tree(frequencies): p = queue.PriorityQueue() for value in frequencies: # 1. Create a leaf node for each symbol p.put(value) # and add it to the priority queue while p.qsize() > 1: # 2. While there is more than one node l, r = p.get(), p.get() # 2a. remove two highest nodes node = HuffmanNode(l, r) # 2b. create internal node with children p.put((l[0]+r[0], node)) # 2c. add new node to queue return p.get() # 3. tree is complete - return root node node = create_tree(freq) print(node) # Recursively walk the tree down to the leaves, # assigning a code value to each symbol def walk_tree(node, prefix="", code={}): return(code) code = walk_tree(node) for i in sorted(freq, reverse=True): print(i[1], '{:6.2f}'.format(i[0]), code[i[1]])

アルファベットデータで実行すると、結果のハフマンコードは次のようになります。

e 12.70 100 t 9.06 000 a 8.17 1110 o 7.51 1101 i 6.97 1011 n 6.75 1010 s 6.33 0111 h 6.09 0110 r 5.99 0101 d 4.25 11111 l 4.03 11110 c 2.78 01001 u 2.76 01000 m 2.41 00111 w 2.37 00110 f 2.23 00100 g 2.02 110011 y 1.97 110010 p 1.93 110001 b 1.49 110000 v 1.04 001010 k 0.75 0010111 j 0.15 001011011 x 0.15 001011010 q 0.10 001011001 z 0.07 001011000

user1762278 · Answer

@Davewalk_treeにツリー処理コードがありません

# Recursively walk the tree down to the leaves, # assigning a code value to each symbol def walk_tree(node, prefix="", code={}): if isinstance(node[1].left[1], HuffmanNode): walk_tree(node[1].left,prefix+"0", code) else: code[node[1].left[1]]=prefix+"0" if isinstance(node[1].right[1],HuffmanNode): walk_tree(node[1].right,prefix+"1", code) else: code[node[1].right[1]]=prefix+"1" return(code)

Pavel Prochazka · Answer

ディクショナリ{label:code}と結果のグラフを含む再帰ディクショナリtreeを返すもう1つのソリューション。入力valsは辞書{label:freq}の形式です。

def assign_code(nodes, label, result, prefix = ''): childs = nodes[label] tree = {} if len(childs) == 2: tree['0'] = assign_code(nodes, childs[0], result, prefix+'0') tree['1'] = assign_code(nodes, childs[1], result, prefix+'1') return tree else: result[label] = prefix return label def Huffman_code(_vals): vals = _vals.copy() nodes = {} for n in vals.keys(): # leafs initialization nodes[n] = [] while len(vals) > 1: # binary tree creation s_vals = sorted(vals.items(), key=lambda x:x[1]) a1 = s_vals[0][0] a2 = s_vals[1][0] vals[a1+a2] = vals.pop(a1) + vals.pop(a2) nodes[a1+a2] = [a1, a2] code = {} root = a1+a2 tree = {} tree = assign_code(nodes, root, code) # assignment of the code for the given binary tree return code, tree

これは次のように使用できます。

freq = [ (8.167, 'a'), (1.492, 'b'), (2.782, 'c'), (4.253, 'd'), (12.702, 'e'),(2.228, 'f'), (2.015, 'g'), (6.094, 'h'), (6.966, 'i'), (0.153, 'j'), (0.747, 'k'), (4.025, 'l'), (2.406, 'm'), (6.749, 'n'), (7.507, 'o'), (1.929, 'p'), (0.095, 'q'), (5.987, 'r'), (6.327, 's'), (9.056, 't'), (2.758, 'u'), (1.037, 'v'), (2.365, 'w'), (0.150, 'x'), (1.974, 'y'), (0.074, 'z') ] vals = {l:v for (v,l) in freq} code, tree = Huffman_code(vals) text = 'hello' # text to encode encoded = ''.join([code[t] for t in text]) print('Encoded text:',encoded) decoded = [] i = 0 while i < len(encoded): # decoding using the binary graph ch = encoded[i] act = tree[ch] while not isinstance(act, str): i += 1 ch = encoded[i] act = act[ch] decoded.append(act) i += 1 print('Decoded text:',''.join(decoded))

Graphvizでツリーを次のように視覚化できます：

この図は、次のスクリプトによって生成されました（Graphvizが必要です）。

def draw_tree(tree, prefix = ''): if isinstance(tree, str): descr = 'N%s [label="%s:%s", fontcolor=blue, fontsize=16, width=2, shape=box];
'%(prefix, tree, prefix) else: # Node description descr = 'N%s [label="%s"];
'%(prefix, prefix) for child in tree.keys(): descr += draw_tree(tree[child], prefix = prefix+child) descr += 'N%s -> N%s;
'%(prefix,prefix+child) return descr import subprocess with open('graph.dot','w') as f: f.write('digraph G {
') f.write(draw_tree(tree)) f.write('}') subprocess.call('dot -Tpng graph.dot -o graph.png', Shell=True)

B Abali · Answer

@DaveクラスHuffmanNode（object）には微妙なバグがあります。 2つの周波数が等しい場合、例外がスローされます。たとえば、

 freq = [ (200/3101, 'd'), (100/3101, 'e'), (100/3101, 'f') ]

次に、TypeError：unorderable types：HuffmanNode（）<str（）が発生します。問題は、PriorityQueueの実装に関係しています。タプルの最初の要素が等しい場合、PriorityQueueは2番目の要素を比較したいと考えています。そのうちの1つはpythonオブジェクトです。ltメソッドを追加します。クラスと問題が解決されます。

 def __lt__(self,other): return 0

max_new · Answer

私は今日この問題を解決し、上記の応答で結果を一致させようとしました。ほとんどの場合、このソリューションはうまく機能しますが、直感的ではないと思うのは、非ノード（リーフ）の印刷に[0]と[1]を追加することだけです。しかし、これは奇跡の質問に答えます-基本的に、任意のトラバーサルメカニズムを使用して印刷できます

import queue class HuffmanNode(object): def __init__(self,left=None,right=None,root=None): self.left = left self.right = right self.root = root def children(self): return (self.left,self.right) def preorder(self,path=None): if path is None: path = [] if self.left is not None: if isinstance(self.left[1], HuffmanNode): self.left[1].preorder(path+[0]) else: print(self.left,path+[0]) if self.right is not None: if isinstance(self.right[1], HuffmanNode): self.right[1].preorder(path+[1]) else: print(self.right,path+[1]) freq = [ (8.167, 'a'), (1.492, 'b'), (2.782, 'c'), (4.253, 'd'), (12.702, 'e'),(2.228, 'f'), (2.015, 'g'), (6.094, 'h'), (6.966, 'i'), (0.153, 'j'), (0.747, 'k'), (4.025, 'l'), (2.406, 'm'), (6.749, 'n'), (7.507, 'o'), (1.929, 'p'), (0.095, 'q'), (5.987, 'r'), (6.327, 's'), (9.056, 't'), (2.758, 'u'), (1.037, 'v'), (2.365, 'w'), (0.150, 'x'), (1.974, 'y'), (0.074, 'z') ] def encode(frequencies): p = queue.PriorityQueue() for item in frequencies: p.put(item) #invariant that order is ascending in the priority queue #p.size() gives list of elements while p.qsize() > 1: left,right = p.get(),p.get() node = HuffmanNode(left,right) p.put((left[0]+right[0],node)) return p.get() node = encode(freq) print(node[1].preorder())