2つの再帰的な差分python辞書（キーと値）

Question

pythonディクショナリ、それを_d1_と呼びます。後でそのバージョンのディクショナリを_d2_と呼びます。すべてを検索したい_d1_と_d2_の間の変更。つまり、追加、削除、または変更されたすべてのものです。トリッキーなビットとは、値が整数、文字列、リスト、またはディクトであるため、これは私がこれまで持ってきたものです：

_def dd(d1, d2, ctx=""): print "Changes in " + ctx for k in d1: if k not in d2: print k + " removed from d2" for k in d2: if k not in d1: print k + " added in d2" continue if d2[k] != d1[k]: if type(d2[k]) not in (dict, list): print k + " changed in d2 to " + str(d2[k]) else: if type(d1[k]) != type(d2[k]): print k + " changed to " + str(d2[k]) continue else: if type(d2[k]) == dict: dd(d1[k], d2[k], k) continue print "Done with changes in " + ctx return _

値がリストでない限り、問題なく動作します。 if(type(d2) == list)の後にこの関数のわずかに変更された巨大なバージョンを繰り返さずに、リストを処理するエレガントな方法を思いつくことはできません。

何かご意見は？

編集：これはこの投稿とは異なります

Andrew Clark · Accepted Answer

1つのオプションは、実行するすべてのリストを、インデックスをキーとして辞書として変換することです。例えば：

# add this function to the same module def list_to_dict(l): return dict(Zip(map(str, range(len(l))), l))

# add this code under the 'if type(d2[k]) == dict' block Elif type(d2[k]) == list: dd(list_to_dict(d1[k]), list_to_dict(d2[k]), k)

これは、コメントで指定したサンプル辞書の出力です。

>>> d1 = {"name":"Joe", "Pets":[{"name":"spot", "species":"dog"}]} >>> d2 = {"name":"Joe", "Pets":[{"name":"spot", "species":"cat"}]} >>> dd(d1, d2, "base") Changes in base Changes in Pets Changes in 0 species changed in d2 to cat Done with changes in 0 Done with changes in Pets Done with changes in base

これはインデックスごとに比較するため、追加または削除されるリストアイテムに対して適切に機能するようにいくつかの変更が必要になることに注意してください。

Seperman · Answer

違いを再帰的に知りたい場合は、Python用のパッケージを作成しました。 https://github.com/seperman/deepdiff

Installation

PyPiからインストール：

pip install deepdiff

使用例

インポート

>>> from deepdiff import DeepDiff >>> from pprint import pprint >>> from __future__ import print_function # In case running on Python 2

同じオブジェクトが空を返す

>>> t1 = {1:1, 2:2, 3:3} >>> t2 = t1 >>> print(DeepDiff(t1, t2)) {}

アイテムの種類が変更されました

>>> t1 = {1:1, 2:2, 3:3} >>> t2 = {1:1, 2:"2", 3:3} >>> pprint(DeepDiff(t1, t2), indent=2) { 'type_changes': { 'root[2]': { 'newtype': <class 'str'>, 'newvalue': '2', 'oldtype': <class 'int'>, 'oldvalue': 2}}}

アイテムの値が変更されました

>>> t1 = {1:1, 2:2, 3:3} >>> t2 = {1:1, 2:4, 3:3} >>> pprint(DeepDiff(t1, t2), indent=2) {'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

追加または削除されたアイテム

>>> t1 = {1:1, 2:2, 3:3, 4:4} >>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff) {'dic_item_added': ['root[5]', 'root[6]'], 'dic_item_removed': ['root[4]'], 'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

文字列の違い

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}} >>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) { 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2}, "root[4]['b']": { 'newvalue': 'world!', 'oldvalue': 'world'}}}

文字列の違い2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!
Goodbye!
1
2
End"}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world
1
2
End"}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) { 'values_changed': { "root[4]['b']": { 'diff': '--- 
' '+++ 
' '@@ -1,5 +1,4 @@
' '-world!
' '-Goodbye!
' '+world
' ' 1
' ' 2
' ' End', 'newvalue': 'world
1
2
End', 'oldvalue': 'world!
' 'Goodbye!
' '1
' '2
' 'End'}}} >>> >>> print (ddiff['values_changed']["root[4]['b']"]["diff"]) --- +++ @@ -1,5 +1,4 @@ -world! -Goodbye! +world 1 2 End

タイプ変更

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world


End"}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) { 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>, 'newvalue': 'world


End', 'oldtype': <class 'list'>, 'oldvalue': [1, 2, 3]}}}

リストの違い

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) {'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}

リストの違い2：

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) { 'iterable_item_added': {"root[4]['b'][3]": 3}, 'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2}, "root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}

順序や重複を無視して違いをリストする：（上記と同じ辞書で）

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}} >>> ddiff = DeepDiff(t1, t2, ignore_order=True) >>> print (ddiff) {}

辞書を含むリスト：

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}} >>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}} >>> ddiff = DeepDiff(t1, t2) >>> pprint (ddiff, indent = 2) { 'dic_item_removed': ["root[4]['b'][2][2]"], 'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}

セット：

>>> t1 = {1, 2, 8} >>> t2 = {1, 2, 3, 5} >>> ddiff = DeepDiff(t1, t2) >>> pprint (DeepDiff(t1, t2)) {'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}

名前付きタプル：

>>> from collections import namedtuple >>> Point = namedtuple('Point', ['x', 'y']) >>> t1 = Point(x=11, y=22) >>> t2 = Point(x=11, y=23) >>> pprint (DeepDiff(t1, t2)) {'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}

カスタムオブジェクト：

>>> class ClassA(object): ... a = 1 ... def __init__(self, b): ... self.b = b ... >>> t1 = ClassA(1) >>> t2 = ClassA(2) >>> >>> pprint(DeepDiff(t1, t2)) {'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

追加されたオブジェクト属性：

>>> t2.c = "new attribute" >>> pprint(DeepDiff(t1, t2)) {'attribute_added': ['root.c'], 'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

martineau · Answer

ちょっと考えてみましょう。オブジェクト指向のアプローチを試して、独自のディクショナリクラスを作成し、それに加えられた変更を追跡（およびレポート）することができます。これは2つの辞書を比較しようとするよりも多くの利点があるように思われます...最後に注記されています。

これがどのように行われるかを示すために、Python 2と3の両方で機能する、合理的に完全で最小限にテストされたサンプル実装を次に示します。

import sys _NUL = object() # unique object if sys.version_info[0] > 2: def iterkeys(d, **kw): return iter(d.keys(**kw)) else: def iterkeys(d, **kw): return d.iterkeys(**kw) class TrackingDict(dict): """ Dict subclass which tracks all changes in a _changelist attribute. """ def __init__(self, *args, **kwargs): super(TrackingDict, self).__init__(*args, **kwargs) self.clear_changelist() for key in sorted(iterkeys(self)): self._changelist.append(AddKey(key, self[key])) def clear_changelist(self): # additional public method self._changelist = [] def __setitem__(self, key, value): modtype = ChangeKey if key in self else AddKey super(TrackingDict, self).__setitem__(key, value) self._changelist.append(modtype(key, self[key])) def __delitem__(self, key): super(TrackingDict, self).__delitem__(key) self._changelist.append(RemoveKey(key)) def clear(self): deletedkeys = self.keys() super(TrackingDict, self).clear() for key in sorted(deletedkeys): self._changelist.append(RemoveKey(key)) def update(self, other=_NUL): if other is not _NUL: otherdict = dict(other) # convert to dict if necessary changedkeys = set(k for k in otherdict if k in self) super(TrackingDict, self).update(other) for key in sorted(iterkeys(otherdict)): if key in changedkeys: self._changelist.append(ChangeKey(key, otherdict[key])) else: self._changelist.append(AddKey(key, otherdict[key])) def setdefault(self, key, default=None): if key not in self: self[key] = default # will append an AddKey to _changelist return self[key] def pop(self, key, default=_NUL): if key in self: ret = self[key] # save value self.__delitem__(key) return ret Elif default is not _NUL: # default specified return default else: # not there & no default self[key] # allow KeyError to be raised def popitem(self): key, value = super(TrackingDict, self).popitem() self._changelist.append(RemoveKey(key)) return key, value # change-tracking record classes class DictMutator(object): def __init__(self, key, value=_NUL): self.key = key self.value = value def __repr__(self): return '%s(%r%s)' % (self.__class__.__name__, self.key, '' if self.value is _NUL else ': '+repr(self.value)) class AddKey(DictMutator): pass class ChangeKey(DictMutator): pass class RemoveKey(DictMutator): pass if __name__ == '__main__': import traceback import sys td = TrackingDict({'one': 1, 'two': 2}) print('changelist: {}'.format(td._changelist)) td['three'] = 3 print('changelist: {}'.format(td._changelist)) td['two'] = -2 print('changelist: {}'.format(td._changelist)) td.clear() print('changelist: {}'.format(td._changelist)) td.clear_changelist() td['newkey'] = 42 print('changelist: {}'.format(td._changelist)) td.setdefault('another') # default None value print('changelist: {}'.format(td._changelist)) td.setdefault('one more', 43) print('changelist: {}'.format(td._changelist)) td.update(Zip(('another', 'one', 'two'), (17, 1, 2))) print('changelist: {}'.format(td._changelist)) td.pop('newkey') print('changelist: {}'.format(td._changelist)) try: td.pop("won't find") except KeyError: print("KeyError as expected:") traceback.print_exc(file=sys.stdout) print('...and no change to _changelist:') print('changelist: {}'.format(td._changelist)) td.clear_changelist() while td: td.popitem() print('changelist: {}'.format(td._changelist))

注辞書のbeforeとafter状態の単純な比較とは異なり、このクラスは追加されてから削除されたキーについて通知します。つまり、_changelistがクリアされるまで完全な履歴を保持します。

出力：

changelist: [AddKey('one': 1), AddKey('two': 2)] changelist: [AddKey('one': 1), AddKey('two': 2), AddKey('three': 3)] changelist: [AddKey('one': 1), AddKey('two': 2), AddKey('three': 3), ChangeKey('two': -2)] changelist: [AddKey('one': 1), AddKey('two': 2), AddKey('three': 3), ChangeKey('two': -2), RemoveKey('one'), RemoveKey('three'), RemoveKey('two')] changelist: [AddKey('newkey': 42)] changelist: [AddKey('newkey': 42), AddKey('another': None)] changelist: [AddKey('newkey': 42), AddKey('another': None), AddKey('one more': 43)] changelist: [AddKey('newkey': 42), AddKey('another': None), AddKey('one more': 43), ChangeKey('another': 17), AddKey('one': 1), AddKey('two': 2)] changelist: [AddKey('newkey': 42), AddKey('another': None), AddKey('one more': 43), ChangeKey('another': 17), AddKey('one': 1), AddKey('two': 2), RemoveKey('newkey')] KeyError as expected: Traceback (most recent call last): File "trackingdict.py", line 122, in <module> td.pop("won't find") File "trackingdict.py", line 67, in pop self[key] # allow KeyError to be raised KeyError: "won't find" ...and no change to _changelist: changelist: [AddKey('newkey': 42), AddKey('another': None), AddKey('one more': 43), ChangeKey('another': 17), AddKey('one': 1), AddKey('two': 2), RemoveKey('newkey')] changelist: [RemoveKey('one'), RemoveKey('two'), RemoveKey('another'), RemoveKey('one more')]

Winston Ewert · Answer

関数は、引数の型をチェックすることから始め、リスト、辞書、int、および文字列を処理できるように関数を記述します。そうすれば、何も複製する必要がなく、再帰的に呼び出すだけです。

擬似コード：

def compare(d1, d2): if d1 and d2 are dicts compare the keys, pass values to compare if d1 and d2 are lists compare the lists, pass values to compare if d1 and d2 are strings/ints compare them

Gabe · Answer

Winston Ewert にインスパイアされた実装です

def recursive_compare(d1, d2, level='root'): if isinstance(d1, dict) and isinstance(d2, dict): if d1.keys() != d2.keys(): s1 = set(d1.keys()) s2 = set(d2.keys()) print('{:<20} + {} - {}'.format(level, s1-s2, s2-s1)) common_keys = s1 & s2 else: common_keys = set(d1.keys()) for k in common_keys: recursive_compare(d1[k], d2[k], level='{}.{}'.format(level, k)) Elif isinstance(d1, list) and isinstance(d2, list): if len(d1) != len(d2): print('{:<20} len1={}; len2={}'.format(level, len(d1), len(d2))) common_len = min(len(d1), len(d2)) for i in range(common_len): recursive_compare(d1[i], d2[i], level='{}[{}]'.format(level, i)) else: if d1 != d2: print('{:<20} {} != {}'.format(level, d1, d2)) if __name__ == '__main__': d1={'a':[0,2,3,8], 'b':0, 'd':{'da':7, 'db':[99,88]}} d2={'a':[0,2,4], 'c':0, 'd':{'da':3, 'db':7}} recursive_compare(d1, d2)

戻ります：

root + {'b'} - {'c'} root.a len1=4; len2=3 root.a[2] 3 != 4 root.d.db [99, 88] != 7 root.d.da 7 != 3

zeekay · Answer

オブジェクトを再帰的に処理するときは、hasattr(obj, '__iter__')の使用を検討してください。オブジェクトが__iter__メソッドを実装している場合、それを反復できることがわかります。

Matt Faus · Answer

Sergeによって提案されたように、私はこのソリューションが2つのディクショナリが「ずっと下」で一致するかどうかをすばやくブール値で返すのに役立つことがわかりました。

import json def match(d1, d2): return json.dumps(d1, sort_keys=True) == json.dumps(d2, sort_keys=True)

Serge · Answer

自分で練習して学ぶことは楽しいですが、重要なタスクの場合は、準備が整って維持されているパッケージがよく機能することがよくあります。

Jsonに変換することを検討し、適切な「セマンティック」jsonコンパレータを使用してください https://www.npmjs.com/package/compare-json またはオンライン http://jsondiff.com 。数字キーを文字列化する必要があります。

本当に必要な場合は、jsondiffをpythonに変換してみてください。

JavaScriptからPythonコードへの変換？

wangzhiwei · Answer

次の簡単な実装を試すことができます

def recursive_compare(obj1, obj2): """ Compare python objects recursively, support type: "int, float, long, basestring, set, datetime, date, dict, Sequence" Example: >>> recursive_compare([1, 2, 3], [1, 2, 3]) >>> True >>> recursive_compare([1, 2, 3], [1, 2, 4]) >>> False >>> recursive_compare({'a': 1}, {'a': 2}) >>> False """ def _diff(obj1, obj2): # exclude type basestring for backward-compatible python2: # <str, unicode> if type(obj1) != type(obj2) and not isinstance(obj1, basestring): return False Elif isinstance(obj1, (int, float, long, basestring, set, datetime, date)): if obj1 != obj2: return False Elif isinstance(obj1, dict): keys = obj1.viewkeys() & obj2.viewkeys() if obj1 and len(keys) == 0 \ or keys.difference(set(obj1.keys())) \ or keys.difference(set(obj2.keys())): return False for k in keys: if _diff(obj1[k], obj2[k]) is False: return False Elif isinstance(obj1, collections.Sequence): # require sorted sequence object if len(obj1) != len(obj2): return False for i in range(len(obj1)): if _diff(obj1[i], obj2[i]) is False: return False else: raise TypeError('do not support type {} to compare'.format( type(obj1))) return False if _diff(obj1, obj2) is False else True