Pythonでリスト/セットのリストを比較するにはどうすればよいですか？

Question

2つのリスト/セットを比較し、違いを出力する最も簡単な方法は何ですか？ネストされたリスト/セットを比較するのに役立つ組み込み関数はありますか？

入力：

First_list = [['Test.doc', '1a1a1a', 1111], ['Test2.doc', '2b2b2b', 2222], ['Test3.doc', '3c3c3c', 3333] ] Secnd_list = [['Test.doc', '1a1a1a', 1111], ['Test2.doc', '2b2b2b', 2222], ['Test3.doc', '8p8p8p', 9999], ['Test4.doc', '4d4d4d', 4444]]

期待される出力：

Differences = [['Test3.doc', '3c3c3c', 3333], ['Test3.doc', '8p8p8p', 9999], ['Test4.doc', '4d4d4d', 4444]]

dr jimbob · Accepted Answer

そのため、アイテムの2つのリストの違いが必要です。

first_list = [['Test.doc', '1a1a1a', 1111], ['Test2.doc', '2b2b2b', 2222], ['Test3.doc', '3c3c3c', 3333]] secnd_list = [['Test.doc', '1a1a1a', 1111], ['Test2.doc', '2b2b2b', 2222], ['Test3.doc', '8p8p8p', 9999], ['Test4.doc', '4d4d4d', 4444]]

最初に、リストの各リストをタプルのリストに変換します。タプルはハッシュ可能であるため（リストはそうではありません）、タプルのリストをタプルのセットに変換できます。

first_Tuple_list = [Tuple(lst) for lst in first_list] secnd_Tuple_list = [Tuple(lst) for lst in secnd_list]

次に、セットを作成できます。

first_set = set(first_Tuple_list) secnd_set = set(secnd_Tuple_list)

編集（sdolanが提案）：ワンライナーの各リストに対して最後の2つのステップを実行できます。

first_set = set(map(Tuple, first_list)) secnd_set = set(map(Tuple, secnd_list))

注：mapは、最初の引数の関数（この場合はTuple関数）を2番目の引数の各アイテム（この場合はリストです）に適用する関数型プログラミングコマンドですリスト）。

そして、セット間の対称差を見つけます。

>>> first_set.symmetric_difference(secnd_set) set([('Test3.doc', '3c3c3c', 3333), ('Test3.doc', '8p8p8p', 9999), ('Test4.doc', '4d4d4d', 4444)])

注first_set ^ secnd_setはsymmetric_differenceと同等です。

また、セットを使用したくない場合（たとえば、python 2.2）を使用して）、それは非常に簡単です。たとえば、リストの内包表記：

>>> [x for x in first_list if x not in secnd_list] + [x for x in secnd_list if x not in first_list] [['Test3.doc', '3c3c3c', 3333], ['Test3.doc', '8p8p8p', 9999], ['Test4.doc', '4d4d4d', 4444]]

または、機能的なfilterコマンドとlambda関数を使用します。（両方の方法をテストし、組み合わせる必要があります）。

>>> filter(lambda x: x not in secnd_list, first_list) + filter(lambda x: x not in first_list, secnd_list) [['Test3.doc', '3c3c3c', 3333], ['Test3.doc', '8p8p8p', 9999], ['Test4.doc', '4d4d4d', 4444]]

pyfunc · Answer

>>> First_list = [['Test.doc', '1a1a1a', '1111'], ['Test2.doc', '2b2b2b', '2222'], ['Test3.doc', '3c3c3c', '3333']] >>> Secnd_list = [['Test.doc', '1a1a1a', '1111'], ['Test2.doc', '2b2b2b', '2222'], ['Test3.doc', '3c3c3c', '3333'], ['Test4.doc', '4d4d4d', '4444']] >>> z = [Tuple(y) for y in First_list] >>> z [('Test.doc', '1a1a1a', '1111'), ('Test2.doc', '2b2b2b', '2222'), ('Test3.doc', '3c3c3c', '3333')] >>> x = [Tuple(y) for y in Secnd_list] >>> x [('Test.doc', '1a1a1a', '1111'), ('Test2.doc', '2b2b2b', '2222'), ('Test3.doc', '3c3c3c', '3333'), ('Test4.doc', '4d4d4d', '4444')] >>> set(x) - set(z) set([('Test4.doc', '4d4d4d', '4444')])

Sam Magura · Answer

これにNice関数があるかどうかはわかりませんが、それを行うための「手動」の方法は難しくありません。

differences = [] for list in firstList: if list not in secondList: differences.append(list)

Stephen Neal · Answer

古い質問ですが、ここでは両方のリストにない一意の要素を返すために使用するソリューションがあります。

これを使用して、データベースから返された値とディレクトリクローラーパッケージによって生成された値を比較します。それらの多くがフラットリストとネストリストの両方を動的に処理できないため、私が見つけた他のソリューションは好きではありませんでした。

def differentiate(x, y): """ Retrieve a unique of list of elements that do not exist in both x and y. Capable of parsing one-dimensional (flat) and two-dimensional (lists of lists) lists. :param x: list #1 :param y: list #2 :return: list of unique values """ # Validate both lists, confirm either are empty if len(x) == 0 and len(y) > 0: return y # All y values are unique if x is empty Elif len(y) == 0 and len(x) > 0: return x # All x values are unique if y is empty # Get the input type to convert back to before return try: input_type = type(x[0]) except IndexError: input_type = type(y[0]) # Dealing with a 2D dataset (list of lists) try: # Immutable and Unique - Convert list of tuples into set of tuples first_set = set(map(Tuple, x)) secnd_set = set(map(Tuple, y)) # Dealing with a 1D dataset (list of items) except TypeError: # Unique values only first_set = set(x) secnd_set = set(y) # Determine which list is longest longest = first_set if len(first_set) > len(secnd_set) else secnd_set shortest = secnd_set if len(first_set) > len(secnd_set) else first_set # Generate set of non-shared values and return list of values in original type return [input_type(i) for i in {i for i in longest if i not in shortest}]

user126284 · Answer

リストをセットに変換する必要があると思います：

>>> a = {('a', 'b'), ('c', 'd'), ('e', 'f')} >>> b = {('a', 'b'), ('h', 'g')} >>> a.symmetric_difference(b) {('e', 'f'), ('h', 'g'), ('c', 'd')}

Sukrit Gupta · Answer

セット内包表記を使用すると、ワンライナーにすることができます。お望みならば：

タプルのセットを取得するには、次のようにします。

Differences = {Tuple(i) for i in First_list} ^ {Tuple(i) for i in Secnd_list}

または、タプルのリストを取得するには、次のようにします。

Differences = list({Tuple(i) for i in First_list} ^ {Tuple(i) for i in Secnd_list})

または、リストのリストを取得するには（本当に必要な場合）、次のようにします。

Differences = [list(j) for j in {Tuple(i) for i in First_list} ^ {Tuple(i) for i in Secnd_list}]

PS：私はここを読みます： https://stackoverflow.com/a/10973817/4900095 map（）関数は物事を行うPythonの方法ではありません。

btilly · Answer

http://docs.python.org/library/difflib.html は、探しているものの出発点として適しています。

デルタに再帰的に適用すると、ネストされたデータ構造を処理できるはずです。しかし、それは少し手間がかかります。