Python 3.xで2.xのような並べ替え動作を実現するにはどうすればよいですか？

Question

Python 2.xの3.xでの並べ替え動作を複製して、可能な場合は改善するようにしています。そのため、int、floatなどは期待どおりにソートされ、相互に順序付けできないタイプは出力内でグループ化されます。

これが私が話していることの例です：

_>>> sorted([0, 'one', 2.3, 'four', -5]) # Python 2.x [-5, 0, 2.3, 'four', 'one'] _

_>>> sorted([0, 'one', 2.3, 'four', -5]) # Python 3.x Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: str() < int() _

これに対する私の以前の試みは、sorted()のキーパラメータにクラスを使用しています（を参照してください）異種シーケンスをソートするためのこのキークラスが奇妙に動作するのはなぜですか？）は基本的に壊れた、そのアプローチ

値を比較しようとする
それが失敗した場合、それらのタイプの文字列表現の比較にフォールバック

BrenBarnの優れた答えで説明されているように、非推移的な順序付けにつながる可能性があります。

コーディングすることさえせずに最初に拒否した単純なアプローチは、_(type, value)_タプルを返すキー関数を使用することです。

_def motley(value): return repr(type(value)), value _

しかし、これは私が望むことをしません。そもそも、相互に順序付け可能な型の自然な順序付けを壊します。

_>>> sorted([0, 123.4, 5, -6, 7.89]) [-6, 0, 5, 7.89, 123.4] >>> sorted([0, 123.4, 5, -6, 7.89], key=motley) [7.89, 123.4, -6, 0, 5] _

次に、入力に同じ本質的に順序付け不可能なタイプの2つのオブジェクトが含まれている場合、例外が発生します。

_>>> sorted([{1:2}, {3:4}], key=motley) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: dict() < dict() _

...これは確かにPython 2.xと3.xの両方で標準の動作です–理想的には、そのようなタイプをグループ化したいのですが（特にそれらについては気にしません）順序付け、ただし、Pythonによる安定した並べ替えの保証に基づいて、元の順序を維持するように見えます）。

私は数値型の最初の問題を、それらを特別なケースにすることで回避できます：

_from numbers import Real from decimal import Decimal def motley(value): numeric = Real, Decimal if isinstance(value, numeric): typeinfo = numeric else: typeinfo = type(value) return repr(typeinfo), value _

...それはそれが行くところまで機能します：

_>>> sorted([0, 'one', 2.3, 'four', -5], key=motley) [-5, 0, 2.3, 'four', 'one'] _

...しかし、相互に順序付け可能な他の異なる（おそらくユーザー定義の）型が存在する可能性があるという事実は考慮されていません。もちろん、本質的に順序付けできない型でも失敗します。

_>>> sorted([{1:2}, {3:4}], key=motley) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: dict() < dict() _

両方任意の、個別ではあるが相互に順序付け可能な型の問題and本質的に順序付けできない型の問題を解決する別のアプローチはありますか？

Bas Swinckels · Accepted Answer

ばかげた考え：最初のパスを作成して、相互に比較できるグループ内のすべての異なるアイテムを分割し、個々のグループを並べ替え、最後にそれらを連結します。アイテムは、グループの最初のメンバーと比較できる場合、グループのすべてのメンバーと比較できると思います。このようなもの（Python3）：

import itertools def python2sort(x): it = iter(x) groups = [[next(it)]] for item in it: for group in groups: try: item < group[0] # exception if not comparable group.append(item) break except TypeError: continue else: # did not break, make new group groups.append([item]) print(groups) # for debugging return itertools.chain.from_iterable(sorted(group) for group in groups)

これは、どの項目も比較できないという哀れなケースでは2次の実行時間になりますが、確実にすべての可能な組み合わせを確認することが確実であることを知る唯一の方法だと思います。複雑な数値のように、並べ替えできないアイテムの長いリストを並べ替えようとする人にとっては、二次的な振る舞いは当然の罰と見なされます。いくつかの文字列といくつかの整数の混合のより一般的なケースでは、速度は通常のソートの速度と同様でなければなりません。簡単なテスト：

In [19]: x = [0, 'one', 2.3, 'four', -5, 1j, 2j, -5.5, 13 , 15.3, 'aa', 'zz'] In [20]: list(python2sort(x)) [[0, 2.3, -5, -5.5, 13, 15.3], ['one', 'four', 'aa', 'zz'], [1j], [2j]] Out[20]: [-5.5, -5, 0, 2.3, 13, 15.3, 'aa', 'four', 'one', 'zz', 1j, 2j]

グループは、比類のないアイテムに遭遇する順序で形成されるため、これも「安定したソート」のようです。

Martijn Pieters · Answer

この答えは、Python 3のPython 2ソート順を細部まで忠実に再現することを目的としています。

実際のPython 2の実装はかなり複雑ですが、 object.c 's default_3way_compare は、インスタンスに通常の比較ルールを実装する機会が与えられた後、最終的なフォールバックを行います。これは、個々の型に比較する機会が与えられた後です（__cmp__または__lt__フックを介して）。

その関数をラッパーで純粋なPythonとして実装し、さらにルールの例外（dictおよび複素数）をエミュレートすると、同じPython 2ソートセマンティクスが得られます。 Python 3：

from numbers import Number # decorator for type to function mapping special cases def per_type_cmp(type_): try: mapping = per_type_cmp.mapping except AttributeError: mapping = per_type_cmp.mapping = {} def decorator(cmpfunc): mapping[type_] = cmpfunc return cmpfunc return decorator class python2_sort_key(object): _unhandled_types = {complex} def __init__(self, ob): self._ob = ob def __lt__(self, other): _unhandled_types = self._unhandled_types self, other = self._ob, other._ob # we don't care about the wrapper # default_3way_compare is used only if direct comparison failed try: return self < other except TypeError: pass # hooks to implement special casing for types, dict in Py2 has # a dedicated __cmp__ method that is gone in Py3 for example. for type_, special_cmp in per_type_cmp.mapping.items(): if isinstance(self, type_) and isinstance(other, type_): return special_cmp(self, other) # explicitly raise again for types that won't sort in Python 2 either if type(self) in _unhandled_types: raise TypeError('no ordering relation is defined for {}'.format( type(self).__name__)) if type(other) in _unhandled_types: raise TypeError('no ordering relation is defined for {}'.format( type(other).__name__)) # default_3way_compare from Python 2 as Python code # same type but no ordering defined, go by id if type(self) is type(other): return id(self) < id(other) # None always comes first if self is None: return True if other is None: return False # Sort by typename, but numbers are sorted before other types self_tname = '' if isinstance(self, Number) else type(self).__name__ other_tname = '' if isinstance(other, Number) else type(other).__name__ if self_tname != other_tname: return self_tname < other_tname # same typename, or both numbers, but different type objects, order # by the id of the type object return id(type(self)) < id(type(other)) @per_type_cmp(dict) def dict_cmp(a, b, _s=object()): if len(a) != len(b): return len(a) < len(b) adiff = min((k for k in a if a[k] != b.get(k, _s)), key=python2_sort_key, default=_s) if adiff is _s: # All keys in a have a matching value in b, so the dicts are equal return False bdiff = min((k for k in b if b[k] != a.get(k, _s)), key=python2_sort_key) if adiff != bdiff: return python2_sort_key(adiff) < python2_sort_key(bdiff) return python2_sort_key(a[adiff]) < python2_sort_key(b[bdiff])

私は Python 2に実装されているように辞書のソートを処理するを組み込んだ。これは、__cmp__フックを介してタイプ自体によってサポートされるためである。もちろん、キーと値のPython 2の順序にもこだわっています。

Python 2でこれらをソートしようとすると例外が発生するため、複素数の特別な大文字小文字も追加しました。

>>> sorted([0.0, 1, (1+0j), False, (2+3j)]) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: no ordering relation is defined for complex numbers

Python 2の動作を正確にエミュレートする場合は、さらに特殊なケースを追加する必要がある場合があります。

複素数とにかくをソートする場合は、それらを常に非数値グループに配置する必要があります。例えば。：

# Sort by typename, but numbers are sorted before other types if isinstance(self, Number) and not isinstance(self, complex): self_tname = '' else: self_tname = type(self).__name__ if isinstance(other, Number) and not isinstance(other, complex): other_tname = '' else: other_tname = type(other).__name__

いくつかのテストケース：

>>> sorted([0, 'one', 2.3, 'four', -5], key=python2_sort_key) [-5, 0, 2.3, 'four', 'one'] >>> sorted([0, 123.4, 5, -6, 7.89], key=python2_sort_key) [-6, 0, 5, 7.89, 123.4] >>> sorted([{1:2}, {3:4}], key=python2_sort_key) [{1: 2}, {3: 4}] >>> sorted([{1:2}, None, {3:4}], key=python2_sort_key) [None, {1: 2}, {3: 4}]

Fred S · Answer

実行していないPython 3ですが、おそらくこのようなものが機能します。 "value"で "less than"比較を実行すると例外が発生するかどうかをテストし、それを処理するために "something"を実行します文字列に変換するような場合。

もちろん、リストに同じタイプではないが相互に順序付け可能な他のタイプがある場合は、さらに特別な処理が必要になります。

from numbers import Real from decimal import Decimal def motley(value): numeric = Real, Decimal if isinstance(value, numeric): typeinfo = numeric else: typeinfo = type(value) try: x = value < value except TypeError: value = repr(value) return repr(typeinfo), value >>> print sorted([0, 'one', 2.3, 'four', -5, (2+3j), (1-3j)], key=motley) [-5, 0, 2.3, (1-3j), (2+3j), 'four', 'one']

Efron Licht · Answer

この問題は次の方法で解決できます。

タイプでグループ化します。
各タイプの1つの代表を比較して、比較可能なタイプを見つけます。
比較可能なタイプのグループをマージします。
可能であれば、マージされたグループをソートします。
（ソートされた）マージされたグループからの収量

repr(type(x))を使用して、型から確定的で順序付け可能なキー関数を取得できます。ここでの「タイプ階層」は、タイプ自体のreprによって決定されることに注意してください。このメソッドの欠点は、2つの型が同じ__repr__（インスタンスではなく型自体）を持つ場合、型を「混同」することです。これは、タプル(repr(type), id(type))を返すキー関数を使用して解決できますが、このソリューションでは実装していません。

Bas Swinkelに対する私の方法の利点は、順序付けできない要素のグループをより明確に処理できることです。二次の振る舞いはありません。代わりに、関数は、sorted（）の実行中に最初に試行された順序付けの後、あきらめます。

私のメソッドは、イテラブルに非常に多くの異なる型が存在するシナリオで最悪の機能を果たします。これはまれなシナリオですが、発生する可能性があります。

def py2sort(iterable): by_type_repr = lambda x: repr(type(x)) iterable = sorted(iterable, key = by_type_repr) types = {type_: list(group) for type_, group in groupby(iterable, by_type_repr)} def merge_compatible_types(types): representatives = [(type_, items[0]) for (type_, items) in types.items()] def mergable_types(): for i, (type_0, elem_0) in enumerate(representatives, 1): for type_1, elem_1 in representatives[i:]: if _comparable(elem_0, elem_1): yield type_0, type_1 def merge_types(a, b): try: types[a].extend(types[b]) del types[b] except KeyError: pass # already merged for a, b in mergable_types(): merge_types(a, b) return types def gen_from_sorted_comparable_groups(types): for _, items in types.items(): try: items = sorted(items) except TypeError: pass #unorderable type yield from items types = merge_compatible_types(types) return list(gen_from_sorted_comparable_groups(types)) def _comparable(x, y): try: x < y except TypeError: return False else: return True if __== '__main__': print('before py2sort:') test = [2, -11.6, 3, 5.0, (1, '5', 3), (object, object()), complex(2, 3), [list, Tuple], Fraction(11, 2), '2', type, str, 'foo', object(), 'bar'] print(test) print('after py2sort:') print(py2sort(test))

Collin Anderson · Answer

Python 2 sorting c code in python 3をできるだけ忠実に実装しようとしました。

次のように使用します：mydata.sort(key=py2key())またはmydata.sort(key=py2key(lambda x: mykeyfunc))

def default_3way_compare(v, w): # Yes, this is how Python 2 sorted things :) tv, tw = type(v), type(w) if tv is tw: return -1 if id(v) < id(w) else (1 if id(v) > id(w) else 0) if v is None: return -1 if w is None: return 1 if isinstance(v, (int, float)): vname = '' else: vname = type(v).__name__ if isinstance(w, (int, float)): wname = '' else: wname = type(w).__name__ if vname < wname: return -1 if vname > wname: return 1 return -1 if id(type(v)) < id(type(w)) else 1 def py2key(func=None): # based on cmp_to_key class K(object): __slots__ = ['obj'] __hash__ = None def __init__(self, obj): self.obj = func(obj) if func else obj def __lt__(self, other): try: return self.obj < other.obj except TypeError: return default_3way_compare(self.obj, other.obj) < 0 def __gt__(self, other): try: return self.obj > other.obj except TypeError: return default_3way_compare(self.obj, other.obj) > 0 def __eq__(self, other): try: return self.obj == other.obj except TypeError: return default_3way_compare(self.obj, other.obj) == 0 def __le__(self, other): try: return self.obj <= other.obj except TypeError: return default_3way_compare(self.obj, other.obj) <= 0 def __ge__(self, other): try: return self.obj >= other.obj except TypeError: return default_3way_compare(self.obj, other.obj) >= 0 return K

Chris_Rands · Answer

Python 3.2+の1つの方法は functools.cmp_to_key() を使用することです。これを使用すると、値を比較してから落ちるソリューションをすばやく実装できます型の文字列表現の比較に戻ります。順序付けされていない型を比較するときに発生するエラーを回避し、元の場合と同じように順序をそのままにすることもできます。

_from functools import cmp_to_key def cmp(a,b): try: return (a > b) - (a < b) except TypeError: s1, s2 = type(a).__name__, type(b).__name__ return (s1 > s2) - (s1 < s2) _

例（入力リスト Martijn Pietersの回答から取得）：

_sorted([0, 'one', 2.3, 'four', -5], key=cmp_to_key(cmp)) # [-5, 0, 2.3, 'four', 'one'] sorted([0, 123.4, 5, -6, 7.89], key=cmp_to_key(cmp)) # [-6, 0, 5, 7.89, 123.4] sorted([{1:2}, {3:4}], key=cmp_to_key(cmp)) # [{1: 2}, {3: 4}] sorted([{1:2}, None, {3:4}], key=cmp_to_key(cmp)) # [None, {1: 2}, {3: 4}] _

これには、3者間比較が常に実行され、時間が複雑になるという欠点があります。ただし、解決策はオーバーヘッドが低く、短く、クリーンであり、cmp_to_key()はこの種のPython 2エミュレーションのユースケース用に開発されたと思います。

ABri · Answer

例外の使用を回避し、型ベースのソリューションに行くために、私はこれを思いつきました：

#! /usr/bin/python3 import itertools def p2Sort(x): notImpl = type(0j.__gt__(0j)) it = iter(x) first = next(it) groups = [[first]] types = {type(first):0} for item in it: item_type = type(item) if item_type in types.keys(): groups[types[item_type]].append(item) else: types[item_type] = len(types) groups.append([item]) #debuggng for group in groups: print(group) for it in group: print(type(it),) # for i in range(len(groups)): if type(groups[i][0].__gt__(groups[i][0])) == notImpl: continue groups[i] = sorted(groups[i]) return itertools.chain.from_iterable(group for group in groups) x = [0j, 'one', 2.3, 'four', -5, 3j, 0j, -5.5, 13 , 15.3, 'aa', 'zz'] print(list(p2Sort(x)))

リスト内のさまざまな型と型保持変数（notImpl）を保持する追加の辞書が必要であることに注意してください。さらに、floatとintはここでは混合されていません。

出力：

================================================================================ 05.04.2017 18:27:57 ~/Desktop/sorter.py -------------------------------------------------------------------------------- [0j, 3j, 0j] <class 'complex'> <class 'complex'> <class 'complex'> ['one', 'four', 'aa', 'zz'] <class 'str'> <class 'str'> <class 'str'> <class 'str'> [2.3, -5.5, 15.3] <class 'float'> <class 'float'> <class 'float'> [-5, 13] <class 'int'> <class 'int'> [0j, 3j, 0j, 'aa', 'four', 'one', 'zz', -5.5, 2.3, 15.3, -5, 13]

Eugene Lisitsky · Answer

この種のタスク（このシステムに非常に近い別のシステムの動作の模倣など）を、ターゲットシステムの詳細を明確にして開始することをお勧めします。さまざまなコーナーケースでどのように機能するか。それを行うための最良の方法の1つ-正しい動作を確認するための一連のテストを記述します。そのようなテストをすることは与える：

どの要素がどの要素に先行するかを理解する
基本的な文書化
一部のリファクタリングと機能の追加に対してシステムを堅牢にします。たとえば、もう1つのルールが追加された場合、以前のルールが壊れていないことを確認するにはどうすればよいですか？

そのようなテストケースを書くことができます：

sort2_test.py

import unittest from sort2 import sorted2 class TestSortNumbers(unittest.TestCase): """ Verifies numbers are get sorted correctly. """ def test_sort_empty(self): self.assertEqual(sorted2([]), []) def test_sort_one_element_int(self): self.assertEqual(sorted2([1]), [1]) def test_sort_one_element_real(self): self.assertEqual(sorted2([1.0]), [1.0]) def test_ints(self): self.assertEqual(sorted2([1, 2]), [1, 2]) def test_ints_reverse(self): self.assertEqual(sorted2([2, 1]), [1, 2]) class TestSortStrings(unittest.TestCase): """ Verifies numbers are get sorted correctly. """ def test_sort_one_element_str(self): self.assertEqual(sorted2(["1.0"]), ["1.0"]) class TestSortIntString(unittest.TestCase): """ Verifies numbers and strings are get sorted correctly. """ def test_string_after_int(self): self.assertEqual(sorted2([1, "1"]), [1, "1"]) self.assertEqual(sorted2([0, "1"]), [0, "1"]) self.assertEqual(sorted2([-1, "1"]), [-1, "1"]) self.assertEqual(sorted2(["1", 1]), [1, "1"]) self.assertEqual(sorted2(["0", 1]), [1, "0"]) self.assertEqual(sorted2(["-1", 1]), [1, "-1"]) class TestSortIntDict(unittest.TestCase): """ Verifies numbers and dict are get sorted correctly. """ def test_string_after_int(self): self.assertEqual(sorted2([1, {1: 2}]), [1, {1: 2}]) self.assertEqual(sorted2([0, {1: 2}]), [0, {1: 2}]) self.assertEqual(sorted2([-1, {1: 2}]), [-1, {1: 2}]) self.assertEqual(sorted2([{1: 2}, 1]), [1, {1: 2}]) self.assertEqual(sorted2([{1: 2}, 1]), [1, {1: 2}]) self.assertEqual(sorted2([{1: 2}, 1]), [1, {1: 2}])

次に、そのようなソート機能があるかもしれません：

sort2.py

from numbers import Real from decimal import Decimal from itertools import tee, filterfalse def sorted2(iterable): """ :param iterable: An iterable (array or alike) entity which elements should be sorted. :return: List with sorted elements. """ def predicate(x): return isinstance(x, (Real, Decimal)) t1, t2 = tee(iterable) numbers = filter(predicate, t1) non_numbers = filterfalse(predicate, t2) sorted_numbers = sorted(numbers) sorted_non_numbers = sorted(non_numbers, key=str) return sorted_numbers + sorted_non_numbers

使い方は非常に簡単で、テストで文書化されています：

>>> from sort2 import sorted2 >>> sorted2([1,2,3, "aaa", {3:5}, [1,2,34], {-8:15}]) [1, 2, 3, [1, 2, 34], 'aaa', {-8: 15}, {3: 5}]

appills · Answer

これを実現する1つの方法を次に示します。

lst = [0, 'one', 2.3, 'four', -5] a=[x for x in lst if type(x) == type(1) or type(x) == type(1.1)] b=[y for y in lst if type(y) == type('string')] a.sort() b.sort() c = a+b print(c)

Ashish Bansal · Answer

@ martijn-pieters python2のリストにも___cmp___があり、リストオブジェクトの比較やpython2での処理方法を処理できるかどうかはわかりません。

とにかく @ martijn-pieters's answer に加えて、次のリストコンパレーターを使用したので、少なくとも同じ入力セット内の要素の異なる順序に基づいて異なる並べ替えられた出力が得られません。

@per_type_cmp(list) def list_cmp(a, b): for a_item, b_item in Zip(a, b): if a_item == b_item: continue return python2_sort_key(a_item) < python2_sort_key(b_item) return len(a) < len(b)

それで、Martijnによる元の回答と結合します。

_from numbers import Number # decorator for type to function mapping special cases def per_type_cmp(type_): try: mapping = per_type_cmp.mapping except AttributeError: mapping = per_type_cmp.mapping = {} def decorator(cmpfunc): mapping[type_] = cmpfunc return cmpfunc return decorator class python2_sort_key(object): _unhandled_types = {complex} def __init__(self, ob): self._ob = ob def __lt__(self, other): _unhandled_types = self._unhandled_types self, other = self._ob, other._ob # we don't care about the wrapper # default_3way_compare is used only if direct comparison failed try: return self < other except TypeError: pass # hooks to implement special casing for types, dict in Py2 has # a dedicated __cmp__ method that is gone in Py3 for example. for type_, special_cmp in per_type_cmp.mapping.items(): if isinstance(self, type_) and isinstance(other, type_): return special_cmp(self, other) # explicitly raise again for types that won't sort in Python 2 either if type(self) in _unhandled_types: raise TypeError('no ordering relation is defined for {}'.format( type(self).__name__)) if type(other) in _unhandled_types: raise TypeError('no ordering relation is defined for {}'.format( type(other).__name__)) # default_3way_compare from Python 2 as Python code # same type but no ordering defined, go by id if type(self) is type(other): return id(self) < id(other) # None always comes first if self is None: return True if other is None: return False # Sort by typename, but numbers are sorted before other types self_tname = '' if isinstance(self, Number) else type(self).__name__ other_tname = '' if isinstance(other, Number) else type(other).__name__ if self_tname != other_tname: return self_tname < other_tname # same typename, or both numbers, but different type objects, order # by the id of the type object return id(type(self)) < id(type(other)) @per_type_cmp(dict) def dict_cmp(a, b, _s=object()): if len(a) != len(b): return len(a) < len(b) adiff = min((k for k in a if a[k] != b.get(k, _s)), key=python2_sort_key, default=_s) if adiff is _s: # All keys in a have a matching value in b, so the dicts are equal return False bdiff = min((k for k in b if b[k] != a.get(k, _s)), key=python2_sort_key) if adiff != bdiff: return python2_sort_key(adiff) < python2_sort_key(bdiff) return python2_sort_key(a[adiff]) < python2_sort_key(b[bdiff]) @per_type_cmp(list) def list_cmp(a, b): for a_item, b_item in Zip(a, b): if a_item == b_item: continue return python2_sort_key(a_item) < python2_sort_key(b_item) return len(a) < len(b) _

PS：コメントとして作成する方が理にかなっていますが、コメントを書く十分な評判がありませんでした。代わりに、答えとして作成しています。