Sorting HOW TO: Guido Van Rossum Fred L. Drake, JR., Editor
Sorting HOW TO: Guido Van Rossum Fred L. Drake, JR., Editor
Release 2.7.5
Contents
1 2 3 4 5 6 7 8 Sorting Basics Key Functions Operator Module Functions Ascending and Descending Sort Stability and Complex Sorts The Old Way Using Decorate-Sort-Undecorate The Old Way Using the cmp Parameter Odd and Ends i ii iii iii iii iv iv v
Author Andrew Dalke and Raymond Hettinger Release 0.1 Python lists have a built-in list.sort() method that modies the list in-place. There is also a sorted() built-in function that builds a new sorted list from an iterable. In this document, we explore the various techniques for sorting data using Python.
1 Sorting Basics
A simple ascending sort is very easy: just call the sorted() function. It returns a new sorted list: >>> sorted([5, 2, 3, 1, 4]) [1, 2, 3, 4, 5]
You can also use the list.sort() method of a list. It modies the list in-place (and returns None to avoid confusion). Usually its less convenient than sorted() - but if you dont need the original list, its slightly more efcient. >>> >>> >>> [1, a = [5, 2, 3, 1, 4] a.sort() a 2, 3, 4, 5]
Another difference is that the list.sort() method is only dened for lists. In contrast, the sorted() function accepts any iterable. >>> sorted({1: D, 2: B, 3: B, 4: E, 5: A}) [1, 2, 3, 4, 5]
2 Key Functions
Starting with Python 2.4, both list.sort() and sorted() added a key parameter to specify a function to be called on each list element prior to making comparisons. For example, heres a case-insensitive string comparison: >>> sorted("This is a test string from Andrew".split(), key=str.lower) [a, Andrew, from, is, string, test, This] The value of the key parameter should be a function that takes a single argument and returns a key to use for sorting purposes. This technique is fast because the key function is called exactly once for each input record. A common pattern is to sort complex objects using some of the objects indices as keys. For example: >>> student_tuples = [ (john, A, 15), (jane, B, 12), (dave, B, 10), ] >>> sorted(student_tuples, key=lambda student: student[2]) [(dave, B, 10), (jane, B, 12), (john, A, 15)] The same technique works for objects with named attributes. For example: >>> class Student: def __init__(self, name, grade, age): self.name = name self.grade = grade self.age = age def __repr__(self): return repr((self.name, self.grade, self.age)) >>> student_objects = [ Student(john, A, 15), Student(jane, B, 12), Student(dave, B, 10), ] >>> sorted(student_objects, key=lambda student: student.age) [(dave, B, 10), (jane, B, 12), (john, A, 15)]
# sort by age
# sort by age
>>> s = sorted(student_objects, key=attrgetter(age)) >>> sorted(s, key=attrgetter(grade), reverse=True) [(dave, B, 10), (jane, B, 12), (john, A, 15)]
The Timsort algorithm used in Python does multiple sorts efciently because it can take advantage of any ordering already present in a dataset.
Or you can reverse the order of comparison with: >>> def reverse_numeric(x, y): return y - x >>> sorted([5, 2, 4, 1, 3], cmp=reverse_numeric) [5, 4, 3, 2, 1] When porting code from Python 2.x to 3.x, the situation can arise when you have the user supplying a comparison function and you need to convert that to a key function. The following wrapper makes that easy to do: def cmp_to_key(mycmp): Convert a cmp= function into a key= function class K(object): def __init__(self, obj, *args): self.obj = obj def __lt__(self, other): return mycmp(self.obj, other.obj) < 0 def __gt__(self, other): return mycmp(self.obj, other.obj) > 0 def __eq__(self, other): return mycmp(self.obj, other.obj) == 0 def __le__(self, other): return mycmp(self.obj, other.obj) <= 0 def __ge__(self, other): return mycmp(self.obj, other.obj) >= 0 def __ne__(self, other): return mycmp(self.obj, other.obj) != 0 return K To convert to a key function, just wrap the old comparison function: >>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric)) [5, 4, 3, 2, 1] In Python 2.7, the functools.cmp_to_key() function was added to the functools module.
>>> sorted(student_objects) [(dave, B, 10), (jane, B, 12), (john, A, 15)] For general purpose comparisons, the recommended approach is to dene all six rich comparison operators. The functools.total_ordering() class decorator makes this easy to implement. Key functions need not depend directly on the objects being sorted. A key function can also access external resources. For instance, if the student grades are stored in a dictionary, they can be used to sort a separate list of student names: >>> students = [dave, john, jane] >>> grades = {john: F, jane:A, dave: C} >>> sorted(students, key=grades.__getitem__) [jane, dave, john]