Python heapq.nlargest vs list.sort
David Allouche — 2025-07-10
All the measurements on one page.
While browsing the code of DeepFace, I noticed this snippet in the face detection function:
if max_faces is not None and max_faces < len(facial_areas):
facial_areas = nlargest(
max_faces, facial_areas,
key=lambda facial_area: facial_area.w * facial_area.h
)
(Reformatted to fit in the page width)
I was not familiar with the
heapq.nlargest
function, so I looked it up. In hindsight, this is an appropriate use of a
standard library function that matches the intent. But at the time I
thought: “nlargest is Python code, and relatively obscure. Why not use
list.sort() and slicing, which is optimized C code and familiar?”
So, how much faster is list.sort(), really? And at what list size does the
better algorithmic complexity of heapq.nlargest overcome the overhead of
being written in Python?
Yes, this is a micro-optimization. In the context of deep learning models, this snippet’s runtime is negligible. But I had already nerd-sniped myself—I had to know.
heapq.nlargest has the biggest advantage over list sorting for max_faces ==
1. And it is likely the most common case: either the application supports
multiple faces, or it only wants one.
So the side quest begun: at what list size does heapq.nlargest(1, x)
outperforms x.sort(); x[-1]? And how does it compare to max(x),
which should be the fastest in this case.
Benchmarking gotchas
A simple problem, but plenty of ways to get it wrong.
For example, x.sort() modifies x in place. In a
timeit loop, only the first
iteration is meaningful unless you reset x. So I used
timeit.repeat
with number=1, a high repeat and a setup that initialized x with random
integers. According to the
documentation,
one should take the minimum of the repeat values. But for small lists, that
measures the time for the best case input.
Instead, generate a fixed list, take the minimum time over repeated runs, and average over multiple lists. For small lists, you can test every permutation. For larger ones, sample random inputs.
For fast operations, timer resolution matters. So it is better to time y =
list(x); y.sort; y[-1] in a loop, and subtract the time for y = list(x)
alone.
CPython
After several false starts, here is what I got:
list.sort is faster than heapq.nlargest for lists with 27 elements or
fewer. Surprisingly, it even outperforms max for up to 9 elements long.
This was on Python 3.10, because that is version included in the Google Deep Learning images. So what about the latest released Python release (3.13.5)?
Interesting! heapq.nlargest is nearly twice as fast for small lists. But
list.sort got a bit slow. Now the cutoff is at 18 elements. list.sort is
still a bit faster than max for tiny lists. The improvements on the
interpreter speed make a clear difference.
Pypy
And what about Pypy? That should be a different story.
One more gotcha: when the list was built with
random.sample,
timing were worse than with
itertools.permutations
or
random.shuffle.
It seems random.sample confuses Pypy type inference. Adding a = [int(x) for
x in a] fixed the issue.
On Pypy, heapq.nlargest is the fastest, except for one-element lists. It is
no longer penalized for being written in Python, because so is
list.sort.
Surprisingly, heapq.nlargest is slightly faster than max.
Oddly, Pypy 3.11 is a bit slower than Pypy 3.10 or earlier on this benchmark.
I used pyenv to run the benchmarks on all the versions of Python that it
could install on a recent MacBook Pro. These were the most interesting results.
All the measurements on one page.
Have fun.