Python heapq.nlargest
vs list.sort
David Allouche — 2025-07-10
All the measurements on one page.
While browsing the code of DeepFace, I noticed this snippet in the face detection function:
if max_faces is not None and max_faces < len(facial_areas):
facial_areas = nlargest(
max_faces, facial_areas,
key=lambda facial_area: facial_area.w * facial_area.h
)
(Reformatted to fit in the page width)
I was not familiar with the
heapq.nlargest
function, so I looked it up. In hindsight, this is an appropriate use of a
standard library function that matches the intent. But at the time I
thought: “nlargest
is Python code, and relatively obscure. Why not use
list.sort()
and slicing, which is optimized C code and familiar?”
So, how much faster is list.sort()
, really? And at what list size does the
better algorithmic complexity of heapq.nlargest
overcome the overhead of
being written in Python?
Yes, this is a micro-optimization. In the context of deep learning models, this snippet’s runtime is negligible. But I had already nerd-sniped myself—I had to know.
heapq.nlargest
has the biggest advantage over list sorting for max_faces ==
1
. And it is likely the most common case: either the application supports
multiple faces, or it only wants one.
So the side quest begun: at what list size does heapq.nlargest(1, x)
outperforms x.sort(); x[-1]
? And how does it compare to max(x)
,
which should be the fastest in this case.
Benchmarking gotchas
A simple problem, but plenty of ways to get it wrong.
For example, x.sort()
modifies x
in place. In a
timeit
loop, only the first
iteration is meaningful unless you reset x
. So I used
timeit.repeat
with number=1
, a high repeat
and a setup that initialized x
with random
integers. According to the
documentation,
one should take the minimum of the repeat
values. But for small lists, that
measures the time for the best case input.
Instead, generate a fixed list, take the minimum time over repeated runs, and average over multiple lists. For small lists, you can test every permutation. For larger ones, sample random inputs.
For fast operations, timer resolution matters. So it is better to time y =
list(x); y.sort; y[-1]
in a loop, and subtract the time for y = list(x)
alone.
CPython
After several false starts, here is what I got:
list.sort
is faster than heapq.nlargest
for lists with 27 elements or
fewer. Surprisingly, it even outperforms max
for up to 9 elements long.
This was on Python 3.10, because that is version included in the Google Deep Learning images. So what about the latest released Python release (3.13.5)?
Interesting! heapq.nlargest
is nearly twice as fast for small lists. But
list.sort
got a bit slow. Now the cutoff is at 18 elements. list.sort
is
still a bit faster than max
for tiny lists. The improvements on the
interpreter speed make a clear difference.
Pypy
And what about Pypy? That should be a different story.
One more gotcha: when the list was built with
random.sample
,
timing were worse than with
itertools.permutations
or
random.shuffle
.
It seems random.sample
confuses Pypy type inference. Adding a = [int(x) for
x in a]
fixed the issue.
On Pypy, heapq.nlargest
is the fastest, except for one-element lists. It is
no longer penalized for being written in Python, because so is
list.sort
.
Surprisingly, heapq.nlargest
is slightly faster than max
.
Oddly, Pypy 3.11 is a bit slower than Pypy 3.10 or earlier on this benchmark.
I used pyenv
to run the benchmarks on all the versions of Python that it
could install on a recent MacBook Pro. These were the most interesting results.
All the measurements on one page.
Have fun.