Python `heapq.nlargest` vs `list.sort`

David Allouche — 2025-07-10

While browsing the code of DeepFace, I noticed this snippet in the face detection function:

if max_faces is not None and max_faces < len(facial_areas):
    facial_areas = nlargest(
        max_faces, facial_areas,
        key=lambda facial_area: facial_area.w * facial_area.h
    )

(Reformatted to fit in the page width)

I was not familiar with the heapq.nlargest function, so I looked it up. In hindsight, this is an appropriate use of a standard library function that matches the intent. But at the time I thought: “nlargest is Python code, and relatively obscure. Why not use list.sort() and slicing, which is optimized C code and familiar?”

So, how much faster is list.sort(), really? And at what list size does the better algorithmic complexity of heapq.nlargest overcome the overhead of being written in Python?

Yes, this is a micro-optimization. In the context of deep learning models, this snippet’s runtime is negligible. But I had already nerd-sniped myself—I had to know.

heapq.nlargest has the biggest advantage over list sorting for max_faces == 1. And it is likely the most common case: either the application supports multiple faces, or it only wants one.

So the side quest begun: at what list size does heapq.nlargest(1, x) outperforms x.sort(); x[-1]? And how does it compare to max(x), which should be the fastest in this case.

Benchmarking gotchas

A simple problem, but plenty of ways to get it wrong.

For example, x.sort() modifies x in place. In a timeit loop, only the first iteration is meaningful unless you reset x. So I used timeit.repeat with number=1, a high repeat and a setup that initialized x with random integers. According to the documentation, one should take the minimum of the repeat values. But for small lists, that measures the time for the best case input.

Instead, generate a fixed list, take the minimum time over repeated runs, and average over multiple lists. For small lists, you can test every permutation. For larger ones, sample random inputs.

For fast operations, timer resolution matters. So it is better to time y = list(x); y.sort; y[-1] in a loop, and subtract the time for y = list(x) alone.

CPython

After several false starts, here is what I got:

CPython 3.10 timings

list.sort is faster than heapq.nlargest for lists with 27 elements or fewer. Surprisingly, it even outperforms max for up to 9 elements long.

This was on Python 3.10, because that is version included in the Google Deep Learning images. So what about the latest released Python release (3.13.5)?

CPython 3.13 timings

Interesting! heapq.nlargest is nearly twice as fast for small lists. But list.sort got a bit slow. Now the cutoff is at 18 elements. list.sort is still a bit faster than max for tiny lists. The improvements on the interpreter speed make a clear difference.

Pypy

And what about Pypy? That should be a different story.

One more gotcha: when the list was built with random.sample, timing were worse than with itertools.permutations or random.shuffle.

It seems random.sample confuses Pypy type inference. Adding a = [int(x) for x in a] fixed the issue.

Pypy 3.11 timings

On Pypy, heapq.nlargest is the fastest, except for one-element lists. It is no longer penalized for being written in Python, because so is list.sort.

Surprisingly, heapq.nlargest is slightly faster than max.

Oddly, Pypy 3.11 is a bit slower than Pypy 3.10 or earlier on this benchmark.

Pypy 3.10 timings

I used pyenv to run the benchmarks on all the versions of Python that it could install on a recent MacBook Pro. These were the most interesting results.

All the measurements on one page.

Have fun.

Python heapq.nlargest vs list.sort

Benchmarking gotchas

CPython

Pypy

Python `heapq.nlargest` vs `list.sort`