Django filter. What is more efficient – .filter(date__range) or .all() and filter via Python

I have quite a large database table (1M+ rows) that experiences issues when filtering results in Django.

Currently the filter logic is the following:

results = Result.objects.filter(date__range=(date_from, date_to))
for result in results:
    # do stuff

On some periods it causes a crash (probably due to memory exhaustion)

I am wondering, would it be more efficient replacing it with the following:

results = Result.objects.all().order_by('-id')
for result in results:
    if result.date > date_to:
        continue
    if result.date < date_from:
        break
    # do stuff

In theory, since .all() creates a lazy QuerySet, it might perform better than range filtering in database with 1M+ rows. Also would be interesting to know about memory consumption in both cases.

Maybe there is another solution how to do it more efficiently and in constant memory?

Thanks!

Answer

In theory, since .all() creates a lazy QuerySet, it might perform better than range filtering in database with 1M+ rows. Also would be interesting to know about memory consumption in both cases.

A QuerySet is lazy in the sense that it will only retrieve the objects if necessary, but once it has to do that, it will fetch all items. Not only .all() is lazy by the way, all QuerySets are lazy, so if you define a Result.objects.filter(…) queryset, that QuerySet is lazy as well.

But regardless how it is implemented, filtering at the database side is more efficient, since databases are designed to do this, and it results in less bandwidth from the database to the Python/Django layer.

In case there are memory issues, it likely means that your QuerySet, even if filtered, is too large to store in memory. You can work with the .iterator(…) method [Django-doc] to load batches of items that then can be processed:

results = Result.objects.filter(date__range=(date_from, date_to)).iterator()

for result in results:
    # …
    pass

If will each time load a chunk of records into memory that then can be processed. If you do not store the items (for example in a list), then Python can reuse the memory for the next chunk.