Group data by month, if the month is in a date range of two fields in Django

I have a contract model containing a start and end datetime field. I want to show in a graph how many contracts are active per month (the month is between start and end time).

How can I get this information without multiple database requests per month?

I can annotate it for each field like this

start_month_contracts = contracts.annotate(
            start_month=TruncMonth("start")
        ) 
            .values("start_month") 
            .annotate(count=Count("start_month"))

end_month_contracts = contracts.annotate(
            end_month=TruncMonth("end")
        ) 
            .values("end_month") 
            .annotate(count=Count("end_month"))

but how do I combine both to get the active contracts per month?

Answer

Suppose you have the following model with start and end dates:

class Contract(models.Model):
    ...
    start = models.DateTimeField()
    end = models.DateTimeField()

Basic query for “active” contracts in a month

The basic formula is as you stated:

the month is between start and end time

A query can get us this for any given month…

# Get active contracts for December 2020
month = datetime.datetime(2020, 12, 1)

# all Contract records active in december
qs = Contract.objects.filter(start__lte=month, end__gte=month)
# Or, since we just care about the count, we can use `.count()` instead:
december_active_count = Contract.objects.filter(start__lte=month, end__gte=month).count()

If you find you need to tweak the basic query, that’s fine. It’s not so much the query as much as it is the methods, which carry through this explanation, without regard to what the query happens to be.

multiple counts in a single query

There’s a few ways you can do a single query and chart out your contracts…

Counting records in the django application

A simple naïve approach is to pull all the relevant contracts first in a single query, then count them for each month in Python…

This works fine, but there’s a few potential problems:

  1. The DB will send data for each record. If you have many records, the number of bytes required to be sent by the DB could get excessive.

  2. While the calculations here are fairly lightweight, it does require some CPU power for Python to crunch these numbers for every record and could take a while if there are many records

Really, we probably want to have the DB do the counting for us.

Counting on the database

If you wanted to handle this on the database, rather than in Python, you can develop a query to do aggregations DB-side using .aggregate. The benefit here is that the DB only has to transmit the counts, rather than all the records, which is a significantly smaller number of bytes. It also offloads some number crunching from your app to the DB.

Extending on the first example, let’s try to get the counts for more than 1 month in a single query. We do this by using aggregate along with the Count aggregation function.

from django.db.models import Count, Q

november = datetime.datetime(2020, 11, 1)
december = datetime.datetime(2020, 12, 1)

contract_counts = Contract.objects.aggregate(
    november_counts=Count('pk', filter=Q(start__lte=november, end__gte=november))
    december_counts=Count('pk', filter=Q(start__lte=december, end__gte=december))
)
print(contract_counts)
{'november_counts': 376, 'december_counts': 393}  # <-- output

We can apply this same principle to get the counts for all months over a specified time range. In order to do this, we pre-determine each month between start and end that will be counted and use Case and Count for each of those months.

Really, this is now just a matter of generating the keyword arguments like above, but dynamically.

I’ll also create a custom manager for this model, so make the interface a little nicer.

import calendar
from django.db.models import Count, Q
class ContractManager(models.Manager):
    def month_counts(self, start, end):
        qs = self.get_queryset()
        # generate keyword arguments for .aggregate
        aggregations = {}
        for month in months(start, end):  # the start of each month in the range
            month_name = calendar.month_name[month.month]
            aggregation_name = f'{month_name}_{month.year}'
            aggregations[aggregation_name] = Count(
                'pk', filter=Q(start__lte=month, end__gte=month)
            )
        return qs.aggregate(**aggregations)


class Contract(models.Model):
    start = models.DateTimeField()
    end = models.DateTimeField()
    objects = ContractManager()

You can then produce the counts like so:

start = datetime(2020, 1, 1)
end = datetime(2021, 1, 1)
print(Contract.objects.month_counts(start, end))

The output, gathered from of this might look something like this:

{'January_2020': 2,
 'February_2020': 90,
 'March_2020': 163,
 'April_2020': 234,
 'May_2020': 272,
 'June_2020': 284,
 'July_2020': 284,
 'August_2020': 275,
 'September_2020': 247,
 'October_2020': 205,
 'November_2020': 128,
 'December_2020': 68,
 'January_2021': 3}

You can also see only 1 query is used:

from django.db import connection
print(len(connection.queries))
# 1

Final thoughts and notes

I should mention This is not the most efficient way to do this and there’s a lot of room for optimization. You could probably also generate the month intervals on the DB side, instead of in Python, if you wanted. Specific backends may have more performant options available, too, like the daterange functions of Postgres. Though, what we have here should provide enough context for using aggregate to get the counts you want.

I can annotate it for each field like this

I don’t think your code here gets you the counts you really want. You’re counting the number of contracts that either started or ended in a particular month… but this won’t be able to tell you how many contracts were active in any single given month.

P.S.

I omitted the code for the months() function above for brevity. The code can be found here if you’re interested. Something like pandas might be more performant, though it shouldn’t be a concern, unless your time intervals go over thousands of years 🙂