How to compute counts on a pandas DataFrame column in Python, given an input list of values?

I have a list that specifies a date range (obtained with pandas.date_range()), let’s say:

dates = ["2016-1-1", "2016-1-2", "2016-1-3", "2016-1-4", "2016-1-5", "2016-1-6"]

and a pandas DataFrame which in one column has start dates, i.e:

df = {"start_date": ["2016-1-3", "2016-1-2", "2016-1-6", "2016-1-6", "2016-1-1"],
      "event": ["task1", "task43", "task2", "task9", "task6"]}

As a result, I want to know for each element in the dates list how often it occurs in the start_date column of the DataFrame.

According to this example, my desired output would be:

[1, 1, 1, 0, 0, 2]

I have to compute some statistics on the output, so a simple list of counts suffices. Normally I could run a counter on the DataFrame column, but since I also want the zero counts for the dates that don’t occur in the DataFrame column, this is not possible to my knowledge.

What is the most efficient way to solve this? The actual DataFrame on which I have to run this is pretty huge, so if multiple solutions are possible I’m looking for the least time-consuming one.

Answer

Convert values to DataFrame:

df = pd.DataFrame(df)

First idea with convert both – column and lsit to datetimes with Series.value_counts and Series.reindex:

dates = ["2016-1-1", "2016-1-2", "2016-1-3", "2016-1-4", "2016-1-5", "2016-1-6"]

df['start_date'] = pd.to_datetime(df['start_date'])
dates = pd.to_datetime(dates)

L = df['start_date'].value_counts().reindex(dates, fill_value=0).tolist()
print (L)
[1, 1, 1, 0, 0, 2]

Or solution without convert to datetimes:

dates = ["2016-1-1", "2016-1-2", "2016-1-3", "2016-1-4", "2016-1-5", "2016-1-6"]

L = df['start_date'].value_counts().reindex(dates, fill_value=0).tolist()