How to filter by date slice at second level in multiindex dataframe

I have a DataFrame with dates as second level index. How can I filter between two dates?
Here is the code to generate the DataFrame:

dates=pd.date_range(start='2015-01-01', end='2018-12-01', freq='M')
persons=['John','Paul','Susan','Steve','Anne','Carol']
miindex=pd.MultiIndex.from_product([persons, dates],
                           names=['persons', 'dates'])
df = pd.DataFrame(np.random.randn(282, 4), columns=list('ABCD'), index=miindex)

                       A         B         C           D
persons dates               
John    2015-01-31  -1.381854   0.438590    -1.838329   0.085944
        2015-02-28  -1.870273   0.040513    1.116906    0.473218
        2015-03-31  0.522960    -0.190412   -0.650339   -0.532672
        2015-04-30  0.147605    -0.045129   1.209839    1.831272
        2015-05-31  -0.331290   -0.413971   -2.418138   0.149583
... ... ... ... ... ...
Carol   2018-07-31  -0.344657   0.871752    -0.040436   0.132283
        2018-08-31  0.168781    0.776657    -0.103212   -0.082286
        2018-09-30  0.019738    0.151568    -0.794741   -1.316847
        2018-10-31  -1.047699   0.913352    1.009840    0.070882
        2018-11-30  -1.360346   -0.850818   -0.824563   0.305373

How could I filter rows with following dates:

  • included in 2016
  • between 2015 and 2017
  • from 01-02-2016 and on
  • from 01-01-2018 and on

For example, filtering for dates from 01-01-2018 and on I should get

                               A         B         C         D
persons dates                                             
John    2018-01-31  1.092697 -0.534817  1.498770 -0.746335
        2018-02-28  0.141443  0.286186 -0.652946 -0.331205
        2018-03-31 -0.547728  0.942533 -0.315792 -1.564275
        2018-04-30  2.383790  1.117817 -0.419611  1.603313
        2018-05-31  0.405304 -1.468452 -0.713453  0.605490
                     ...       ...       ...       ...
Carol   2018-07-31  0.711990  0.615596  1.198836  2.283507
        2018-08-31 -0.071486 -0.102290 -1.855148  0.284160
        2018-09-30  1.461128 -1.163214  1.142434  0.183197
        2018-10-31 -1.994097 -0.275098  0.877738 -1.094145
        2018-11-30  0.225581  2.194110  0.160663  1.582566 

Notice that you must ignore the value at columns A, B, C, D in my output because I generated the DataFrame randomly only with the indexes expected to show the content.

Answer

Use partial string indexing with MultiIndex, but first sorting by DataFrame.sort_index:

df = df.sort_index()

idx = pd.IndexSlice
print (df.loc[idx[:, "2016"], :])
                           A         B         C         D
persons dates                                             
Anne    2016-01-31  1.189332  1.240492  1.948487  1.049944
        2016-02-29  0.155651  0.172096 -1.315934  2.447474
        2016-03-31  0.258901  1.052156  0.194412  0.551807
        2016-04-30  0.817727 -0.039305  0.196576 -1.163072
        2016-05-31 -0.379003 -0.640898 -0.412814 -0.507134
                     ...       ...       ...       ...
Susan   2016-08-31  0.944875  0.655981 -1.167568  1.087909
        2016-09-30 -0.533770  0.271889  0.743089 -1.021702
        2016-10-31 -0.548632  0.980111  1.288285 -1.130429
        2016-11-30  0.843035 -1.019152  0.394127  0.375720
        2016-12-31  0.789154  0.660676 -0.097020 -0.392890

[72 rows x 4 columns]

print (df.loc[idx[:, "2015":"2017"], :])
                           A         B         C         D
persons dates                                             
Anne    2015-01-31  0.340056 -0.084973 -0.160449  0.476274
        2015-02-28  1.521403  2.075643 -0.089913 -3.556345
        2015-03-31  1.871844 -1.933054  0.360196 -1.184768
        2015-04-30  1.996072 -0.671001  1.001818  0.787014
        2015-05-31  0.642655 -0.685923 -0.854484 -0.311828
                     ...       ...       ...       ...
Susan   2017-08-31 -0.349868  1.095051  0.950181  1.365780
        2017-09-30  0.937602  0.456578  0.169026 -0.559212
        2017-10-31 -0.404749  0.595979 -0.434110  2.312148
        2017-11-30  1.381366 -1.470635  0.773891 -0.686727
        2017-12-31 -0.611788  0.963277  0.564169 -0.647526

[216 rows x 4 columns]

print (df.loc[idx[:, "01-02-2016":], :])
                           A         B         C         D
persons dates                                             
Anne    2016-01-31  1.189332  1.240492  1.948487  1.049944
        2016-02-29  0.155651  0.172096 -1.315934  2.447474
        2016-03-31  0.258901  1.052156  0.194412  0.551807
        2016-04-30  0.817727 -0.039305  0.196576 -1.163072
        2016-05-31 -0.379003 -0.640898 -0.412814 -0.507134
                     ...       ...       ...       ...
Susan   2018-07-31 -0.180213 -0.613854 -0.143997  0.938364
        2018-08-31 -1.232334 -1.066170  2.074717 -0.219996
        2018-09-30 -0.014457  0.350130 -0.920580  0.040339
        2018-10-31  1.651722 -0.399346 -1.647574  0.323075
        2018-11-30  1.465342  0.182188  0.039446 -1.155651

[210 rows x 4 columns]

print (df.loc[idx[:, "01-01-2018":], :])
                           A         B         C         D
persons dates                                             
Anne    2018-01-31  0.072784 -0.093604 -0.896780 -0.336099
        2018-02-28 -0.591907 -0.439462 -0.189500  0.172523
        2018-03-31  0.027810 -0.932447  0.547707 -0.148938
        2018-04-30 -0.114616  0.116554 -0.840459 -1.807368
        2018-05-31 -0.017403  0.562685  0.157102  1.739236
                     ...       ...       ...       ...
Susan   2018-07-31 -0.180213 -0.613854 -0.143997  0.938364
        2018-08-31 -1.232334 -1.066170  2.074717 -0.219996
        2018-09-30 -0.014457  0.350130 -0.920580  0.040339
        2018-10-31  1.651722 -0.399346 -1.647574  0.323075
        2018-11-30  1.465342  0.182188  0.039446 -1.155651

[66 rows x 4 columns]

Leave a Reply

Your email address will not be published. Required fields are marked *