Pandas dataframe to multikey dictionary

I’m trying to transform df like this into the dictionary with multiple nested keys.

import pandas as pd
import datetime

columns = ['country', 'city', 'from_date', 'to_date', 'sales']
data = [['UK', 'London', datetime.date(2021, 8, 26), datetime.date(2099, 5,5), 2500], ['Mexico', 'Mexico City', datetime.date(2011, 3,3), datetime.date(2012, 4, 5), 5670], ['Mexico', 'Mexico City', datetime.date(2014, 3,3), datetime.date(2017, 4, 5), 5680]]
df = pd.DataFrame(data, columns=columns)
df
  country         city from_date     to_date  sales
0      UK       London 2021-08-26  2099-05-05   2500
1  Mexico  Mexico City 2011-03-03  2012-04-05   5670
2  Mexico  Mexico City 2014-03-03  2017-04-05   5680

Result # 1 I’m looking for:

{'Mexico': 
   {'Mexico City':
      [
         {'from_date: 2011-03-03, 'to_date:  2012-04-05, 'sales': 5670},
         {'from_date: 2014-03-03, 'to_date:  2017-04-05, 'sales': 5680}
      ]},
'UK': 
     {'London':
       [
         {'from_date: 2021-08-26, 'to_date:  2099-05-05, 'sales': 2500}
      ]},
}

Or Result #2:

{'Mexico': 
   {'Mexico City':
      {2011-03-03: 5670,  # from_date: sales
      2014-03-03: 5680}   # from_date: sales
   },
'UK': 
     {'London':
       {2021-08-26: 2500}   # from_date: sales
      },
}

I don’t know how to get result #1, as for result #2 I’ve tried this:

df.groupby(['country', 'city', 'from_date'])['sales'].apply(float).to_dict()
{('Mexico', 'Mexico City', Timestamp('2011-03-03 00:00:00')): 5670.0, ('Mexico', 'Mexico City', Timestamp('2014-03-03 00:00:00')): 5670.0, ('UK', 'London', Timestamp('2021-08-26 00:00:00')): 2500.0}

BUT I need to be able to get from_date as a separate key because I will be using it to compare to another date. Ideally, I’d like to learn how to get both results but any help is appreciated!

Answer

You can create MultiIndex Series by lambda function in GroupBy.apply with DataFrame.to_dict:

df['from_date'] = pd.to_datetime(df['from_date']).dt.strftime('%Y-%m-%d')
df['to_date'] = pd.to_datetime(df['to_date']).dt.strftime('%Y-%m-%d')

f = lambda x: x.to_dict('records')
s = df.groupby(['country', 'city'])[['from_date','to_date','sales']].apply(f)


d = {level: s.xs(level).to_dict() for level in s.index.levels[0]}
print (d)

{
    'Mexico': {
        'Mexico City': [{
                'from_date': '2011-03-03',
                'to_date': '2012-04-05',
                'sales': 5670
            },
            {
                'from_date': '2014-03-03',
                'to_date': '2017-04-05',
                'sales': 5680
            }
        ]
    },
    'UK': {
        'London': [{
            'from_date': '2021-08-26',
            'to_date': '2099-05-05',
            'sales': 2500
        }]
    }
}

For second is changed lambda function only:

f = lambda x: x.set_index('from_date')['sales'].to_dict()
s2 = df.groupby(['country', 'city']).apply(f)
print (s2)
country  city       
Mexico   Mexico City    {'2011-03-03': 5670, '2014-03-03': 5680}
UK       London                             {'2021-08-26': 2500}
dtype: object

d2 = {level: s2.xs(level).to_dict() for level in s2.index.levels[0]}
print (d2)
{'Mexico': {'Mexico City': {'2011-03-03': 5670, '2014-03-03': 5680}}, 
 'UK': {'London': {'2021-08-26': 2500}}}