The question is published on by Tutorial Guruji team.
I have a dictionary with each key holding a list of float values. These lists are not of same size.
I’d like to convert this dictionary to a pandas dataframe so that I can perform some analysis functions on the data easily such as (min, max, average, standard deviation, more).
My dictionary looks like this:
{ 'key1': [10, 100.1, 0.98, 1.2], 'key2': [72.5], 'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7] }
What is the best way to get this into a dataframe so that I can utilize basic functions like sum
, mean
, describe
, std
?
The examples I find (like the link above), all assume each of the keys have the same number of values in the list.
Answer
d={ 'key1': [10, 100.1, 0.98, 1.2], 'key2': [72.5], 'key3': [1, 5.2, 71.2, 9, 10.11, 12.21, 65, 7] } df=pd.DataFrame.from_dict(d,orient='index').transpose()
Then df
is
key3 key2 key1 0 1.00 72.5 10.00 1 5.20 NaN 100.10 2 71.20 NaN 0.98 3 9.00 NaN 1.20 4 10.11 NaN NaN
Note that numpy has some built in functions that can do calculations ignoring NaN
values, which may be relevant here. For example, if you want to find the mean of 'key1'
column, you can do it as follows:
import numpy as np np.nanmean(df[['key1']]) 28.07
Other useful functions include numpy.nanstd, numpy.nanvar, numpy.nanmedian, numpy.nansum
.
EDIT: Note that the functions from your basic functions link can also handle nan
values. However, their estimators may be different from those of numpy. For example, they calculate the unbiased estimator of sample variance, while the numpy version calculates the “usual” estimator of sample variance.