Python Pandas can do these tasks?

I have a time series data frame for eight Years (2013-2020) has Hourly data, each Year has Nine zones, under each zone two columns(“Gen”, “Load”) as follows:

                            A ZONE         B ZONE               ...   G ZONE       H ZONE           I ZONE         
        date_time           GEN    LOAD    GEN    LOAD          ...    LOAD      GEN    LOAD         GEN    LOAD
    2013-01-01 00:00:00    725.7 5,859.5  312.2 3,194.7        ...    77.1      706.0 227.1        495.0   861.9
    2013-01-01 01:00:00    436.2 450.5    248.0 198.0          ...    865.5     240.7 107.9        640.5 767.3
    2013-01-01 02:00:00    464.5 160.2    144.2 068.3          ...    738.7     044.7 32.7         509.3 700.4
    2013-01-01 03:00:00    169.9 733.8    268.1 869.5          ...    671.7     649.4 951.3        626.8 652.1
    2013-01-01 04:00:00    145.4 553.4    280.2 872.8          ...    761.5     561.0 912.9        552.1 637.3
...                  ...        ...     ...        ...          ...  ...        ...         ...     ...         ...     ...
    2020-12-31 19:00:00    450.9 951.7    371.4 516.3          ...    461.7     808.9 471.4        983.7 447.8
    2020-12-31 20:00:00    553.0 936.5    848.7 233.9          ...    397.3     978.3 404.3        490.9 233.0
    2020-12-31 21:00:00    458.6 735.6    716.8 121.7          ...    385.1     808.0 192.0        131.5 70.1
    2020-12-31 22:00:00    515.8 651.6    693.5 142.4          ...    291.4     826.1 16.8         591.9 863.2
    2020-12-31 23:00:00    218.6 293.4    448.2 14.2           ...    340.6     435.0 897.4        622.5 768.3

What I want is the following:

1- Detect outliers in each column which is more or less three time Standard Deviation of that column and put it in a new column its name “A_gen_outliers” if the there is outliers in “GEN”column under “A Zone” as well as “A_load_outliers” if the there is outliers in “LOAD”column under “A Zone”. Number of new columns are 18 columns.

2- A new column represents sum of “Gen” columns 3- A new column represents sum of “Load” columns 4- A new column represents “GEN” column calculate A_GEN_div = cell value/maximum value of “GEN column under A Zone for each year for example 725.7/725.7=1 for the first cell and 436.2/725.1 for second cell and for last cell 218.6/553. etc. and the same for all “GEN” columns and also for “LOAD” columns- proposed names “A_Load_div” . Number of new columns are 18 columns.

Number of total new columns are “18 *2 + 2” columns

Thanks in advance.

Answer

I think this might help. Note that this will keep the columns MultiIndex. Your points above seem to imply that you want to flatten your MultIndex. If this is the case, you might want to look at this question.

1:

df.join(df>(3*df.std()), rsuffix='_outlier')

2 and 3:

df.groupby(level=-1, axis=1).sum()

Note that it is not clear from what the first level of the columns MultIndex should be for this.

4:

maxima = df.resample('1Y').max().reindex(df.index, method='bfill').ffill()
df.join(df.divide(maxima), rsuffix='_div')