Writing a recursive (?) method to map reorganizations over time

I am working on mapping reorganizations over time. The data we have is in a state where you can see which group became which and when, and the ratio (i.e., how much of the old group became the new one). This state is not very useful, because it is necessary to instead see how much of the old groups are in the groups of today.

The data set I have is much larger and contains many groups but looks like this base case:

Became Was Date Ratio
9230X 9139X 2020-10-01 1
9139X 9179X 2017-01-01 0,8
9139X 9189X 2017-01-01 1
9179X 9101X 2013-01-01 0,5

To clarify: The current group is 9230X. On 2020-10-01 100% of the group 9139X became 9230X. On 2017-01-01 80% of the group 9179X became 9139X, and 100% of 9189X also became 9139X. On 2013-01-01 half of 9101X became 9379X.

The data I need to produce should look like this:

Today Was Date start Date end Ratio
9230X 9139X 2017-01-01 2020-10-01 1
9230X 9179X 2013-01-01 2017-01-01 1 * 0,8
9230X 9189X 2017-01-01 1 * 1
9230X 9101X 2013-01-01 1 * 0,8 * 0,5

So bascally I am after a table that shows all the groups a current group (in this case 9230X) has been previously, and how much of those group has contributed to the current group. This is so that it can be merged with cost data.

To solve this, I am trying to write a recursive method to take a group and search in the column “Became” and then for each hit do the same search for the groups in the column “Was”. The method I have written works, in the sense that it can go all the way down to the ends, but I cannot figure out how to calculate the ratios going back up.

Maybe a recursive method isn’t the best way, but this is the method I have come up with so far… My argument for recursivenss is that the chains are not fixed in lenghts, and going back in time the current group can have been be a few or many groups. I am aware that it doesn’t calculate or return anything at this moment, but that is the part I cannot figure out. Any help or ideas are appreciated!

I am using is Pandas to load a spreadsheet into a dataframe. The conditions where df[date] < date, and df[was] != df[became] are necessary due to some cases where names of groups are reused in another timespan, and some transitions occur between the same groups i.e. 9385X –> 9385X

df = pd.read_excel (r'to_py.xlsx')
search_date = pd.to_datetime('2021-01-01')
search_group = '9230X'
search_ratio = 1

def recursiveFind(group, date, ratio):
  temp = df[(df['Became'] == group) & (df['Date'] < date ) & (df['Was'] != df['Became'])]

  if temp.empty:
    return
  else:
    for idx, row in temp.iterrows():
      recursiveFind(row.loc['Was'], row.loc['Date'], row.loc['Ratio'])

recursiveFind(search_group, search_date, search_ratio)

Answer

Not a pandas expert, but can help with the logic. When you run this function you get data like

A was B
A was C
A was D

Then you want to see if C can be broken out to get a final result of

A was B
A was X
A was Z
A was D

So, to make your thing work, find the matching rows. Build a list. For each returned row, call the recursive function. If that returns rows, change the Became property of each returned row to the current group name and append it to the temporary list. If it doesn’t, append the original row to the list. Return the list of rows.

def recursiveFind(group_name, date):
  rows = []
  for row in get_rows(group_name):
    expanded_rows = []
    for origin_row in recursiveFind(row.loc["Was"]):
        expanded_rows.append(dict(origin_row, became=group_name))
    rows.extend(expanded_rows or row)
  return rows