How to fix the directed graph so the top most parent is the first level always?

I have the following pandas dataframe:

parent, child 
40066, 50106
30029, 40066
40066, 50100
40066, 50106
50106, 60061
50106, 60063
50106, 60062
50100, 60057
50100, 60058

I am trying to get to the following structure:

child, level1, level2, level3
60063, 30029, 40066, 50106
60062, 30029, 40066, 50106
60061, 30029, 40066, 50106
60058, 30029, 40066, 50100
60057, 30029, 40066, 50100

I have used the following code:

import pandas as pd
import networkx as nx 
df = (see above) 
leaves = set(df.child).difference(set(df.parent))
g = nx.from_pandas_edgelist(df, 'parent', 'child', create_using=nx.DiGraph(), edge_attr=True)
ancestors = {n: nx.algorithms.dag.ancestors(g,n) for n in leaves{ 
df2 = pd.DataFrame.from_disc(ancestors, orient='index')

Which gives me this output:

60063, 30029, 50106, 40066
60062, 30029, 50106, 40066
60061, 30029, 50106, 40066
60058, 30029, 50100, 40066
60057, 50100, 40066, 30029

This is incorrect (the first 4 rows should have the association 30029 -> 400600 -> …) and the last row is completely the wrong order.

Answer

Use nx.all_simple_paths to find all paths from roots to leaves.

Full code:

import pandas as pd
import numpy as np
import networkx as nx


# setup initial data
df = pd.DataFrame({'parent': [40066, 30029, 40066, 40066,
                              50106, 50106, 50106, 50100, 50100],
                   'child': [50106, 40066, 50100, 50106,
                             60061, 60063, 60062, 60057, 60058]})
g = nx.from_pandas_edgelist(df, 'parent', 'child', create_using=nx.DiGraph)

# get leaves and roots
leaves = [node for node, degree in g.out_degree() if degree == 0]
roots = [node for node, degree in g.in_degree() if degree == 0]

# find all paths
paths = []
for root in roots :
  for leaf in leaves :
    for path in nx.all_simple_paths(g, root, leaf):
        paths.append(path)

# create dataframe
df1 = pd.DataFrame(np.roll(paths, shift=1))
df1 = df1.add_prefix('level').rename(columns={'level0': 'child'})

Output:

>>> df1
   child  level1  level2  level3
0  60058   30029   40066   50106
1  60061   30029   40066   50106
2  60063   30029   40066   50106
3  60062   30029   40066   50100
4  60057   30029   40066   50100