I have a pandas dataframe

From | To |
---|---|

A | B |

A | C |

D | E |

F | F |

B | G |

B | H |

B | I |

G | J |

G | K |

L | L |

M | M |

N | N |

I want to convert it into multi column hierarchy. The expected hierarchy will look like

Level_1 | Level_2 | Level_3 | Level_4 |
---|---|---|---|

A | B | G | J |

A | B | G | K |

A | B | H | |

A | B | I | |

A | C | ||

D | E | ||

F | F | ||

L | L | ||

M | M | ||

N | N |

Is there an in-built way in pandas to achieve this? I know i can use recursion, Is there any other simplified way?

## Answer

You can easily get what you expect using `networkx`

# Python env: pip install networkx # Anaconda env: conda install networkx import networkx as nx import pandas as pd df = pd.DataFrame({'From': ['A', 'A', 'D', 'F', 'B', 'B', 'B', 'G', 'G', 'L', 'M', 'N'], 'To': ['B', 'C', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N']}) G = nx.from_pandas_edgelist(df, source='From', target='To', create_using=nx.DiGraph) roots = [v for v, d in G.in_degree() if d == 0] leaves = [v for v, d in G.out_degree() if d == 0] all_paths = [] for root in roots: for leaf in leaves: paths = nx.all_simple_paths(G, root, leaf) all_paths.extend(paths) for node in nx.nodes_with_selfloops(G): all_paths.append([node, node])

Output:

>>> pd.DataFrame(sorted(all_paths)).add_prefix('Level_').fillna('') Level_0 Level_1 Level_2 Level_3 0 A B G J 1 A B G K 2 A B H 3 A B I 4 A C 5 D E 6 F F 7 L L 8 M M 9 N N

Documentation: networkx.algorithms.simple_paths.all_simple_paths