The Girvan-Newman algorithm for community detection in networks:

detects communities by progressively removing edges from the original graph. The algorithm removes the “most valuable” edge, traditionally the edge with the highest betweenness centrality, at each step. As the graph breaks down into pieces, the tightly knit community structure is exposed and the result can be depicted as a dendrogram.

In NetworkX the implementation returns an iterator over tuples of sets. First tuple is the first cut consisting of 2 communities, second tuple is the second cut consisting of 3 communities, etc., until the last tuple with n sets for n separate nodes (the leaves of the dendrogram).

import networkx as nx G = nx.path_graph(10) comp = nx.community.girvan_newman(G) list(comp)

[({0, 1, 2, 3, 4}, {5, 6, 7, 8, 9}), ({0, 1}, {2, 3, 4}, {5, 6, 7, 8, 9}), ({0, 1}, {2, 3, 4}, {5, 6}, {8, 9, 7}), ({0, 1}, {2}, {3, 4}, {5, 6}, {8, 9, 7}), ({0, 1}, {2}, {3, 4}, {5, 6}, {7}, {8, 9}), ({0}, {1}, {2}, {3, 4}, {5, 6}, {7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5, 6}, {7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}, {9})]

Question is: how to plot this dendrogram?

I’ve looked at `scipy.cluster.hierarchy.dendrogram`

but it expects a “linkage matrix” I’m guessing such as the one created by `scipy.cluster.hierarchy.linkage`

, but I’m not sure how I would convert this list of tuples into this “linkage matrix”.

So I’m asking how to draw this dendrogram, with/without the help of SciPy’s `dendrogram`

.

## Answer

Following @ItamarMushkin I followed @mdml’s answer with slight modifications and got what I wanted. At high level I’m turning NetworkX’s Girvan-Newman iterator output into another `DiGraph()`

I eventually want to see as a dendogram. Then I build `Z`

, a linkage matrix I input to `scipy.cluster.hierarchy.dendrogram`

, in the form of a edgelist that includes the actual height for each dendogram merge.

Two modifications I had to make to @mdml’s answer:

- Not that important: I sort the tuple-keys of nodes entering
`index`

- More important: I add a
`get_merge_height`

function, which gives for each merge its unique height according to Girvan-Newman output order of edges removal. Otherwise, all merges of two nodes would be the same height in the dendrogram, all merges in the next level of merging two nodes and another one would be the same height, etc.

I understand there may be some redundant iterations here, I haven’t thought about optimization yet.

import networkx as nx from itertools import chain, combinations import matplotlib.pyplot as plt from scipy.cluster.hierarchy import dendrogram # get simulated Graph() and Girvan-Newman communities list G = nx.path_graph(10) communities = list(nx.community.girvan_newman(G)) # building initial dict of node_id to each possible subset: node_id = 0 init_node2community_dict = {node_id: communities[0][0].union(communities[0][1])} for comm in communities: for subset in list(comm): if subset not in init_node2community_dict.values(): node_id += 1 init_node2community_dict[node_id] = subset # turning this dictionary to the desired format in @mdml's answer node_id_to_children = {e: [] for e in init_node2community_dict.keys()} for node_id1, node_id2 in combinations(init_node2community_dict.keys(), 2): for node_id_parent, group in init_node2community_dict.items(): if len(init_node2community_dict[node_id1].intersection(init_node2community_dict[node_id2])) == 0 and group == init_node2community_dict[node_id1].union(init_node2community_dict[node_id2]): node_id_to_children[node_id_parent].append(node_id1) node_id_to_children[node_id_parent].append(node_id2) # also recording node_labels dict for the correct label for dendrogram leaves node_labels = dict() for node_id, group in init_node2community_dict.items(): if len(group) == 1: node_labels[node_id] = list(group)[0] else: node_labels[node_id] = '' # also needing a subset to rank dict to later know within all k-length merges which came first subset_rank_dict = dict() rank = 0 for e in communities[::-1]: for p in list(e): if tuple(p) not in subset_rank_dict: subset_rank_dict[tuple(sorted(p))] = rank rank += 1 subset_rank_dict[tuple(sorted(chain.from_iterable(communities[-1])))] = rank # my function to get a merge height so that it is unique (probably not that efficient) def get_merge_height(sub): sub_tuple = tuple(sorted([node_labels[i] for i in sub])) n = len(sub_tuple) other_same_len_merges = {k: v for k, v in subset_rank_dict.items() if len(k) == n} min_rank, max_rank = min(other_same_len_merges.values()), max(other_same_len_merges.values()) range = (max_rank-min_rank) if max_rank > min_rank else 1 return float(len(sub)) + 0.8 * (subset_rank_dict[sub_tuple] - min_rank) / range # finally using @mdml's magic, slightly modified: G = nx.DiGraph(node_id_to_children) nodes = G.nodes() leaves = set( n for n in nodes if G.out_degree(n) == 0 ) inner_nodes = [ n for n in nodes if G.out_degree(n) > 0 ] # Compute the size of each subtree subtree = dict( (n, [n]) for n in leaves ) for u in inner_nodes: children = set() node_list = list(node_id_to_children[u]) while len(node_list) > 0: v = node_list.pop(0) children.add( v ) node_list += node_id_to_children[v] subtree[u] = sorted(children & leaves) inner_nodes.sort(key=lambda n: len(subtree[n])) # <-- order inner nodes ascending by subtree size, root is last # Construct the linkage matrix leaves = sorted(leaves) index = dict( (tuple([n]), i) for i, n in enumerate(leaves) ) Z = [] k = len(leaves) for i, n in enumerate(inner_nodes): children = node_id_to_children[n] x = children[0] for y in children[1:]: z = tuple(sorted(subtree[x] + subtree[y])) i, j = index[tuple(sorted(subtree[x]))], index[tuple(sorted(subtree[y]))] Z.append([i, j, get_merge_height(subtree[n]), len(z)]) # <-- float is required by the dendrogram function index[z] = k subtree[z] = list(z) x = z k += 1 # dendrogram plt.figure() dendrogram(Z, labels=[node_labels[node_id] for node_id in leaves]) plt.savefig('dendrogram.png')