I am using Python networkx to compute the Jaccard similarity of the following 2 networks:

G.edges(data=True) OutEdgeDataView([('v1', 'a', {'weight': 1}), ('v1', 'b', {'weight': 1}), ('a', 'd', {'weight': 1}), ('a', '11', {'weight': 7}), ('b', 'v4', {'weight': 1}), ('b', 'v9', {'weight': 1}), ('v9', 'b', {'weight': 1})]) K.edges(data=True) OutEdgeDataView([('v1', 'asssssss', {'weight': 1}), ('v1', 'b', {'weight': 1}), ('asssssss', 'd', {'weight': 1}), ('asssssss', '11', {'weight': 7}), ('b', 'asssssss', {'weight': 10}), ('b', 'd', {'weight': 10}), ('v9', 'b', {'weight': 1})])

I am using the following solution kindly provided in this answer:

def jaccard_similarity(g, h): i = set(g).intersection(h) return round(len(i) / (len(g) + len(h) - len(i)),6) jaccard_similarity(G.edges(), K.edges()) 0.166667

However, I need to extend this calculation to a weighted jaccard measure (distance or similarity) that **takes into account arc weights** and works both for directed and undirected networks, which could have different edges and nodes.

Can you help me?

## Answer

For the weighted Jaccard, we need to define a vector. If we treat union of all edges as our vector dimensions, then our weights for each graph form a coordinate in that space.

def weighted_jaccard(g1, g2): edges = set(g1.edges()).union(g2.edges()) mins, maxs = 0, 0 for edge in edges: weight1 = g1.get_edge_data(*edge, {}).get('weight', 0) weight2 = g2.get_edge_data(*edge, {}).get('weight', 0) mins += min(weight1, weight2) maxs += max(weight1, weight2) return mins / maxs

This definition is consistent with the non-weighted Jaccard when all of the weights are 1.

It’s worth pointing out that once you’ve defined your vectors in this way, you could apply pretty much any R^{n} distance metric.