Sort by value then by key returns different results upon multiple runs?

I have a dict of terms (words) and scores assigned to them, like:

{'goose': 12.34521, 'egg': 8.54021}

I have a task to sort them by score, and when scores are equal then lexicographically by key, then simply print top 10 terms in format:

term1, term2, ..., term10

This is what I’ve done:

# term_idf is dict as explained above
term_idf_sorted = sorted(term_idf_dict.items(), key=functools.cmp_to_key(cmp_term_score))
terms_sorted = list(map(lambda p: p[0], term_idf_sorted[:n_top_terms]))
print(", ".join(terms_sorted))

where compare function is

def cmp_term_score(term_score_1, term_score_2):
    if term_score_1[1] == term_score_2[1]:
        return term_score_1[0].casefold() < term_score_2[0].casefold()
        return term_score_2[1] - term_score_1[1]

When i create dict based on some chunk of text and then print a sorted version, I get something like:

[('parish', 7.427144133408616), ('saar', 4.406719247264253), ('saaremaa', 4.406719247264253), ('jõe', 4.406719247264253), ('villag', 4.208268308540415) ...]

The problem is that ‘jõe’ should come before ‘saar’ and ‘saaremaa’ but when I run app multiple times, sometimes ‘jõe’ ends up in the middle, and sometimes in the first place, which really confuses me. I tried to change comparing function but then my other test cases fail due to this lexicographical comparison. [This second fail occurs on {‘egg’: 3.05, ‘descend’: 3.05} where ‘egg’ ends up printed before ‘descend’, but ‘descend’ should be first]

How can I leverage this lexicographical sort (as second priority) to be consistent?

Note. Dict terms are read from a file as utf-8 string.


By default tuples are compared in field order. That is, tuples are sorted by their first fields and in the case of ties the second fields are compared, etc. So, if your challenges is how to sort by score followed by name it may be as simple as leveraging this inherent feature of tuples with one wrinkle: you want the numeric sort to be from high to low, while you want the lexicographical sort to be from low to high. The following example does that albeit in a somewhat tricky way.


word_scores = [('parish', 7.427144133408616), ('saar', 4.406719247264253), ('saaremaa', 4.406719247264253), ('jõe', 4.406719247264253), ('villag', 4.208268308540415)]

sorted_word_scores = sorted(word_scores, key=lambda ws: (0 - ws[1], ws[0]))



[('parish', 7.427144133408616), ('jõe', 4.406719247264253), ('saaremaa', 4.406719247264253), ('saar', 4.406719247264253), ('villag', 4.208268308540415)]