How does the time complexity of iterating through two unsorted list of integers compare to sorting then iterating through the lists?

This question is regarding the Time Complexity of two algorithms.

So I am a solving a question involving two lists (not sorted) of integers, and I want to return the pair of integers (one from each list) with the smallest absolute difference.

  1. The naive approach is simply to iterate through both lists, and keep track of the smallest difference throughout, and at the end return the pair that produced this difference- I believe this would take O(mn) time, where n and m are the length of each list; as we simply use 2x for loops.

  2. Another approach is to sort both lists, then iterate through both of them, and there is some logic where we can break out of the for loops in many instances without iterating through the whole of the second list. I believe this would take O(nlogn + mlogm) time, again where n,m are the lengths of the two lists.

I read on the solution page that solution 2. is more efficient, but I don’t see why?- if the time complexity I mentioned above are correct, for solution 2. if we factorise by mn what’s left >0??

Answer

We can better see the difference with an example. Let’s say we have 2 lists, each containing a 100-item list of the numbers from 1 to 100.

  • list1 = [11, 2, 33, 4, 5, 67, ...] = n items where n is 100
  • list2 = [8, 90, 1, 23, 45, 7, ...] = m items where m is 100

Naive approach

Steps:

  • Iterate over each element of list1’s n items
    • Time complexity: n = 100
  • For each of the said iteration, iterate over each element of list2’s m items just to get the number that will produce the smallest difference with the current item of list1.
    • Time complexity: n * m = 100 * 100 = 10,000

Total time complexity: n * m = 10,000

Sorted data approach 1

Steps:

  • Sort list2 producing [1, 2, 3, 4, 5, ...]
    • Time complexity: m * log(m) = 100 * log(100) = 664.4 = ~700
  • Iterate over each element of list1’s n items
    • Time complexity: n = 100
  • For each of the said iteration, perform a binary search on the sorted items of list2 to see where the current item of list1 is the closest to.
    • To further explain this, let’s say we have a sorted 8-item list [0, 5, 10, 15, 20, 25, 30, 35]
    • If the number we want to search for the closest value is 22, we don’t need to iterate through the whole list. Just like a dictionary/thesaurus, if you would look for the word “Potato”, you won’t be browsing through the dictionary from page 1, what you would do is go to the middle, if it’s lower than “P” such as “M” then go further, but if higher such as “S” then go back, then repeat.
    • So the items that we would be checking is the middle 15, then 25, then 20. These 3 items proves the binary search time complexity of log(n) = log(8) = 3 since we are dividing the list in half for every step.
    • Once we are at 20, we can check the surrounding numbers namely [15, 20, 25] to see the closest to 22, which will be 20, thus it is the number to pair to get the smallest difference. This is just a constant O(1) operation.
    • Time complexity: n * log(m) = 100 * log(100) = 664.4 = ~700

Total time complexity: (m * log(m)) + (n * log(m)) = 700 + 700 = 1,400

Sorted data approach 2

  • Sort list1 producing [1, 2, 3, 4, 5, ...]
    • Time complexity: n * log(n) = 100 * log(100) = 664.4 = ~700
  • Sort list2 producing [1, 2, 3, 4, 5, ...]
    • Time complexity: m * log(m) = 100 * log(100) = 664.4 = ~700
  • Iterate through both list1 and list2 at the same time to know what pair would produce the smallest difference. Move the iterator that has the smaller value. The idea is like 2 persons carrying wood upwards through stairs, where the lower person would be the one to move higher, to remain in sync and for the 2 persons to remain at the closest distance to each other at all times (otherwise the wood will fall ^_^).
    • Time complexity: n + m = 100 + 100 = 200

Total time complexity: (n * log(n)) + (m * log(m)) + n + m = (n * log(n)) + (m * log(m)) = 700 + 700 = 1,400

  • n + m was removed as they are non-dominant terms

Conclusion

For 2 lists with size 100 each, the sorted data approach is ~8,600 iterations faster than the naive approach.