Problem

Solving challenges on project Euler or HackerRank is a good past time. For folks working in the wider analaytical / data science field, places like project Euler provide an excellent opportunity to work with academic programming concepts that do not frequently appear in real-life. I was looking at common problem:

You are given an unordered array consisting of consecutive integers [1, 2, 3, …, n] without any duplicates. You are allowed to swap any two elements. Find the minimum number of swaps required to sort the array in ascending order.
Example Perform the following steps:

i   arr                     swap (indices)
0   [7, 1, 3, 2, 4, 5, 6]   swap (0,3)  
1   [2, 1, 3, 7, 4, 5, 6]   swap (0,1)  
2   [1, 2, 3, 7, 4, 5, 6]   swap (3,4)  
3   [1, 2, 3, 4, 7, 5, 6]   swap (4,5)  
4   [1, 2, 3, 4, 5, 7, 6]   swap (5,6)  
5   [1, 2, 3, 4, 5, 6, 7]  

It took swaps to sort the array.

Solution

First attempt

After completing the problem, I like to explore literature and search or post on CodeReview for feedback. My initial solution simply involved re-starting array sort and counting each approach. As arrays are consisting of unordered consecutive integers ∈ [1, 2, 3, …, n], the easiest solution

def minimumSwaps(arr):
    # Add zero to avoid the need of shifting the index so all the loops
    # are working fine
    arr.insert(0, 0)
    num_sorts = 0
    while arr != sorted(arr):
        for idx, val in enumerate(arr):
            if idx != val:
                arr[idx], arr[val] = arr[val], arr[idx]
                num_sorts += 1
                break
    return num_sorts

The solution returns the correct results but times out for 6 cases that consists of bigger arrays.

Passed tests: 8 / 14

Second attempt

My first thought was to quickly optimise the existing code by looking for quick wins. The sorting operation in while arr != sorted(arr) could be optimised by storing sorted object (as each array has only one sorted order that meets the criteria).

def minimumSwaps(arr):
    # Add zero to avoid the need of shifting the index so all the loops
    # are working fine
    arr.insert(0, 0)
    num_sorts = 0
    array_sorted = sorted(arr) # First optimisation
    while arr != array_sorted:
        for idx, val in enumerate(arr):
            if idx != val:
                arr[idx], arr[val] = arr[val], arr[idx]
                num_sorts += 1
                break
    return num_sorts

The line array_sorted = sorted(arr) makes the solution time out only once.

Passed tests: 13/14

Third attempt

Understandably, there is unnecessary computation taking place. The for loop will be restarted and always iterate over all elements until finding the element that does not match its correct place. This is computationally expensive. One approach would involve working with the loop to make the iteration less expensive. However, before attempting this I wanted to attempt another solution. For arrays that are simply sorted in reverse order arriving at incremental sorting can be achieved with a number of moves equal to half of the array lengths. In those cases, the computation and array swapping is unnecessary. This is implemented in the example below.

def minimumSwaps(arr):
    # Check if dealing with simple reverse array, as sorting reverse array
    # will be equivalent to half of length
    if arr == sorted(arr, reverse=True):
        return int(len(arr) / 2)
    # Add zero to avoid the need of shifting the index so all the loops
    # are working fine
    arr.insert(0, 0)
    num_sorts = 0
    array_sorted = sorted(arr) # First optimisation
    while arr != array_sorted:
        for idx, val in enumerate(arr):
            if idx != val:
                arr[idx], arr[val] = arr[val], arr[idx]
                num_sorts += 1
                break
    return num_sorts

Passed tests: 14/14

Conclusions

I don’t remember where I’ve seen this statement but someone once said:

RAM is cheap, thinking is expensive

It’s always easy and tempting to start working through concrete example instead of approaching the problem algebraically.

  • First a good program computes what is has to and only that. The task is to return the numbers of steps not to sort. Sorting implies actual movement of elements in array and potentially will have a different (likely worse) memory footprint to a purely algebraic solution. The proposed solution does not adhere to this principle fully as purely algebraic solution is applied only for reverse arrays.
  • Second, a premature optimisation is frequent source of bugs; also, alogirthm is often the source of poor performance. In the proposed solution, there is a little focus on the actual performance, and optimisation is achieved mostly in the course of cheap / common sense steps, like storing sorted object or skipping computation for specific cases.

Main learning points are that thinking how to achieve the outcome is frequently easier than building a detail mental image of the process that lends itself well to estimating the computational effort.