Algorithms (Week 1) | M.E. Irizarry-Gelpí

Sun 04 May 2014
Notes
#algorithms

As part of my journey towards data science I am taking a MOOC on algorithms by Tim Roughgarden from Stanford University. I took my first MOOC some years ago from MIT and enjoyed it quite a lot. Algorithms are things I know very little about and I hope this is no longer true by the end of the course. As part of my study I will keep some lecture notes with important points from each of the six weeks. Currently, I am planning on solving the programming questions with Python (using this as an opportunity to write some code in Python 3.x).

Introduction

An algorithm is "a set of well-defined rules, a recipe in effect, for solving some computational problem".

Define a computational problem, with an input and an output, then provide an algorithm that transforms the input into the output.

The algorithm's designer mantra:

"Perhaps the most important principle for the good algorithm designer is to refuse to be content." - Aho, Hopcroft, and Ullman, The Design and Analysis of Computer Algorithms

In other words: Can we do better?

Merge Sort

Merge sort is an old algorithm. A good example of the divide-and-conquer paradigm. One breaks a problem into smaller sub-problems that are solved recursively. Merge sort achives sorting with a number of operations that is less than quadratic in size.

The sorting problem: as input we are given an array of $N$ numbers, which are not sorted, and we are asked to provide the $N$ numbers in sorted order (say in increasing order).

I wrote a short function that takes as input a non-negative integer n, and gives as output an unsorted list containing the n + 1 integers between 0 and n. Here it is:

from random import choice

def unsorted_list(n):
    l1 = [m for m in range(n)]
    l2 = list()
    for x in range(n):
        y = choice(l1)
        l2.append(y)
        l1.remove(y)
    return l2

With this function I can create list of integers that need to be sorted. Next, we implement Merge Sort. My implementation involves two parts. The first part is a function that takes as input two sorted lists of integers a and b, and gives as output a third sorted list c made by merging a and b. Here is the code:

def mel_merge(a, b):
    i_a = 0
    i_b = 0
    c = list()
    while (i_a < len(a)) and (i_b < len(b)):
        if a[i_a] < b[i_b]:
            c.append(a[i_a])
            i_a += 1
        elif b[i_b] < a[i_a]:
            c.append(b[i_b])
            i_b += 1
        else:
            c.append(a[i_a])
            c.append(b[i_b])
            i_a += 1
            i_b += 1
    if (i_a == len(a)) and (i_b < len(b)):
        c = c + b[i_b:]
    elif (i_b == len(b)) and (i_a < len(a)):
        c = c + a[i_a:]
    return c

The second part is a recursive function that takes an unsorted list l, splits it into two smaller lists, then recursively calls itself on those smaller lists producing sorted lists l1 and l2, and finally calls mel_merge to merge these two into a sorted list:

def mel_merge_sort(l):
    len_l = len(l)
    if (len_l == 0) or (len_l == 1):
        return l
    else:
        k = len_l // 2
        l1 = mel_merge_sort(l[:k])
        l2 = mel_merge_sort(l[k:])
        return mel_merge(l1, l2)

Analysis of Algorithms

One can adopt three guiding principles for how to reason about algorithms and also how to define a fast algorithm.

The first guiding principle is worst-case analysis. This principle will help us find results that hold quite generally with no assumptions about the specific details of the input. A related, but different principle is average-case analysis, where one makes some assumptions about the frequencies of certain kinds of inputs and develops algorithms optimized for those particular cases. However, this requires some domain knowledge, whereas the worst-case analysis holds in general.

The second guiding principle consist of not worrying to much about constant factors, or lowest order terms. This means that we are really interested in looking at the leading (dominant) behavior.

The third guiding principle consist of using asymptotic analysis. This means that we will focus on large input sizes. In some sense, the interesting problems are the ones with large inputs.

With these three principles we can define a fast algorithm as an algorithm where the wors-case running time grows slowly with the input size. For most of the problems we will encounter, the "holy grail" is to find an algorithm that solves the problem in linear time (or close to linear time).

Asymptotic Analysis

Asymptotic analysis is the language used to discuss the high-level performance of an algorithm. It is coarse enough to suppress specific details like architecture/language/compiler-dependent details. It is also sharp enough to make useful comparisons between different algorithms, especially on large inputs.

The high-level idea is to supress constant factors (which are system-specific) and lower-order terms (which are irrelevant for large inputs). A somewhat careful analysis of Merge Sort yields a running time of the form

$$T(n) = 6n \lg(n) + 6n$$

Using asymptotic analysis we just provide $n \lg(n)$. That is, we ignore the constant factor $6$ and the lower-order term $6n$. In this sense we say that the running time of Merge Sort is $O(n \lg(n))$.

Big-O

In general, we only consider functions defined for positive integers (i.e. sizes of inputs). One such function is the worst-case running time of an algorithm as a function of the input size $n$, which we denote by $T(n)$. If $T(n) = O(f(n))$, this means that eventually (i.e. for all sufficiently large $n$), $T(n)$ is bounded from above by $f(n)$. A more formal definition is:

Big-O Notation: $T(n) = O(f(n))$ if and only if there exist positive constants $c$ and $n_{0}$ such that
$$T(n) \leq c f(n)$$
for all $n \geq n_{0}$.

Big-Omega and Big-Theta

Similar to big-O, we use $T(n) = \Omega(g(n))$ to mean that eventually $T(n)$ is bounded from below by $g(n)$. A more formal definition is:

Big-Omega Notation: $T(n) = \Omega(g(n))$ if and only if there exist positive constants $c$ and $n_{0}$ such that
$$T(n) \geq c g(n)$$
for all $n \geq n_{0}$.

Finally, we can use $T(n) = \Theta(h(n))$ to mean that both $T(n) = O(h(n))$ and $T(n) = \Omega(h(n))$. That is,

Big-Theta Notation: $T(n) = \Theta(h(n))$ if there exist positive constants $c_{1}$, $c_{2}$ and $n_{0}$ such that
$$c_{1} h(n) \leq T(n) \leq c_{2} h(n)$$
for all $n \geq n_{0}$.

Divide and Conquer Algorithms

The divide and conquer paradigm for solving problems consist of two main steps. One first divides the main problem into smaller subproblems, that are solvable (the conquering). After combining the solutions of the smaller subproblems, the starting problem can be solved.

Counting Inversions

Given an array $A$ of $n$ distinct integers, an inversion is a pair of array indices $i$ and $j$ such that $i < j$ and $A_{i} > A_{j}$.