COMP4804: Algorithms II
$\newcommand{\R}{\mathbb{R}}\DeclareMathOperator{\E}{\mathbf{E}}\DeclareMathOperator{\deg}{deg}$
Closest Pair and Linear Programming

This lecture looks at two randomized algorithms for two geometric problems. Both algorithms are randomized incremental algorithms: They each compute a random permutation of the input and then incrementally solve the problem by adding one input element at a time and updating the current solution after the addition of each element.

Closest Pair

For simplicity, we will assume that all inter-point distances are unique, so there are no ties. Recall that a hash table can store a set of key/value pairs so that, we can test if any key is in the set and look up its value in $O(1)$ expected time.

We will solve the closest-pair problem with this randomized incremental algorithm:

Lemma: The probability that $p_{i}$ is part of the closest pair among $p_1,\ldots,p_i$ is at most $2/i$.

Proof: Remember that $p_1,\ldots,p_n$ is a random permutation of $P$, so $p_1,\ldots,p_i$ is a random permutation of $P_i=\{p_1,\ldots,p_i\}$. Exactly two elements $a$ and $b$ of $P_i$ define the closest pair and $p_i$ is equally likely to be any of the $i$ elements of $P_i$ so the probability that $p_i\in\{a,b\}$ is $2/i$. ∎

Theorem: Given a set of $n$ points in $\R^2$, we can find the closest pair in $O(n)$ expected time

Proof: After computing the random permutation, the rest of the algorithm examines each point, does a constant number of work, and possibly rebuilds the hash table. Let $c>0$ be some constant and for each $i\in\{2,\ldots,n\}$, let
\[ I_i = \begin{cases} ci & \text{if $p_i$ is part of the CP of $p_1,\ldots,p_i$} \\ 0 & \text{otherwise .} \end{cases} \] Exercise: Check that $\E[I_i] \le 2c$. Then, the expected amount of time spent rebuilding hash tables is \[ \E\left[\sum_{i=2}^n I_i\right] = \sum_{i=2}^n \E[I_i] = \sum_{i=2}^n 2c = \sum_{i=2}^n O(1) = O(n) \enspace . \] The other steps of the algorithm easily run in $O(n)$ time. If you're not sure how the random permutation is generated, you can read about the Fisher–Yates Shuffle. ∎

The preceding algorithm easily generalizes to points in $\R^{d}$, though the expected running time becomes $O(C^d n)$ for some constant $C$. So the algorithm is linear in $n$ but exponential in $d$. This is pretty common for geometric problems and is sometimes called the curse of dimensionality.

Linear Programming

Exercise: How much better can you do than the trivial algorithm? $O(n^2)$, $O(n\log n)$?

To make life simple, we assume there are no horizontal lines and no three lines interect in a common point. We can solve this problem again with a randomized incremental algorithm.

This should look familiar. All we need to show now is that the new solution is not likely to involve $\ell_i$. Exactly 2 lines in $\ell_1,\ldots,\ell_i$ define the optimal solution for $\ell_1,\ldots,\ell_i$. Now, $\ell_i$ is uniformly chosen from a set of size $i-2$, so the probability that $\ell_i$ is one of the two that defines the solution is at most $2/(i-2)$. The rest of the analysis is just like the analysis of the closest pair algorithm.

Theorem: Given $n$ lines in $\R^2$, we can find the lowest point that is above all the lines in $O(n)$ expected time.

The preceding algorithm also generalizes to $\R^d$, and again, the expected running-time is $O(C^dn)$ for some constant $C$. The generalization is less straightforward, though, and involves a recursion on the dimension, $d$.