Author : Girija Limaye
Introduction
Summer School on Theoretical Foundations of Computer Science was held at IIIT Bangalore from June 18th to 22nd, 2018. It covered 5 themes. In this blog post, I am writing about Approximation Algorithms, Parameterized Complexity and Foundations of Open World Computing. Apart from these, Cryptography and Program analysis were also on agenda of the school.
Disclaimer : If there are any mistakes in the technical details written below, the onus is on the author and not on the faculty of the school.
Note: I have not included the definition of problems since all these are well-known NP Hard problems. Most of the content here is based on my notes I took during the school. At some places, I have augmented them with web references.
Approximation Algorithms
Prof. V N Muralidhara delivered 3 sessions on Approximation algorithms. For the NP hard optimization problems, since the exact optimal solution cannot be efficiently computed, one tries to find the approximate solution efficiently with certain guarantees on the approximate solution.
Approximation factor and some definitions
Consider minimization problem. If $OPT$ is the optimal solution for the problem and $A$ is the solution given by any approximation algorithm solving that problem then we can compare $A$ with $OPT$ using the approximation factor or ratio. E.g. if the $OPT = 100$ and $A = 120$ then $A$ is off by $20\%$. Here we can say that the approximation factor is $\frac{120}{100} = 1.2$. In general one can have following forms of approximation factor.
- $c$ : constant factor approximation e.g. $A \leq 2 * OPT$
- $c$ : constant percent approximation e.g. $A \leq 1.1 * OPT$ i.e. within 10% of $OPT$
- $(1+\epsilon)$ : $\epsilon$ approximation e.g. $A \leq (1+\epsilon) * OPT, \epsilon > 0$
- $f(n)$ : where $f(n)$ is some function of input size $n$ e.g. $A \leq \log{n} * OPT$, $A \leq \sqrt{n} * OPT$
- $p(n)$ : where $p(n)$ is polynomial in $n$ e.g. $n^\epsilon, n$
Even though NP Hard problems are reducible to each other, the approximation guarantees for one NP Hard problem may not give identical guarantees for another NP Hard problem. We will see this in next sections. Some definitions follow.
- PTAS - Polynomial Time Approximation Scheme - If approximation algorithm runs in time $O(n^{\frac{1}{\epsilon}})$ and gives a solution $A$ such that $A \leq (1+\epsilon) * OPT)$, then it is said that the problem under consideration has a PTAS.
- FPTAS - Fully Polynomial Time Approximation Scheme - If approximation algorithm runs in time $O(p(n, \frac{1}{\epsilon}))$ where $(p(n, \frac{1}{\epsilon}))$ is a polynomial in $n$ and $\frac{1}{\epsilon}$ and gives a solution $A$ such that $A \leq (1+\epsilon) * OPT$, then it is said that the problem under consideration has a FPTAS.
This gives a handle on running time based on the accuracy we target. But, as we will see later, some problems do not even have PTAS. It means we can at most hope to get either a constant fraction / constant percentage or a function of $n$ as approximation factor.
We saw following main results in the school. I will not be proving each and every claim below, but will give overall design and how the claimed running time is achieved.
Vertex cover has a 2-approximation algorithm
Since we don't know the $OPT$ and still want to guarantee the approximation factor based on $OPT$, first step is to lower bound $OPT$. For maximization problems, as we will see later in Section Knapsack, we need to upper bound the $OPT$. For Vertex Cover, $OPT \geq$ size of maximal matching in the graph.
This can be shown using a bipartite graph constructed from two vertex sets - one set represents every matched edge in maximal matching as a vertex and another set represents the set of vertices in any Vertex Cover of the graph. Since each of the edges in maximal matching are disjoint by definition of matching, to cover each of them, we must pick one of the end-points from the other set. So, if $M$ is any maximal matching and $VC$ is any Vertex Cover then cardinality of $VC$ is at least as much as cardinality of $M$. So, $VC \geq |M|$ so $OPT \geq |M|$.
Now we have a lower bound. Let's try to design an algorithm and bound the solution it gives w.r.t. the lower bound on $OPT$. We follow simple strategy as follows.
- Find a maximal matching on graph.
- Pick both vertices of every matched edge as VC.
This gives $A = 2 * |M|$. Since, $|M| \leq OPT$, $A \leq 2 * OPT$. Thus, we have a 2-approximation algorithm.
Simplest strategy for Vertex Cover could have been to start with the maximum degree vertex, by including it in $A$, then pick the next vertex with maximum degree and so on. This gives $\log{n}$-approximation algorithm. Refer Section on Set Cover for more details. Since, $\log{n}$ is a function of $n$, a 2-approximation algorithm which instead gives a constant factor approximation is better than this one.
Analyzing an approximation algorithm
As we saw above, we did 2 main steps. In each step, we can ask following questions to see if anything better can be achieved.
- Lower bounding the $OPT$ - Keeping the algorithm same, can be find a better lower bound?
- Analyze the approximation guarantee - Can the same algorithm be analyzed in much better way to get better guarantee?
- If both no, can we design a new, better algorithm from scratch?
Set Cover has a $\log{n}$-approximation algorithm
In Vertex Cover, I said that the greedy strategy gives $\log{n}$ approximation. Here, we can try applying the same greedy strategy - At each step, pick the set $f_i$ among the $m$ sets which covers maximum number of elements from Universe which are yet uncovered.
So, in the beginning, we will take the maximum cardinality set since all elements are uncovered. WLOG, call this set $f_1$. Let $c(e_j)$ be the cost per $j^{th}$ element when it is covered for the first time. Then, for every element in $f_1$, $c(e) = \frac{1}{|f_1|}$. Since we don't know the maximum cardinality and hence the cost, we bound it. If $OPT$ is the optimal solution then $OPT$ number of sets must cover all $n$ elements. Since, $f_1$ is maximum cardinality, $c(e)|_{e \in f_1} \leq \frac{OPT}{n}$.
This is easy to see since $\frac{1}{|f_1|}$ is the least cost since $f_1$ is largest cardinality set. Average cost of all elements in $OPT$ is $\frac{OPT}{n}$. So, $\frac{1}{|f_1|}$ i.e. least possible cost of adding an uncovered element cannot be larger than average cost of $OPT$ (otherwise, it will raise the average itself!).
The total cost of greedy algorithm is $\sum_{j=1}^{n}{c(e_j)}$. This can be shown to be at most $OPT * (\frac{1}{n} + \frac{1}{n-1} + \ldots + 1)$ i.e. $A \leq \log{n} * OPT$.
Note that here we did not explicitly bound the $OPT$ but related it with $A$ using the notion of cost per element. $O(\log{n})$ is the best known approximation factor for Set Cover and there examples to show that this is tight. Also, there are results proving this inapproximability.
Hitting Set and relationship with Vertex Cover
We also studied a dual problem of Set Cover in which we are asked to find the hitting set which is the smallest subset of $n$ elements from the Universe such that intersection of this set with every set in $m$ sets is non-empty. In other words, we need to select smallest number of elements such that they cover all the $m$ sets. Thus, this is exactly the dual of Set Cover problem and this equivalence can be shown by visualizing the problems using Bipartite Graphs. Since, they are duals, Hitting Set also can be solved with $\log{n}$-approximation.
If in the Hitting Set problem, each set in $m$ sets given has the cardinality of 2 then we can easily see that hitting set will be equivalent to Vertex Cover since we can represent the elements by vertices of graph and sets as edges between vertices. So, Vertex Cover is a special case of Hitting Set problem. From Section on Vertex Cover, we know that this special case has a 2-approximation, whereas the general case i.e. Hitting Set itself has $\log{n}$-approximation algorithm.
An attempt towards approximating TSP
We studied TSP in general form as well as metric TSP (i.e. distances following triangle inequality). A minimum spanning tree (MST) touches every vertex in graph and this property is common with TSP. MST is also smallest cost like TSP. The only difference is TSP is a tour i.e. a cycle. But, we can lower bound $OPT$ for TSP by $c(MST)$ where $c(MST)$ is the cost or weight of MST.
This bound is easy to see. Since, if $OPT < c(MST)$ then we can take the $OPT$ for TSP, delete the maximum weight edge and get a spanning tree with cost less than that of MST - a contradiction.
We can easily augment MST by doubling the edges (the well-known algorithm, which now enables us to get a tour), taking an Euler tour and then short-cutting the tour to eliminate multiple occurrences of a vertex. Using triangle inequality while short-cutting the path, we can say that the $A$ can only reduce during the short-cutting phase if at all. So, $A \leq 2 * MST \leq 2 * OPT$. There are however examples in which short-cutting keeps the cost same and hence the bound is tight.
We can extend above idea further to even get 1.5-approximation factor using idea of perfect matching on odd degree vertices and the fact that number of odd degree vertices in any graph are even. This algorithm is due Christofides.
For general TSP, we show that no $\alpha(n)$-approximation algorithm exists. This can be seen from reducing Hamiltonian Cycle (HC) problem to TSP as follows.
For the given graph, consider a complete graph on same set of vertices. For the edges which are present in original graph, assign distance 1 and for other edges assign distance $n * \alpha(n)$. Now, assume that an $\alpha(n)$ approximation algorithm for TSP exists. That means, we can compute approximate solution which is at most $\alpha(n)$ times the $OPT$. We can argue that if such an algorithm exists, then we can solve HC using that algorithm and above reduction as follows.
If the original instance indeed has a HC, then those edges have distance 1 and hence $OPT$ in corresponding TSP instance is $n$. If HC is not present in original instance, then the $OPT$ of TSP instance is more than $n$. Since, our claimed algorithm can produce solution which is at most $\alpha(n)$ times the $OPT$, it will produce solution with cost at most $n * \alpha(n)$ when HC exists in original instance and solution with cost more than $n * \alpha(n)$ when HC is not present. So, based on solution cost i.e. $A$, we can decide if the original instance has HC or not. This contradicts to HC problem being NP-complete. Hence, such algorithm cannot exist.
FPTAS for 0-1 Knapsack problem
We know that fractional Knapsack is polynomial time solvable but not the 0-1 Knapsack. We went through the standard dynamic programming based algorithm for 0-1 Knapsack problem which is pseudo polynomial in profit values i.e. $\theta(n^2 * P)$ if $P$ is maximum profit. We can then remove the dependency on $P$ by introducing suitable $k$ which satisfies $P*\epsilon = k*n$. We then scale down every profit value by $k$ and solve the problem using DP based algorithm. The optimal solution to the scaled down version is a feasible solution to original instance since we have not touched the weights of items. We then proved that the optimal solution to the scaled down version is at least $(1-\epsilon)$ times the $OPT$ of 0-1 Knapsack original instance. So, there exists an FPTAS for 0-1 Knapsack.
Parameterized Complexity
Prof. Pradeesha Ashok delivered 3 sessions on Parameterized Complexity which is comparatively a recent topic of research and tries to address hardness of problems by taking a different approach than approximation algorithms. Here the main idea is to parameterize the given problem. The parameter could be output size, maximum degree in the graph etc. Consider MaxClique problem. Output clique size is the parameter for the parameterized version of Clique problem. Our goal now is get exact solution in running time $O(p(n) * f(k))$ where $k$ is the parameter. Note that this looks similar to approximation algorithms but it isn't. The main difference is we are still targeting exact solution, not approximate and the parameter $k$ has a completely different notion than the $\epsilon$ in PTAS or FPTAS for approximate algorithms.
Techniques
Following are the three main techniques we learned for designing algorithms for parameterized version of NP-Hard problems. Here, each is illustrated with one example problem.
Branching technique (Bounded Search Trees)
Here we take Vertex Cover problem. Parameter is size of Vertex Cover, say $k$. Our goal is to find exact vertex cover of size at most $k$ in running time polynomial in $n$ (size of graph) and some function of $k$.
Approach 1 Initially, VC is $\phi$ and budget is given as $k$.
- Pick any edge $(u, v)$. To cover this edge, either $u$ or $v$ must be in VC.
- Branch out on either $u$ or $v$ indicating that vertex is picked.
- So, at level 2, we have VC as $\{u\}$ in left branch and as $\{v\}$ in right branch.
- As, we have included 1 vertex in VC, reduce budget $k$ by 1.
- Remove edges incident on $u$ or $v$ in respective branches (as they are covered now) to get new sub-graphs.
- Repeat the process till budget lasts. Hence the name - bounded tree.
In the end, if there exists no branch with all edges are covered, it means that a VC within our budget of given $k$ does not exist. Otherwise, we have a branch that leads to empty graph along which we have computed VC. Output the VC.
It is easy to see that such a bounded search tree has 2 branches at each level and there are at most $k$ levels, thus the size complexity of this tree is $O(2^k)$. At each step, we do work which is polynomial in $n$, so the running time is $O(p(n) * 2^k)$.
Improved running time
Instead of picking an edge, we can pick a vertex and decide whether it is part of VC or not. So the two branches correspond to - $v$ is in VC or (if not then it must be the case that) all neighbours of $v$ are in VC. In left branch, we used up 1 from earlier budget $k$ and in right branch we used up $d(v)$ from earlier budget $k$ where $d(v)$ is the degree of vertex $v$. If there is no vertex in original graph with degree more than 2, then the problem is simple, so we assume that $\exists v, d(v) \geq 3$. In this case, $k$ will become $k-3$ at most i.e. decreased by 3 at least. So, such branching tree will have recurrence
$T(k) \geq T(k-1) + T(k-3)$
which is $O(1.46^k)$.
So, the running time is improved to $O(p(n) * 1.46^k)$.
Kernelization
We take Edge Clique Cover problem as illustration here. We want to find exact solution i.e. if the gives graph can be edge-covered by at most $k$ cliques.
Idea behind kernelization is to covert instance $(I, k)$ to a smaller instance $(I', k')$ such that the instance $I'$ has size bounded by a function of parameter $k$ instead of input size $n$. For that, we do some preprocessing of the instance. After this, the kernel of instance will have certain properties as we will see below.
Preprocessing the instance to get kernel We perform following reductions till the rules can be applied.
- Remove isolated vertex if present. This makes sense since an isolated vertex has no edges to cover in the edge clique cover.
- Remove isolated edge $(u, v)$ if present. Decrease budget $k$ by 1. An edge $(u, v)$ is isolated if $u$ and $v$ share no neighbour. This makes sense since an isolated edge can be covered by itself as a 2-clique.
- If there are two vertices $u$ and $v$ such that $N(u) = N(V)$ where $N(.)$ denotes the neighbourhood, then remove either $u$ or $v$. This makes sense because, since all neighbours of $v$ are same as $u$, any clique the $u$ is part of also covers all edges incident on $v$.
Claim The kernel size (number of vertices) is $O(2^k)$ if the graph has an edge clique cover of size at most $k$.
Proof Let graph has edge clique cover of size at most $k$ and let the cliques be
$C_1, C_2, \ldots, C_k$. Pick an vertex $v$ in the kernel. Set $b_i = 1$ if $v$ is part of $C_i$. So, $b_1, b_2, \ldots, b_k$ gives a binary map of vertex $v$. Since there are at most $k$ cliques, at most $2^k$ binary maps are possible. Let's assume on the contrary that kernel has more than $2^k$ vertices left then there are at least two vertices (by Pigeonhole principle) which have the same binary map. There are two cases -
- If both binary maps are all-zero, it means both the corresponding vertices are not part of any clique. Since, at most $k$ cliques are enough to edge cover the graph, it means that both these vertices have no edges incident on them i.e. they are isolated. It contradicts to the reduction rule 1 that we applied repeatedly.
- Else, it means that both vertices are part of same set of cliques. It simply means that both share the neighbourhood. It again contradicts to reduction rule 3 above that we applied repeatedly.
Iterative Compression
Let's assume we have budget of $k$ and we have a solution $Z$ of size slightly larger than budget $k$ like $k+1, 2k,$ etc. Idea of iterative compression is to compress the solution $Z$ to get a feasible solution under budget $k$.
For Vertex Cover problem, here is the iterative compression algorithm
- Run 2-approximation algorithm. Let solution $Z$ is of size $z$.
- If $z > 2k$, declare no solution under budget $k$ exists and exit.
- Let $X$ be the unknown solution within budget $k$.
- Imagine $X_Z = X \cap Z$. Since $X$ is unknown, we need to consider all possible subsets of $Z$ which could be possible $X_Z$s.
- We want to try extending partial solution $X_Z$ if our guess is correct. If so, then every edge incident on vertex in $X_Z$ is covered. And every edge incident on vertex in $Z / X_Z$ is to be covered by a vertex outside $Z$. So, pick $v \notin Z$ such that for edge $(u, v)$, $u \in Z / X_Z$.
- Adjust budget according to how many vertices we pick.
- If we are still within $k$, we got the solution.
- If not, repeat for next imagined intersection of $X$ and $Z$.
For each imagined intersection, p(n) time is spent in extending partial solution to the VC. There are at most $2^{|Z|}$ subsets to consider as intersections. So, the running time for $|Z| = 2k$ is $O(4^k)$.
This can be improved by starting with a $(k+1)$ size solution instead of $2k$ size solution to $O(n * p(n) * 2^k)$.
More problems
In addition to the problems discussed above, following are some of the problems we studied or solved as part of problem set. Here, I briefly write about them.
- Clique and r-regular Clique - Bounded Search Tree technique gives running time of $O(r^k * p(n))$. Similar to VC, at each level, branch in $r$ possible ways and form the clique along the way from root to leaf taking care of budget.
- Cluster Vertex Deletion and Edition problem - Bounded Search Tree technique. We need to prove that a graph is a cluster graph iff there is no P3 as as induced graph. Once we have this in place, we need to eliminate P3s. A P3 can be removed in 3 possible ways - by deleting either of the 3 vertices on that P3. So, at each level, branching factor is 3 and we get $O(3^k)$ running time. Same idea works for editing problem, except that to make the cluster graph, we either delete the two edges on exiting P3 or add 3rd edge to P3 making it a clique. Still the branching factor is 3 and hence running time is same as above.
Foundations of Open World Computing
Prof. Srinath Srinivasa delivered 3 sessions on building foundations of Open World Computing. I'll very briefly write about the contents he covered.
We started with set theory - naive set theory followed by axiomatic approaches to set theory. We discussed Cantor's paradox and Russel's paradox. We revised the power and limitations of 3 computing models namely - Finite State Automaton, Push Down Automaton and Turing Machines. We then learned about the hierarchy of their corresponding languages - Regular, Context Free and Turing decidable (Recursive) and Turing Recognizable (Recursively enumerable). We also discussed about Entscheidungsprobleme.
We also learned how the cardinality of the set can be viewed as defining a one-one mapping between the set and the set of natural numbers. This led us to defining enumerable i.e. countably infinite sets and to the fact that set of even numbers, set of integers and set of rational numbers have cardinality same as that of natural numbers and all are countably infinite. However, a power set of an infinite cardinality set is not countable and this we saw using Cantor's diagonalization technique. A similar argument can be used to prove that Halting Problem is undecidable. We thus studied that even infinity has a hierarchy - $\aleph_0, \aleph_1, \ldots$
With these foundations, we understood the concept of uncomputability. This uncomputability plays a vital role in Open World Computing. In the end, we also had somewhat philosophical discussion on the difference between man-made machines and nature-made machines which was very much enlightening. This closed the loop of where we started to discuss Open World Computing being something like "We don't know what we don't know".