Monday, 25 June 2018

Glimpses of Summer School held at IIIT Bangalore in June 2018


Author : Girija Limaye

Introduction

Summer School on Theoretical Foundations of Computer Science was held at IIIT Bangalore from June 18th to 22nd, 2018.  It covered 5 themes. In this blog post, I am writing about Approximation Algorithms, Parameterized Complexity and Foundations of Open World Computing. Apart from these, Cryptography and Program analysis were also on agenda of the school.

Disclaimer : If there are any mistakes in the technical details written below, the onus is on the author and not on the faculty of the school.

Note: I have not included the definition of problems since all these are well-known NP Hard problems. Most of the content here is based on my notes I took during the school. At some places, I have augmented them with web references.


Approximation Algorithms

Prof. V N Muralidhara delivered 3 sessions on Approximation algorithms. For the NP hard optimization problems, since the exact optimal solution cannot be efficiently computed, one tries to find the approximate solution efficiently with certain guarantees on the approximate solution.

Approximation factor and some definitions

Consider minimization problem. If $OPT$ is the optimal solution for the problem and $A$ is the solution given by any approximation algorithm solving that problem then we can compare $A$ with $OPT$ using the approximation factor or ratio. E.g. if the $OPT = 100$ and $A = 120$ then $A$ is off by $20\%$. Here we can say that the approximation factor is $\frac{120}{100} = 1.2$. In general one can have following forms of approximation factor.

  • $c$ : constant factor approximation e.g. $A \leq 2 * OPT$
  • $c$ : constant percent approximation e.g. $A \leq 1.1 * OPT$ i.e. within 10% of $OPT$
  • $(1+\epsilon)$ : $\epsilon$ approximation e.g. $A \leq (1+\epsilon) * OPT, \epsilon > 0$
  • $f(n)$ : where $f(n)$ is some function of input size $n$ e.g. $A \leq \log{n} * OPT$, $A \leq \sqrt{n} * OPT$
  • $p(n)$ : where $p(n)$ is polynomial in $n$ e.g. $n^\epsilon, n$

Even though NP Hard problems are reducible to each other, the approximation guarantees for one NP Hard problem may not give identical guarantees for another NP Hard problem. We will see this in next sections. Some definitions follow.

  • PTAS - Polynomial Time Approximation Scheme - If approximation algorithm runs in time $O(n^{\frac{1}{\epsilon}})$ and gives a solution $A$ such that $A \leq (1+\epsilon) * OPT)$, then it is said that the problem under consideration has a PTAS.
  • FPTAS - Fully Polynomial Time Approximation Scheme - If approximation algorithm runs in time $O(p(n, \frac{1}{\epsilon}))$ where $(p(n, \frac{1}{\epsilon}))$ is a polynomial in $n$ and $\frac{1}{\epsilon}$ and gives a solution $A$ such that $A \leq (1+\epsilon) * OPT$, then it is said that the problem under consideration has a FPTAS.
Note that the running time is a function of reciprocal of $\epsilon$. This is intuitive, since we know that the problem is NP Hard and as much closer we want to get to the $OPT$ (i.e. smaller the $\epsilon$), we need to pay for proportionately higher in terms of the running time. That said, running time being a polynomial function of $\frac{1}{\epsilon}$ is better than having an exponential function of it, so the best is to have an FPTAS, and if not, at least PTAS for a problem.

This gives a handle on running time based on the accuracy we target. But, as we will see later, some problems do not even have PTAS. It means we can at most hope to get either a constant fraction / constant percentage or a function of $n$ as approximation factor.

We saw following main results in the school. I will not be proving each and every claim below, but will give overall design and how the claimed running time is achieved.

Vertex cover has a 2-approximation algorithm

Since we don't know the $OPT$ and still want to guarantee the approximation factor based on $OPT$, first step is to lower bound $OPT$. For maximization problems, as we will see later in Section Knapsack, we need to upper bound the $OPT$. For Vertex Cover, $OPT \geq$ size of maximal matching in the graph.

This can be shown using a bipartite graph constructed from two vertex sets - one set represents every matched edge in maximal matching as a vertex and another set represents the set of vertices in any Vertex Cover of the graph. Since each of the edges in maximal matching are disjoint by definition of matching, to cover each of them, we must pick one of the end-points from the other set. So, if $M$ is any maximal matching and $VC$ is any Vertex Cover then cardinality of $VC$ is at least as much as cardinality of $M$. So, $VC \geq |M|$ so $OPT \geq |M|$.

Now we have a lower bound. Let's try to design an algorithm and bound the solution it gives w.r.t. the lower bound on $OPT$. We follow simple strategy as follows.
  • Find a maximal matching on graph.
  • Pick both vertices of every matched edge as VC.

This gives $A = 2 * |M|$. Since, $|M| \leq OPT$, $A \leq 2 * OPT$. Thus, we have a 2-approximation algorithm.

Simplest strategy for Vertex Cover could have been to start with the maximum degree vertex, by including it in $A$, then pick the next vertex with maximum degree and so on. This gives $\log{n}$-approximation algorithm. Refer Section on Set Cover for more details. Since, $\log{n}$ is a function of $n$, a 2-approximation algorithm which instead gives a constant factor approximation is better than this one.

Analyzing an approximation algorithm

As we saw above, we did 2 main steps. In each step, we can ask following questions to see if anything better can be achieved.
  • Lower bounding the $OPT$ - Keeping the algorithm same, can be find a better lower bound?
  • Analyze the approximation guarantee - Can the same algorithm be analyzed in much better way to get better guarantee?
  • If both no, can we design a new, better algorithm from scratch?
It turns out than 2-approximation algorithm for Vertex Cover is the best we can do. There exist examples which show that the lower bound as well as the analysis is tight. So, the question is - can we completely design a new algorithm with a different strategy altogether and get a better bound? By Unique Games Conjecture, the answer is no.

Set Cover has a $\log{n}$-approximation algorithm

In Vertex Cover, I said that the greedy strategy gives $\log{n}$ approximation. Here, we can try applying the same greedy strategy - At each step, pick the set $f_i$ among the $m$ sets which covers maximum number of elements from Universe which are yet uncovered.

So, in the beginning, we will take the maximum cardinality set since all elements are uncovered. WLOG, call this set $f_1$. Let $c(e_j)$ be the cost per $j^{th}$ element when it is covered for the first time. Then, for every element in $f_1$, $c(e) = \frac{1}{|f_1|}$. Since we don't know the maximum cardinality and hence the cost, we bound it. If $OPT$ is the optimal solution then $OPT$ number of sets must cover all $n$ elements. Since, $f_1$ is maximum cardinality, $c(e)|_{e \in f_1} \leq \frac{OPT}{n}$.

This is easy to see since $\frac{1}{|f_1|}$ is the least cost since $f_1$ is largest cardinality set. Average cost of all elements in $OPT$ is $\frac{OPT}{n}$. So, $\frac{1}{|f_1|}$ i.e. least possible cost of adding an uncovered element cannot be larger than average cost of $OPT$ (otherwise, it will raise the average itself!).

The total cost of greedy algorithm is $\sum_{j=1}^{n}{c(e_j)}$. This can be shown to be at most $OPT * (\frac{1}{n} + \frac{1}{n-1} + \ldots + 1)$ i.e. $A \leq \log{n} * OPT$.

Note that here we did not explicitly bound the $OPT$ but related it with $A$ using the notion of cost per element. $O(\log{n})$ is the best known approximation factor for Set Cover and there examples to show that this is tight. Also, there are results proving this inapproximability.

Hitting Set and relationship with Vertex Cover

We also studied a dual problem of Set Cover in which we are asked to find the hitting set which is the smallest subset of $n$ elements from the Universe such that intersection of this set with every set in $m$ sets is non-empty. In other words, we need to select smallest number of elements such that they cover all the $m$ sets. Thus, this is exactly the dual of Set Cover problem and this equivalence can be shown by visualizing the problems using Bipartite Graphs. Since, they are duals, Hitting Set also can be solved with $\log{n}$-approximation.

If in the Hitting Set problem, each set in $m$ sets given has the cardinality of 2 then we can easily see that hitting set will be equivalent to Vertex Cover since we can represent the elements by vertices of graph and sets as edges between vertices. So, Vertex Cover is a special case of Hitting Set problem. From Section on Vertex Cover, we know that this special case has a 2-approximation, whereas the general case i.e. Hitting Set itself has $\log{n}$-approximation algorithm.

An attempt towards approximating TSP

We studied TSP in general form as well as metric TSP (i.e. distances following triangle inequality). A minimum spanning tree (MST) touches every vertex in graph and this property is common with TSP. MST is also smallest cost like TSP. The only difference is TSP is a tour i.e. a cycle. But, we can lower bound $OPT$ for TSP by $c(MST)$ where $c(MST)$ is the cost or weight of MST.

This bound is easy to see. Since, if $OPT < c(MST)$ then we can take the $OPT$ for TSP, delete the maximum weight edge and get a spanning tree with cost less than that of MST - a contradiction.

We can easily augment MST by doubling the edges (the well-known algorithm, which now enables us to get a tour), taking an Euler tour and then short-cutting the tour to eliminate multiple occurrences of a vertex.  Using triangle inequality while short-cutting the path, we can say that the $A$ can only reduce during the short-cutting phase if at all. So, $A \leq 2 * MST \leq 2 * OPT$. There are however examples in which short-cutting keeps the cost same and hence the bound is tight.

We can extend above idea further to even get 1.5-approximation factor using idea of perfect matching on odd degree vertices and the fact that number of odd degree vertices in any graph are even. This algorithm is due Christofides.

For general TSP, we show that no $\alpha(n)$-approximation algorithm exists. This can be seen from reducing Hamiltonian Cycle (HC) problem to TSP as follows.

For the given graph, consider a complete graph on same set of vertices. For the edges which are present in original graph, assign distance 1 and for other edges assign distance $n * \alpha(n)$. Now, assume that an $\alpha(n)$ approximation algorithm for TSP exists. That means, we can compute approximate solution which is at most $\alpha(n)$ times the $OPT$. We can argue that if such an algorithm exists, then we can solve HC using that algorithm and above reduction as follows.

If the original instance indeed has a HC, then those edges have distance 1 and hence $OPT$ in corresponding TSP instance is $n$. If HC is not present in original instance, then the $OPT$ of TSP instance is more than $n$. Since, our claimed algorithm can produce solution which is at most $\alpha(n)$ times the $OPT$, it will produce solution with cost at most $n * \alpha(n)$ when HC exists in original instance and solution with cost more than $n * \alpha(n)$  when HC is not present. So, based on solution cost i.e. $A$, we can decide if the original instance has HC or not. This contradicts to HC problem being NP-complete. Hence, such algorithm cannot exist.

FPTAS for 0-1 Knapsack problem

We know that fractional Knapsack is polynomial time solvable but not the 0-1 Knapsack. We went through the standard dynamic programming based algorithm for 0-1 Knapsack problem which is pseudo polynomial in profit values i.e. $\theta(n^2 * P)$ if $P$ is maximum profit. We can then remove the dependency on $P$ by introducing suitable $k$ which satisfies $P*\epsilon = k*n$. We then scale down every profit value by $k$ and solve the problem using DP based algorithm. The optimal solution to the scaled down version is a feasible solution to original instance since we have not touched the weights of items. We then proved that the optimal solution to the scaled down version is at least $(1-\epsilon)$ times the $OPT$ of 0-1 Knapsack original instance. So, there exists an FPTAS for 0-1 Knapsack.


Parameterized Complexity

Prof. Pradeesha Ashok delivered 3 sessions on Parameterized Complexity which is comparatively a recent topic of research and tries to address hardness of problems by taking a different approach than approximation algorithms. Here the main idea is to parameterize the given problem. The parameter could be output size, maximum degree in the graph etc. Consider MaxClique problem. Output clique size is the parameter for the parameterized version of Clique problem. Our goal now is get exact solution in running time $O(p(n) * f(k))$ where $k$ is the parameter. Note that this looks similar to approximation algorithms but it isn't. The main difference is we are still targeting exact solution, not approximate and the parameter $k$ has a completely different notion than the $\epsilon$ in PTAS or FPTAS for approximate algorithms.

Techniques

Following are the three main techniques we learned for designing algorithms for parameterized version of NP-Hard problems. Here, each is illustrated with one example problem.

Branching technique (Bounded Search Trees)

Here we take Vertex Cover problem. Parameter is size of Vertex Cover, say $k$. Our goal is to find exact vertex cover of size at most $k$ in running time polynomial in $n$ (size of graph) and some function of $k$.

Approach 1 Initially, VC is $\phi$ and budget is given as $k$.
  • Pick any edge $(u, v)$. To cover this edge, either $u$ or $v$ must be in VC.
  • Branch out on either $u$ or $v$ indicating that vertex is picked.
  • So, at level 2, we have VC as $\{u\}$ in left branch and as $\{v\}$ in right branch.
  • As, we have included 1 vertex in VC, reduce budget $k$ by 1.
  • Remove edges incident on $u$ or $v$ in respective branches (as they are covered now) to get new sub-graphs.
  • Repeat the process till budget lasts. Hence the name - bounded tree.

In the end, if there exists no branch with all edges are covered, it means that a VC within our budget of given $k$ does not exist. Otherwise, we have a branch that leads to empty graph along which we have computed VC. Output the VC.

It is easy to see that such a bounded search tree has 2 branches at each level and there are at most $k$ levels, thus the size complexity of this tree is $O(2^k)$. At each step, we do work which is polynomial in $n$, so the running time is $O(p(n) * 2^k)$.

Improved running time

Instead of picking an edge, we can pick a vertex and decide whether it is part of VC or not. So the two branches correspond to - $v$ is in VC or (if not then it must be the case that) all neighbours of $v$ are in VC. In left branch, we used up 1 from earlier budget $k$ and in right branch we used up $d(v)$ from earlier budget $k$ where $d(v)$ is the degree of vertex $v$. If there is no vertex in original graph with degree more than 2, then the problem is simple, so we assume that $\exists v, d(v) \geq 3$. In this case, $k$ will become $k-3$ at most i.e. decreased by 3 at least. So, such branching tree will have recurrence

$T(k) \geq T(k-1) + T(k-3)$

which is $O(1.46^k)$.

So, the running time is improved to $O(p(n) * 1.46^k)$.

Kernelization

We take Edge Clique Cover problem as illustration here. We want to find exact solution i.e. if the gives graph can be edge-covered by at most $k$ cliques.

Idea behind kernelization is to covert instance $(I, k)$ to a smaller instance $(I', k')$ such that the instance $I'$ has size bounded by a function of parameter $k$ instead of input size $n$. For that, we do some preprocessing of the instance. After this, the kernel of instance will have certain properties as we will see below.

Preprocessing the instance to get kernel We perform following reductions till the rules can be applied.
  • Remove isolated vertex if present. This makes sense since an isolated vertex has no edges to cover in the edge clique cover.
  • Remove isolated edge $(u, v)$ if present. Decrease budget $k$ by 1. An edge $(u, v)$ is isolated if $u$ and $v$ share no neighbour. This makes sense since an isolated edge can be covered by itself as a 2-clique.
  • If there are two vertices $u$ and $v$ such that $N(u) = N(V)$  where $N(.)$ denotes the neighbourhood, then remove either $u$ or $v$. This makes sense because, since all neighbours of $v$ are same as $u$, any clique the $u$ is part of also covers all edges incident on $v$.
After applying above reduction rules repeatedly till applicable, we get kernel.

Claim The kernel size (number of vertices) is $O(2^k)$ if the graph has an edge clique cover of size at most $k$.

Proof Let graph has edge clique cover of size at most $k$ and let the cliques be
$C_1, C_2, \ldots, C_k$. Pick an vertex $v$ in the kernel. Set $b_i = 1$ if $v$ is part of $C_i$. So, $b_1, b_2, \ldots, b_k$ gives a binary map of vertex $v$. Since there are at most $k$ cliques, at most $2^k$ binary maps are possible. Let's assume on the contrary that kernel has more than $2^k$ vertices left then there are at least two vertices (by Pigeonhole principle) which have the same binary map. There are two cases -
  • If both binary maps are all-zero, it means both the corresponding vertices are not part of any clique. Since, at most $k$ cliques are enough to edge cover the graph, it means that both these vertices have no edges incident on them i.e. they are isolated. It contradicts to the reduction rule 1 that we applied repeatedly.
  • Else, it means that both vertices are part of same set of cliques. It simply means that both share the neighbourhood. It again contradicts to reduction rule 3 above that we applied repeatedly.
So, there cannot exist such 2 vertices. So, the number of vertices is at most $2^k$. After this we can do exhaustive search on kernel instance whose size is $O(2^k)$. Since, reduction rules take time polynomial in $n$, overall running time if $O(p(n) * f(k))$ which is exponential in $k$ and polynomial in $n$.

Iterative Compression

Let's assume we have budget of $k$ and we have a solution $Z$ of size slightly larger than budget $k$ like $k+1, 2k,$ etc. Idea of iterative compression is to compress the solution $Z$ to get a feasible solution under budget $k$.

For Vertex Cover problem, here is the iterative compression algorithm
  • Run 2-approximation algorithm. Let solution $Z$ is of size $z$.
  • If $z > 2k$, declare no solution under budget $k$ exists and exit.
  • Let $X$ be the unknown solution within budget $k$.
  • Imagine $X_Z = X \cap Z$. Since $X$ is unknown, we need to consider all possible subsets of $Z$ which could be possible $X_Z$s.
  • We want to try extending partial solution $X_Z$ if our guess is correct. If so, then every edge incident on vertex in $X_Z$ is covered. And every edge incident on vertex in $Z / X_Z$ is to be covered by a vertex outside $Z$. So, pick $v \notin Z$ such that for edge $(u, v)$, $u \in Z / X_Z$.
  • Adjust budget according to how many vertices we pick.
  • If we are still within $k$, we got the solution.
  • If not, repeat for next imagined intersection of $X$ and $Z$.

For each imagined intersection, p(n) time is spent in extending partial solution to the VC. There are at most $2^{|Z|}$ subsets to consider as intersections. So, the running time for $|Z| = 2k$ is $O(4^k)$.

This can be improved by starting with a $(k+1)$ size solution instead of $2k$ size solution to $O(n * p(n) * 2^k)$.

More problems

In addition to the problems discussed above, following are some of the problems we studied or solved as part of problem set. Here, I briefly write about them.
  • Clique and r-regular Clique - Bounded Search Tree technique gives running time of $O(r^k * p(n))$. Similar to VC, at each level, branch in $r$ possible ways and form the clique along the way from root to leaf taking care of budget.
  • Cluster Vertex Deletion and Edition problem - Bounded Search Tree technique. We need to prove that a graph is a cluster graph iff there is no P3 as as induced graph. Once we have this in place, we need to eliminate P3s. A P3 can be removed in 3 possible ways - by deleting either of the 3 vertices on that P3. So, at each level, branching factor is 3 and we get $O(3^k)$ running time. Same idea works for editing problem, except that to make the cluster graph, we either delete the two edges on exiting P3 or add 3rd edge to P3 making it a clique. Still the branching factor is 3 and hence running time is same as above.

Foundations of Open World Computing

Prof. Srinath Srinivasa delivered 3 sessions on building foundations of Open World Computing. I'll very briefly write about the contents he covered.

We started with set theory - naive set theory followed by axiomatic approaches to set theory. We discussed Cantor's paradox and Russel's paradox. We revised the power and limitations of 3 computing models namely - Finite State Automaton, Push Down Automaton and Turing Machines. We then learned about the hierarchy of their corresponding languages - Regular, Context Free and Turing decidable (Recursive) and Turing Recognizable (Recursively enumerable). We also discussed about Entscheidungsprobleme.

We also learned how the cardinality of the set can be viewed as defining a one-one mapping between the set and the set of natural numbers. This led us to defining enumerable i.e. countably infinite sets and to the fact that set of even numbers, set of integers and set of rational numbers have cardinality same as that of natural numbers and all are countably infinite. However, a power set of an infinite cardinality set is not countable and this we saw using Cantor's diagonalization technique. A similar argument can be used to prove that Halting Problem is undecidable. We thus studied that even infinity has a hierarchy - $\aleph_0, \aleph_1, \ldots$

With these foundations, we understood the concept of uncomputability. This uncomputability plays a vital role in Open World Computing. In the end, we also had somewhat philosophical discussion on the difference between man-made machines and nature-made machines which was very much enlightening. This closed the loop of where we started to discuss Open World Computing being something like "We don't know what we don't know".

Friday, 22 June 2018

Motivating Open World Computing

Mathematicians have been fascinated by the study of collections of items, leading to the development and evolution of Set Theory. Attempts have been made to represent the entirety of Math using sets. Indeed, is this even possible? The Entscheidungsproblem asks if an algorithm can decide whether a given statement is provable from the axioms using the rules of logic.

It seems possible. Consider the null set {}. This represents zero. Now consider a set containing zero {{}}. This represents one. The set for two contains zero and one {{}, {{}}}. And so on. So we can represent numbers.
Now consider a general mathematical problem. It can be viewed as a function that maps a set of inputs to a set of outputs. One way to represent this is as a set of ordered pairs. A more powerful way is of the form { x | p(x) }, where p(x) is the predicate. Any element x that satisfies the predicate is a member of the set.
So far, so good. But can we really use sets for everything?
Here are some questions that make us squirm in discomfort (Take some time to reflect on each of these) -
i) Can a set contain itself?
ii) Does the set containing all the sets in the universe also contain it's power set? (Cantor's paradox)
iii) Consider all the sets that do not contain themselves. Does the set containing all these sets, contain itself? (Russel's paradox)

Here's something related, but different. A model is something that can represent a problem. It is specified using a "language", which is a collection of strings that use an "alphabet".
A language is regular, if it can be represented as a state machine. This can express an infinite number of problems, but not all.
A context free grammar is a set of rules that describe possible strings. This can express all the problems that a regular language can, and more. But again, not all.
A Turing machine is a model that defines a hypothetical machine that manipulates symbols on a strip of tape. A function maps the current state and symbol to a new state, symbol and direction to move. A subset of its states are halting states, that specify whether the input is accepted or rejected.

As expected, Turing machines are much more expressive(Chomsky's hierarchy), and in fact the theoretical basis of modern computing systems. A Universal Turing Machine can read another Turing machine and emulate it's behavior. In other words, it executes a stored program.
But are we done yet? Can we express every problem? Not really.

Returning to Set theory, let's compare sizes of sets. A set is said to be countably infinite if every element in the set can be mapped to, from natural numbers. Elements are "indexed". Let's represent a subset of such a set by a vector(also countably infinite) of 1s and 0s, where a 1 specifies that the element at that index is present in the subset.
Here's an experiment
Arrange all the vectors as a matrix, in any arbitrary manner. Consider another vector formed by taking the diagonal items in this matrix, and inverting individual bits. Does this newly formed vector exist in our matrix? A little reasoning tells us that it cannot be (Cantor's diagonal argument). The takeaway is that you may have two infinite sets, and yet they may have different cardinalities. Looks like "some infinities are greater than others". Funny as it sounds, we have a hierarchy of infinities, denoted by cardinal numbers. The smallest cardinal number is called Aleph null.

So how many problems can a Turing machine express? The answer is Aleph null. The best known mathematical models, our most advanced computing systems, are all at the lowest level of cardinal numbers. The possibilities are limitless (I was careful not to say infinite).

When faced with the task of "finding" something that we "don't know", only one thing is clear - "We don't know what we don't know". This is enough to motivate us to study the field of Open World Computing. 

Thursday, 21 June 2018

IIITB's Summer School - More like 5 days of Computer Science Enlightenment !!

Coming on to the last day of Summer School, I can safely say it was quite an enlightening and enriching experience for me. The last 5 days have been quite a remarkable experience for me and one that will be embedded in my memories for the rest of my life. Being an undergrad student, I was quite apprehensive at first regarding the feasibility of the course for me.Even then, I decided to give this School a try - partly coz i had nothing better to do and had never came to Banagalore before and partly coz the fooding and lodging was being provided by the Institute free of cost!!!  I thought this of nothing more than an excuse to break away from my day-to-day life and didn't have any expectations from the whole course. How wrong i was!!

The moment I stepped into the IIITB campus, I was quite taken aback by the greenery all around. The sight was quite in contrast to the concrete jungle of huge IT buildinigs surrounding the campus. I was also quite impressed by the hospitality provided by the Institute for the whole duration of my stay. Special kudos to the Institute's mess food!!

Having a background  in algorithms, I was looking forward to the Talks by Prof V N Muralidhara on Approx. Algorithms more than any of the other talks. The whole series of lectures given by him was quite an experience in itself. I was simply enamoured by his ability of giving and teaching highly complex proofs of Approximation algorithms in such a structured and simplistic manner. Within 3 lectures, he covered almost all of traditional introductory approximation algortihms without feeling anything to be rushed. He surely has rejuvenated my interest in the field of NP-hard problems and their approximated solutions.

Prof. Srinivas Vivek talked about Cryptography and its applications and implications in the IT world. The most interesting and unique aspect of his talks was his spin on the traiditional way of teaching this topic. It was to his credit that he made such a topic which has its roots delved deep into traditional mathematics and could have easily be seen as boring for a student like me who has little-to-no cryptographic or mathematical background, into something which was quite appealing and engaging. The way he explained these theoretical concepts with an array of interesting examples and animations helped me grasp the core of Cryptography and now i am greatly motivated and interested to know more about it.

Prof. Meenakshi D'Souza's talks on Formal Verification simply gave me a new found respect and understanding  for the topic in general. The way she structured her lectures starting with the history and importance of this particular field, continuing with some introductory simple models and then culminating with some real world practical models and examples, really helped me understand the relevance and need of such verification algorithms and models in modern day computing.

Prof. Pradeesha Ashok talked about Parameterised Algorithms. This opened a whole new world for me and introduced me to a whole new different perspective when talking about algorithmic complexities which is quite different from the classical notion and yet quite practical. Her lectures have now motivated me to look at algorithmic solutions to problems in a whole new way.

The series of lectures given by Prof. Srinivas Vivek was truly an experiecne in itself. He adopted quite an unorthodox approach for teaching. For the first 2 of his 3 lectures most of us were pretty clueless regarding what is the direction which he is taking. In spite of that, his style of teaching,  which included some really cool quotes, anecdotes and real-life examples kept us engaged throughout the first 2 lectures. We were learning something new with each day, but the overall picture eluded my naive mind. Then, on the 3rd day ,he masterfully wove the teachings of the first 2 days into a whole coherent structured narrative and we all saw the bigger picture. This goes on to show his great expertise in this field. Before going into his lectures, I didn't had any clue regarding "Open-World Computing", but now I am genuienly excited to know more about this. We truly "don't know what we don;t know"!!

Lastly, I had the great privilege of attending  lectures by some of the "Theroretical Computer Science Gurus" who were invited by the Institute. Dr. Venkatesan Chakravarthy from IBM Research spoke about approximation in Decision trees which had some real cool applications in the filed of High Performance Computing and Artifical Intelligence. Prof. K.V Raghavan from IISc Bangalore continued from where Prof. Meenakshi had left on Formal Verification and beautifuly illustrated the concept of Abstract Interpretation. Lastly, Dr. Satyalokam from Microsoft Research, Bangalore gave a talk on Cryptography and BlockChains which helped us to understand this new-edge technology much better in a more comprehensive way but in a simplistic manner.

This learning experinece has truly been a privilege. More than anything, it opened my mind to a realm of possibilities which was not inhibited by the limitations of an undergrad curriculum. Also, it helped all of us to see things with differnt perspectives.The school helped us motivate towards the very basis of Computer Science i.e its foundations and the theories driving it - which we, as naive IT-job seekers - tend to overlook. Personally, it helped me mature as a would-be Engineer and made me realise that the real fun of computer science not only in the applications but also in the theories behind it. This was truly an enlightening experience for me.

Abhivandit
(3rd year, C.S.E, IIITG)





The man of technology has learned to believe in justification, not by faith, but by verification!!!

Hi all...!! This is Farzana here and I hope you guys are enjoying the Summer School..😃 So well, here is a glimpse from Program Analysis and Verification (my all time favourite😍) lectured by Prof. K V Raghavan. He has been to our college a couple of months ago for 2 days lecture on the same topic. Today time didn't allow us for the complete lecture and with this post I will be adding some more elements that he taught us. So here we go.....👇

What is Program Verification?

Well...It is the algorithmic discovery of properties of a program by inspection of the source text. It is also known as static analysis, static program analysis, formal methods...

A verification approach: Abstract Interpretation!!

So here is the exact definition for abstract interpretation- A kind of program execution in which variables store abstract values from bounded domains, not concrete values.The program is executed on all abstract inputs and we observe the program properties from these runs.

Let's try an example!
Wanna try another example...?????????Solve....

So, that's all about abstract interpretation and hope you guys got a brief idea about it..😇 Now let's move on to the next topic.

Pointer Analysis!!!!

Let's begin with some definitions...

Points-to Analysis: Determine the set of possible values of a pointer-variable
Alias Analysis: Determine if two pointer-variables may point to the same location

What does it really mean when I say "p points to x"?????😕Well, it means “p stores the value &x” or “*p denotes the location x”. So let's try working out an example!

Example:
x = &a;                                    
y = &b;
if (?) {
p = &x;
} else {
p = &y;
}
*x = &c;--------------------------->x points to a..therefore a points to c
*p = &c;--------------------------->p points to x and y....therefore x points to a&c and y points to b&c

Solution:
x->a        y->b        p->x,y      a->null
x->a        y->b        p->x,y     a->c
x->a,c     y->b,c     p->x,y     a->c         
The above example will let us know about 2 related terms called:
Strong Update: a->c 
Weak Update :  x->a,c   and  y->b,c

(Try finding out why it is called so. Answers are welcomed on comment box😊)

Let's plow into some more analysis!! happy?????😃

Andersen's Analysis!!!

Let's have a look into this example:

1. x = &a;
2. *x = &w;
3. y = x;
4. x = &b;
5. *x = &t;
6. z = x;

So, how do you find solution for this? Here is the solution👇
In Andersen's analysis, initially all the variables will point to a null value. The table shows the variables pointing to at each statement. For example, at statement 1, x points to a,null while all other points to null initially.At statement 2, *x=&w means the variables where x is pointing-to points to w i.e,x is pointing to a,null and therefore a points to w. 

Steensgaard's Algorithm!!!

 We will try out finding solution for the same example above!
1. x = &a;
2. *x = &w;
3. y = x;
4. x = &b;
5. *x = &t;
6. z = x;

Solution..

The above solution is based on some rules👇
1. p=q
        Combine p's box and q's box.
2. p=&q
       Make p's box points to q's box. Whenever a box b points to 2 boxes b1 and b2,combine b1 and           b2.
3. p=*q
       Combine p's box with box pointed by q. Whenever a box b points to 2 boxes b1 and b2,combine         b1 and  b2.
4. *p=q
      Combine box pointed to by p with q's box.

Applications!!!💪
  • Compiler Optimizations
  • Verification and Bug finding
That's all about PAV...:)



Wishing you all great health and happiness!
God Bless!
With Regards,

Farzana Ishak
M.Tech Candidate
Government Engineering College Idukki, Kerala.

Glimpses from Cryptography: Cryptograpic Hash functions


Hash function is a one way mapping, typically from an an infinite set to a finite set. Hash functions are deterministic functions. All hash functions are not suitable for cryptographic applications, hence we have a set of functions called Cryptographic Hash functions which posses certain properties which makes in suitable for cryptographic applications. For explaining the properties, Consider an infinite set A such that a is a member of A . The elements of set A are mapped to a finite set B. Let the mapping be represented by function h(). According to Pigeonhole principle, it is clear that more than one element of A is mapped to a unique element in B. Be it the case we design a computationally secure hash function with following properties.
  • Collision resistance: It is computationally infeasible to find two elements such that both are having same hash value. That is to find a1,a2 such that h(a1)=h(a2)
  • Pre-image Resistance: Given a hash value, it is computationally infeasible to find corresponding element from set A. Let b be element of B, then it is infeasible to find a such that b=h(a)
  • Second-Preimage resistance: Given a pair (a1,b) such that h(a1)=b, it is computationally infeasible to find a2 such that h(a1)=b=h(a2)
  • Pseudorandomness: For a single bit change in the input, the hash value should change significantly

Some of the applications of cryptographic hash functions are listed below
  • Digital Signatures
  • Password storage
  • Malware analysis (using signature of malware)
  • Data Integrity Check

Glimpses from Cryptography: Foundation of Security

I would like to have some glimpses from the topic of cryptography, being my favorite one. Why cryptography matters to me is because it is the spine of information security, which is essential for our day to day life. The basic foundations of Information Security lies in CIA triad (not Central Intelligence Agency!!). CIA stands for Confidentiality Integrity and Availability. We use the services from cryptography for achieving the couple of the above foundations of Information security.
To explain the foundations, lets introduce the classic characters for the stories of security Alice, Bob and Eve. Alice is communicating to Bob and Eve is trying to intrude into the communication and try to do some malicious activity. In between the communication of Alice and Bob, Eve can
  1. Read the messages communicated by Alice to Bob
  2. Modify/replace the message sent by Alice before reaching Bob
  3. Prevent Bob from getting any message from Alice.
Confidentiality: If Eve read the message sent by Alice, then it is a violation of Confidentiality. Hence Alice use the encryption features to secure the communication. Once encrypted, Eve can get only set of characters which doesn’t make any sense.

Integrity: If Eve modify the message, then it is a violation if Integrity. Hence Alice may use techniques such as Hash functions, MAC (Message Authentication Code) for ensuring the integrity of the message.

Availability: When Eve blocks the message entirely, Bob has no way to know that Alice want to communicate with him. Cryptography doesn’t directly address the issues of Availability.

Authentication: In the case above, Alice and Bob are only permitted to communicate over the channel, in other words, they are authorized to use the channel for communication. How it is verified that the communicating parties are Alice and Bob. It is achieving using one or more of the Authentication mechanisms. The most common authentication is using passwords, which is a secret known by the legitimate user. Authentication can be done using bio-metrics or using a physical device such as a usb key.

Non-Repudiation: Non repudiation has nothing to do with eve. If Alice send Bob a message and at a future point Alice refute the fact that she sent the message, then how one can ensure that the message is sent by Alice, not by someone else. This also deals with the matter of trust. Hence for effectively achieving non-repudiation, DSS (Digital Signature Systems) has to be used.

Glimpses from Cryptography: Basic Terminologies


Here are some of the terminologies used in cryptography in simple words.

Cryptography: Simple indicates "secret writing". It is the set of techniques used to secure the data using various techniques mainly powered by the concepts of mathematics. 

Cryptanalysis: It is the process of analyzing a cryptographic techniques to break the system, so that the secret message can be obtained.

Plain text: The message in the readable form.

Cipher text: The set of characters derived using plain text along with key using encryption algorithm.

KeySpace: The set of possible keys used for the process of encryption.

Key: An element randomly selected from KeySpace for encryption.

Encryption: The process of transforming Plane text into Cipher text using key.

Decryption: The process of regaining the plane text from cipher text using key.

Symmetric cryptosystem: A cryptosystem in which encryption and decryption are done using same key.

Asymmetric cryptography (Public Key cryptography): A system in which encryption is done using a key which is different from the key used for decryption.

Side Channel Attack: Apart from the mathematical techniques used for cryptanalysis, Some features such as power consumption of the system, Electromagnetic emission, Acoustic signals etc can be analyzed to obtain meaningful information from a cryptographic system are collectively called Side Channel Attack.

Wednesday, 20 June 2018

Summer School on Theoretical Foundations of Computer Science

Summer School on Theoretical Foundations of Computer Science has been a great learning experience. The most remarkable thing about the course is that the most complex concepts and techniques which form the foundation of the world of Computer Science and the IT industry, are explained in the most simplest manner, that makes things easier to comprehend, yet without losing out on the essence. Some of the most important and industry-centric subjects such as Approximation Algorithms, Parameterized Algorithms, Cryptography and the modern approaches to it,  Formal Verification- it's significance and implications in the field of Software engineering were explained, and trust me, these lectures were a treat for any Computer Science engineering aspirant. We were also introduced to the sphere of Open World Computing, the driving force behinds it's origin and the theoretical aspects that it encompasses.

All the professors, namely Prof. VN Muralidhara, Prof. Srinivas Vivek, Prof. Meenakshi D'Souza, Prof. Srinath Srinivasa and Prof. Pradeesha Ashok have been kind enough to share with us a part of the huge bundle of knowledge that they possess. One of the notable things of the course, which indubitably left a mark on me is that not just the theoretical concept itself, the respected lecturers also shed light on the history behind them, the story behind the origin of the concepts, different people's point of views for the same and the revelations that followed, which not all books would probably do. The connectivity of the concepts with the real world, information about the widely used tools in the industry; and the practical applications associated with them, were also elucidated. The guest lecture by Dr. Venkatesan Chakaravarthy, a dignitary from IBM; on the art of calculating Approximation Ratios, their relationship with decision trees and the significance of Decision-tree approach in the modern world of computing was indeed quite comprehensive and grasped the attention of the students.

Constant motivation from each of the professors instigated a zeal for learning and acquainted us with a slew of opportunities that could come our way in the near future, as far as Information Technology domain is concerned. Being introduced to different perspectives augmented imagination and the scope of contemplation. The learning approach that is followed presses for the practice of questioning, logical reasoning and finding alternative solutions to a particular problem. It castigates the ideology of blindly accepting things as facts. Rather than grasping things as they come, it urges to take things with a pinch of salt.
Learning at IIITB has been a privilege and most importantly, learning has been fun, which for me, still remains the paramount consideration. Thank you IIITB for this opportunity.

Tuesday, 19 June 2018

Summer School 2 Days In.


There is nothing like a refresher course in theoretical CS how the industry has changed you over the past 4 years.

It was a great recap of a variety of topics through the first two days of the spanning from Approximation Algorithms, Cryptography, Formal verification, parameterized algorithms and open world computing.

Every professor tries to motivate us with real world implications or the bigger picture of what they are discussing to help us set a better perspective.

Particularly interesting for me personally were cryptography and open world computing taken by
Prof Srinivas Vivek and Prof Srinath Srinivasa respectively.

Cryptography examples were all based out of real world scenarios which helped in relating to the security issues which all of us face in every single interaction with software.. better. The problem set provided was a throwback to my college days and made me realize the college is definitely harder than work.

Open world computing on the other hand has been a great philosophical exercise in the lecture 1. Starting from the constructs of system design, so far we have covered the mathematical foundations of computing starting from set theory, finite state machines and regular languages. Further ahead we are going to delve into open world computing and multi agent systems. Excited to learn more as these set the foundations of intelligent agent based systems which is where the commercial AI is leading us to.

Though at times being one of the only persons from industry makes me feel out of depth (homework is a must to survive in class I must say!), It has been a great 2 days so far, looking forward to the remaining three days.


Monday, 18 June 2018

TFoCS summer school @ IIIT Bangalore



Dear Organizers,
 
I must say workshop fulfilled my expectations and I learned a lot in today's session. I am still excited to attend further sessions. The tremendous effort that was obviously involved in your team must be appreciated and your overwhelming generosity is greatly appreciated
I am sure I will use this knowledge gained, to write good technical papers.  I must thank my mentor, principal and management for this support.
Above are some snapshots of the sessions.

Fondly,

Sharath Yaji
Research Scholar (Ph.D Candidate)
NMAM Institute of Technology-Nitte.

Tuesday, 12 June 2018

Summer School on TFoCS 2018

Welcome to the blog posts from the Summer School on Theoretical Foundations of Computer Science 2018 held at IIIT Bangalore! We have collected here the blog posts volunteered by the participants who attended the School. The posts by the participants have undergone little/no review by the organisers of the School. Hence the authors are solely responsible for the content of their posts.

When Randomness meets Automation- The Intersection of Cryptography with Machine Learning

Author- Anuj Srivastava Here is a brief summary, or rather the cornerstone of what Dr. Arpita Patra, a renowned faculty and esteemed r...