# diskodev

DESIGN & DEVELOPMENT AGENCY

# Data Structures and Operations

Let us look at this problem definition. Here, given a word, we need to find all the words that can appear next to it in a word ladder. The list of words is given in a text file. And a word ladder is a sequence of words made by changing one letter at a time. For example, the below sequence defines a word ladder,

cold -> cord -> card -> ward -> warm

What we need to find out are,
1) Given a word, list all the words that can appear next to it in a word ladder
2) Which word has the longest ladder?
3) How many words can be reached, from a source word, in n or fewer steps?

Let us try to come up with a naive algorithm that given a word, it finds all the words that can appear next to it. The code would be like something given below,

In the above naive algorithm, O(n) of time is required. For a given word, we parse the entire list to check if the diff between a pair is 1 and hence the 0(n) time. [The function in the above gist, getDiffCount(), gives the diff count between the pair of words.]

Now, let us try to find the word that has the longest ladder. Making use of the code we have written earlier, we can write the below function,

The runtime of the above code is O(n^2). This is so because we try to get the next words (which is a O(n) operation) for each word which makes the algorithm run in O(n^2) time.

The importance of Data Structures
What if, now, these operations were not one off and are to be repeated often. Using the above naive functions to do these operations is sub-optimal. This is where a good Data Structure can help us. We need to come up with a DS such that we can maintain the state of the program. This is so because during every operation we do not need to create the state afresh.(In the above naive functions, re-creating the state was more expensive than other work we did)

Operations define a Data Structure
How do we choose what Data Structure that we are going to use? The golden rule to effectively choose a Data Structure is to choose it in such a way so as to easily support the operations that we plan to do during the life of the program.

Let us come up with a data structure for the above operations. Looking at the operations that we are going to perform, let us represent each words as an element in the DS as follows,

where, the given word is stored as a string in the element. The noOfLadders contains the value of number of ladders for that particular element. The links contains the list of all the words that can replace the element's word in a word ladder. The next two members in the element struct are there to aid us in traversing the DS specifically to support operation 3 (How many words can be reached, from a source word, in n or fewer steps?).

The data structure (similar to a adjacency lists) can be diagrammatically shown below as,

Each node has an element and the links contains the list to the words that can replace the current word.

The code to initialize the vector of elements is given below.

Above, wordToVectorIndex is a map that maps a word to an vector index. Each element is marked as not visited and the distance from source is set as 0. Once the words are collected in the vector, we need to build the ladder DS where we set the links to point to the list of words that can replace the element's word in a word ladder. The code to do that is given below,

Here, we link a pair of words in their respective links if the difference between the words is 1. Note that, the link contains the pointer to the element and not a copy of the element. The built structure is similar to the figure shown earlier.

After building the ladder data structure, we need to update the number of ladders (links) for each element. The code to do that is given below,

Now, let us try to come with the runtime of the operations using our new data structure.

To find the list of words that can appear next to a word in a word ladder, we can use the following code,

The runtime of the above code block is O(m) where m is the number of words that can replace the given word in a ladder. This is very optimal with respect to the naive solution seen earlier.

Similarly, to find the longest ladder in the set of words, it very easy since we have stored the state in each element. Hence the runtime of the operation is 0(n) where n is the number of words. The code to find the longest ladder is given below

Now to find the words that can be reached, from a source word, in n or fewer steps? Using the ladder data structure, we can easily do a depth limited BFS walk from the source to the given depth. The code to do it is given below,

The runtime of the above code block is O(V + E) where V is the number of words in that depth and E is the relationship between the words.

As you can see, we have really improved upon our runtime using the ladder data structure we have built. Remember this - you need a data struture that can help you with your future operations hence operations always define the data structure and not the other way around.

Let me know if anything can be improved.

Also feel free to follow my projects on github.

# Multiplication Algorithm and Recurrences

Multiplication of two numbers is a well defined computational problem. It is such a basic step that we do not give a thought about the algorithms behind it. In this post we shall look at the various implementations of the multiplication algorithm. We will be only looking at the basic algorithms and for the complete set, refer here. This will also be an introductory post to analyzing recurrence relations and to finding the upper bound on the running time of the recurrence algorithm.

Let us try to multiply two numbers X and Y. Let us use the basic algorithm that we all studied in our younger days.

56 X 236

---------------

3402 + 17010 + 113400

---------------

133812

---------------

Here each number in Y is multiplied against each number in X and shifted accordingly. Let us try to frame a pseudo code for the above algorithm.

In the above code block, we assume that there is a function - length(), that returns the length of the number. For example, for the number 567, it returns 3. The running time of the above algorithm is easy to guess. Since each number is multiplied against another number, we need two loops to do the work. Hence the upper bound on the above algorithm is O(n^2)(n-squared).

Let us try to do better than the n-squared algorithm we have seen above. Let us use a recursive algorithm to multiply two numbers. The basic idea of the algorithm is to breakdown the input into smaller sub-problems, solve them and then combine the solutions.

The working of the algorithm is as follows :

In the algorithm above, we divide both the numbers into n/2 size each and then make four recursive call to multiply them (which might make more recursive calls) and get the final result. The base condition here is that when we get a single digit number, we return the product of the numbers.

Now let us come try to come up with the upper bound on the running time of the recurrence algorithm earlier. The recurrence in the algorithm above is of the form,

T(n) = 4T(n/2) + f(n)

Here, there are four sub-problems and the size of each sub-problem is half of the original problem. And f(n) represents the work done outside the recurrences which is just multiplying the single digit numbers and hence O(1). (These are loose bounds and assuming that multiplying single digit numbers is a elementary operation)

The solution of the recurrence can easily be found using the master theorem. But let us analyze it using the recursion tree method to understand it intuitively.

The recursion tree for the above recurrence can be shown as below,

In the above recurrence, except the leaves, each call makes four other recursive calls. To come up with the upper bound on the running time, we need to calculate the cost at each level and add them up along the height of the recursion tree. To aid us in our calculation, let us assume that n is an power of 2.

The number of sub-problems at each level j is equal to (4 ^ j). The size of sub-problems at level j is equal to n/(2 ^ j). Hence the cost accrued in each level of the reccursion tree is equal to (4 ^ j) * (n/(2 ^ j)) which gives to (2 ^ j) * cn, the cost of one level in the tree. The height of the tree is log n to the base 2. (It is actually log n-1 to the base 2, but we can be sloppy in this and this does not affect the analysis) This is so because the growth of the tree is a logarithmic function with base 2. So let us sum the cost at each level over the entire tree. Hence,

Above, we are trying to sum the cost of each level over the entire recursion tree. This gives us the upper bound on the runtime of the algorithm. So continuing,

This equation gives geometric series of the form, (The base of the logarithmic function above is 2)

Since log 2(to the base 2) is 1, we get

So we have used a complex algorithm to multiply two numbers without getting a increase in the runtime. The runtime of the recurrence algorithm is the same as the "naive" algorithm we had seen earlier. Let us see if we can improve upon the recurrence algorithm we have seen. Note that, to use the recursive algorithm we make four recursive multiplications to compute the procedure. Can we reduce this so that we can reduce the recursive calls thereby reducing the running time of the algorithm? There is a clever algorithm called Karatsuba algorithm which exactly does this. The algorithm makes use of a Gauss trick that makes three recursive calls instead of four to multiply two numbers. The working of the algorithm is as follows:

The recurrence of the above algorithm is given as,

T(n) = 3T(n/2) + f(n)

Here, there are three sub-problems and the size of each sub-problem is half of the original problem. And f(n) represents the work done outside the recurrences which is just multiplying the single digit numbers and hence O(1).

The recursion tree for the above recurrence can be shown as below,

The number of sub-problems at each level j is equal to (3 ^ j). The size of sub-problems at level j is equal to n/(2 ^ j). Hence the cost accrued in each level of the recursion tree is equal to (3 ^ j) * (n/(2 ^ j)) which gives to ((3/2) ^ j) * cn, the cost of one level in the tree. The height of the tree is log n to the base 2. (It is actually log n-1 to the base 2, but we can be sloppy in this and this does not affect the analysis) This is so because the growth of the tree is a logarithmic function with base 2. So let us sum the cost at each level over the entire tree. Hence,

Above, we are again trying to sum the cost of each level over the entire recursion tree. So continuing,

This equation gives geometric series of the form, (The base of the logarithmic function is 2)

The above makes use of logarithmic properties to simplify the equation.

We have actually reduced the running time from n-squared to (n ^ 1.5) which is a slower growing function than n-squared as shown in the figure below.

The code to do the multiplication is given in the below gist

The function getOrderOfNumber() returns the number of digits in a number. The code makes three recursive calls as we had described earlier and returns the result.

Note: We can confirm the upper bound we had calculated using the recursion tree by the easier master theorem. The details of the process are given below.

The first step in using the master theorem is to calculate the number of leaves in the recursion. The leaves are given by,

Also feel free to follow my projects on github.

# Fibonacci Numbers & its Implementations

The Fibonacci sequence is defined by the recurrence,

F(0) = 0
F(1) = 1
F(i) = F(i - 2) + F(i - 1)         [For all i ≥ 2]

The series that yield from the above recurrence is

0, 1, 1, 2, 3, 5, 8, 13, …

The Fibonacci series appear throughout nature. The sequence in which rabbit multiply, branch patterns in trees, arrangement of a pinecone etc. are some practical examples.

There are various implementations of the Fibonacci series. Lets look at some of them along with their run times.

The above gist contains the recursive implementation of the Fibonacci series. This is by far the worst implementation of the sequence. The running time of the above algorithm is O(2^n) where n is the input to the recurrence F from above definition. This algorithm might suffice when n is small, but when n is large, the time taken by the program grows exponentially.

The above gist is the iterative implementation of the algorithm. The running time of the algorithm is O(n). This is simple and faster than the above algorithm. For most of the time, this algorithm is fast enough to compute the Fibonacci sequence. But if you still want to push the boundary, there are a few techniques available.

Fibonacci sequence is closely related to the golden ratio (ϕ). It is given by the following formula,

And the Fibonacci number of i is given by

Using the above formula we can calculate the Fibonacci sequence of a given number i. The running time of the above method is O(1). But it is advised that this method is not used in non-mathematical software. This is so because there might be overflows in the floating-point computation and you might not end with the right result.

The fastest way then, in normal circumstances, to find out the Fibonacci sequence of a number i is to use a matrix of a special form and square it recursively. It turns out that

The Fibonacci number is present in the upper right corner and in the lower left corner of the result matrix. For example,

The initial matrix is of a constant size (2x2) and when you multiply the matrix against itself, you get another 2x2 matrix. Only four numbers are used here and this makes the algorithm simple. The running time for this algorithm is O(lg n) which is very fast and this is the algorithm that is mostly used to calculate F(i).

Also feel free to follow my projects on github.

# Do. Then Tell. Integrity. Failure.

Lately, I have been thinking as to how a Leader inspires a team. What their actions are? How they carry themselves? How 'differently' they communicate? And they all inspire a team? Or are Leaders really necessary as everyone want us to believe? Can it not be shared by everyone in the team?

Imagine a game. Lets say a team member misses a sitter. And when the Captain takes him to the cleaners - Does the Leader show passion which in turn inspires everyone around him? Or are they just being an ass****. In the same game. Another team member misses a sitter. And when captain does not show any emotion, what happens now? Does it show they are not bothered?

Compare the same in a corporate setup. Which style is really best to push the organization forward? To make people like them and listen to them. Some Leaders make you wonder if being an ass**** is what is needed to get to the top or you survive there if you turn into one. And everyone of us have our share of those ass******.

But the best Leaders are the ones who are not concerned about what other people think about them or their style or their whatever. If you aspire to be a good leader, just have in mind the following. These are the things great Leaders do best.

Do. Then Tell.
A preacher never makes a good Leader. After all, all he does is preach. Great Leaders do not sit in the pedestal and bark out orders. They go out and face the music. You want your team to play hard, show them. The Leader needs to show the team how to share, respect and win. He can not expect others to do what he does not want to do. If you expect your team to never give up, to go the extra mile or take risks - show them the way. Even if you want them to do the smallest of tasks.

Integrity.
This is the one quality that the team looks for in its Leader. The world loves a talent, but pays off on character. You need to treat the members in your team equally and in a fair way. The team will respond in a positive way to your honesty. The members will be themselves with you. You set a benchmark within the team. This creates a culture of positivity and nurtures team spirit. People enjoy working under you. They know that you are not going to climb on their shoulders. They belive you.

Failure.
You learn about a Leader during times of a failure. A great Leader is one who takes the blame when things go wrong and credits the team members when things go right. They do not finger point when things go bad. Instead they try to set things right. It does not mean that they do not let people take responsibility for their work or actions. It actually does the opposite. They give their best work knowing that you will stand up for them. Good Leaders know that during failures is when you can build a great team.

So go on. Anyone can be a great Leader.

Also feel free to follow my projects on github.

# Graph Theory

(This is a first post of a two part series giving an introduction into Graph Theory and its applications. This post will contain more of an theory on graphs and the next one would concentrate more on its applications)

The different types of graphs discussed here are - Directed Graphs, Undirected Graphs & Trees (which are similar to graphs as we shall see)

A directed graph or digraph G is a pair (V,E) where V is a finite set of vertices and E is a set of edges which has a binary relation on V. Given below is an example of a digraph.

An undirected graph G = (V, E), the edge set E consists of unordered pairs of vertices. The edge (u, v) is the same as (v, u). No self-loops are allowed all though this is possible in a digraph. Given below is an example of an undirected graph.

Terminologies used in graphs:

Related to edges of a graph

• If (u, v) is an edge in a directed graph G = (V, E), we say that it is incident from u and incident to v. For example, in the above directed graph, the edge (1, 2) is incident from 1 and incident to 2
• if (u, v) is an edge in an undirected graph G = (V, E), we say that is incident on vertices u and v. For example, in the above undirected graph, the edge (1, 2) is incident on vertices 1 and 2

Related to vertices of a graph

• If (u, v) is an edge in the graph G = (V, E), we say that vertex v is adjacent to vertex u. This relation is symmetric if the graph is undirected. In the directed graph, w.r.t edge (2, 3) vertex 3 is adjacent to vertex 3 and in the first undirected graph, the edge (2, 4) vertex 2 is adjacent to vertex 4 and vertex 4 is also adjacent to vertex 2
• The degree of a vertex in an undirected graph G = (V, E) is the number of edges incident to it. A vertex is said to be isolated if its degree is 0. In the first undirected graph, the degree of vertex 4 is four (In fact the graph is a complete graph with all vertices of degree four). And in the second undirected graph, vertex 2 is isolated and has a degree of zero
• The degree of a vertex in a directed graph G = (V, E) is the sum of its In-degree (Number of edges incident from the vertex) and its Out-degree (Number of edges incident to the vertex). In the above directed graph, the in-degree of vertex 4 is three and the out-degree is two
• A path from a vertex u to a vertex v in a graph G = (V, E) is a sequence <V0, V1, V2, V3, … , Vn> where V0 = u and Vn = v and v is reachable from u. A path is simple if all vertices in a path is unique. A example path in the directed graph is 1, 2, 3, 5
• A subpath of a path P is the continuous subsequence of its vertices. An example subpath in the directed graph is 2, 3, 4 of the complete path 1, 2, 3, 4
• A path forms a cycle if one or more of its vertices are repeated during the path. A path with no cycle is called a simple path. A graph with no cycle is called acyclic. A dag is a directed acyclic graph which has no cycles. An example of a cycle in the directed graph is 1, 2, 3, 4, 3, 5.
• A connected graph is a graph G = (V, E), where all its vertices are reachable using a path. The first undirected graph in the above illustration is a connected graph where as the second undirected graph is not connected. In the directed graph too all vertices form a connected graph
• The strongly connected components of a directed graph are the set of vertices for which each pair of vertices in the set are reachable from each other. In the first undirected graph, all the vertices form a strongly connected component

Trees are related to graphs but uses some slightly different notions. A tree is a connected, acyclic, undirected graph. Many algorithms that work on graphs also work on trees. Below are  some illustration of trees.

Terminologies used in trees:

• A root is a node which is the base of the graph. No edges come into it. Edges only go out of it. In the illustrations above, 15 and 1 are some examples of trees
• If x is a node in the tree with root r, and any node y on the path from r to x is called an ancestor of x and x is an descendant of y. In the first example, 12 is an descendant of 15 and and 22 is an ancestor of 3
• If there is an edge (x,y), then x is called the parent of y and y is called the child of x. In the second example, 2 is the parent of 5 and 5 is a child of 2
• The length of the path from the root r to a node x is the depth of x in the tree. The depth of the first illustration is 4 and of the second is 2 (Note: the depth starts from 0)
• The height of the tree is the depth of the lowest node from the root of the tree. The height of the first tree is 4 and of the second one is 2
• A binary tree is one which has zero, one, two or three disjoint sets of nodes. The first example is not a binary tree where as the second one is
• A node to the left of the root is called the left child of the root and a node to the right of the root is called the right child of the root
• A full binary tree is a tree where each node is a leaf or has two children. The second illustration is a full binary tree
• A complete binary tree is a tree where all the leaves are in the same level and each non leaf node has two children. The second illustration is again a complete binary tree

Some properties of graphs and trees:

• In a complete graph, the sum of degree of all vertices is equal to the 2 times number of edges in the graph i.e. for v ∈ V ∑degree (v) = 2 | E |
• Any connected, undirected graph G = (V, E), | E | ≧ | V | -1
• Any two vertices in a tree are connected by a unique simple path
• In a connected tree, | E | = | V | - 1
• A complete binary tree has (2 ^ h) - 1 nodes where h is the height of the tree
• A non empty binary tree with n nodes has a height at least floor(lg n)