Data Structures and Operations
Let us look at this problem definition. Here, given a word, we need to find all the words that can appear next to it in a word ladder. The list of words is given in a text file. And a word ladder is a sequence of words made by changing one letter at a time. For example, the below sequence defines a word ladder,
cold -> cord -> card -> ward -> warm
What we need to find out are,
1) Given a word, list all the words that can appear next to it in a word ladder
2) Which word has the longest ladder?
3) How many words can be reached, from a source word, in n or fewer steps?
Let us try to come up with a naive algorithm that given a word, it finds all the words that can appear next to it. The code would be like something given below,
In the above naive algorithm, O(n) of time is required. For a given word, we parse the entire list to check if the diff between a pair is 1 and hence the 0(n) time. [The function in the above gist, getDiffCount(), gives the diff count between the pair of words.]
Now, let us try to find the word that has the longest ladder. Making use of the code we have written earlier, we can write the below function,
The runtime of the above code is O(n^2). This is so because we try to get the next words (which is a O(n) operation) for each word which makes the algorithm run in O(n^2) time.
The importance of Data Structures
What if, now, these operations were not one off and are to be repeated often. Using the above naive functions to do these operations is sub-optimal. This is where a good Data Structure can help us. We need to come up with a DS such that we can maintain the state of the program. This is so because during every operation we do not need to create the state afresh.(In the above naive functions, re-creating the state was more expensive than other work we did)
Operations define a Data Structure
How do we choose what Data Structure that we are going to use? The golden rule to effectively choose a Data Structure is to choose it in such a way so as to easily support the operations that we plan to do during the life of the program.
Let us come up with a data structure for the above operations. Looking at the operations that we are going to perform, let us represent each words as an element in the DS as follows,
where, the given word is stored as a string in the element. The noOfLadders contains the value of number of ladders for that particular element. The links contains the list of all the words that can replace the element's word in a word ladder. The next two members in the element struct are there to aid us in traversing the DS specifically to support operation 3 (How many words can be reached, from a source word, in n or fewer steps?).
The data structure (similar to a adjacency lists) can be diagrammatically shown below as,
Each node has an element and the links contains the list to the words that can replace the current word.
The code to initialize the vector of elements is given below.
Above, wordToVectorIndex is a map that maps a word to an vector index. Each element is marked as not visited and the distance from source is set as 0. Once the words are collected in the vector, we need to build the ladder DS where we set the links to point to the list of words that can replace the element's word in a word ladder. The code to do that is given below,
Here, we link a pair of words in their respective links if the difference between the words is 1. Note that, the link contains the pointer to the element and not a copy of the element. The built structure is similar to the figure shown earlier.
After building the ladder data structure, we need to update the number of ladders (links) for each element. The code to do that is given below,
Now, let us try to come with the runtime of the operations using our new data structure.
To find the list of words that can appear next to a word in a word ladder, we can use the following code,
The runtime of the above code block is O(m) where m is the number of words that can replace the given word in a ladder. This is very optimal with respect to the naive solution seen earlier.
Similarly, to find the longest ladder in the set of words, it very easy since we have stored the state in each element. Hence the runtime of the operation is 0(n) where n is the number of words. The code to find the longest ladder is given below
Now to find the words that can be reached, from a source word, in n or fewer steps? Using the ladder data structure, we can easily do a depth limited BFS walk from the source to the given depth. The code to do it is given below,
The runtime of the above code block is O(V + E) where V is the number of words in that depth and E is the relationship between the words.
you can see, we have really improved upon our runtime using the ladder
data structure we have built. Remember this - you need a data struture
that can help you with your future operations hence operations always
define the data structure and not the other way around.
Let me know if anything can be improved.
If you have read this far, you should follow me on twitter.
Also feel free to follow my projects on github.