CS106B Assignment 2: Using Collection Classes

Swallowing a hundred rivers, as it is virtuous vast is the sea;
Towering thousands of feet high, as it is desireless the cliff’s mighty.
— Couplet, Lin Zexu (1785-1850), Qing Dynasty
海纳百川，有容乃大；壁立千仞，无欲则刚

Now that you’ve been introduced to the handy Stanford C++ class library, it’s time to put these objects to work! In your role as a client of these Abstraction Data Types (ADTs) the low-level details abstracted away, you can put your energy toward solving more interesting problems. In this assignment, your job is to write two short client programs that use these classes to do nifty things. The tasks may sound a little daunting at first, but given the power tools in your arsenal, each requires only a page or two of code. Let’s hear it for abstraction!

The assignment has several purposes:

To let you experience more fully the joy of using powerful library classes.
To stress the notion of abstraction as a mechanism for managing data and providing functionality without revealing the representational details.
To increase your familiarity with using C++ class templates.
To give you some practice with classic data structures such as the stack, queue, vector, map, and lexicon.

Problem 1 Anagram Clusters

Two words are anagrams of one another if the letters in one can be rearranged into the other. For examples:

“senator” and “treason”
“praising” and “aspiring”
“arrogant” and “tarragon”

Nifty fact: two words are anagrams if you get the same string when you write the letters in those words in sorted order. For example, “praising” and “aspiring” are anagrams because, in both cases, you get the same string as “aiignprs” if you sort the letters in the two words.

Enter a word: treason
{"atoners", "senator", "treason"}
Enter a word: praising
{"aspiring", "praising"}
Enter a word: arrogant
{"arrogant", "tarragon"}
Enter a word:

Write the program group all words in English into “clusters” of words that are all anagrams of one another. You can use a Map. Each key is a string of letters in sorted order described in the slides 05 for Lecture. Each value is the collection of English words that have those letters in that order.

The starter project for this problem includes a copy of the EnglishWords.txt file (included in res/directory) described in Lecture. Before you write the Anagram Clusters program, you might experiment with a simpler program that uses the lexicon in simpler ways. For example, you might write a program that prints out all the English words that by specific first letters.

Problem 2 Area Codes

Telephone numbers in the United States and Canada are organized into various three-digit area codes. A single state or province will often have many area codes, but a single area code will not cross a state boundary. This rule makes it possible to list the geographical locations of each area code in a data file. For this problem, assume that you have access to the file AreaCodes.txt, which lists all the area codes paired with their locations as illustrated by the first ten lines of that file:

201-New Jersey
202-District of Columbia
203-Connecticut
204-Manitoba
205-Alabama
206-Washington
207-Maine
208-Idaho
209-California
210-Texas
……

Using the Airport Codes program as a model, write the code necessary to read this file into a Map, where the key is the area code and the value is the location. Once you’ve read in the data, write a main program that repeatedly asks the user for an area code and then looks up the corresponding location, as illustrated in the following sample run:

Enter area code or state name: 650
California
Enter area code or state name: 202
District of Columbia
Enter area code or state name: 778
British Columbia

As the prompt suggests, however, your program should also allow users to enter the name of a state or province and have the program list all the area codes that serve that area, as illustrated by the following sample run:

Enter area code or state name: Oregon
458
503
541
971
Enter area code or state name: Manitoba
204

Extension: When you wrote the FindAreaCode program for the previous exercise, it is likely that you generated the list of area codes for a state by looping through the entire map and printing out any area codes that mapped to that state. Although this strategy is fine for small maps like the area code example, efficiency might become an issue in working with much larger collections of data. This strategy also feels uncomfortably asymmetric. When you want to translate an area code to a state name, you ask the map and it gives you the answer immediately; translating in the opposite direction requires a lot more work.

What you would like to do is invert the map so that you could perform lookup operations in either direction. You can’t, however, declare the inverted map as a Map, because there is often more than one area code associated with a state. An int can’t hold all the necessary information. A better strategy is to make the inverted map a Map< string, Vector > that maps each state name to a vector of the area codes that serve that state. Rewrite the FindAreaCode program so that it creates an inverted map after reading in the data file and then uses that map to list the area codes for a state.

Problem 3 Word Ladders

A word ladder is a connection from one word to another formed by changing one letter at a time with the constraint that at each step the sequence of letters still forms a valid word. For example, here is a word ladder connecting “code” to “data”:

code → core → care → dare → date → data

That word ladder, however, is not the shortest possible one. Although the words may be a little less familiar, the following ladder is one step shorter:

code → cade → cate → date → data

Imagine the word ladder from “boost” to “happy”, the connections given by Donald E. Knuth is “boost” → “boast” → “beast” → “least” → “leapt” → “leaps” → “leads” → “lends”→ “lands” → “hands” → “handy” → “hardy” → “harpy” → “happy”. It may not be the shortest one, but may be appropriate one for this year. (https://www-cs-faculty.stanford.edu/~knuth/news.html)

Your job in this problem is to write a program that finds a minimal word ladder between two words. Your code will make use of several of the ADTs from Chapter 5, along with a powerful algorithm called breadth-first search to find the shortest such sequence. Here, for example, is a sample run of the word-ladder program in operation:

Enter start word (RETURN to quit) : work
Enter destination word: play
Found ladder: work fork form foam flam flay play
Enter start word(RETURN to quit) : awake
Enter destination word: sleep
Found ladder: awake aware sware share shire shirr shier sheer sheep sleep
Enter start word (RETURN to quit) : airplane
Enter destination word: tricycle
No ladder found.
Enter start word (RETURN to quit) :

A sketch of the word ladder implementation

Finding a word ladder is a specific instance of a shortest-path problem, in which the challenge is to find the shortest path from a starting position to a goal. Shortest-path problems come up in a variety of situations such as routing packets in the Internet, robot motion planning, determining proximity in social networks, comparing gene mutations, and more.

One strategy for finding a shortest path is the classic algorithm known as breadth-first search, which is a search process that expands outward from the starting position, considering first all possible solutions that are one step away from the start, then all possible solutions that are two steps away, and so on, until an actual solution is found. Because you check all the paths of length 1 before you check any of length 2, the first successful path you encounter must be as short as any other.

For word ladders, the breadth-first strategy starts by examining those ladders that are one step away from the original word, which means that only one letter has been changed. If any of these single-step changes reach the destination word, you’re done. If not, you can then move on to check all ladders that are two steps away from the original, which means that two letters have been changed. In computer science, each step in such a process is called a hop.

The breadth-first algorithm is typically implemented by using a queue to store partial ladders that represent possibilities to explore. The ladders are enqueued in order of increasing length. The first elements enqueued are all the one-hop ladders, followed by the two-hop ladders, and so on. Because queues guarantee first-in/first-out processing, these partial word ladders will be dequeued in order of increasing length.

To get the process started, you simply add a ladder consisting of only the start word to the queue. From then on, the algorithm operates by dequeueing the ladder from the front of the queue and determining whether it ends at the goal. If it does, you have a complete ladder, which must be minimal. If not, you take that partial ladder and extend it to reach words that are one additional hop away, and enqueue those extended ladders, where they will be examined later. If you exhaust the queue of possibilities without having found a completed ladder, you can conclude that no ladder exists. It is possible to make the algorithm considerably more concrete by implementing it in pseudocode, which is simply a combination of actual code and English. The pseudocode for the word-ladder problem appears in Figure 1.

Create an empty queue.
Add the start word to the end of the queue.
while (the queue is not empty) {
  Dequeue the first ladder from the queue.
  if (the final word in this ladder is the destination word){
    Return this ladder as the solution.
  }
  for (each word in the lexicon of English words that differs by one letter){ 
    if (that word has not already been used in a ladder) {
      Create a copy of the current ladder.
      Add the new word to the end of the copy. 
      Add the new ladder to the end of the queue.
    }
  }
}
Report that no word ladder exists.

As is generally the case with pseudocode, several of the operations that are expressed in English need to be fleshed out a bit. For example, the loop that reads

for (each word in the lexicon of English words that differs by one letter)

is a conceptual description of the code that belongs there. It is, in fact, unlikely that this idea will correspond to a single for loop in the final version of the code. The basic idea, however, should still make sense. What you need to do is iterate over all the words that differ from the current word by one letter. One strategy for doing so is to use two nested loops; one that goes through each character position in the word and one that loops through the letters of the alphabet, replacing the character in that index position with each of the 26 letters in turn. Each time you generate a word using this process, you need to look it up in the lexicon of English words to make sure that it is actually a legal word.

Another issue that is a bit subtle is the restriction that you not reuse words that have been included in a previous ladder. One advantage of making this check is that doing so reduces the need to explore redundant paths. For example, suppose that you have previously added the partial ladder

cat → cot → cog

to the queue and that you are now processing the ladder

cat → cot → con

One of the words that is one hop away from con, of course, is cog, so you might be tempted to enqueue the ladder

cat → cot → con → cog

Doing so, however, is unnecessary. If there is a word ladder that begins with these four words, then there must be a shorter one that, in effect, cuts out the middleman by eliminating the unnecessary word con. In fact, as soon as you’ve enqueued a ladder ending with a specific word, you never have to enqueue that word again.

The simplest way to implement this strategy is to keep track of the words that have been used in any ladder (which you can easily do using another lexicon) and ignore those words when they come up again. Keeping track of what words you’ve used also eliminates the possibility of getting trapped in an infinite loop by building a circular ladder, such as

cat → cot → cog → bog → bag → bat → cat

One of the other questions you will need to resolve is what data structure you should use to represent word ladders. Conceptually, each ladder is just an ordered list of words, which should make your mind scream out “Vector!” (Given that all the growth is at one end, stacks are also a possibility, but vectors will be more convenient when you are trying to print out the results.) The individual components of the Vector are of type string.

Implementing the application

At this point, you have everything you need to start writing the actual C++ code to get this project done. It’s all about leveraging the class library—you’ll find your job is just to coordinate the activities of various different queues, vectors, and lexicons necessary to get the job done. The finished assignment requires less than a page of code, so it’s not a question of typing in statements until your fingers get tired. It will, however, certainly help to think carefully about the problem before you actually begin that typing.

As always, it helps to plan your implementation strategy in phases rather than try to get everything working at once. Here, for example, is one possible breakdown of the tasks:

Task 1—Try out the demo program. Play with the demo just for fun and to see how it works from a user’s perspective.
Task 2—Read over the descriptions of the classes you’ll need. For this part of the assignment, the classes you need from Chapter 5 are Vector, Queue, and Lexicon. If you have a good sense of how those classes work before you start coding, things will go much more smoothly than they will if you try to learn how they work on the fly.
Task 3—Think carefully about your algorithm and data-structure design. Be sure you understand the breadth-first algorithm and what data types you will be using.
Task 4—Play around with the lexicon. The starter project for this problem includes a copy of the EnglishWords.txt file described in Lecture. Before you write the word ladder application, you might experiment with a simpler program that uses the lexicon in simpler ways. For example, you might write a program that reads in a word and then prints out all the English words that are one letter away.
Task 5—Implement the breadth-first search algorithm. Now you’re ready for the meaty part. The code is not long, but it is dense, and all those templates will conspire to trip you up. We recommend writing some test code to set up a small dictionary (with just ten words or so) to make it easier for you to test and trace your algorithm while you are in development. Test your program using the large dictionary only after you know it works in the small test environment.

Note that breadth-first search is not the most efficient algorithm for generating minimal word ladders. As the lengths of the partial word ladders increase, the size of the queue grows exponentially, leading to exorbitant memory usage when the ladder length is long and tying up your computer for quite a while examining them all.

Problem 1 Anagram Clusters

Problem 2 Area Codes

Problem 3 Word Ladders

发送评论 编辑评论

发送评论编辑评论