6: String Processing

What We Will Cover


Continuations

Questions from last class?

  • Exam reminder
  • What will be printed after the following C++ statements have executed?
    int count = 1;
    while (count <= 3) {
        cout << count << " ";
        count++;
    }
    
    1. 1 2
    2. 1 2 3
    3. 2 3
    4. 1 2 3 4

Homework Questions?

6.1: More About Strings and Characters

Learner Outcomes

At the end of the lesson the student will be able to:

  • Iterate through a string and extract each character
  • Convert characters to digits
  • Use string functions

6.1.1: Strings Versus Characters

  • Remember that a string is a series of characters enclosed in double quotes such as:
    "Hello"  "b"  "3.14159"  "$3.95"  "My name is Bruce"
  • We can store text in a variable of type string, like:
    string firstName;             // declaration
    firstName = "Bruce";         // assignment
    string lastName = "Hartman";  // declaration + assignment
    cout << firstName << " " << lastName << endl;
    
  • On the other hand, a character is a single letter, number or special symbol
  • We enclose characters in a single quote, rather than a double quote, like:
    'a'   'b'   'Z'   '3'   'q'   '$'   '*'
  • Also, we store a a single character using a variable of type char, such as:
    char letterA = 'A';
    char letterB = 'B';
    
  • Each character is stored as a number, using its ASCII Table value
  • By declaring a char variable or using single quotes, C++ knows to treat the number as a character
  • Thus, when we print a character, we see a letter rather than a number:
    char letter = 'A';
    cout << letter << 'B' << endl;
    
  • As we can see, a string is made up of characters and characters are numerical codes
  • We can use this information to work with characters and strings

String Concatenation and Functions

  • Recall that we can join (concatenate) two strings or a string with a character
    string str = "abc";
    str = str + "1"; // allowed
    str = str + '1'; // allowed
    str = str + 1;   // NO
    str = str + 1.2; // NO
    
  • However, we cannot concatenate a string with a number
  • Because strings are objects, they have member functions
  • Two useful member functions we have studied are length() and substr()
  • length(): Returns the number of characters in a string
    string str = "Hello";
    cout << "The number of characters is " << str.length()
         << ".\n";
    
  • substr(i, n): Returns a substring of length n starting at index i
    string greeting = "Hello, World!\n";
    string sub = greeting.substr(0, 4);
    cout << sub << endl;
    
  • The position numbers in a string start at 0. The last character is always one less than the length of the string
    H e l l o , W o r l d !
    0 1 2 3 4 5 6 7 8 9 10 11 12
  • string w = greeting.substr(7, 5);
    H e l l o , W o r l d !
    0 1 2 3 4 5 6 7 8 9 10 11 12

Check Yourself

  1. True or false: strings are a sequence of characters.
  2. True or false: "A" and 'A' are the same.
  3. The following code is wrong because ________.
    cout << "3.14159" * 2;
    
    1. you cannot double PI
    2. 3.14159 is not exact enough to represent PI
    3. string may be added but not multiplied
    4. "3.14159" is not a number
  4. After the following code executes, it displays ________.
    char ch;
    ch = 'd' - 'a' + 'A';
    cout << ch << endl;
    
    1. 'D'
    2. D
    3. 68
    4. d

6.1.2: Indexing a String

  • Strings are stored in a character sequence starting at 0 (zero)

    String character positions

  • We can access any individual character of a string variable using square brackets [ ]
  • The general syntax is:
    stringVariable[index];
    
  • Where:
    • stringVariable: the name of your string variable
    • index: the number of the character position
  • For example:
    string str = "abcdef";
    char firstLetter = str[0];
    cout << firstLetter << str[1] << endl;
    
  • The above code displays:
    ab
  • Notice that the square bracket notation returns a char data type

Check Yourself

For the following string declaration, answer the questions below:

string str = "C++ Rules!";
  1. The value of str[0] is: ________
  2. The value of str[2] is: ________
  3. The value of str[4] is: ________
  4. The value of str[str.length() - 1] is: ________

6.1.3: Iterating Strings

  • Recall that member function length() returns the number of characters in a string variable:
    string s = "abcdef";
    unsigned n = s.length();
    
  • Since a string's length is always 0 or a positive number, the length() function returns an unsigned int type
  • After we know the length, it is easy to iterate through the individual characters of a string using a counting loop:
    cout << "Enter a word: ";
    string msg;
    cin >> msg;
    for (unsigned i = 0; i < msg.length(); i++) {
        cout << "Char[" << i << "]: " << msg[i] << endl;
    }
    

Using unsigned

  • Note the use of unsigned i in the for loop
  • Specifying unsigned assumes int by default
  • So rather than unsigned int we may just code unsigned as the data type
  • Recall from lesson 3.1.3 that unsigned ranges from 0 to 4294967295 rather than -2147483647 to 2147483647 for int
  • The length() function returns an unsigned number because the length of a string is never less than zero
  • If you compare a signed number with an unsigned number, the compiler may issue a warning:

    warning: comparison between signed and unsigned integer expressions

  • By using unsigned as the counting variable type in the for loop you avoid the warning

Try It: iterating Strings (4m)

  1. Copy the following program into a text editor, save it as test.cpp, and then compile and run the starter program to make sure you copied it correctly.
    #include <iostream>
    using namespace std;
    
    int main() {
        // Enter your code here
    
        return 0;
    }
    
  2. Add the code to prompt for and read a messages from the user:
    cout << "Enter a word: ";
    string msg;
    cin >> msg;
    
  3. Next add the following for-loop code to the main() function.
    for (unsigned int i = 0; i < msg.length(); i++) {
        cout << i << ": " << msg[i] << endl;
    }
    
  4. Compile and run your code. What do you see when you compile?
  5. Be prepared to answer the following Check Yourself questions when called upon.

Check Yourself

  1. True or false: the length() function of a string returns an unsigned integer.
  2. For the following code, the output the second time through the loop is ________
    string msg = "aeiou";
    for (unsigned i = 0; i < msg.length(); i++) {
        cout << "Char[" << i << "]: " << msg[i] << endl;
    }
  3. True or false: the compiler may give a warning if you compare an unsigned int with a signed int.
  4. Each character in the above loop is printed on it own line because of the ________.

6.1.4: String Input With Spaces

  • We have been using the >> operator to enter data into a string variable:
    string something;
    cout << "Enter something: ";
    cin >> something;
    cout << "You entered: " << something << "END OF OUTPUT\n";
    
  • However, there are some complications
  • >> skips whitespace and stops on encountering more whitespace
  • Thus, we only get a single word for each input variable
  • If a user types in "Hello Mom!", we would only read "Hello" and not " Mom!"
  • This is because cin >> s1 works as follows:
    1. Skips whitespace
    2. Reads non-whitespace characters into the variable
    3. Stops reading when whitespace is found

Input Using getline()

  • To read an entire line we use function getline()
  • Syntax:
    getline(cin, stringVariable);
    
  • Where:
    • stringVariable: the name of the string variable
  • For example:
    string line;
    cout << "Enter a line of input:\n";
    getline(cin, line);
    cout << line << "END OF OUTPUT\n";
    
  • Note that getline() stops reading when it encounters a '\n'

The Problem with Newlines

  • When you press the Enter key, a newline character ('\n') is inserted as part of the input
  • The newline character can cause problems when you mix cin >> with getline()
  • Recall that cin >> s1:
    1. Skips whitespace
    2. Reads non-whitespace characters into the variable
    3. Stops reading when whitespace is found
  • Since whitespace includes newline characters, using cin >> will leave a newline character in the input stream
  • However, getline() just stops reading when it first finds a newline character
  • This can lead to mysterious results in code like the following:
    cout << "Enter your age: ";
    int age;
    cin >> age;
    cout << "Enter your full name: ";
    string name;
    getline(cin, name);
    cout << "Your age: " << age << endl
         << "Your full name: " << name << endl;
    
  • To correct this problem we use cin >> ws just before getline()
    cin >> ws; // clear whitespace from input stream
    
  • We can see how to use this fix in the following example

Example Using cin >> ws

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <iostream>
using namespace std;

int main() {
    cout << "Enter your age: ";
    int age;
    cin >> age;
    cout << "Enter your full name: ";
    string name;
    cin >> ws; // clear whitespace from buffer
    getline(cin, name);
    cout << "Your age: " << age << endl
         << "Your full name: " << name << endl;
}

Check Yourself

  1. True or false: Using the >> operator with string variables only reads one word at a time.
  2. To read strings containing multiple words use the ________ function.
  3. True or false: before you switch from using the >> operator to using getline(), you must clear the next newline character from the input buffer.
  4. To clear whitespace from the input buffer use: ________.

6.1.5: Processing Text Input

  • Sometimes we need to read input as words and sometimes as lines
  • To input a sequence of words, use the loop:
    string word;
    while (cin >> word) {
       // process word
       cout << word << endl;
    }
    
  • cin >> word is the same test as cin.good() (see lesson 6.1.7)
  • To process input one line at a time, use the getline() function
    string line;
    while (getline(cin, line)) {
       // process line
       cout << line << endl;
    }
    
  • getline(cin, line) returns true as long as there is input remaining
  • The following example processes text input by counting words
  • When reading input in the while test, you need to close the stream using:
    • Ctrl + Z in Windows
    • Ctrl + D in Linux or OS X
  • Closing the stream acts as a sentinel value for the loop
  • When the stream fails the loop exits

Example Program that Reverses a Sentence

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <iostream>
#include <string>

using namespace std;

int main() {
    cout << "Enter a phrase followed by the Enter"
         << " key and Ctrl-Z/D.\n";
    string word;
    int count = 0;
    while (cin >> word) {
        count++;
    }
    cout << "Number of words: " << count << endl;

    return 0;
}

Redirection of Input and Output

  • We could use the above program by typing words at the command line
  • However, that quickly gets tedious
  • A better way is to use redirection of input (see textbook page 154)
  • The command line interfaces of most operating systems have a way to link a file to the input of a program
  • The content of the file gets fed into the program as if all the characters had been typed by a user
  • For example, after compiling the above program we type something like the following at the command line:
    ./words < input.txt
    
  • Where input.txt is the text file on which we want to count words
  • You can redirect program output to a file as well using something like:
    ./words > output.txt
    
  • You can combine input and output redirection in one command:
    ./words < input.txt > output.txt
    

Check Yourself

  1. True or false: the following code reads input one word at a time.
    string str;
    while (cin >> str) {
       cout << str << endl;
    }
    
  2. True or false: the following code reads input one line at a time.
    string str;
    while (getline(cin, str)) {
       cout << str << endl;
    }
    
  3. To close the cin input stream use the Ctrl key plus the ________ key.
  4. True or false: most operating systems let you redirect input and output at the command line.

Exercise 6.1: Finding Words (5m)

In this exercise we write code to find words in a text file. Compile and test after each step to verify your work.

Specifications

  1. Copy the following program into a text editor, save it to the home folder of Cygwin or your Terminal window as findword.cpp, and then compile and run the starter program to make sure you copied it correctly.
    #include <iostream>
    using namespace std;
    
    int main() {
        // Enter your code here
    
        return 0;
    }
    
  2. Inside main(), declare both a string variable named word and an integer variable named count, like:
    string word;
    int count = 0;
    
  3. Add a while loop to read one word at a time from cin, like:
    while (cin >> word) {
        // Add if statements here
    }
    
  4. Inside the while loop write code to add one to the count variable.
  5. Add two if-statements, one to test for the word "Shazam" and one to test for the word "bogus", reporting the word count where the word was found. For example:
    if (word == "Shazam") {
        cout << "Shazam is word " << count << endl;
    }
    
  6. Test your program by saving the words.txt file into the home folder of Cygwin or your Terminal window.

    words.txt

  7. Run the program from the command line using input redirection:
    ./findword < words.txt
    
  8. Save your program source code to submit to Canvas as part of assignment 6.

Finding Words

Code to process strings in a loop

As time permits, read the following sections and be prepared to answer the Check Yourself questions in the section: 6.1.6: Summary.

6.1.6: Summary

  • A string is a series of characters enclosed in double quotes
  • We can store text in a variable of type string, like:
    string s1 = "Hello Mom!";
  • A character is a single letter, number or special symbol
  • We can store a a single character using a variable of type char, such as:
    char letterA = 'A';
    char letterB = 'B';
    
  • Each character is stored as a number, using its ASCII code
  • Strings are stored in a character sequence starting at 0 (zero)

    String character position

  • We can access individual characters of a string using []
  • Strings are a special type of variable called objects, just like a Turtle
  • Because a string is an object, it has member functions
  • We can iterate through a string using a loop and the length() member function:
    string s = "abcdef";
    for (unsigned i = 0; i < s.length(); i++) {
        cout << "Char[" << i << "]: " << s[i] << endl;
    }
    
  • To read an entire line, you need to use the getline() function:
    getline(cin, line);
  • Sometimes cin >> can leave a '\n' character in the input stream
  • To get around this problem you can use cin >> ws before getline()
    cin >> ws; // clear whitespace from buffer
    

Check Yourself

Answer these questions to check your understanding. You can find more information by following the links after the question.

  1. String are enclosed in double quotes. What type of quote marks enclose characters? (6.1.1)
  2. The characters of a string variable can be accessed using what brackets? (6.1.2)
  3. The leftmost character of a string is accessed using which index number? (6.1.2)
  4. To print the following string vertically down the page, what code do you write? (6.1.3)
    string str = "Hi mom!";
  5. To convert the following char variable to a number, what code do you write? (6.1.4)
    char ch = '7';
  6. What is the value of the expression: 'd' - 'a' + 'A'? (6.1.4)
  7. To convert the following string variable to a number, what code do you write? (6.1.4)
    string str = "7";
  8. How many words can you enter with the following code? (6.1.5)
    string something;
    cout << "Enter something: ";
    cin >> something;
    cout << "You entered: " << something << endl;
    
  9. How can you change the previous code to read a string that includes spaces? (6.1.5)
  10. What code can you use to clear newlines and other whitespace from the input stream? (6.1.5)

MidTerm Prep (10m)

  • shut down your computers
  • take 10, return ready to take test
  • 3x5, blank paper, pencil - put name on card and paper
  • no phones or other devices
  • sit in a different place, next to different people

MidTerm - 45 minutes

  • Log into Canvas and proceed to MidTerm - page will be locked with password
  • Nothing open on your desktop except canvas (no textpad, no other browser pages!!!)
  • Instructor will come by and inspect your materials
  • Wait for instructor to start the test
  • When you are finished, give 3x5 and scratch paper to instructor
  • Please shut down your computer
  • Leave quietly

Due Today:
A5-Midterm 1 Preparation (3/7/18)
Due Next:
A6-Loopy Programs (3/14/18)
Last Updated: Wed Mar 7 14:47:52 PST 2018