Characters and Strings

FIELDS OF STUDY

Software Development; Coding Techniques; Computer Science

ABSTRACT

Characters and strings are basic units of programming languages that direct computer programs to store, retrieve, and manipulate data. Characters represent the individual letters, numbers, punctuation marks, and other symbols in programming languages. Strings are groups of characters arranged in a specific sequence.

PRINCIPAL TERMS

WHAT ARE CHARACTERS AND STRINGS?

Characters are the most basic units of programming languages. Characters include letters, numbers, punctuation marks, blank spaces, and the other symbols. In computer programming, characters may be grouped together to form units of information called strings. A string can be created by placing one or more characters in sequential order. For example, the letters O, P, and T are characters. Those three characters can be combined to create the strings “TOP,” “OPT,” and “POT.” Strings that use the same characters can have different meanings and outputs depending on the sequence in which the characters are arranged.

Computer programs use variables to store and manipulate data. Variables are referenced by unique symbolic names. The data stored in a variable is referred to as a value. Different types of data can be stored as variables, including numeric and textual information. For example, the variable used to hold an error message might be named ErrorMessage. If a computer program needs to notify the user that an error has occurred, it could accomplish this by storing the string “Error!” in the variable ErrorMessage. The program could then issue a command to instruct the computer to display the error message.

The rules that govern the arrangement and meanings of words and symbols in a programming language are the syntax. For example, one computer language might use the following syntax to display the error message.




Characters and Strings

In this example, ErrorMessage is the variable. “Error!” is the string. The command is named echo, and the statements are separated by line breaks. On the other hand, another language might use different syntax to accomplish the same task:




Characters and Strings

There are several ways in which string values may be created. One way is through the use of string literals. String literals are strings that represent a variable with a fixed value. String literals are enclosed in quotation marks in the source code. For example, the string “John Smith” might be stored in the variable CustomerName as follows:




Characters and Strings

String literals can also be created by combining various placeholders. For example, the variable CustomerName might also be created as follows:




Characters and Strings

The process of adding two string variables together in this manner is called concatenation.

Strings may also be created by combining variables, string literals, and constants. When a computer program evaluates various placeholders within a string, it is called interpolation. For example, the variables FirstName and LastName, along with the string literals “Dear” and blank spaces, might be used to store the string value “Dear John Smith” in the variable Salutation, as follows:




Characters and Strings

In this example, the string value “John” is stored in the variable FirstName, and the string value “Smith” is stored in the variable LastName. The final line of code uses interpolation to combine the data stored in the variables with the string literals “Dear” and blank spaces to create a new variable named Salutation, which stores the string value “Dear John Smith.”

USING SPECIAL CHARACTERS IN STRINGS

As demonstrated in the previous examples, the syntax for a computer language might use double quotation marks to define a string as follows:




Characters and Strings

When a character, such as the double quotation mark, is used to define a string, it is called a special character. The use of a special character to define a string creates a problem when the special character itself needs to be used within the string. Programming languages solve this problem through the use of character combinations to escape the string. For example, a double quotation mark that is used in a string enclosed by double quotation marks might be represented by a backslash as follows:




Characters and Strings

This code would output the text “Hello there, Mr. Jones!” to the user's monitor. The backslash changes the meaning of the quotation mark that it precedes from marking the end of a string to representing an ordinary character within the string.

STRINGS AND CHARACTERS IN REAL-WORLD DATA PROCESSING

The use of characters and strings to store and manipulate the symbols that make up written languages is commonplace in computer programming. Strings are used to process textual data of all types. Computer programs use strings and characters to sort, search, and combine a vast array of textual information rapidly and accurately. The widespread use of textual data in computer programming has led to the development of industry standards that promote sharing of text across programming languages and software. For example, Unicode, the most widely used system for processing, displaying, and storing textual data, standardizes the encoding for more than 128,000 characters and 135 languages. Due to their widespread use, strings and characters form one of the cornerstones of computer and information science.

—Maura Valentino, MSLIS

Friedman, Daniel P., and Mitchell Wand. Essentials of Programming Languages. 3rd ed., MIT P, 2008.

Haverbeke, Marijn. Eloquent JavaScript: A Modern Introduction to Programming. 2nd ed., No Starch Press, 2014.

MacLennan, Bruce J. Principles of Programming Languages: Design, Evaluation, and Implementation. 3rd ed., Oxford UP, 1999.

Scott, Michael L. Programming Language Pragmatics. 4th ed., Morgan Kaufmann Publishers, 2016.

Schneider, David I. Introduction to Programming using Visual Basic. 10th ed., Pearson, 2016.

Van Roy, Peter, and Seif Haridi. Concepts, Techniques, and Models of Computer Programming. MIT P, 2004.