Concepts of Programming Languages

(Sean Pound) #1
6.3 Character String Types 253

The second option is to allow strings to have varying length up to a
declared and fixed maximum set by the variable’s definition, as exemplified
by the strings in C and the C-style strings of C++. These are called limited
dynamic length strings. Such string variables can store any number of char-
acters between zero and the maximum. Recall that strings in C use a special
character to indicate the end of the string’s characters, rather than maintaining
the string length.
The third option is to allow strings to have varying length with no maxi-
mum, as in JavaScript, Perl, and the standard C++ library. These are called
dynamic length strings. This option requires the overhead of dynamic storage
allocation and deallocation but provides maximum flexibility.
Ada 95+ supports all three string length options.

6.3.4 Evaluation


String types are important to the writability of a language. Dealing with strings
as arrays can be more cumbersome than dealing with a primitive string type.
For example, consider a language that treats strings as arrays of characters
and does not have a predefined function that does what strcpy in C does.
Then, a simple assignment of one string to another would require a loop. The
addition of strings as a primitive type to a language is not costly in terms of
either language or compiler complexity. Therefore, it is difficult to justify the
omission of primitive string types in some contemporary languages. Of course,
providing strings through a standard library is nearly as convenient as having
them as a primitive type.
String operations such as simple pattern matching and catenation are
essential and should be included for string type values. Although dynamic-
length strings are obviously the most flexible, the overhead of their implemen-
tation must be weighed against that additional flexibility.

6.3.5 Implementation of Character String Types


Character string types could be supported directly in hardware; but in most
cases, software is used to implement string storage, retrieval, and manipulation.
When character string types are represented as character arrays, the language
often supplies few operations.
A descriptor for a static character string type, which is required only dur-
ing compilation, has three fields. The first field of every descriptor is the name
of the type. In the case of static character strings, the second field is the type’s
length (in characters). The third field is the address of the first character. This
descriptor is shown in Figure 6.2. Limited dynamic strings require a run-time
descriptor to store both the fixed maximum length and the current length,
as shown in Figure 6.3. Dynamic length strings require a simpler run-time
descriptor because only the current length needs to be stored. Although we
depict descriptors as independent blocks of storage, in most cases, they are
stored in the symbol table.
Free download pdf