Strings are immutable sequences of JavaScript characters. Each such character is a 16-bit UTF-16 code unit. That means that a single Unicode character is represented by either one or two JavaScript characters. You mainly need to worry about the two-character case whenever you are counting characters or splitting strings (see Chapter 24).
Both single and double quotes can be used to delimit string literals:
'He said: "Hello"'
"He said: \"Hello\""
'Everyone\'s a winner'
"Everyone's a winner"
Thus, you are free to use either kind of quote. There are several considerations, though:
Your code will look cleaner if you quote consistently. But sometimes, a different quote means that you don’t have to escape, which can justify your being less consistent (e.g., you may normally use single quotes, but temporarily switch to double quotes to write the last one of the preceding examples).
Most characters in string literals simply represent themselves. The backslash is used for escaping and enables several special features:
You can spread a string over multiple lines by escaping the end of the line (the line-terminating character, the line terminator) with a backslash:
var
str
=
'written \
over \
multiple \
lines'
;
console
.
log
(
str
===
'written over multiple lines'
);
// true
An alternative is to use the plus operator to concatenate:
var
str
=
'written '
+
'over '
+
'multiple '
+
'lines'
;
These sequences start with a backslash:
\b
is a backspace, \f
is a form feed, \n
is a line feed (newline), \r
is a carriage return, \t
is a horizontal tab, and \v
is a vertical tab.
Escaped characters that represent themselves: \'
is a single quote, \"
is a double quote, and \\
is a backslash. All characters except b f n r t v x u
and decimal digits represent themselves, too. Here are two examples:
> '\"' '"' > '\q' 'q'
\0
.
\xHH
(HH
are two hexadecimal digits) specifies a character via an ASCII code. For example:
> '\x4D' 'M'
\uHHHH
(HHHH
are four hexadecimal digits) specifies a UTF-16 code unit (see Chapter 24). Here are two examples:
> '\u004D' 'M' > '\u03C0' 'π'
There are two operations that return the nth character of a string.[16] Note that JavaScript does not have a special data type for characters; these operations return strings:
> 'abc'.charAt(1) 'b' > 'abc'[1] 'b'
Some older browsers don’t support the array-like access to characters via square brackets.
Values are converted to a string as follows:
Value | Result |
|
|
|
|
A boolean |
|
| |
A number | The number as a string
(e.g., |
A string | Same as input (nothing to convert) |
An object | Call |
The three most common ways to convert any value to a string are:
| (Invoked as a function, not as a constructor) |
| |
| (Does not work for |
I prefer String()
, because it is more descriptive. Here are some examples:
> String(false) 'false' > String(7.35) '7.35' > String({ first: 'John', last: 'Doe' }) '[object Object]' > String([ 'a', 'b', 'c' ]) 'a,b,c'
Note that for displaying data, JSON.stringify()
(JSON.stringify(value, replacer?, space?)) often works better than the canonical conversion to string:
> console.log(JSON.stringify({ first: 'John', last: 'Doe' })) {"first":"John","last":"Doe"} > console.log(JSON.stringify([ 'a', 'b', 'c' ])) ["a","b","c"]
Naturally, you have to be aware of the limitations of JSON.stringify()
—it doesn’t always show everything. For example, it hides properties whose values it can’t handle (functions and more!). On the plus side, its output can be parsed by eval()
and it can display deeply nested data as nicely formatted trees.
> String(false) 'false' > Boolean('false') true
For undefined
and null
, we face similar problems.
There are two ways of comparing strings. First, you can use the comparison operators: <
, >
, ===
, <=
, >=
. They have the following drawbacks:
They’re case-sensitive:
> 'B' > 'A' // ok true > 'B' > 'a' // should be true false
They don’t handle umlauts and accents well:
> 'ä' < 'b' // should be true false > 'é' < 'f' // should be true false
Second, you can use String.prototype.localeCompare(other)
, which tends to fare better, but isn’t always supported (consult Search and Compare for details).
The following is an interaction in Firefox’s console:
> 'B'.localeCompare('A') 2 > 'B'.localeCompare('a') 2 > 'ä'.localeCompare('b') -2 > 'é'.localeCompare('f') -2
A result less than zero means that the receiver is “smaller” than the argument. A result greater than zero means that the receiver is “larger” than the argument.
There are two main approaches for concatenating strings.
The operator +
does string concatenation as soon as one of its operands is a string. If you want to collect string pieces in a variable, the compound assignment operator +=
is useful:
> var str = ''; > str += 'Say hello '; > str += 7; > str += ' times fast!'; > str 'Say hello 7 times fast!'
It may seem that the previous approach creates a new string whenever a piece is added to str
. Older JavaScript engines do it that way, which means that you can improve the performance of string concatenation by collecting all the pieces in an array first and joining them as a last step:
> var arr = []; > arr.push('Say hello '); > arr.push(7); > arr.push(' times fast'); > arr.join('') 'Say hello 7 times fast'
However, newer engines optimize string concatenation via +
and use a similar method internally. Therefore, the plus operator is faster on those engines.
The function String
can be invoked in two ways:
String(value)
As a normal function, it converts value
to a primitive string (see Converting to String):
> String(123) '123' > typeof String('abc') // no change 'string'
new String(str)
As a constructor, it creates a new instance of String
(see Wrapper Objects for Primitives), an object that wraps str
(nonstrings are coerced to string). For example:
> typeof new String('abc') 'object'
The former invocation is the common one.
String.fromCharCode(codeUnit1, codeUnit2, ...)
produces a string whose characters are the UTF-16 code units specified by the 16-bit unsigned integers codeUnit1
, codeUnit2
, and so on. For example:
> String.fromCharCode(97, 98, 99) 'abc'
If you want to turn an array of numbers into a string, you can do so via apply()
(see func.apply(thisValue, argArray)):
> String.fromCharCode.apply(null, [97, 98, 99]) 'abc'
The inverse of String.fromCharCode()
is String.prototype.charCodeAt()
.
The length
property indicates the number of JavaScript characters in the string and is immutable:
> 'abc'.length 3
All methods of primitive strings are stored in String.prototype
(refer back to Primitives Borrow Their Methods from Wrappers). Next, I describe how they work for primitive strings, not for instances of String
.
The following methods extract substrings from the receiver:
String.prototype.charAt(pos)
Returns a string with the character at position pos
. For example:
> 'abc'.charAt(1) 'b'
The following two expressions return the same result, but some older JavaScript engines support only charAt()
for accessing characters:
str
.
charAt
(
n
)
str
[
n
]
String.prototype.charCodeAt(pos)
Returns the code (a 16-bit unsigned integer) of the JavaScript character (a UTF-16 code unit; see Chapter 24) at position pos
.
This is how you create an array of character codes:
> 'abc'.split('').map(function (x) { return x.charCodeAt(0) }) [ 97, 98, 99 ]
The inverse of charCodeAt()
is String.fromCharCode()
.
String.prototype.slice(start, end?)
Returns the substring starting at position start
up to and excluding position end
. Both of the two parameters can be negative, and then the length
of the string is added to them:
> 'abc'.slice(2) 'c' > 'abc'.slice(1, 2) 'b' > 'abc'.slice(-2) 'bc'
String.prototype.substring(start, end?)
slice()
, which is similar, but can handle negative positions and is implemented more consistently across browsers.
String.prototype.split(separator?, limit?)
Extracts the substrings of the receiver that are delimited by separator
and returns them in an array. The method has two parameters:
separator
: Either a string or a regular expression. If missing, the complete string is returned, wrapped in an array.
limit
: If given, the returned array contains at most limit
elements.
Here are some examples:
> 'a, b,c, d'.split(',') // string [ 'a', ' b', 'c', ' d' ] > 'a, b,c, d'.split(/,/) // simple regular expression [ 'a', ' b', 'c', ' d' ] > 'a, b,c, d'.split(/, */) // more complex regular expression [ 'a', 'b', 'c', 'd' ] > 'a, b,c, d'.split(/, */, 2) // setting a limit [ 'a', 'b' ] > 'test'.split() // no separator provided [ 'test' ]
If there is a group, then the matches are also returned as array elements:
> 'a, b , '.split(/(,)/) [ 'a', ',', ' b ', ',', ' ' ] > 'a, b , '.split(/ *(,) */) [ 'a', ',', 'b', ',', '' ]
Use ''
(empty string) as a separator to produce an array with the characters of a string:
> 'abc'.split('') [ 'a', 'b', 'c' ]
While the previous section was about extracting substrings, this section is about transforming a given string into a new one. These methods are typically used as follows:
var
str
=
str
.
trim
();
In other words, the original string is discarded after it has been (nondestructively) transformed:
String.prototype.trim()
Removes all whitespace from the beginning and the end of the string:
> '\r\nabc \t'.trim() 'abc'
String.prototype.concat(str1?, str2?, ...)
Returns the concatenation of the receiver and str1
, str2
, etc.:
> 'hello'.concat(' ', 'world', '!') 'hello world!'
String.prototype.toLowerCase()
Creates a new string with all of the original string’s characters converted to lowercase:
> 'MJÖLNIR'.toLowerCase() 'mjölnir'
String.prototype.toLocaleLowerCase()
toLowerCase()
, but respects the rules of the current locale. According to the ECMAScript spec: “There will only be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode case mappings.”
String.prototype.toUpperCase()
Creates a new string with all of the original string’s characters converted to uppercase:
> 'mjölnir'.toUpperCase() 'MJÖLNIR'
String.prototype.toLocaleUpperCase()
toUpperCase()
, but respects the rules of the current locale.
The following methods are used for searching and comparing strings:
String.prototype.indexOf(searchString, position?)
Searches for searchString
starting at position
(the default is 0). It returns the position where searchString
has been found or –1 if it can’t be found:
> 'aXaX'.indexOf('X') 1 > 'aXaX'.indexOf('X', 2) 3
Note that when it comes to finding text inside a string, a regular expression works just as well. For example, the following two expressions are equivalent:
str
.
indexOf
(
'abc'
)
>=
0
/
abc
/
.
test
(
str
)
String.prototype.lastIndexOf(searchString, position?)
Searches for searchString
, starting at position
(the default is the end), backward. It returns the position where searchString
has been found or –1 if it can’t be found:
> 'aXaX'.lastIndexOf('X') 3 > 'aXaX'.lastIndexOf('X', 2) 1
String.prototype.localeCompare(other)
Performs a locale-sensitive comparison of the string with other
. It returns a number:
other
other
other
For example:
> 'apple'.localeCompare('banana') -2 > 'apple'.localeCompare('apple') 0
Not all JavaScript engines implement this method properly. Some just base it on the comparison operators. However, the ECMAScript Internationalization API (see The ECMAScript Internationalization API) does provide a Unicode-aware implementation. That is, if that API is available in an engine, localeCompare()
will work.
If it is supported, localeCompare()
is a better choice for comparing strings than the comparison operators. Consult Comparing Strings for more information.
The following methods work with regular expressions:
String.prototype.search(regexp)
(more thoroughly explained in String.prototype.search: At What Index Is There a Match?)
Returns the first index at which regexp
matches in the receiver (or –1 if it doesn’t):
> '-yy-xxx-y-'.search(/x+/) 4
String.prototype.match(regexp)
(more thoroughly explained in String.prototype.match: Capture Groups or Return All Matching Substrings)
> '-abb--aaab-'.match(/(a+)b/) [ 'ab', 'a', index: 1, input: '-abb--aaab-' ]
If the flag /g
is set, then all complete matches (group 0) are returned in an array:
> '-abb--aaab-'.match(/(a+)b/g) [ 'ab', 'aaab' ]
String.prototype.replace(search, replacement)
(more thoroughly explained in String.prototype.replace: Search and Replace)
Searches for search
and replaces it with replacement
. search
can be a string or a regular expression, and replacement
can be a string or a function. Unless you use a regular expression as search
whose flag /g
is set, only the first occurrence will be replaced:
> 'iixxxixx'.replace('i', 'o') 'oixxxixx' > 'iixxxixx'.replace(/i/, 'o') 'oixxxixx' > 'iixxxixx'.replace(/i/g, 'o') 'ooxxxoxx'
A dollar sign ($
) in a replacement string allows you to refer to the complete match or a captured group:
> 'iixxxixx'.replace(/i+/g, '($&)') // complete match '(ii)xxx(i)xx' > 'iixxxixx'.replace(/(i+)/g, '($1)') // group 1 '(ii)xxx(i)xx'
You can also compute a replacement via a function:
> function repl(all) { return '('+all.toUpperCase()+')' } > 'axbbyyxaa'.replace(/a+|b+/g, repl) '(A)x(BB)yyx(AA)'
[16] Strictly speaking, a JavaScript string consists of a sequence of UTF-16 code units. That is, JavaScript characters are Unicode code units (see Chapter 24).