26. Unicode in ES6
Table of contents
Please support this book: buy it (PDF, EPUB, MOBI) or donate
(Ad, please don’t block.)

26. Unicode in ES6

This chapter explains the improved support for Unicode that ECMAScript 6 brings. For a general introduction to Unicode, read Chap. “Unicode and JavaScript” in “Speaking JavaScript”.



26.1 Unicode is better supported in ES6

There are three areas in which ECMAScript 6 has improved support for Unicode:

Additionally, ES6 is based on Unicode version 5.1.0, whereas ES5 is based on Unicode version 3.0.

26.2 Escape sequences in ES6

There are three parameterized escape sequences for representing characters in JavaScript:

Unicode code point escapes are new in ES6. They let you specify code points beyond 16 bits. If you wanted to do that in ECMAScript 5, you had to encode each code point as two UTF-16 code units (a surrogate pair). These code units could be expressed via Unicode escapes. For example, the following statement logs a rocket (code point 0x1F680) to most consoles:

console.log('\uD83D\uDE80');

With a Unicode code point escape you can specify code points greater than 16 bits directly:

console.log('\u{1F680}');

26.2.1 Where can escape sequences be used?

The escape sequences can be used in the following locations:

  \uHHHH \u{···} \xHH
Identifiers  
String literals
Template literals
Regular expression literals Only with flag /u

Identifiers:

> const hello = 123;
> hell\u{6F}
123

String literals:

Template literals:

> `hell\u{6F}` // cooked
'hello'
> String.raw`hell\u{6F}` // raw
'hell\\u{6F}'

Regular expressions:

26.2.2 Escape sequences in the ES6 spec

Various information:

26.2.2.1 Regular expressions

The spec distinguishes between BMP patterns (flag /u not set) and Unicode patterns (flag /u set). Sect. “Pattern Semantics” explains that they are handled differently and how.

As a reminder, here is how grammar rules are be parameterized in the spec:

You can see this parameterization in action in Sect. “Patterns”, where the subscript [U] creates separate grammars for BMP patterns and Unicode patterns:

Sect. “CharacterEscape” explains how various escape sequences are translated to characters (roughly: either code units or code points).

Next: 27. Tail call optimization