c# - What is `\ U`?

I met just such a task:

string str1 = "\ U0010FADE";
string str2 = "\ U0000FADE";
Console.WriteLine (str1.Length);
Console.WriteLine (str2.Length);

As it turned out, terminals 2 and 1. What is going on here?

I know only lowercase \ u , for which should follow the 4 hexadecimal digits.

In MSDN for char \ U is not listed, it is logical – the result is clearly not char

For strings – there is mention, but still do not understand:

The escape code \ udddd (where dddd is a four-digit number) represents the Unicode character U + dddd . Eight-digit Unicode escape codes are also recognized: \ Udddddddd

.

Elsewhere says that they need to form a surrogate pairs, but also without further explanation:

\ Uxxxxxxxx – Unicode escape sequence for character with hex value xxxxxxxx (for generating surrogates)

So does the \ U and why in the second row to be not a surrogate pair, but only one character?

I tried to run on ideone , but something derived characters are not those codes, which are specified in the source code. While this may be a shoal of ideone.

Answer 1, Authority 100%

The information in the documentation is absolutely correct. Syntax \ Udddddddd simply includes Unicode character string constant with dddddddd code. This symbol can be a surrogate pair and hold two code units of UTF-16, but it can also be an ordinary character, holding a single code unit.

ECMA-334

7.4.2 Unicode character escape sequences

A Unicode escape sequence represents a Unicode code point. Unicode escape sequences are
processed in identifiers (§7.4.3), character literals (§7.4.5.5), and
regular string literals (§7.4.5.6). A Unicode escape sequence is not
processed in any other location (for example, to form an operator,
punctuator, or keyword).
unicode-escape-sequence ::
\ U hex-digit hex-digit hex-digit hex-digit
\ U hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit
A Unicode character escape sequence represents the single Unicode code point
formed by the
hexadecimal number following the “\ u” or “\ U” characters. Since C #
uses a 16-bit encoding of Unicode code points in character and string
values, a Unicode code point in the range U + 10000 to U + 10FFFF is
represented using two Unicode surrogate code units. unicode code
points above U + FFFF are not permitted in character literals. Unicode
code points above U + 10FFFF are invalid and are not supported.

In the first case, the value of the code position over U + 10000, so it is represented by two code units. In the second case – the less so one.

In other words, write \ U0000FADE is equivalent to \ uFADE , not \ u0000 \ uFADE , as it might seem at first glance (the last really consists of two code units).

What is `\ U`?

Answer 1, Authority 100%

Programmers, Start Your Engines!

Recent questions

yandex cards disappear labels with zoom

Embarcadero C++ Builder 10.3 does not give prompts by code

Found input variables with inconsistent numbers of samples error

Return to previous page

Lua C++ error handling