No Login Data Private Local Save

Text to Unicode Escape Sequence Converter - Online

14
0
0
0
✓ Copied to clipboard!

Text to Unicode Escape Sequence Converter

Convert text to Unicode escape sequences and decode them back. Supports JavaScript \uXXXX, ES6 \u{XXXXX}, HTML entities, and more.

Input Text
0 chars 0 bytes (UTF-8)
Unicode Escapes
0 sequences 0 chars

Frequently Asked Questions

A Unicode escape sequence is a way to represent a Unicode character using only ASCII characters. It's commonly used in programming languages, JSON data, and HTML when direct Unicode input isn't available or practical. For example, the Chinese character (U+4E16) can be written as \u4E16 in JavaScript, U+4E16 in Unicode notation, or in HTML. Escape sequences ensure text remains portable across different encodings and systems.

In JavaScript, \uXXXX represents a UTF-16 code unit. For characters in the Basic Multilingual Plane (BMP, U+0000 to U+FFFF), a single \uXXXX suffices. For characters above U+FFFF (like emoji 😀 at U+1F600), JavaScript uses surrogate pairs — two consecutive \uXXXX sequences: \uD83D\uDE00. ES6 introduced the extended \u{XXXXX} syntax (note the curly braces) that directly accepts the full code point, e.g., \u{1F600}, making it much easier to work with supplementary characters.

\uXXXX is a programming language syntax used in JavaScript, Java, C#, and JSON to embed Unicode characters in source code or data. U+XXXX is the Unicode Standard notation used in documentation, character charts, and specifications to identify a code point. U+XXXX is not valid syntax in most programming languages — it's purely a human-readable reference format. For example, the character "A" is U+0041 in Unicode notation but \u0041 in JavaScript.

Characters with code points above U+FFFF (called supplementary characters) include most emoji, many CJK characters, and historic scripts. In UTF-16 based systems (JavaScript, Java, Windows), these are encoded as surrogate pairs — two 16-bit code units. For example, 😀 (U+1F600) becomes \uD83D\uDE00. Our tool automatically handles surrogate pairs for the \uXXXX format. For the ES6 \u{XXXXX} and U+ formats, the full code point is used directly. HTML entities (😀 or 😀) also support the full code point natively.

Surrogate pairs are a mechanism in UTF-16 encoding to represent characters outside the BMP (U+10000 to U+10FFFF). A surrogate pair consists of a high surrogate (U+D800 to U+DBFF) and a low surrogate (U+DC00 to U+DFFF). Together they encode a single supplementary code point. This matters because naively counting characters or splitting strings can break surrogate pairs, producing invalid results. Our converter properly handles surrogate pairs when encoding to \uXXXX format and when decoding them back to text.

HTML supports two types of numeric character references: hexadecimal (&#xXXXX;) and decimal (&#DDDD;). Both refer to Unicode code points and can represent any character, including emoji. For example, the euro sign € can be written as or . Unlike \uXXXX, HTML entities use the full code point (not UTF-16 code units), so supplementary characters are straightforward: 😀 is 😀 or 😀. Our converter supports both HTML entity formats.

Yes! CSS supports Unicode escapes with a backslash followed by the hexadecimal code point: \XXXX (4-6 hex digits). This is useful in content properties and selectors. For example, content: '\2605'; displays a ★ star. CSS also supports \XXXXXX for supplementary characters (6 hex digits). Unlike JavaScript's \uXXXX, CSS doesn't use the u prefix. Note that CSS escapes are distinct from the formats in this tool, but the underlying code points are the same — you can use the U+ format from this converter as a reference.