No Login Data Private Local Save

String to Unicode Code Points - Online Full Decoder

13
0
0
0
✓ Copied to clipboard!

String to Unicode Code Points

Full decoder — convert any string to Unicode code points with detailed encoding information. Supports all Unicode planes including emoji.

Input String
Hello 中文 😀🎉 Café αβγ
Unicode Code Points
Total chars: 0 Code points: 0 BMP chars: 0 Supplementary: 0 UTF-8 bytes: 0 UTF-16 units: 0
Character Details
# Char Code Point (Hex) Decimal UTF-8 Encoding UTF-16 Encoding Plane Category
Enter text above to see character details
Reverse: Unicode Code Points → String Auto-detects formats: U+XXXX, \uXXXX, &#XXXX;, 0xXXXX, hex, decimal
Frequently Asked Questions

A Unicode code point is a unique number assigned to every character in the Unicode standard. It ranges from U+0000 to U+10FFFF, covering over 1.1 million possible values. Each code point represents a specific character, symbol, or control code — from basic Latin letters (like U+0041 for 'A') to complex emoji (like U+1F600 for 😀). Code points are typically written in hexadecimal with a "U+" prefix.

Simply paste or type your text into the input box above. The tool instantly converts each character to its corresponding Unicode code point. You can choose from multiple output formats: U+XXXX (standard), \uXXXX (JavaScript/JSON), &#XXXX; (HTML entities), 0xXXXX (hex literal), or plain decimal. Use the format dropdown to switch between them. The tool handles all Unicode planes, including emoji (e.g., 😀 → U+1F600).

Unicode is the standard that maps characters to code points (abstract numbers). UTF-8 and UTF-16 are encoding schemes that convert those code points into actual bytes for storage or transmission. UTF-8 uses 1-4 bytes per code point (variable-length, ASCII-compatible), making it ideal for web and email. UTF-16 uses 2 or 4 bytes per code point (with surrogate pairs for characters above U+FFFF), and is used internally by JavaScript, Java, and Windows. The character details table above shows both encodings for each character.

Use the Reverse Conversion section above. Paste your code points (in any common format — U+XXXX, \uXXXX, &#XXXX;, 0xXXXX, decimal, or plain hex) and click "Convert to String". The tool auto-detects the format and reconstructs the original text. It even handles surrogate pairs (e.g., \uD83D\uDE00 → 😀) and ES6-style \u{1F600} notation. This is useful when you need to decode Unicode escapes from JSON, HTML, or source code.

Surrogate pairs are a mechanism in UTF-16 encoding to represent Unicode code points above U+FFFF (supplementary characters like emoji, rare CJK ideographs, and historic scripts). These characters require two 16-bit code units: a high surrogate (U+D800–U+DBFF) and a low surrogate (U+DC00–U+DFFF). For example, 😀 (U+1F600) is encoded as \uD83D\uDE00 in UTF-16. Our tool correctly handles surrogate pairs in both forward conversion (showing the actual code point) and reverse conversion (combining pairs back to the original character).

Most emoji reside in the Supplementary Multilingual Plane (SMP, Plane 1) with code points ranging from U+1F300 to U+1FAFF. For instance, 🐱 is U+1F431, 💖 is U+1F496, and 🎉 is U+1F389. Some complex emoji (like 👨‍👩‍👧‍👦) are sequences of multiple code points joined by Zero Width Joiners (ZWJ, U+200D). This tool displays each individual code point in such sequences, giving you full visibility into the emoji's composition.

Different programming environments use different notations:
U+0041 — Standard Unicode notation (used in documentation, character tables)
\u0041 — JavaScript/JSON/Java escape (4 hex digits, BMP only)
\u{1F600} — ES6/JavaScript extended escape (supports all planes)
A — HTML decimal entity
A — HTML hexadecimal entity
0x0041 — Hexadecimal literal (common in C/C++/Rust)
Our tool supports all these formats for both encoding and decoding.