No Login Data Private Local Save

Text to ASCII & Unicode Code Points - Online Full Converter

14
0
0
0

Text to ASCII & Unicode Code Points Converter

Instantly convert any text into decimal, hexadecimal, octal, binary, and U+ notation code points. Full bidirectional conversion with real-time updates.

Full Unicode · Emoji Ready
ASCII: 0 Extended BMP: 0 Supplementary: 0 Total chars: 0 UTF-8 bytes: 0 Enter text to see statistics
Base-10 code points
Base-16 code points
Base-8 code points
Base-2 code points
Standard U+XXXXXXXX format
Character Detail Breakdown 0 characters analyzed
Char Unicode Name / Info Decimal Hex Octal Binary U+ Notation UTF-8 Bytes
Enter text above to see detailed breakdown
Reverse Conversion: Code Points → Text Paste code points to convert back to text
Frequently Asked Questions

A code point is a numerical identifier assigned to each character in a character set. ASCII (American Standard Code for Information Interchange) defines 128 characters (0–127), covering basic Latin letters, digits, and punctuation. Unicode extends this to over 1.1 million possible code points (0x0 to 0x10FFFF), encompassing virtually all writing systems, symbols, and emoji. ASCII is a subset of Unicode — every ASCII character has the same code point in both systems.

U+ notation is the standard way to denote Unicode code points, using the prefix "U+" followed by 4–6 hexadecimal digits (e.g., U+0041 for 'A', U+1F600 for 😀). Regular hexadecimal is the raw base-16 number (e.g., 41 or 0x41). U+ notation is universally recognized in Unicode documentation and is padded to at least 4 digits for clarity. For supplementary characters (code points above U+FFFF), U+ notation uses 5–6 hex digits.

The reverse converter accepts code points in various formats — decimal (72 101 108), hexadecimal (0x48 0x65 or just 48 65), octal (0o110 0o145), or U+ notation (U+0048 U+0065). Use the Input Interpretation dropdown to specify how bare numbers should be interpreted. The tool auto-detects prefixes like 0x, U+, 0o, and 0b. Valid code points range from 0 to 1,114,111 (0x10FFFF).

UTF-8 is a variable-length encoding. Characters in the ASCII range (U+0000–U+007F) use 1 byte. Characters in U+0080–U+07FF (including most Latin extensions, Greek, Cyrillic) use 2 bytes. Characters in U+0800–U+FFFF (most CJK ideographs, many symbols) use 3 bytes. Supplementary characters (U+10000–U+10FFFF), including all modern emoji, use 4 bytes. This tool shows the exact UTF-8 byte count for each character in the detail table.

Converting text to code points is essential for programming, debugging, data encoding, and internationalization. Developers use code points to understand character encoding issues, generate escape sequences, debug text processing pipelines, and ensure proper Unicode handling in software. Web developers use them for HTML entities and CSS escapes. Linguists and researchers use code points to identify specific characters in large character sets. The tool is also valuable for learning about character sets and encoding.

Yes. This tool uses JavaScript's native String.prototype.codePointAt() and String.fromCodePoint() methods, which fully support the entire Unicode range from U+0000 to U+10FFFF. This includes all rare scripts, historic scripts, mathematical symbols, musical notation, and all emoji (including skin tone variants and ZWJ sequences, though each component in a sequence is treated as a separate code point). The tool correctly handles supplementary plane characters that require surrogate pairs in UTF-16.

Control characters (U+0000–U+001F and U+007F–U+009F) are displayed with their official Unicode names or abbreviations in the detail table (e.g., LF for Line Feed U+000A, CR for Carriage Return U+000D, TAB for Horizontal Tabulation U+0009). This prevents layout issues and makes the output more readable. In the code point output sequences, their numerical values are still shown correctly.

The statistics bar uses a color-coded system to help you quickly understand the character composition of your text: ● Green marks ASCII characters (U+0000–U+007F, the original 128 characters), ● Blue marks Extended BMP characters (U+0080–U+FFFF, covering most world scripts and symbols), and ● Purple marks Supplementary plane characters (U+10000–U+10FFFF, including emoji and rare historic scripts). This visual breakdown is helpful for developers assessing encoding requirements.