No Login Data Private Local Save

Unicode Normalization Tester - Online See NFC vs NFD

7
0
0
0

Unicode Normalization Tester

Compare NFC, NFD, NFKC & NFKD — see how Unicode normalization transforms your text at the character level.

0 characters
Presets:
NFC Composed āœ“ Same as input ⟳ Changed
—
0 chars
NFD Decomposed āœ“ Same as input ⟳ Changed
—
0 chars
NFKC Compat Composed āœ“ Same as input ⟳ Changed
—
0 chars
NFKD Compat Decomposed āœ“ Same as input ⟳ Changed
—
0 chars
NFC vs NFD — Character-by-Character Comparison
# NFC Composed NFD Decomposed Match
Enter text above to see the comparison
Frequently Asked Questions

Unicode normalization is a process that transforms Unicode text into a consistent, canonical form. Since the same visual character can often be represented in multiple ways in Unicode (e.g., "Ć©" can be a single code point U+00E9 or a decomposed sequence U+0065 + U+0301), normalization ensures that equivalent strings compare and behave identically. This is crucial for search, indexing, password validation, file systems, and data interchange.

NFC (Normalization Form C) composes characters into their combined form whenever possible. For example, "e" + "ā—ŒĢ" (U+0065 + U+0301) becomes "Ć©" (U+00E9). It results in the shortest possible representation.

NFD (Normalization Form D) decomposes combined characters into their base character plus combining marks. For example, "Ć©" (U+00E9) becomes "e" + "ā—ŒĢ" (U+0065 + U+0301). This is useful for text processing tasks like sorting, searching, and stripping diacritics.

NFC is generally preferred for display and storage (it's more compact), while NFD is useful for internal processing and analysis.

NFKC and NFKD go a step further than NFC/NFD by also applying compatibility decomposition. They convert "compatibility" characters into their canonical equivalents:
• Ligatures like "fi" (U+FB01) → "fi" (two characters)
• Circled numbers like "ā‘ " (U+2460) → "1" (U+0031)
• Superscripts like "²" → "2"
• Full-width Latin letters → regular ASCII

Use NFKC/NFKD when you need lossy normalization for tasks like search indexing, duplicate detection, or when you want to treat visually-similar characters as identical. Be cautious — formatting information can be lost.

Hangul (Korean script) has a unique Unicode structure. Each Hangul syllable block (like "ķ•œ") can be represented either as a single precomposed code point (U+D55C in NFC) or as a sequence of individual jamo (consonant + vowel + consonant: į„’ + į…” + ᆫ = U+1112 + U+1161 + U+11AB in NFD). NFC uses the compact precomposed form, while NFD breaks it into individual jamo components. This makes NFD useful for linguistic analysis and NFC better for storage efficiency.

Unicode normalization is essential in many web scenarios:
• Form input validation: Users may submit the same text in different Unicode forms — normalize before comparison.
• Password authentication: Normalize passwords (usually NFC) to ensure consistent hashing.
• URL slug generation: NFKD + strip diacritics for clean ASCII slugs.
• Search functionality: Normalize both query and indexed content for reliable matching.
• Database indexing: Use a consistent normalization form for unique constraints.
JavaScript provides String.prototype.normalize() which supports all four forms natively in modern browsers.

Most emoji are unaffected by NFC/NFD normalization because they are already in a canonical form. However, emoji sequences using ZWJ (Zero Width Joiner, U+200D) — like family emoji "šŸ‘Øā€šŸ‘©ā€šŸ‘§ā€šŸ‘¦" — remain intact under all four normalization forms because ZWJ is a canonical character that is preserved. Some older emoji using variation selectors (U+FE0E for text style, U+FE0F for emoji style) also maintain their structure. In general, you can safely normalize text containing emoji without breaking them.

Pro tip: Most modern systems (macOS, Linux, HTML5) use NFC by default. Windows and some older systems may use NFD for file names. When in doubt, normalize to NFC for web content and NFD for low-level text processing. Use NFKC when building search indexes to collapse compatibility variants like "fi" → "fi" and "ā‘ " → "1".