Unicode Normalizer
Convert text between NFC, NFD, NFKC and NFKD forms, compare byte counts, and reveal hidden combining marks.
Per-character breakdown
| # | Char | Code point | NFC | NFD | NFKC | NFKD |
|---|---|---|---|---|---|---|
| 0 | c | U+0063 | c | c | c | c |
| 1 | a | U+0061 | a | a | a | a |
| 2 | f | U+0066 | f | f | f | f |
| 3 | é | U+00E9 | é | é | é | é |
| 4 | ́ | U+0301 | ́ | ́ | ́ | ́ |
| 5 | U+0020 | |||||
| 6 | — | U+2014 | — | — | — | — |
| 7 | U+0020 | |||||
| 8 | fi | U+FB01 | fi | fi | fi | fi |
| 9 | s | U+0073 | s | s | s | s |
| 10 | h | U+0068 | h | h | h | h |
| 11 | U+0020 | |||||
| 12 | A | U+FF21 | A | A | A | A |
| 13 | B | U+FF22 | B | B | B | B |
| 14 | C | U+FF23 | C | C | C | C |
| 15 | U+0020 | |||||
| 16 | ① | U+2460 | ① | ① | 1 | 1 |
| 17 | ② | U+2461 | ② | ② | 2 | 2 |
| 18 | ③ | U+2462 | ③ | ③ | 3 | 3 |
| 19 | U+0020 | |||||
| 20 | ffi | U+FB03 | ffi | ffi | ffi | ffi |
About Unicode Normalizer
Unicode Normalizer converts text into one of the four canonical forms defined by the Unicode standard: NFC, NFD, NFKC, and NFKD. Two strings that look identical can have completely different byte representations — normalization rewrites them into a single, predictable form so they compare equal, hash equally, and round-trip safely through databases, file systems, and APIs.
It is built on the browser-native String.prototype.normalize method, so the output matches exactly what your JavaScript, Node.js, Python (with unicodedata),
Java, or Swift code will produce.
Side-by-side outputs show which characters changed under each form, alongside UTF-16 code-unit, Unicode code-point, and UTF-8 byte counts. A per-character table reveals how every input code point decomposes — useful for debugging hidden combining marks, ligatures, and lookalike characters.
How to Use Unicode Normalizer
- Paste your text into the Input box. It can include any Unicode — accents, emoji, CJK characters, ligatures, fullwidth forms, or invisible combining marks.
- Compare the four outputs. NFC composes accents back to single characters; NFD splits them apart; NFKC and NFKD additionally fold compatibility characters (ligatures, fullwidth letters, circled digits) to plain equivalents.
- Read the modified vs unchanged badge at the top of each panel. Yellow highlighting marks the bytes that differ from the input, so you can see exactly which characters got rewritten.
- Watch the counters — code units, code points, and UTF-8 bytes — to spot when a transformation expands or shrinks the text. NFD typically grows; NFC typically shrinks.
- Inspect the per-character table to drill into each code point and confirm how it decomposes individually under each form.
- Copy the form you need with the Copy button. Use NFC for storage, NFKC for case-insensitive search and deduplication, and NFD/NFKD when stripping diacritics.
Common Use Cases
Database deduplication
Detect "duplicate" usernames or product titles that differ only by composition.
Search indexing
Use NFKC + casefold so users searching "fish" find "fish".
Diacritic stripping
Apply NFD then drop combining marks to slugify "résumé" into "resume".
Filename safety
macOS stores filenames in NFD while Linux uses NFC — normalize before syncing.
Spoof detection
Spot fullwidth or circled lookalike characters in user input or domain names.
Test data scrubbing
Reveal invisible combining marks copy-pasted from PDFs, docs, or rich text.