Unicode Normalizer

Convert text between NFC, NFD, NFKC and NFKD forms, compare byte counts, and reveal hidden combining marks.

Try a sample:
UTF-16 code units: 21 code points: 21 UTF-8 bytes: 41
NFC unchanged
café́ — fish ABC ①②③ ffi
code units: 21 code points: 21 UTF-8 bytes: 41
NFD modified
café́ — fish ABC ①②③ ffi
code units: 22 code points: 22 UTF-8 bytes: 42
NFKC modified
café́ — fish ABC 123 ffi
code units: 24 code points: 24 UTF-8 bytes: 28
NFKD modified
café́ — fish ABC 123 ffi
code units: 25 code points: 25 UTF-8 bytes: 29

Per-character breakdown

#CharCode pointNFCNFDNFKCNFKD
0cU+0063cccc
1aU+0061aaaa
2fU+0066ffff
3éU+00E9éé
4́U+0301́́́́
5 U+0020
6U+2014
7 U+0020
8U+FB01fifi
9sU+0073ssss
10hU+0068hhhh
11 U+0020
12U+FF21AA
13U+FF22BB
14U+FF23CC
15 U+0020
16U+246011
17U+246122
18U+246233
19 U+0020
20U+FB03ffiffi

About Unicode Normalizer

Unicode Normalizer converts text into one of the four canonical forms defined by the Unicode standard: NFC, NFD, NFKC, and NFKD. Two strings that look identical can have completely different byte representations — normalization rewrites them into a single, predictable form so they compare equal, hash equally, and round-trip safely through databases, file systems, and APIs.

It is built on the browser-native String.prototype.normalize method, so the output matches exactly what your JavaScript, Node.js, Python (with unicodedata), Java, or Swift code will produce.

Side-by-side outputs show which characters changed under each form, alongside UTF-16 code-unit, Unicode code-point, and UTF-8 byte counts. A per-character table reveals how every input code point decomposes — useful for debugging hidden combining marks, ligatures, and lookalike characters.

How to Use Unicode Normalizer

  1. Paste your text into the Input box. It can include any Unicode — accents, emoji, CJK characters, ligatures, fullwidth forms, or invisible combining marks.
  2. Compare the four outputs. NFC composes accents back to single characters; NFD splits them apart; NFKC and NFKD additionally fold compatibility characters (ligatures, fullwidth letters, circled digits) to plain equivalents.
  3. Read the modified vs unchanged badge at the top of each panel. Yellow highlighting marks the bytes that differ from the input, so you can see exactly which characters got rewritten.
  4. Watch the counters — code units, code points, and UTF-8 bytes — to spot when a transformation expands or shrinks the text. NFD typically grows; NFC typically shrinks.
  5. Inspect the per-character table to drill into each code point and confirm how it decomposes individually under each form.
  6. Copy the form you need with the Copy button. Use NFC for storage, NFKC for case-insensitive search and deduplication, and NFD/NFKD when stripping diacritics.

Common Use Cases

Database deduplication

Detect "duplicate" usernames or product titles that differ only by composition.

Search indexing

Use NFKC + casefold so users searching "fish" find "fish".

Diacritic stripping

Apply NFD then drop combining marks to slugify "résumé" into "resume".

Filename safety

macOS stores filenames in NFD while Linux uses NFC — normalize before syncing.

Spoof detection

Spot fullwidth or circled lookalike characters in user input or domain names.

Test data scrubbing

Reveal invisible combining marks copy-pasted from PDFs, docs, or rich text.

FAQ