Unicode Normalizer

Convert text between NFC, NFD, NFKC and NFKD forms, compare byte counts, and reveal hidden combining marks.

Try a sample:

Input

UTF-16 code units: 21 code points: 21 UTF-8 bytes: 41

NFC unchanged

café́ — ﬁsh ＡＢＣ ①②③ ﬃ

code units: 21 code points: 21 UTF-8 bytes: 41

NFD modified

café́ — ﬁsh ＡＢＣ ①②③ ﬃ

code units: 22 code points: 22 UTF-8 bytes: 42

NFKC modified

café́ — fish ABC 123 ffi

code units: 24 code points: 24 UTF-8 bytes: 28

NFKD modified

café́ — fish ABC 123 ffi

code units: 25 code points: 25 UTF-8 bytes: 29

Per-character breakdown

#	Char	Code point	NFC	NFD	NFKC	NFKD
0	c	U+0063	c	c	c	c
1	a	U+0061	a	a	a	a
2	f	U+0066	f	f	f	f
3	é	U+00E9	é	é	é	é
4	́	U+0301	́	́	́	́
5		U+0020
6	—	U+2014	—	—	—	—
7		U+0020
8	ﬁ	U+FB01	ﬁ	ﬁ	fi	fi
9	s	U+0073	s	s	s	s
10	h	U+0068	h	h	h	h
11		U+0020
12	Ａ	U+FF21	Ａ	Ａ	A	A
13	Ｂ	U+FF22	Ｂ	Ｂ	B	B
14	Ｃ	U+FF23	Ｃ	Ｃ	C	C
15		U+0020
16	①	U+2460	①	①	1	1
17	②	U+2461	②	②	2	2
18	③	U+2462	③	③	3	3
19		U+0020
20	ﬃ	U+FB03	ﬃ	ﬃ	ffi	ffi

About Unicode Normalizer

Unicode Normalizer converts text into one of the four canonical forms defined by the Unicode standard: NFC, NFD, NFKC, and NFKD. Two strings that look identical can have completely different byte representations — normalization rewrites them into a single, predictable form so they compare equal, hash equally, and round-trip safely through databases, file systems, and APIs.

It is built on the browser-native String.prototype.normalize method, so the output matches exactly what your JavaScript, Node.js, Python (with unicodedata), Java, or Swift code will produce.

Side-by-side outputs show which characters changed under each form, alongside UTF-16 code-unit, Unicode code-point, and UTF-8 byte counts. A per-character table reveals how every input code point decomposes — useful for debugging hidden combining marks, ligatures, and lookalike characters.

How to Use Unicode Normalizer

Paste your text into the Input box. It can include any Unicode — accents, emoji, CJK characters, ligatures, fullwidth forms, or invisible combining marks.
Compare the four outputs. NFC composes accents back to single characters; NFD splits them apart; NFKC and NFKD additionally fold compatibility characters (ligatures, fullwidth letters, circled digits) to plain equivalents.
Read the modified vs unchanged badge at the top of each panel. Yellow highlighting marks the bytes that differ from the input, so you can see exactly which characters got rewritten.
Watch the counters — code units, code points, and UTF-8 bytes — to spot when a transformation expands or shrinks the text. NFD typically grows; NFC typically shrinks.
Inspect the per-character table to drill into each code point and confirm how it decomposes individually under each form.
Copy the form you need with the Copy button. Use NFC for storage, NFKC for case-insensitive search and deduplication, and NFD/NFKD when stripping diacritics.

Common Use Cases

Database deduplication

Detect "duplicate" usernames or product titles that differ only by composition.

Search indexing

Use NFKC + casefold so users searching "fish" find "ﬁsh".

Diacritic stripping

Apply NFD then drop combining marks to slugify "résumé" into "resume".

Filename safety

macOS stores filenames in NFD while Linux uses NFC — normalize before syncing.