Unicode Character Inspector

Inspect Unicode characters — code points, UTF-8/UTF-16 bytes, names, categories, and HTML entities.

Characters
15
Unique
11
Non-ASCII
1
Click any character card below to see full details.
H U+0048
e U+0065
l U+006C
l U+006C
o U+006F
, U+002C
U+0020
W U+0057
o U+006F
r U+0072
l U+006C
d U+0064
! U+0021
U+0020
🌍 U+1F30D

About the Unicode Character Inspector

The Unicode Character Inspector lets you decode any text character-by-character, revealing the underlying Unicode data that most editors hide. Whether you're debugging encoding issues, building an internationalized app, or simply curious what a symbol really is, this tool surfaces the full technical profile of every character instantly.

  • View the Unicode code point (e.g. U+0041) for every character in your text
  • Inspect UTF-8 and UTF-16 byte sequences in hexadecimal
  • See the official Unicode character name and category (Letter, Symbol, Punctuation, etc.)
  • Get the correct HTML entity — named (e.g. ©) or numeric (e.g. ©)
  • Look up any code point directly by entering U+XXXX, a decimal number, or pasting a character
  • Identify ASCII vs. multi-byte Unicode characters at a glance

All processing runs entirely in your browser — no text is sent to any server.

How to Use the Unicode Character Inspector

  1. 1

    Paste or type your text

    Enter any text in the "Inspect Text" tab. The tool immediately generates a character card for each code point in your input.

  2. 2

    Click a character card

    Select any card in the grid to expand its full details: code point, decimal, hex, UTF-8 bytes, UTF-16 bytes, HTML entity, name, and category.

  3. 3

    Copy any value with one click

    Each detail field has a copy button so you can grab the code point, entity, or byte sequence directly into your code or document.

  4. 4

    Look up a specific code point

    Switch to the "Lookup by Code Point" tab and enter U+XXXX, a hex value like 0x1F600, a decimal integer, or paste a single character to see its full profile.

Tip: Emoji characters are often multi-code-point sequences. Paste an emoji string in the Inspect Text tab to see exactly how many code points it uses and how each is encoded in UTF-8.

Common Use Cases

Debugging Encoding Issues

  • • Identify unexpected non-ASCII characters in strings
  • • Compare UTF-8 vs. UTF-16 byte representations
  • • Spot invisible control characters causing parse errors

Web & HTML Development

  • • Look up the correct HTML entity for special characters
  • • Verify that symbols render correctly across browsers
  • • Find the named entity (©, —) vs. numeric form

Internationalization (i18n)

  • • Inspect CJK, Arabic, Cyrillic, and other scripts
  • • Validate that multi-byte characters are stored correctly
  • • Check that right-to-left text contains the expected code points

Security & Input Validation

  • • Detect homograph characters that look like ASCII letters
  • • Find zero-width joiners, directional overrides, or spoofing chars
  • • Audit user input for unexpected Unicode categories

Learning & Research

  • • Understand how UTF-8 encodes multi-byte sequences
  • • Explore Unicode categories and character properties
  • • Look up official Unicode names for symbols and emoji

Typography & Design

  • • Find the code point for typographic quotes, dashes, and spaces
  • • Distinguish en dash (U+2013) from em dash (U+2014)
  • • Locate the exact symbol or diacritic you need

Frequently Asked Questions

What is a Unicode code point?

A code point is a unique number assigned to every character in the Unicode standard, written as U+XXXX (e.g. U+0041 for "A"). The Unicode standard covers over 1.1 million possible code points, of which about 150,000 are currently assigned.

What is the difference between UTF-8 and UTF-16?

Both are encodings of Unicode. UTF-8 uses 1–4 bytes per character and is dominant on the web. UTF-16 uses 2 or 4 bytes and is common in Windows APIs and Java. ASCII characters (U+0000–U+007F) are 1 byte in UTF-8 but 2 bytes in UTF-16.

Why do some emoji show as multiple characters?

Many emoji are composed of multiple Unicode code points joined by zero-width joiners (U+200D) or variation selectors. For example, a family emoji may be 7+ individual code points that the operating system renders as one glyph. This tool shows each code point separately.

How do I find the HTML entity for a character?

Paste or type the character in the Inspect Text tab, click its card, and the "HTML Entity" field shows the correct entity — named if one exists (like © for ©), or numeric (like € for €) as a fallback.

Is my data safe? Does this tool send text to a server?

No data leaves your browser. All Unicode analysis happens entirely client-side using JavaScript, so your text is never uploaded or stored anywhere.

Can I look up a character by its decimal value?

Yes. Switch to the "Lookup by Code Point" tab and enter the decimal number (e.g. 65 for "A", 128512 for 😀). The tool accepts U+XXXX hex notation, 0xXXXX, plain decimal integers, and single pasted characters.

What Unicode categories does this tool recognize?

The tool identifies uppercase and lowercase letters, decimal digits, punctuation, math symbols, currency symbols, space separators, control characters, CJK ideographs, Hangul syllables, emoji, and more — covering the major Unicode General Category values.

Does this work for characters outside the Basic Multilingual Plane?

Yes. The tool supports the full Unicode range up to U+10FFFF, including emoji (U+1F300–U+1FFFF), historic scripts, and supplementary CJK characters. UTF-16 surrogate pairs are computed correctly for these supplementary code points.