πŸ¦‹
Bluesky Counter
πŸ”¬ Unicode Grapheme Debugger

Emoji & Grapheme Inspector

Paste any text to see a character-level breakdown of every Unicode grapheme cluster β€” byte weight, codepoints, and category. Understand exactly why πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ counts as one character.

🧬 What is a Grapheme Cluster?

A grapheme cluster is what humans perceive as a single character. A family emoji like πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ is actually four separate Unicode codepoints joined by Zero Width Joiners (ZWJ), but Bluesky's AT Protocol counts it as exactly 1 grapheme using Intl.Segmenter.

Input Text or Emoji

πŸ”¬

Paste any text above to inspect grapheme clusters

Try emojis like πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ or accented characters like cafΓ©

Understanding Unicode Graphemes

Why the same emoji can be 1 grapheme but 8 JavaScript characters

πŸ”— Zero Width Joiners (ZWJ)

Many complex emoji like πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ and πŸ³οΈβ€πŸŒˆ use Unicode's Zero Width Joiner (U+200D) character to sequence multiple emoji into a single visual glyph. JavaScript's .length counts each code unit separately, but Intl.Segmenter correctly identifies the combined sequence as one grapheme.

πŸ‡ΊπŸ‡Έ Regional Indicator Flags

Country flags like πŸ‡ΊπŸ‡Έ are encoded as pairs of Regional Indicator Symbols (e.g., U+1F1FA + U+1F1F8 for US). Each symbol occupies 4 UTF-8 bytes, making a single flag 8 bytes β€” yet it's still just 1 grapheme on Bluesky.

🎨 Skin Tone Modifiers

Emojis with skin tones (e.g., πŸ‘‹πŸ½) use a base emoji followed by one of 5 Fitzpatrick skin tone modifier codepoints (U+1F3FB–U+1F3FF). The pair forms one grapheme cluster even though it comprises two Unicode codepoints.

Γ© Combining Diacritical Marks

Accented characters like Γ© can be represented two ways: as a precomposed character (U+00E9, 1 codepoint) or as an ASCII letter e + a combining acute accent (U+0301) β€” two codepoints. Both render identically and count as 1 grapheme in Bluesky.

Frequently Asked Questions

Why does JavaScript report a different length than this tool?

JavaScript's .length property counts UTF-16 code units, not visual characters. A single emoji can be 2, 4, 8, or more code units. This tool uses Intl.Segmenter β€” the same API Bluesky's AT Protocol uses β€” to count actual grapheme clusters.

What does "byte weight" mean?

UTF-8 is a variable-width encoding. ASCII characters (A–Z, 0–9) use 1 byte each. Characters with diacritics use 2 bytes. Most emoji use 4 bytes. Bluesky enforces a 3,000 UTF-8 byte limit alongside the 300 grapheme limit β€” whichever you hit first stops your post.

What is a multi-codepoint cluster?

Any grapheme composed of more than one Unicode codepoint β€” such as flag emoji (2 regional indicators), skin-toned emoji (base + modifier), family emoji (base emojis + ZWJ characters), or combining diacritics (letter + accent mark). This inspector tags these with the "multi" badge.