๐Ÿฆ‹
Bluesky Counter
๐Ÿ”ฌ Technical Reference

AT Protocol Deep Dive: How Bluesky Really Works

A complete technical reference for the Authenticated Transfer Protocol โ€” from decentralized identity and data repositories to rich-text facets and why grapheme counting is enforced at the network layer.

What is AT Protocol?

The open standard powering Bluesky

The Authenticated Transfer Protocol (AT Protocol or atproto) is an open networking protocol for building decentralized social applications. Developed by Bluesky PBC, it is now governed as an open specification. AT Protocol's core design goals are:

  • โ€ข

    Account portability: Users can migrate between hosting providers without losing their identity, followers, or content.

  • โ€ข

    Self-authenticating data: Every piece of data is cryptographically signed by the user's key, so any observer can verify authenticity without trusting a central server.

  • โ€ข

    Interoperability: Any developer can build an AT Protocol client, server, or application โ€” the protocol defines the interfaces, not the implementations.

  • โ€ข

    Algorithmic choice: Users choose their own feed algorithms via open feed generators, rather than being locked into a platform's proprietary ranking.

Decentralized Identifiers (DIDs)

How your Bluesky identity is truly portable

Every Bluesky account is backed by a DID โ€” a W3C Decentralized Identifier โ€” that is independent of any server. Your DID persists even if you move your data to a different host or if Bluesky PBC ceases to exist.

did:plc (Placeholder DID)

The current default for Bluesky accounts. A 32-character base32-encoded identifier managed via the PLC Directory โ€” a globally distributed log. Example: did:plc:ewvi7nxzyoun6zhhandbv25p. Rotation keys allow secure key recovery.

did:web (Web DID)

A DID resolved via HTTPS from a domain you own. If you control yourname.com, your DID document lives at https://yourname.com/.well-known/did.json. Using this makes your domain your identity โ€” independent of Bluesky's infrastructure entirely.

Why this matters for handles: When you use a custom domain as a Bluesky handle (e.g., @yourname.com), Bluesky resolves your domain to your DID via a DNS TXT record or the /.well-known path. Your DID is the authoritative identity; the handle is a human-readable alias.

Personal Data Servers (PDS)

Where your posts actually live

A PDS is an HTTPS server that stores your AT Protocol repository โ€” all of your posts, likes, follows, and profile data. Unlike Mastodon where your data is bound to an instance, your AT Protocol repository is cryptographically signed and fully portable.

The default PDS for Bluesky users is bsky.social, operated by Bluesky PBC. However, the AT Protocol is designed so that anyone can host their own PDS. If you migrate your PDS, all of your data โ€” posts, likes, follows, follower list โ€” migrates with you. No data loss, no follower request re-approval process.

Your PDS stores data in a Merkle Search Tree (MST) โ€” a verifiable, hash-linked data structure similar to a git commit graph. Every write is a signed commit with a CID (Content Identifier), making your data tamper-evident.

Lexicons & Record Schemas

The type system for AT Protocol data

A Lexicon is a JSON schema document that defines the structure of a record type, XRPC method, or event type in the AT Protocol. Lexicons serve as the interoperability contract between AT Protocol services.

Key Bluesky Lexicons
app.bsky.feed.post

The post record type. Defines text, facets, embed, reply, and langs fields.

app.bsky.richtext.facet

Rich-text annotation: byte-range index + feature type (link, mention, tag).

app.bsky.feed.like

A like record with a subject reference (CID + URI of the liked post).

app.bsky.graph.follow

A follow record pointing to another DID.

app.bsky.actor.profile

Profile record: displayName, description, avatar, and banner.

com.atproto.sync.subscribeRepos

The WebSocket firehose endpoint for streaming all repository events.

Rich-Text Facets

How links, mentions, and hashtags are encoded in posts

Bluesky posts store plain text in the text field and annotate it with facets โ€” structured objects that mark byte ranges of the text and attach a feature type. This separation of text from markup makes posts portable and parseable without custom HTML.

Example Facet Structure (JSON)

{
  "text": "Hello @atproto.com! Check https://bsky.app #bluesky",
  "facets": [
    {
      "index": { "byteStart": 6, "byteEnd": 17 },
      "features": [{ "$type": "app.bsky.richtext.facet#mention", "did": "did:plc:..." }]
    },
    {
      "index": { "byteStart": 25, "byteEnd": 40 },
      "features": [{ "$type": "app.bsky.richtext.facet#link", "uri": "https://bsky.app" }]
    },
    {
      "index": { "byteStart": 41, "byteEnd": 49 },
      "features": [{ "$type": "app.bsky.richtext.facet#tag", "tag": "bluesky" }]
    }
  ]
}

Byte indices are UTF-8 byte offsets, not character indices. This is why Bluesky's rich-text library must use TextEncoder rather than string indexing.

Character Counting Deep Dive

Why graphemes, bytes, and JavaScript .length are all different numbers

The Three Layers

Graphemes (Bluesky post limit) new Intl.Segmenter().segment(text)

What humans perceive as one character. The AT Protocol limits posts to 300 grapheme clusters. This is what this tool measures.

UTF-8 Bytes (Network limit) new TextEncoder().encode(text).length

The actual memory size of the text when encoded as UTF-8. AT Protocol adds a 3,000 byte hard cap โ€” the packet must be small enough to transmit. ASCII = 1 byte, most emoji = 4 bytes.

JavaScript .length (Not used by Bluesky) text.length

Counts UTF-16 code units. A family emoji like ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ returns .length of 11. This number is useless for Bluesky character validation.

The Relay Network & Firehose

How all Bluesky posts get distributed globally

The AT Protocol uses a Relay โ€” a server that aggregates repository events from all PDS hosts and re-broadcasts them as a unified event stream called the Firehose. Any service can subscribe to the Firehose at com.atproto.sync.subscribeRepos and receive every post, like, follow, and profile update on the entire network in real time.

This architecture is what enables global search, custom feed generators, and third-party analytics to work across the entire Bluesky network without needing to crawl individual servers. The main relay is operated by Bluesky PBC at bsky.network, but anyone can run a relay.

Feed Generators

The open algorithm marketplace

A Feed Generator is any web service that implements the app.bsky.feed.getFeedSkeleton Lexicon โ€” it returns a list of post URIs in a ranked order. Users subscribe to Feed Generators and see them alongside their Following feed.

๐Ÿ“ธ

Photography feeds

Filter for posts with image embeds + #photography hashtag

๐Ÿ’ป

Developer feeds

Posts matching #buildinpublic, #coding, @tech handles

๐ŸŒ

Location feeds

Community-built feeds for specific cities or regions

๐Ÿ“ฐ

News aggregators

Curated feeds from verified journalists and news outlets

Because Feed Generators consume the Firehose and implement an open Lexicon, the community can build and publish feeds without any permission from Bluesky PBC. The Discover feed at bsky.app is itself just one of thousands of available feeds.