AT Protocol Deep Dive: How Bluesky Really Works
A complete technical reference for the Authenticated Transfer Protocol โ from decentralized identity and data repositories to rich-text facets and why grapheme counting is enforced at the network layer.
Table of Contents
What is AT Protocol?
The open standard powering Bluesky
The Authenticated Transfer Protocol (AT Protocol or atproto) is an open networking protocol for building decentralized social applications. Developed by Bluesky PBC, it is now governed as an open specification. AT Protocol's core design goals are:
- โข
Account portability: Users can migrate between hosting providers without losing their identity, followers, or content.
- โข
Self-authenticating data: Every piece of data is cryptographically signed by the user's key, so any observer can verify authenticity without trusting a central server.
- โข
Interoperability: Any developer can build an AT Protocol client, server, or application โ the protocol defines the interfaces, not the implementations.
- โข
Algorithmic choice: Users choose their own feed algorithms via open feed generators, rather than being locked into a platform's proprietary ranking.
Decentralized Identifiers (DIDs)
How your Bluesky identity is truly portable
Every Bluesky account is backed by a DID โ a W3C Decentralized Identifier โ that is independent of any server. Your DID persists even if you move your data to a different host or if Bluesky PBC ceases to exist.
did:plc (Placeholder DID)
The current default for Bluesky accounts. A 32-character base32-encoded identifier managed via the PLC Directory โ a globally distributed log. Example: did:plc:ewvi7nxzyoun6zhhandbv25p. Rotation keys allow secure key recovery.
did:web (Web DID)
A DID resolved via HTTPS from a domain you own. If you control yourname.com, your DID document lives at https://yourname.com/.well-known/did.json. Using this makes your domain your identity โ independent of Bluesky's infrastructure entirely.
@yourname.com), Bluesky resolves your domain to your DID via a DNS TXT record or the /.well-known path. Your DID is the authoritative identity; the handle is a human-readable alias.
Personal Data Servers (PDS)
Where your posts actually live
A PDS is an HTTPS server that stores your AT Protocol repository โ all of your posts, likes, follows, and profile data. Unlike Mastodon where your data is bound to an instance, your AT Protocol repository is cryptographically signed and fully portable.
The default PDS for Bluesky users is bsky.social, operated by Bluesky PBC. However, the AT Protocol is designed so that anyone can host their own PDS. If you migrate your PDS, all of your data โ posts, likes, follows, follower list โ migrates with you. No data loss, no follower request re-approval process.
Your PDS stores data in a Merkle Search Tree (MST) โ a verifiable, hash-linked data structure similar to a git commit graph. Every write is a signed commit with a CID (Content Identifier), making your data tamper-evident.
Lexicons & Record Schemas
The type system for AT Protocol data
A Lexicon is a JSON schema document that defines the structure of a record type, XRPC method, or event type in the AT Protocol. Lexicons serve as the interoperability contract between AT Protocol services.
app.bsky.feed.post The post record type. Defines text, facets, embed, reply, and langs fields.
app.bsky.richtext.facet Rich-text annotation: byte-range index + feature type (link, mention, tag).
app.bsky.feed.like A like record with a subject reference (CID + URI of the liked post).
app.bsky.graph.follow A follow record pointing to another DID.
app.bsky.actor.profile Profile record: displayName, description, avatar, and banner.
com.atproto.sync.subscribeRepos The WebSocket firehose endpoint for streaming all repository events.
Rich-Text Facets
How links, mentions, and hashtags are encoded in posts
Bluesky posts store plain text in the text field and annotate it with facets โ structured objects that mark byte ranges of the text and attach a feature type. This separation of text from markup makes posts portable and parseable without custom HTML.
Example Facet Structure (JSON)
{
"text": "Hello @atproto.com! Check https://bsky.app #bluesky",
"facets": [
{
"index": { "byteStart": 6, "byteEnd": 17 },
"features": [{ "$type": "app.bsky.richtext.facet#mention", "did": "did:plc:..." }]
},
{
"index": { "byteStart": 25, "byteEnd": 40 },
"features": [{ "$type": "app.bsky.richtext.facet#link", "uri": "https://bsky.app" }]
},
{
"index": { "byteStart": 41, "byteEnd": 49 },
"features": [{ "$type": "app.bsky.richtext.facet#tag", "tag": "bluesky" }]
}
]
} Byte indices are UTF-8 byte offsets, not character indices. This is why Bluesky's rich-text library must use TextEncoder rather than string indexing.
Character Counting Deep Dive
Why graphemes, bytes, and JavaScript .length are all different numbers
The Three Layers
new Intl.Segmenter().segment(text) What humans perceive as one character. The AT Protocol limits posts to 300 grapheme clusters. This is what this tool measures.
new TextEncoder().encode(text).length The actual memory size of the text when encoded as UTF-8. AT Protocol adds a 3,000 byte hard cap โ the packet must be small enough to transmit. ASCII = 1 byte, most emoji = 4 bytes.
text.length Counts UTF-16 code units. A family emoji like ๐จโ๐ฉโ๐งโ๐ฆ returns .length of 11. This number is useless for Bluesky character validation.
The Relay Network & Firehose
How all Bluesky posts get distributed globally
The AT Protocol uses a Relay โ a server that aggregates repository events from all PDS hosts and re-broadcasts them as a unified event stream called the Firehose. Any service can subscribe to the Firehose at com.atproto.sync.subscribeRepos and receive every post, like, follow, and profile update on the entire network in real time.
This architecture is what enables global search, custom feed generators, and third-party analytics to work across the entire Bluesky network without needing to crawl individual servers. The main relay is operated by Bluesky PBC at bsky.network, but anyone can run a relay.
Feed Generators
The open algorithm marketplace
A Feed Generator is any web service that implements the app.bsky.feed.getFeedSkeleton Lexicon โ it returns a list of post URIs in a ranked order. Users subscribe to Feed Generators and see them alongside their Following feed.
Photography feeds
Filter for posts with image embeds + #photography hashtag
Developer feeds
Posts matching #buildinpublic, #coding, @tech handles
Location feeds
Community-built feeds for specific cities or regions
News aggregators
Curated feeds from verified journalists and news outlets
Because Feed Generators consume the Firehose and implement an open Lexicon, the community can build and publish feeds without any permission from Bluesky PBC. The Discover feed at bsky.app is itself just one of thousands of available feeds.