Skip to main content

Skygaze Hackathon

· 3 min read
Cooper Edmunds
Skygaze

This is a guest blog post by Skygaze, creators of the For You feed. You can check out For You, the custom feed that learns what you like, at https://skygaze.io/feed.

Last Sunday, 70 engineers came together at the YC office in San Francisco for the first Bluesky AI Hackathon. The teams took full advantage of Bluesky’s complete data openness to build 17 pretty spectacular projects, many of which genuinely surprised us. My favorites are below. Thank you to Replicate for donating $50 of LLM and image model credits to each participant and sponsoring a $1000 prize for the winning team!

The 17 projects covered a wide range of categories: location-based feeds, feeds with dynamic author lists, collaborative image generation, text moderation, NSFW image labeling, creator tools, and more. The top three stood out for their creativity, practicality, and completeness (despite having only ~6 hours to build), and we’ll share a bit about them below.

Convo Detox

@paritoshk.bsky.social and team came in first place with Convo Detox–a bot that predicts when a thread is at high risk of becoming toxic and interjects to diffuse tension. We were particularly impressed with the team’s use of a self-hosted model trained on Reddit data specifically to predict conversations that are likely to get heated. As a proof of concept they deployed it as a bot that can be summoned via mention, but in the near future this would make for a great third party moderation label.

SF IRL

This is a bot that detects and promotes tech events happening in SF. In addition to flagging events, it keeps track of the accounts posting about SF tech and serves a feed with all of the posts from those accounts. We think simple approaches to dynamic author lists is a very interesting 90/10 on customized feeds and (if designed reasonably) could be both easier for the feed maintainer and higher quality for the feed consumers.

NSFW Image Detection

On Bluesky, users can set whether they want adult content to show up in their app. Beyond this level of customization, whether or not an image is labeled as NSFW can be customized as well — people have a wide variety of preferences. This team trained a model to classify images into a large number of NSFW categories, which would theoretically fit nicely into the 3rd party moderation labeler interface. It’s neat that their choice of architecture extends naturally to processing text in tandem with images.

Other Projects

Other noteworthy projects included translation bots, deep fake detectors, a friend matchmaker, and an image generator tool that allowed people to build image generated prompts together in reply threads. It was genuinely incredibly impressive and exciting to see what folks with no previous AT Proto experience were able to put together (and often deploy !!!) in only a few hours.

Additional Resources

We prepared some starter templates for the hackathon, and want to share them below for anyone who couldn’t attend the event in person!

And if you’re interested in hosting your own bluesky hackathon but don’t know where to start, please feel free to copy all of our invite copy, starter repos, and datasets.

Early Access Federation for Self-Hosters

· 5 min read

For a high-level introduction to data federation, as well as a comparison to other federated social protocols, check out the Bluesky blog.

Update May 2024: we have removed the Discord registration requirement, and PDS instances can now connect to the network directly. You are still welcome to join the PDS Admins Discord for community support.

Today, we’re releasing an early-access version of federation intended for self-hosters and developers.

The atproto network is built upon a layer of self-authenticating data. This foundation is critical to guaranteeing the network’s long term integrity. But the protocol’s openness ultimately flows from a diverse set of hosts broadcasting this data across the network.

Up until now, every user on the network used a Bluesky PDS (Personal Data Server) to host their data. We’ve already federated our own data hosting on the backend, both to help operationally scale our service, and to prove out the technical underpinnings of an openly federated network. But today we’re opening up federation for anyone else to begin connecting with the network.

The PDS, in many ways, fulfills a simple role: it hosts your account and gives you the ability to log in, it holds the signing keys for your data, and it keeps your data online and highly available. Unlike a Mastodon instance, it does not need to function as a full-fledged social media service. We wanted to make atproto data hosting—like web hosting—into a fairly simple commoditized service. The PDS’s role has been limited in scope to achieve this goal. By limiting the scope, the role of a PDS in maintaining an open and fluid data network has become all the more powerful.

We’ve packaged the PDS into a friendly distribution with an installer script that handles much of the complexity of setting up a PDS. After you set up your PDS and join the PDS Admins Discord to submit a request for your PDS to be added to the network, your PDS’s data will get routed to other services in the network (like feed generators and the Bluesky Appview) through our Relay, the firehose provider. Check out our Federation Overview for more information on how data flows through the atproto network.

Early Access Limitations

As with many open systems, Relays will never be totally unconstrained in terms of what data they’re willing to crawl and rebroadcast. To prevent network and resource abuse, it will be necessary to place rate limits on the PDS hosts that they consume data from. As trust and reputation is established with PDS hosts, those rate limits will increase. We’ll gain capacity to increase the baseline rate limits we have in place for new PDSs in the network as we build better tools for detecting and mitigating abuse..

For a smooth transition into a federated network, we’re starting with some lower limits. Specifically, each PDS will be able to host 10 accounts and limited to 1500 evts/hr and 10,000 evts/day. After those limits are surpassed, we’ll stop crawling the PDS until the rate limit period resets. This is intended to keep the network and firehose running smoothly for everyone in the ecosystem.

These are early days, and we have some big changes still planned for the PDS distribution (including the introduction of OAuth!). The software will be updating frequently and things may break. We will not be breaking things indiscriminately. However, in this early period, in order to avoid cruft in the protocol and PDS distribution, we are not making promises of backwards compatibility. We will be supporting a migration path for each release, but if you do not keep your PDS distribution up to date, it may break and render the app unusable until you do so.

Because the PDS distribution is not totally settled, we want to have a line of communication with PDS admins in the network, so we’re asking any developer that plans to run a PDS to join the PDS Admins Discord. You’ll need to provide the hostname of your PDS and a contact email in order to get your PDS added to the Relay’s allowlist. This Discord will serve as a channel where we can announce updates about the PDS distribution, relay policy, and federation more generally. It will also serve as a community where PDS admins can experiment, chat, and help each other debug issues.

Account Migration

A major promise of the AT Protocol is the ability to migrate accounts between PDS hosts. This is an important check against potential abuse, and further safeguards the fluid open layer of data hosting that underpins the network.

The PDS distribution that we’re releasing has all of the facilities required to migrate accounts between servers. We’re also opening routes on our PDS that will allow you to migrate your account off our server. However — we do not yet provide the capability to migrate back to the Bluesky PDS, so for the time being, this is a one way street. Be warned: these migrations involve possibly destructive identity operations. While we have some guardrails in place, it may still be possible for you to break your account and we will not be able to help you recover it. So although it’s technically possible, we do not recommend migrating your main account between servers yet. We especially recommend against doing so if you do not have familiarity with how DID PLC operations work.

In the coming months we will be hardening this feature and making it safer and easier to do, including creating an in-app flow for moving between servers.

Getting Started

To get started, join the PDS Administrators Discord, and check out the bluesky-social/pds repo on Github. The README will provide all necessary information on getting your PDS setup and running.

Featured Community Project: Bridgy Fed

· 3 min read

Bridgy Fed is a bridge between decentralized social networks that currently supports the IndieWeb and the Fediverse, a portmanteau of “federated” and “universe” that refers to a collection of networks including Mastodon. It's a work-in-progress by Ryan Barrett (@snarfed.org), who has already added initial Bluesky support, and is planning on launching it publicly once Bluesky launches federation early next year.

Bridgy Fed is open source, and Ryan has a guide on how IDs and handles are translated between networks. He welcomes feedback!

Screenshot of Bridgy Fed


Can you share a bit about yourself and your background?

I'm a dad, San Francisco resident, and stereotypical Silicon Valley engineer who's always been interested in owning his presence online.

What is Bridgy Fed?

Bridgy Fed is a bridge between decentralized social networks. It currently supports the IndieWeb and the Fediverse, and I soon plan to add other protocols like Bluesky and Nostr.

It's fully bidirectional; from any supported network, you can follow anyone on any other network, see their posts, reply or like or repost them, and those interactions flow across to their network and vice versa. More details here.

Initial Bluesky support is complete! All interactions are working, in both directions. I'm looking forward to launching it publicly after Bluesky federation itself launches!

What inspired you to build Bridgy Fed?

The very first time I posted on Facebook, back over 20 years ago when it was just for college students, I immediately understood that I didn't control or own that space. I had no guarantees as to whether my profile and posts would stay there, who'd see them, etc. I started posting to my website/blog first, and only afterward copied those posts to social networks like Facebook.

I've been working on this stuff ever since, including tools like Granary and Bridgy classic and the IndieWeb community, historical decentralized social protocols like OpenSocial and OStatus, and most recently ActivityPub and Bluesky’s AT Protocol.

What tech stack is Bridgy Fed built on?

Bridgy Fed runs on Google's App Engine serverless platform. It's written in Python, uses libraries like Granary, and leverages standards like webmention and microformats2 in addition to ActivityPub and atproto. I'd eventually like to migrate it to asyncio, but otherwise its stack is serving it well.

What's in the future for Bridgy Fed?

I can't wait to launch Bluesky support! Nostr too. I'm also looking forward to extending the current IndieWeb support to any web site, using standard metadata like OGP and RSS and Atom feeds.


You can follow Ryan on Bluesky here, find the Bridgy Fed GitHub repo here, and keep an eye out for Bridgy Fed’s launch next year!

Note: Use an App Password when logging in to third-party tools for account security and read our disclaimer for third-party applications.

Download and Parse Repository Exports

· 9 min read

One of the core principles of the AT Protocol is simple access to public data, including posts, multimedia blobs, and social graph metadata. A user's data is stored in a repository, which can be efficiently exported all together as a CAR file (.car). This post will describe how to export and parse a data repository.

A user's data repository consists of individual records, each of which can be accessed in JSON format via HTTP API endpoints.

The example code in this post is in the Go programming language, and uses the atproto SDK packages from indigo. You can find the full source code in our example cookbook GitHub repository.

This post is written for a developer audience. We plan on adding a feature for users to easily export their own data from within the app in the future.

Privacy Notice

While atproto data is public, you should take care to respect the rights, intents, and expectations of others. The following examples work for downloading any account's public data.

This goes beyond following copyright law, and includes respecting content deletions and block relationships. Images and other media content does not come with any reuse rights, unless explicitly noted by the account holder.

Download a Repository

On Bluesky's Main PDS Instance

You can easily construct a URL to download a repository on Bluesky's main PDS instance. In this case, the PDS host is https://bsky.social, the Lexicon endpoint is com.atproto.sync.getRepo, and the account DID is passed as a query parameter.

As a result, the download URL for the @atproto.com account is:

https://bsky.social/xrpc/com.atproto.sync.getRepo?did=did:plc:ewvi7nxzyoun6zhxrhs64oiz

Note that this endpoint intentionally does not require authentication: content in a user's repository is public (much like a public website), and anybody can download it from the web. Such content includes posts and likes, but does not include content like mutes and list subscriptions.

If you navigate to that URL, you'll download the repository for the @atproto.com account on Bluesky. But if you try to open that file, it won't make sense yet. We'll show you how to parse the data later in this post.

On Another Instance

In the more general case, we start with any "AT Identifier" (handle or DID). We need to find account's PDS instance. This involves first resolving the handle or DID to the account's DID document, then parsing out the #atproto_pds service entry.

The github.com/bluesky-social/indigo/atproto/identity package handles all of this for us already:

import (
"fmt"
"context"

"github.com/bluesky-social/indigo/atproto/identity"
"github.com/bluesky-social/indigo/atproto/syntax"
"github.com/bluesky-social/indigo/xrpc"
)

func main() {
run()
}

func run() error {
ctx := context.Background()
atid, err := syntax.ParseAtIdentifier("atproto.com")
if err != nil {
return err
}

dir := identity.DefaultDirectory()
ident, err := dir.Lookup(ctx, *atid)
if err != nil {
return err
}

if ident.PDSEndpoint() == "" {
return fmt.Errorf("no PDS endpoint for identity")
}

fmt.Println(ident.PDSEndpoint())
}

Once we know the PDS endpoint, we can create an atproto API client, call the getRepo endpoint, and write the results out to a local file on disk:

carPath := ident.DID.String() + ".car"

xrpcc := xrpc.Client{
Host: ident.PDSEndpoint(),
}

repoBytes, err := comatproto.SyncGetRepo(ctx, &xrpcc, ident.DID.String(), "")
if err != nil {
return err
}

err = os.WriteFile(carPath, repoBytes, 0666)
if err != nil {
return err
}

The go-repo-export example from the cookbook repository implements this with the download-repo command:

> ./go-repo-export download-repo atproto.com
resolving identity: atproto.com
downloading from https://bsky.social to: did:plc:ewvi7nxzyoun6zhxrhs64oiz.car

Now you know that the @atproto.com account's PDS instance is at bsky.social, and we've downloaded the repository's CAR file.

Parse Records from CAR File as JSON

What's a CAR File?

CAR files are a standard file format from the IPLD ecosystem. They stand for "Content Addressable aRchives." They have a simple binary format, with a series of binary (CBOR) blocks concatenated together, not dissimilar to tar files or Git packfiles. They're well-suited for efficient data processing and archival storage, but they're not the most accessible to developers.

The Repository CAR File

The repository data structure is a key-value store, with the keys being a combination of a collection name (NSID) and "record key", separated by a slash (<collection>/<rkey>). The CAR file contains a reference (CID) pointing to a (signed) commit object, which then points to the top of the key-value tree. The commit object also has an atproto repo version number (at time of writing, currently 3), the account DID, and a revision string.

Let's load the repository tree structure out of the CAR file in to memory, and list all of the record paths (keys).


import (
"encoding/json"
"context"
"fmt"
"os"
"path/filepath"

"github.com/bluesky-social/indigo/repo"
)
func carList(carPath string) error {
ctx := context.Background()
fi, err := os.Open(carPath)
if err != nil {
return err
}

// read repository tree in to memory
r, err := repo.ReadRepoFromCar(ctx, fi)
if err != nil {
return err
}

// extract DID from repo commit
sc := r.SignedCommit()
did, err := syntax.ParseDID(sc.Did)
if err != nil {
return err
}
topDir := did.String()

// iterate over all of the records by key and CID
err = r.ForEach(ctx, "", func(k string, v cid.Cid) error {
fmt.Printf("%s\t%s\n", k, v.String())
return nil
})
if err != nil {
return err
}
return nil
}

Note that the ForEach iterator provides a record path string as a key, and a CID as the value, instead of the record data itself. If we want to get the record itself, we need to fetch the "block" (CBOR bytes) from the repository, using the CID reference.

Let's also convert the binary CBOR data in to a more accessible JSON format, and write out records to disk. The following code snippet could go in the ForEach function in the previous example:

// where to write out data on local disk
recPath := topDir + "/" + k
os.MkdirAll(filepath.Dir(recPath), os.ModePerm)
if err != nil {
return err
}


// fetch the record CBOR and convert to a golang struct
_, rec, err := r.GetRecord(ctx, k)
if err != nil {
return err
}

// serialize as JSON
recJson, err := json.MarshalIndent(rec, "", " ")
if err != nil {
return err
}

if err := os.WriteFile(recPath+".json", recJson, 0666); err != nil {
return err
}

return nil

In the cookbook repository, the go-repo-export example implements these as list-records and unpack-record:

> ./go-repo-export list-records did:plc:ewvi7nxzyoun6zhxrhs64oiz.car
=== did:plc:ewvi7nxzyoun6zhxrhs64oiz ===
key record_cid
app.bsky.actor.profile/self bafyreifbxwvk2ewuduowdjkkjgspiy5li2dzyycrnlbu27gn3hfgthez3u
app.bsky.feed.like/3jucagnrmn22x bafyreieohq4ngetnrpse22mynxpinzfnaw6m5xcsjj3s4oiidjlnnfo76a
app.bsky.feed.like/3jucahkymkk2e bafyreidqrmqvrnz52efgqfavvjdbwob3bc2g3vvgmhmexgx4xputjty754
app.bsky.feed.like/3jucaj3qgmk2h bafyreig5c2atahtzr2vo4v64aovgqbv6qwivfwf3ex5gn2537wwmtnkm3e
[...]

> ./go-repo-export unpack-records did:plc:ewvi7nxzyoun6zhxrhs64oiz.car
writing output to: did:plc:ewvi7nxzyoun6zhxrhs64oiz
did:plc:ewvi7nxzyoun6zhxrhs64oiz/app.bsky.actor.profile/self.json
did:plc:ewvi7nxzyoun6zhxrhs64oiz/app.bsky.feed.like/3jucagnrmn22x.json
did:plc:ewvi7nxzyoun6zhxrhs64oiz/app.bsky.feed.like/3jucahkymkk2e.json
did:plc:ewvi7nxzyoun6zhxrhs64oiz/app.bsky.feed.like/3jucaj3qgmk2h.json
[...]

If you were downloading and working with CAR in a higher-stakes situation than just running a one-off repository export, you would probably want to confirm the commit signature using the account's signing public key (included in the resolved identity metadata). Signing keys can change over time, meaning the signatures in old repo exports will no longer validate. It may be a good idea to keep a copy of the identity metadata along side the repository for long-term storage.

Downloading Blobs

An account's repository contains all the current (not deleted) records. These records include likes, posts, follows, etc. and may refer to images and other media "blobs" by hash (CID), but the blobs themselves aren't stored directly in the repository. So if you want a full public account data export, you also need to fetch the blobs.

It is possible to parse through all the records in a repository and extract all the blob references (tip: they all have $type: blob). But PDS instances also implement a helpful com.atproto.sync.listBlobs endpoint, which returns all the CIDs (blob hashes) for a specific account (DID).

The com.atproto.sync.getBlob endpoint is used to download the original blob itself.

Neither of these PDS endpoints require authentication, though they may be rate-limited by operators to prevent resource exhaustion or excessive bandwidth costs.

Note that the first part of the blob download function is very similar to the CAR download: resolving identity to find the account's PDS:

func blobDownloadAll(raw string) error {
ctx := context.Background()
atid, err := syntax.ParseAtIdentifier(raw)
if err != nil {
return err
}

// resolve the DID document and PDS for this account
dir := identity.DefaultDirectory()
ident, err := dir.Lookup(ctx, *atid)
if err != nil {
return err
}

// create a new API client to connect to the account's PDS
xrpcc := xrpc.Client{
Host: ident.PDSEndpoint(),
}
if xrpcc.Host == "" {
return fmt.Errorf("no PDS endpoint for identity")
}

topDir := ident.DID.String() + "/_blob"
os.MkdirAll(topDir, os.ModePerm)

// blob-specific part starts here!
cursor := ""
for {
// loop over batches of CIDs
resp, err := comatproto.SyncListBlobs(ctx, &xrpcc, cursor, ident.DID.String(), 500, "")
if err != nil {
return err
}
for _, cidStr := range resp.Cids {
// if the file already exists, skip
blobPath := topDir + "/" + cidStr
if _, err := os.Stat(blobPath); err == nil {
continue
}

// download the entire blob in to memory, then write to disk
blobBytes, err := comatproto.SyncGetBlob(ctx, &xrpcc, cidStr, ident.DID.String())
if err != nil {
return err
}
if err := os.WriteFile(blobPath, blobBytes, 0666); err != nil {
return err
}
}

// a cursor in the result means there are more CIDs to enumerate
if resp.Cursor != nil && *resp.Cursor != "" {
cursor = *resp.Cursor
} else {
break
}
}
return nil
}

In the cookbook repository, the go-repo-export example implements list-blobs and download-blobs commands:

> ./go-repo-export list-blobs atproto.com
bafkreiacrjijybmsgnq3mca6fvhtvtc7jdtjflomoenrh4ph77kghzkiii
bafkreib4xwiqhxbqidwwatoqj7mrx6mr7wlc5s6blicq5wq2qsq37ynx5y
bafkreibdnsisdacjv3fswjic4dp7tju7mywfdlcrpleisefvzf44c3p7wm
bafkreiebtvblnu4jwu66y57kakido7uhiigenznxdlh6r6wiswblv5m4py
[...]

> ./go-repo-export download-blobs atproto.com
writing blobs to: did:plc:ewvi7nxzyoun6zhxrhs64oiz/_blob
did:plc:ewvi7nxzyoun6zhxrhs64oiz/_blob/bafkreiacrjijybmsgnq3mca6fvhtvtc7jdtjflomoenrh4ph77kghzkiii downloaded
did:plc:ewvi7nxzyoun6zhxrhs64oiz/_blob/bafkreib4xwiqhxbqidwwatoqj7mrx6mr7wlc5s6blicq5wq2qsq37ynx5y downloaded
did:plc:ewvi7nxzyoun6zhxrhs64oiz/_blob/bafkreibdnsisdacjv3fswjic4dp7tju7mywfdlcrpleisefvzf44c3p7wm downloaded
[...]

This will download blobs for a repository.

A more rigorous implementation should verify the blob CID (by hashing the downloaded bytes), at a minimum to detect corruption and errors.

Building on the AT Protocol

· 2 min read

We believe that the future of social networking is open, and we want the AT Protocol, the open protocol that serves as the backbone to Bluesky’s open-source social network, to be your playground.

The AT Protocol is still in active development, but developers can already start building projects ranging from bots and clients to custom feeds, and eventually, entirely separate applications and experiences.

In this blog post, we’ll share what you can already build on atproto, and what you can expect soon.

Getting Started with the AT Protocol

In June of this year, we launched the federation sandbox environment for developers to begin exploring, as we prepare to launch federation in the production environment as well.

Currently, the AT Protocol is under active development, and the Bluesky beta app is a microblogging client built on top of it. We’re using the Bluesky beta to drive and inform protocol development.

While the production network isn’t federated quite yet, developers can already currently build:

  • Feed generators
    • Third-party developers have built algorithmic feeds, community-based feeds, topic-based feeds, and more that hundreds of thousands of users view every day.
    • Get started with our feed generator starter kit.
  • Bots
    • Over the last few months, developers have created MTA Alert bots, earthquake bots, chat bots, and other interactive experiences on the network.
    • Get started with a template for a Bluesky bot here.
  • Bluesky clients
    • Because the AT Protocol is open, third-party developers can create entirely separate frontends for the network too.
    • Here's a starter template for building a Bluesky client.

You can read about more third-party projects and check them out here. Let us know what you build!

What to Expect

We’ve still only scratched the surface of what’s possible on atproto. Currently, microblogging is the only content type that’s integrated, but we’re excited to for apps for forums, photo-centric platforms, long-form writing, and more to exist on atproto.

In the meantime, we’ve published a protocol roadmap here to give you a sense of our priorities and what’s coming in the future.

2023 Protocol Roadmap

· 10 min read

This post lays out the current AT Protocol (atproto) development plan, through to a "version one" release. This document is written for developers already familiar with atproto concepts and terminology. The scope here is features of the underlying protocol itself, not any application or Lexicons on top of the protocol. In particular, this post doesn't describe any product features specific to the Bluesky microblogging application (app.bsky.* Lexicons).

At a high level:

  • We are pushing towards federation on the production network early next year (2024), if development continues as planned.
  • A series of related infrastructure and operational changes to Bluesky services are underway to ensure we can handle a large influx of content from federated instances in a resource and cost-efficient way.
  • We plan to submit future governance and formalization of the core protocol to an independent standards body. These are consensus processes relying on interest and support of a community of volunteers, and we have already done some initial outreach, but we expect to start conversations in earnest around the same time as opening up federation.

Feel free to share your thoughts via Github Discussions here.

Getting to Federation

The AT Protocol is designed as a federated social web protocol. While we have an open federated sandbox network for developers to experiment with, the main Bluesky services are currently still not federated. Our current development priority is to resolve any final protocol issues or decisions that would impede federation with independent PDS instances. Some of these points are listed below.

Account Migration: One of the differentiating features of atproto is seamless migration of both identity and public content from one PDS instance to another. The foundations for this feature were set in place from the start, but there are some details and behaviors to decide on and document.

Auth refactor: We want to improve both third-party auth flows (eg, OAuth2), and to support verifiable inter-service requests (eg, with UCANs). These involve both authentication (“who is this”) and authorization (“what is allowed”). This work will hopefully be a matter of integrating and adapting existing standards. There is a chance that this will be ready in time for federation, but it is not a hard requirement.

Repo Event Stream (Firehose) Iteration: A healthy ecosystem of projects has subscribed to the full stream of repo commits that the firehose provides. We have a few improvements and options in mind to help with efficiency and developer experience, without changing the underlying semantics and power dynamic (aka, the ability to replicate all public content). These include "epochs" of sequence numbers; sharding; the option to strip MST nodes ("proof" chain) to reduce bandwidth; and changes to account lifecycle events.

Third-party Labeling: This is the protocol-level ability to distribute and subscribe to labels from many sources, including cryptographic signatures for verification. The lexicons are mostly designed, but are not included in the AppView reference implementation, and documentation is needed.

Account Status Propagation: Actions like account takedowns and (self) deletions don't currently get broadcast to other parties in the network, though they can be publicly observed (i.e., content becomes inaccessible). Some form of explicit signal would likely be helpful. Additionally, we’d like to allow temporary account deactivation as an option, instead of only account deletion. Temporary deactivation will have behavior similar to an account takedown, with effects propagating through the network.

Refactor Moderation Actions: The com.atproto.admin.* Lexicons have a concept of "actions" which "resolve" specific "reports," which has become rigid to work with. A refactor will have more flexible "events" (including both reports and private flags or annotations) which update "subjects." The moderation report submission process should not be impacted, but other admin Lexicons may have breaking changes.

Legacy Record Cleanup: We have a short remaining pre-federation window during which we could, in theory, rewrite any records to remove legacy or invalid data. Specifically, we could try to remove all instances of the legacy "blob" schema, and fix many invalid datetimes. This would break a large number of strong references between records, in a way that is difficult to clean up, and would be an aggressive intervention to existing repository content, so there is a good chance this won't actually happen and these crufty records will be in the network forever.

Operational Changes for Federation

A few infrastructure changes are planned for coming weeks and months that will impact other folks in the ecosystem. We will try to communicate these ahead of time, but are balancing disruption to the existing developer and service ecosystem with federating the network quickly.

Multiple PDS instances: The Bluesky PDS (bsky.social) is currently a monolithic PostgreSQL database with over a million hosted repositories. We will be splitting accounts across multiple instances, using the protocol itself to help with scaling. Depending on details, this will probably not be very visible to end users or client developers, but will impact firehose consumers and some developers.

BGS Firehose: Most folks currently subscribe to the firehose coming from the bsky.social PDS directly. We have had a BGS instance running in production for some time, but it has not been promoted and has had occasional downtime. With the move to multiple PDSes, it will still technically be possible to subscribe to multiple individual PDS feeds, but most folks will want to shift to the unified BGS feed instead. This will involve a change in hostname, and the sequence numbers will be different.

Possible bulk did:plc updates: We are not yet certain when, but there is a good chance that DID documents will need to be updated for all (or a large fraction) of bsky.social accounts. This identity churn will be in-spec, but could cause disruptions to other services if they are not prepared.

Search updates: The search.bsky.social service, which was never a formal part of the Bluesky API (Lexicons) will be deprecated soon, with both actor (profile) and post search possible through app.bsky.* Lexicons, served by the AppView (and proxied by the PDS).

Spam mitigation systems: Anti-spam efforts will be a crucial part of opening federation. It is not a protocol-level change per-say, but we expect to deploy automated systems to combat spam by labeling posts and accounts at scale.

Network Growth Control: We don't have a specific proposal for feedback from the ecosystem yet, but expect to have some form of resource use limits in place as we open federation to prevent the social graph, firehose consumers, and human and technical systems from being overwhelmed.

New Application Development

Atproto has had a clear system for third-party application development and application-layer protocol extension built in from the beginning: the Lexicon schema system. The current PDS reference implementation restricts record creation for unknown schemas, even when the "skip validation" query parameter is set, but we expect to relax that constraint in coming months.

There is a lot of supporting framework and documentation needed to make atproto a developer-friendly foundation for application development, and a few protocol pieces still need to be worked out:

Lexicon schema resolution: There should be a clear automated method for resolving new and unknown NSIDs (schema names) to a Lexicon JSON object over the web.

Clarify validation failure behaviors: What each piece of infrastructure is expected to do when records fail to validate against the Lexicon schema needs to be specified in detail, with test cases provided.

Lexicon schema evolution and extension: The backwards-compatibility rules for changes to application Lexicon schemas are mostly in place, but they’re not well documented. There is also wiggle room for third parties to inject extension fields into third-party records, and the norms and best practices for this type of extension need documentation.

Further Down the Road

It is worth mentioning a few parts of the protocol that we do not expect to work on in the near future. We think these are pretty important (which is why we are mentioning them!), but our current priority is federation.

Private content and messaging (DMs): We intend to eventually incorporate group-private E2EE (end-to-end encryption) content in the protocol, as well as a robust E2EE messaging solution. These are important and rightfully expected features. We expect this to be a major additional component of atproto, not a simple bolt-on to the current public repository data structure, even if we adopt an existing standard like MLS, Matrix, or the Signal protocol. As a result, this will be a large body of integration work, and will not be a focus until after federation.

Record versioning: The low-level protocol allows updates to existing records. It should be possible to store "old" versions of records in the repo, allowing (optional) transparency into history and edits. This may involve additions to repository paths, AT URIs, and core repo read and update endpoints.

Floating point numbers: The data model currently only supports integers, not floats. The concerns around floats in content-addressed systems are well documented, and there are alternatives and work-arounds (like string encoding). But if an elegant solution presents itself, it would be nice to support floats as a first-class data type.

Static CAR file repository hosting: For many use cases, it would be convenient if atproto repositories could be served as static files (probably CAR files), instead of requiring a full-featured PDS instance.

Protocol Governance

Development of atproto to date has been driven by a single company, Bluesky PBC. Once the network opens to federation, protocol changes and improvements will still be necessary, but will impact multiple organizations, communities, and individuals, each with separate priorities and development interests. If the protocol is successful, there certainly will be disagreements and competitive tensions at play. Having a single company controlling the protocol will not work long-term.

The plan is to bring development and governance of the protocol itself to an established standards body around the time the network opens to federation. Our current hope is to bring this work to the IETF, likely as a new working group, which would probably be a multi-year process. If the IETF does not work out as a home, we will try again with other bodies. While existing work can be proposed exactly “as-is", it is common to have some evolution and breaking changes come out of the standardization processes.

Which parts of the protocol would be in-scope for independent standardization? Like many network protocols, atproto is abstracted into multiple layers, with distinctions between "application-specific" bits and the core protocol. The identity, auth, data model, repositories, streams, and Lexicon schema language are all part of the core protocol, and in-scope for standardization. Some API endpoints under the com.atproto.* namespace are also relatively core, and that namespace specifically could fall under independent governance.

Other application-specific APIs and namespaces, including the app.bsky.* microblogging application, would not be in-scope for standardization as part of the core protocol. The authority for these APIs and Lexicons is encoded in the name itself, and Bluesky PBC intends to retain control over future development of the bsky.app application. The API schemas (Lexicons) will be open, and the expectations and rules for third-party interoperability and extensions for any Lexicon will be part of the core protocol specification.

Bluesky BGS and DID Document Formatting Changes

· 3 min read

We have a number of protocol and infrastructure changes rolling out in the next three months, and want to keep everybody in the loop.

This update was also emailed to the developer mailing list, which you can subscribe to here.

TL;DR

  • As of this week, the Bluesky AppView instance now consumes from a Bluesky BGS, instead of directly from the PDS. Devs can access the current streaming API at https://bsky.network/xrpc/com.atproto.sync.subscribeRepos or for WebSocket directly, wss://bsky.network/xrpc/com.atproto.sync.subscribeRepos
    Your existing cursor for bsky.social will not be in sync with bsky.network, so check the live stream first to grab a recent seq before connecting!
  • We are updating the DID document public key syntax to “Multikey” format next week on the main network PLC directory (plc.directory). This change is already live on the sandbox PLC directory.

How will this affect me?

  • For today, if you're consuming the firehose, grab a new cursor from bsky.network and restart your firehose consumer pointed at bsky.network.

Bluesky BGS

The Bluesky services themselves are moving to a federated deployment, with multiple Bluesky (the company) PDS instances aggregated by a BGS, and the AppView downstream of that. As of yesterday, the Bluesky Appview instance (api.bsky.app) consumes from a Bluesky PBC BGS (bsky.network), which consumes from the Bluesky PDS (bsky.social). Until now, the AppView consumed directly from the PDS.

How close are we to federation?

Technically, the main network BGS could start consuming from independent PDS instances today, the same as the sandbox BGS does. We have configured it not to do so until we finish implementing some more details, and do our own round of security hardening. If you want to bang on the BGS implementation (written in Go, code in the indigo github repository), please do so in the sandbox environment, not the main network.

This change impacts devs in two ways:

  • In the next couple weeks, new Bluesky (company) PDS instances will appear in the main network. Our plan is to optionally abstract this away for most client developers, so they can continue to connect to bsky.social as a virtual PDS. But the actual PDS hostnames will be distinct and will show up in DID documents.
  • Firehose consumers (feed generators, mirrors, metrics dashboards, etc) will need to switch over and consume from the BGS instead of the PDS directly. If they do not, they will miss content from the new (Bluesky) PDS instances.

The firehose subscription endpoint, which works as of today, is https://bsky.network/xrpc/com.atproto.sync.subscribeRepos (or wss:// for WebSocket directly). Note that this endpoint has different sequence numbers. When switching over, we recommend folks consume from both the BGS and PDS for a period to ensure no events are lost, or to scroll back the BGS cursor to ensure there is reasonable overlap in streams.

We encourage developers and operators to switch to the BGS firehose sooner than later.

DID Document Formatting Changes

We also want to remind folks that we are planning to update the DID document public key syntax to “Multikey” format next week on the main network PLC directory (plc.directory). These changes are described here, with example documents for testing, and are live now on the sandbox PLC directory.

Rate Limits, PDS Distribution v3, and More

· 5 min read

To get future blog posts directly in your email, you can now subscribe to Bluesky’s Developer Mailing List here.

Adding Rate Limits

Now that we have a better sense of user activity on the network, we’re adding some application rate limits. This helps us keep the network secure — for example, by limiting the number of requests a user or bot can make in a given time period, it prevents bad actors from brute-forcing certain requests and helps us limit spammy behavior.

We’re adding a rate limit for the number of created actions per DID. These numbers shouldn’t affect typical Bluesky users, and won’t affect the majority of developers either, but it will affect prolific bots, such as the ones that follow every user or like every post on the network. The limit is 5,000 points per hour and 35,000 points per day, where:

Action TypeValue
CREATE3 points
UPDATE2 points
DELETE1 point

To reiterate, these limits should be high enough to affect no human users, but low enough to constrain abusive or spammy bots. We decided to release this new rate limit immediately instead of giving developers an advance notice to secure the network from abusive behavior as soon as possible, especially since bad actors might take this blog post as an open invite!

Per this system, an account may create at most 1,666 records per hour and 11,666 records per day. That means an account can like up to 1,666 records in one hour with no problem. We took the most active human users on the network into account when we set this threshold (you surpassed our expectations!).

In case you missed it, in August, we added some other rate limits as well.

  • Global limit (aggregated across all routes)
    • Rate limited by IP
    • 3000/5 min
  • updateHandle
    • Rate limited by DID
    • 10/5 min
    • 50/day
  • createAccount
    • Rate limited by IP
    • 100/5 min
  • createSession
    • Rate limited by handle
    • 30/5 min
    • 300/day
  • deleteAccount
    • Rate limited by IP
    • 50/5 min
  • resetPassword
    • Rate limited by IP
    • 50/5 min

We’ll also return rate limit headers on each response so developers can dynamically adapt to these standards.

In a future update (in about a week), we’re also lowering the applyWrites limit from 200 to 10. This function applies a batch transaction of creates, updates, and deletes. This is part of the PDS distribution upgrade to v3 (read more below) — now that repos are ahistorical, we no longer need a higher limit to account for batch writes. applyWrites is used for transactional writes, and logic that requires more than 10 transactional records is rare.

PDS Distribution v3

We’re rolling out v3 of the PDS distribution. This shouldn’t be a breaking change, though we will be wiping the PLC sandbox. PDSs in parallel networks should still continue to operate with the new distribution.

Reminder: The PDS distribution auto-updates via the Watchtower companion Docker container, unless you specifically disabled that option. We’re adding the admin upgradeRepoVersion endpoint to the upgraded PDS distribution, so PDS admins can also upgrade their repos by hand.

Handle Invalidations on App View

Last month, we began proxying requests to the App View. In our federation architecture, the App View is the piece of the stack that gives you all your views of data, such as profiles and threads. Initially, we started out by serving all of these requests from our bsky.social PDS, but proxying these to the App View is one way of scaling our infrastructure to handle many more users on the network. (Read our federation architecture overview blog post for more information.)

For some users, this caused an invalid handle error. If you have an invalid handle, the user-facing UI will display this instead of your handle:

Screenshot of a profile with an invalid handle

You can use our debugging tool to investigate this: https://bsky-debug.app/handle. Just type your handle in. If it shows no error, please try updating your handle to the same handle you currently have to resolve this issue.

If the debugging page shows an error for your handle, follow this guide to make sure you set up your handle properly.

If that still isn’t working for you, file a support ticket through the app (“Help” button in the left menu on mobile or right side on desktop) and a Bluesky team member will assist you.

Subscribe for Developer Updates

You can subscribe to Bluesky’s Developer Mailing List here to receive future updates in your email. If you received your invite code from the developer waitlist, you’re already subscribed. Each email will have the option to unsubscribe.

We’ll continue to publish updates to our technical blog as well as on the app from @atproto.com.

Updates to Repository Sync Semantics

· 4 min read

We’re excited to announce that we’re rolling out a new version of atproto repositories that removes history from the canonical structure of repositories, and replaces it with a logical clock. We’ll start rolling out this update next week (August 28, 2023).

For most developers with projects subscribed to the firehose, such as feed generators, this change shouldn’t affect you. These will only affect you if you’re doing commit-aware repo sync (a good rule of thumb is if you’ve ever passed earliest or latest to the com.atproto.sync.getRepo method) or are explicitly checking the repo version when processing commits.

Removing Repository History

Repositories on the AT Protocol are like Git repositories, but for structured records. Just like Git, each commit to an atproto repository currently includes a pointer to the previous commit. However, this approach has caused a couple of pain points:

  • Record deletions are difficult to process. If a user deletes a record, that commit needs to be erased from their repository to match their intent.
  • Increased storage cost. Maintaining repo history can cause anywhere from a 5-10x increase in repo size.

We attempted to resolve both of these in the current model through rebases (discrete moments when the history of a repository is deleted/mutated, like in Git). However, this is a tricky and sensitive operation that is expensive to conduct and complex to communicate across the network.

Using a Logical Clock for Repositories

To address the above issues, we’re replacing the prev pointer in commits with a logical clock. We originally published our intention to do so a few weeks ago. These are the changes we’re making to the way we handle repository history:

  • Incrementing the repo version to 3
  • Making the prev field on repo commits optional
  • Adding a new required rev (revision) field which is a logical clock
  • Removing or adjusting commit-aware repo sync mechanisms

Note: If you explicitly verify the version of a repo commit or do strict type checking on commit repo commits (which you shouldn’t — the spec allows unspecified fields!), you will need to make that check inclusive of version 3.

To facilitate backwards compatibility with software that is still running repo v2, we will continue setting the prev field on commits in the interim.

Even though we are setting the prev field, this can be considered a “hint” and the history is no longer considered a canonical part of the repository.


Repository Revisions

The new sync semantics for the repository rely on a logical clock included in each signed commit.

This “revision” takes the form of a TID and must be monotonically increasing.

The included revision serves a few functions:

Ordering

The clock provides a simple ordering mechanism for encountered repos or commits. If a consumer encounters the same repo from two different sources, each with a valid signature and structure, the revision gives a simple mechanism to determine which is the most recent repository.

Sync

When syncing a repository, revisions give a series of signposts that allow you to request everything from a given repo since a previously seen version. Because revisions are ordered and monotonically increasing, the provider does not necessarily need the exact revision that the consumer is asking for (as with a commit hash), rather they can provide all repo contents from the latest version of the repo that they remember that is before the requested revision.

The PDS for instance will track the revision at which each repo block or record was introduced into a repository. If a consumer asks for every block or record since a given revision, the PDS has a simple mechanism by which to give that information, without needing a complicated sync algorithm.

Stale Reads

Finally, a logical clock on the repo gives us a mechanism through which we can detect stale reads. (We actually already snuck this in with an optional revision field on v2 repos!)

Repo revisions may be returned in response headers to most requests. A client will know their own repo’s current revision and can compare that with the upstream service’s revision.

We use this today on the PDS to paper over some read-after-write concerns that are inherent in eventually consistent architectures. Some clients may use these headers to alert their users that their PDS is “out of sync” with other services in the network (for instance an AppView).

Available sync methods


If you have questions about these changes, join us on GitHub Discussions here.

Posting via the Bluesky API

· 11 min read

This blog post may become outdated as new features are added to atproto and the Bluesky application schemas.


First, you'll need a Bluesky account. We'll create a session with HTTPie (brew install httpie).

http post https://bsky.social/xrpc/com.atproto.server.createSession \
identifier="$BLUESKY_HANDLE" \
password="$BLUESKY_APP_PASSWORD"

Now you can create a post by sending a POST request to the createRecord endpoint.

http post https://bsky.social/xrpc/com.atproto.repo.createRecord \
Authorization:"Bearer $AUTH_TOKEN" \
repo="$BLUESKY_HANDLE" \
collection=app.bsky.feed.post \
record:="{\"text\": \"Hello world! I posted this via the API.\", \"createdAt\": \"`date -u +"%Y-%m-%dT%H:%M:%SZ"`\"}"

Posts can get a lot more complicated with replies, mentions, embedding images, and more. This guide will walk you through how to create these more complex posts in Python, but there are many API clients and SDKs for other programming languages and Bluesky PBC publishes atproto code in TypeScript and Go as well.

Skip the steps below and get the full script here. It was tested with Python 3.11, with the requests and bs4 (BeautifulSoup) packages installed.


Authentication

Posting on Bluesky requires account authentication. Have your Bluesky account handle and App Password handy.

import requests

BLUESKY_HANDLE = "example.bsky.social"
BLUESKY_APP_PASSWORD = "123-456-789"

resp = requests.post(
"https://bsky.social/xrpc/com.atproto.server.createSession",
json={"identifier": BLUESKY_HANDLE, "password": BLUESKY_APP_PASSWORD},
)
resp.raise_for_status()
session = resp.json()
print(session["accessJwt"])

The com.atproto.server.createSession API endpoint returns a session object containing two API tokens: an access token (accessJwt) which is used to authenticate requests but expires after a few minutes, and a refresh token (refreshJwt) which lasts longer and is used only to update the session with a new access token. Since we're just publishing a single post, we can get away with a single session and not bother with refreshing.

Post Record Structure

Here is what a basic post record should look like, as a JSON object:

{
"$type": "app.bsky.feed.post",
"text": "Hello World!",
"createdAt": "2023-08-07T05:31:12.156888Z"
}

Bluesky posts are repository records with the Lexicon type app.bsky.feed.post — this just defines the schema for what a post looks like.

Each post requires these fields: text and createdAt (a timestamp).

This script below will create a simple post with just a text field and a timestamp. You'll need the datetime package installed.

import json
from datetime import datetime, timezone

# Fetch the current time
# Using a trailing "Z" is preferred over the "+00:00" format
now = datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")

# Required fields that each post must include
post = {
"$type": "app.bsky.feed.post",
"text": "Hello World!",
"createdAt": now,
}

resp = requests.post(
"https://bsky.social/xrpc/com.atproto.repo.createRecord",
headers={"Authorization": "Bearer " + session["accessJwt"]},
json={
"repo": session["did"],
"collection": "app.bsky.feed.post",
"record": post,
},
)
print(json.dumps(resp.json(), indent=2))
resp.raise_for_status()

The full repository path (including the auto-generated rkey) will be returned as a response to the createRecord request. It looks like:

{
"uri": "at://did:plc:u5cwb2mwiv2bfq53cjufe6yn/app.bsky.feed.post/3k4duaz5vfs2b",
"cid": "bafyreibjifzpqj6o6wcq3hejh7y4z4z2vmiklkvykc57tw3pcbx3kxifpm"
}

Setting the Post's Language

Setting the post's language helps custom feeds or other services filter and parse posts.

This snippet sets the text and langs value of a post to be Thai and English.

# an example with Thai and English (US) languages
post["text"] = "สวัสดีชาวโลก!\nHello World!"
post["langs"] = ["th", "en-US"]

The resulting post record object looks like:

{
"$type": "app.bsky.feed.post",
"text": "\u0e2a\u0e27\u0e31\u0e2a\u0e14\u0e35\u0e0a\u0e32\u0e27\u0e42\u0e25\u0e01!\\nHello World!",
"createdAt": "2023-08-07T05:44:04.395087Z",
"langs": [ "th", "en-US" ]
}

The langs field indicates the post language, which can be an array of strings in BCP-47 format.

You can include multiple values in the array if there are multiple languages present in the post. The Bluesky Social client auto-detects the languages in each post and sets them as the default langs value, but a user can override the configuration on a per-post basis.

Mentions and links are annotations that point into the text of a post. They are actually part of a broader system for rich-text "facets." Facets only support links and mentions for now, but can be extended to support features like bold and italics in the future.

Suppose we have a post:

✨ example mentioning @atproto.com to share the URL 👨‍❤️‍👨 https://en.wikipedia.org/wiki/CBOR.

Our goal is to turn the handle (@atproto.com) into a mention and the URL (https://en.wikipedia.org/wiki/CBOR) into a link. To do that, we grab the starting and ending locations of each "facet".

✨ example mentioning @atproto.com to share the URL 👨‍❤️‍👨 https://en.wikipedia.org/wiki/CBOR.
start=23^ end=35^ start=74^ end=108^

We then identify them in the facets array, using the mention and link feature types. (You can view the schema of a facet object here.) The post record will then look like this:

{
"$type": "app.bsky.feed.post",
"text": "\u2728 example mentioning @atproto.com to share the URL \ud83d\udc68\u200d\u2764\ufe0f\u200d\ud83d\udc68 https://en.wikipedia.org/wiki/CBOR.",
"createdAt": "2023-08-08T01:03:41.157302Z",
"facets": [
{
"index": {
"byteStart": 23,
"byteEnd": 35
},
"features": [
{
"$type": "app.bsky.richtext.facet#mention",
"did": "did:plc:ewvi7nxzyoun6zhxrhs64oiz"
}
]
},
{
"index": {
"byteStart": 74,
"byteEnd": 108
},
"features": [
{
"$type": "app.bsky.richtext.facet#link",
"uri": "https://en.wikipedia.org/wiki/CBOR"
}
]
}
]
}

You can programmatically set the start and end points of a facet with regexes. Here's a script that parses mentions and links:

import re
from typing import List, Dict

def parse_mentions(text: str) -> List[Dict]:
spans = []
# regex based on: https://atproto.com/specs/handle#handle-identifier-syntax
mention_regex = rb"[$|\W](@([a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)"
text_bytes = text.encode("UTF-8")
for m in re.finditer(mention_regex, text_bytes):
spans.append({
"start": m.start(1),
"end": m.end(1),
"handle": m.group(1)[1:].decode("UTF-8")
})
return spans

def parse_urls(text: str) -> List[Dict]:
spans = []
# partial/naive URL regex based on: https://stackoverflow.com/a/3809435
# tweaked to disallow some training punctuation
url_regex = rb"[$|\W](https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*[-a-zA-Z0-9@%_\+~#//=])?)"
text_bytes = text.encode("UTF-8")
for m in re.finditer(url_regex, text_bytes):
spans.append({
"start": m.start(1),
"end": m.end(1),
"url": m.group(1).decode("UTF-8"),
})
return spans

Once the facet segments have been parsed out, we can then turn them into app.bsky.richtext.facet objects.

# Parse facets from text and resolve the handles to DIDs
def parse_facets(text: str) -> List[Dict]:
facets = []
for m in parse_mentions(text):
resp = requests.get(
"https://bsky.social/xrpc/com.atproto.identity.resolveHandle",
params={"handle": m["handle"]},
)
# If the handle can't be resolved, just skip it!
# It will be rendered as text in the post instead of a link
if resp.status_code == 400:
continue
did = resp.json()["did"]
facets.append({
"index": {
"byteStart": m["start"],
"byteEnd": m["end"],
},
"features": [{"$type": "app.bsky.richtext.facet#mention", "did": did}],
})
for u in parse_urls(text):
facets.append({
"index": {
"byteStart": u["start"],
"byteEnd": u["end"],
},
"features": [
{
"$type": "app.bsky.richtext.facet#link",
# NOTE: URI ("I") not URL ("L")
"uri": u["url"],
}
],
})
return facets

The list of facets gets attached to the facets field of the post record:

post["text"] = "✨ example mentioning @atproto.com to share the URL 👨‍❤️‍👨 https://en.wikipedia.org/wiki/CBOR."
post["facets"] = parse_facets(post["text"])

Replies, Quote Posts, and Embeds

Replies and quote posts contain strong references to other records. A strong reference is a combination of:

  • AT URI: indicates the repository DID, collection, and record key
  • CID: the hash of the record itself

Posts can have several types of embeds: record embeds, images and exernal embeds (like link/webpage cards, which is the preview that shows up when you post a URL).

Replies

A complete reply post record looks like:

{
"$type": "app.bsky.feed.post",
"text": "example of a reply",
"createdAt": "2023-08-07T05:49:40.501974Z",
"reply": {
"root": {
"uri": "at://did:plc:u5cwb2mwiv2bfq53cjufe6yn/app.bsky.feed.post/3k43tv4rft22g",
"cid": "bafyreig2fjxi3rptqdgylg7e5hmjl6mcke7rn2b6cugzlqq3i4zu6rq52q"
},
"parent": {
"uri": "at://did:plc:u5cwb2mwiv2bfq53cjufe6yn/app.bsky.feed.post/3k43tv4rft22g",
"cid": "bafyreig2fjxi3rptqdgylg7e5hmjl6mcke7rn2b6cugzlqq3i4zu6rq52q"
}
}
}

Since threads of replies can get pretty long, reply posts need to reference both the immediate "parent" post and the original "root" post of the thread.

Here's a Python script to find the parent and root values:

# Resolve the parent record and copy whatever the root reply reference there is
# If none exists, then the parent record was a top-level post, so that parent reference can be reused as the root value
def get_reply_refs(parent_uri: str) -> Dict:
uri_parts = parse_uri(parent_uri)

resp = requests.get(
"https://bsky.social/xrpc/com.atproto.repo.getRecord",
params=uri_parts,
)
resp.raise_for_status()
parent = resp.json()

parent_reply = parent["value"].get("reply")
if parent_reply is not None:
root_uri = parent_reply["root"]["uri"]
root_repo, root_collection, root_rkey = root_uri.split("/")[2:5]
resp = requests.get(
"https://bsky.social/xrpc/com.atproto.repo.getRecord",
params={
"repo": root_repo,
"collection": root_collection,
"rkey": root_rkey,
},
)
resp.raise_for_status()
root = resp.json()
else:
# The parent record is a top-level post, so it is also the root
root = parent

return {
"root": {
"uri": root["uri"],
"cid": root["cid"],
},
"parent": {
"uri": parent["uri"],
"cid": parent["cid"],
},
}

The root and parent refs are stored in the reply field of posts:

post["reply"] = get_reply_refs("at://atproto.com/app.bsky.feed.post/3k43tv4rft22g")

Quote Posts

A quote post embeds a reference to another post record. A complete quote post record would look like:

{
"$type": "app.bsky.feed.post",
"text": "example of a quote-post",
"createdAt": "2023-08-07T05:49:39.417839Z",
"embed": {
"$type": "app.bsky.embed.record",
"record": {
"uri": "at://did:plc:u5cwb2mwiv2bfq53cjufe6yn/app.bsky.feed.post/3k44deefqdk2g",
"cid": "bafyreiecx6dujwoeqpdzl27w67z4h46hyklk3an4i4cvvmioaqb2qbyo5u"
}
}
}

The record embedded here is the post that's getting quoted. The post record type is app.bsky.feed.post, but you can also embed other record types in a post, like lists (app.bsky.graph.list) and feed generators (app.bsky.feed.generator).

Images Embeds

Images are also embedded objects in a post. This example code demonstrates reading an image file from disk and uploading it, capturing a blob in the response:

IMAGE_PATH = "./example.png"
IMAGE_MIMETYPE = "image/png"
IMAGE_ALT_TEXT = "brief alt text description of the image"

with open(IMAGE_PATH, "rb") as f:
img_bytes = f.read()

# this size limit is specified in the app.bsky.embed.images lexicon
if len(img_bytes) > 1000000:
raise Exception(
f"image file size too large. 1000000 bytes maximum, got: {len(img_bytes)}"
)

# TODO: strip EXIF metadata here, if needed

resp = requests.post(
"https://bsky.social/xrpc/com.atproto.repo.uploadBlob",
headers={
"Content-Type": IMAGE_MIMETYPE,
"Authorization": "Bearer " + session["accessJwt"],
},
data=img_bytes,
)
resp.raise_for_status()
blob = resp.json()["blob"]

The blob object, as JSON, would look something like:

{
"$type": "blob",
"ref": {
"$link": "bafkreibabalobzn6cd366ukcsjycp4yymjymgfxcv6xczmlgpemzkz3cfa"
},
"mimeType": "image/png",
"size": 760898
}

The blob is then included in a app.bsky.embed.images array, along with an alt-text string. The alt field is required for each image. Pass an empty string if there is no alt text available.

post["embed"] = {
"$type": "app.bsky.embed.images",
"images": [{
"alt": IMAGE_ALT_TEXT,
"image": blob,
}],
}

A complete post record, containing two images, would look something like:

{
"$type": "app.bsky.feed.post",
"text": "example post with multiple images attached",
"createdAt": "2023-08-07T05:49:35.422015Z",
"embed": {
"$type": "app.bsky.embed.images",
"images": [
{
"alt": "brief alt text description of the first image",
"image": {
"$type": "blob",
"ref": {
"$link": "bafkreibabalobzn6cd366ukcsjycp4yymjymgfxcv6xczmlgpemzkz3cfa"
},
"mimeType": "image/webp",
"size": 760898
}
},
{
"alt": "brief alt text description of the second image",
"image": {
"$type": "blob",
"ref": {
"$link": "bafkreif3fouono2i3fmm5moqypwskh3yjtp7snd5hfq5pr453oggygyrte"
},
"mimeType": "image/png",
"size": 13208
}
}
]
}
}

Each post contains up to four images, and each image can have its own alt text and is limited to 1,000,000 bytes in size. Image files are referenced by posts, but are not actually included in the post (eg, using bytes with base64 encoding). The image files are first uploaded as "blobs" using com.atproto.repo.uploadBlob, which returns a blob metadata object, which is then embedded in the post record itself.

It's strongly recommended best practice to strip image metadata before uploading. The server (PDS) may be more strict about blocking upload of such metadata by default in the future, but it is currently the responsibility of clients (and apps) to sanitize files before upload today.

Website Card Embeds

A website card embed, often called a "social card," is the rendered preview of a website link. A complete post record with an external embed, including image thumbnail blob, looks like:

{
"$type": "app.bsky.feed.post",
"text": "post which embeds an external URL as a card",
"createdAt": "2023-08-07T05:46:14.423045Z",
"embed": {
"$type": "app.bsky.embed.external",
"external": {
"uri": "https://bsky.app",
"title": "Bluesky Social",
"description": "See what's next.",
"thumb": {
"$type": "blob",
"ref": {
"$link": "bafkreiash5eihfku2jg4skhyh5kes7j5d5fd6xxloaytdywcvb3r3zrzhu"
},
"mimeType": "image/png",
"size": 23527
}
}
}
}

Here's an example of embedding a website card:

from bs4 import BeautifulSoup

def fetch_embed_url_card(access_token: str, url: str) -> Dict:

# the required fields for every embed card
card = {
"uri": url,
"title": "",
"description": "",
}

# fetch the HTML
resp = requests.get(url)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "html.parser")

# parse out the "og:title" and "og:description" HTML meta tags
title_tag = soup.find("meta", property="og:title")
if title_tag:
card["title"] = title_tag["content"]
description_tag = soup.find("meta", property="og:description")
if description_tag:
card["description"] = description_tag["content"]

# if there is an "og:image" HTML meta tag, fetch and upload that image
image_tag = soup.find("meta", property="og:image")
if image_tag:
img_url = image_tag["content"]
# naively turn a "relative" URL (just a path) into a full URL, if needed
if "://" not in img_url:
img_url = url + img_url
resp = requests.get(img_url)
resp.raise_for_status()

blob_resp = requests.post(
"https://bsky.social/xrpc/com.atproto.repo.uploadBlob",
headers={
"Content-Type": IMAGE_MIMETYPE,
"Authorization": "Bearer " + access_token,
},
data=resp.content,
)
blob_resp.raise_for_status()
card["thumb"] = blob_resp.json()["blob"]

return {
"$type": "app.bsky.embed.external",
"external": card,
}

An external embed is stored under embed like all the others:

post["embed"] = fetch_embed_url_card(session["accessJwt"], "https://bsky.app")

On Bluesky, each client fetches and embeds this card metadata, including blob upload if needed. Embedding the card content in the record ensures that it appears consistently to everyone and reduces waves of automated traffic being sent to the referenced website, but it does require some extra work by the client.

Putting It All Together

A complete script, with command-line argument parsing, is available from this Git repository.

As mentioned at the beginning, we expect most folks will use SDKs or libraries for their programming language of choice to help with most of the details described here. But sometimes it is helpful to see what is actually going on behind the abstractions.