Parakeet - A Year On
a blog post by Mia, which does contain some of her own personal views
Parakeet was announced on Bluesky not too long over a year ago now. In this post, I’ll be going over where we are, the lessons learned, and some plans for the future.
I spoke to (some of) the impetus behind Parakeet, and some of the progress at a high level in an article I wrote on my personal blog at the end of 2025. If you’re new here, it might be worth skim reading that first.
Parakeet got started as a result of some of the cough turbulence cough that engulfed Bluesky at the end of 2024 (fun fact: the project name Parakeet is a portmanteau of ‘parachute’ and ‘skeet’, the bird names came after the fact). In some respects, the site still hasn’t yet recovered (and I don’t mean this in an eternal September way, although that too is true).
By now, it is the single largest project I’ve ever worked on, both in terms of scope (all of Bluesky) and pure lines of code (c.14k lines of Rust).
Having a project with such a large scope is daunting, and I’ve faltered on similar (and larger) projects in the past. Personally, what helped with Parakeet was having the target of parity with Bluesky - there was always an implementation or client to test against - knowing exactly the list of things you need to do is, as it turns out, to the surprise of literally nobody, useful for the brain.
The scope has grown, of course: I wasn’t originally planning on any of the small-world items such as a repo filter, or the control panel, and Bluesky have added many more features since I started that I’ve had to catch up on (thankfully some of them small, see: pronouns that I merged under an hour after Bluesky).
It has been, all that aside, one of the most fun and interesting projects I’ve spent time on, and the amount of other ideas it’s given me, and the things I’ve learned, is seriously incredible.
Progress Update
There hasn’t been a Parakeet progress update in a while now (October 2025 was the most recent) so here’s one so far:
Parakeet has the basic functionality you’d expect: browsing accounts, threads, timelines (new!), and feeds works, along with lists and starter packs. Moderation is nearly there, just needing a full ingest of labels for testing, and a few block and mute related fixes.
Recent additions also included some admin tools. The firehose consumer can filter down to only specific repos - currently a list inside the database, but options for Bluesky lists and follow relations are planned. Also: XRPC-based backfilling (no more manual redis rpush commands), runtime configurable lists of trusted verifiers, system admins, and consumer filter and query allowlists.
The backfiller was also rewritten to use Fig’s repo-stream crate, to cut down on how much code I was maintaining. The same applies to the did-resolver crate, which I removed in favour of Orual’s jacquard-identity crate.
As a side-note, future Parakeet updates will be posted on the project account, instead of my personal account.
What Did I Learn?
In no particular order:
1. Big databases are incredibly difficult to manage, more so when your usual definition of ‘big’ is a couple GB, and not double digit TB. And, relatedly...
2. Servers (especially those with loads of NVMe) are very expensive.
Usually when I do databases in projects, they’re measured in 10s of MBs, or a couple of GB if I’m pulling a big dataset in. My best estimate for Parakeet’s database size at full network would be 14TiB for 30 million users (probably well over 16TiB now), which loosely matches both Blacksky and futur’s numbers.
This is not cheap to get as NVMe (or even as SATA SSDs) on any cloud provider, and nor is it cheap for in the homelab.
(Doing this on consumer (WD Black) 4TB SSDs, where you’d need 8x in RAID10, would cost £3680 at time of writing, and you’d want a spare for the initial write load alone. £6300 if you insisted on entry-level enterprise Kioxia drives)
3. Code generation (as in, macros, or tools that take input and generate structs etc, not LLMs) is a huge time saver.
Everything in the lexica crate, along with all the XRPC call request/response types, is hand-written. I intend to move over to Jacquard eventually, but I spend a lot of time writing definitions when I should be generating them. This also would’ve made sure I didn’t miss any fields off (which I have done previously)
4. There’s a time and a place for an SQL builder, and a high-throughput firehose consumer isn’t one.
Parakeet uses diesel - it used to use it for everything, the consumer, backfiller, and XRPC server. As part of rewriting the consumer to be faster (and using Postgres’ COPY FROM functionality), I cut diesel out and went with hand-written SQL queries and a standard postgres library.
The server part still uses diesel though, as a query builder is pretty useful there.
5. Instrumentation such as opentelemetry traces and metrics are very useful, but difficult to apply properly.
Being able to see traces for queries is unbelievably useful for debugging performance issues, but the setup time using the Rust opentelemetry and tracing crates with Axum is way too high. I ship metrics to my existing Grafana server (although I used Cloud for a bit), and logs and traces to Honeycomb (until I get Grafana wired up for that.)
One issue with Parakeet’s OTel reporting is the use of a GraphQL-like dataloader to batch database queries. This should stop loads of small queries hitting the database under load, but it has the unfortunate side effect of breaking tracing as the database accesses bounce between threads.
6. Cargo workspaces are both a gift and an absolute nightmare
Having all your code in one repo and being able to easily vendor in other libraries or break common functions out into libraries? Great, incredibly useful.
Getting three different binaries to build, all of which need specific feature sets? Harder than it should be, especially in Docker, where the best solution is three Dockerfiles.
7. Nix fixes this (seriously)
Having a single file that sets up a whole dev environment reproducibly across any computer and that builds the same way is seriously incredible. I did the first year of development without Nix and, now I’ve started using it, I can’t go back. I just open the folder and nix develop, and everything I need is ready: diesel, protoc, and the Rust tooling.
The fact that the builds just work perfectly across macOS and Linux helps so much. The Docker builds were always such a nightmare because they didn’t cache between builds, and now that all works, and the dependencies only build once, before the binaries.
They even solve the cargo workspaces problem! (or, crane certainly does, at least)
Plans for V1.0
I’ve started planning the roadmap for releasing the first major version, in the Skyboard, by tagging items with release/v1. These are the items that definitely need to be in Parakeet before I’m comfortable calling anything 1.0.
As it stands, the key things missing are:
- Notifications (active WIP)
- Search
- The nuclear block
- Recommendation features (for profiles, feeds, follows, starter packs)
There’s also a few other little bits that need to be cleaned up: in the time I’ve been working on it (and the time not), there have been some schema or behavioural changes, and the occasional bit of tech debt. Items with tag epic/mia are the things that I need to fix before I could start dog-fooding a Parakeet instance (which I do intend to do before 1.0!)
Of course, everything needs to be subjected to some rigorous testing, the kind that can really only be achieved through proper usage. This is partially why I intend to dogfood. I should probably also port across Bluesky’s AppServer test suite from the atproto repo for proper coverage.
Moving closer to v1.0, performance is also an important goal. I’d like to do some deep database work to find out if there are any good tweaks or optimisations I can do to increase query and write performance and/or decrease storage space. Part of my plan for this is turning all the DIDs into i64, so we’re not storing (technically) variable length strings everywhere.
A far-off goal is to do string interning on AT URIs too - although this’ll need serious work on determining how to allocate and map all those strings, and also to move record storage (for those types like post that require it) out to parakeet-index, which can compress them.
Further Trajectory
I also have an epic for ‘small-world’ AppServers, for when you don’t want to index the whole network. This includes the allowlist and consumer filter changes outlined previously, and a few other bits like automatic purging of old events on a set timescale. These probably won’t be done until after 1.0, but they are on my radar as this is an important use case for smaller communities (or friend groups!)
My current goal, given Bluesky’s trajectory recently, is to get an instance of Parakeet up and running that indexes all the independent PDSes on the network. I’d like to use Parakeet to help further decentralisation of the network, and especially to reduce reliance on Bluesky.
The progressive increase in independent PDSes, and users on them, allows this Parakeet instance to scale alongside (although it’s currently on a machine with plenty of capacity). Getting it to a point of being useful would require the independent PDS network to host a significant chunk of active users. I think this is a possible goal, and one I intend to discuss more in a future post on my personal blog.
Conclusion (and call to action)
Turns out, big applications that rely on big databases with lots of throughput are hard - who’d’a thunk it‽
I recognise I’m not always great at asking for help, but: if you’re someone who is interested in ATProto and/or Rust, and feel that you could help, then please do! I’d be super grateful with a hand on docs, CI/CD, performance optimisation, and general code improvements, and of course help on the implementation and testing work is also appreciated.