Coding Agents are Addictive #
Many lessons learned
Despite having used LLMs since before they could produce reasonable English paragraphs, and despite reading Simon Willison and Armin Ronacher wax rhapsodic about what they've been able to accomplish the AI agents, I've been stuck in the occasionally-copy-from-chat routine.
Then Steve Yegge introduced beads which seemed interesting until it turned out to be a bit of a nightmare. But there was something about how he tied agent work to the way humans work that made it click for me and so a little over a week ago I decided to install Claude Code.
But what to try it on? Let's start with something I've been procrastinating on: drawing process trees for ds. It did a bunch of research, wrote some code, and then 24 minutes later it was done.
Ok, I think, I've had some success with code reviews. Let's try that. And then that was done.
Overall here's how fixing the entire backlog of ds went. (Towards the end I used this session to also create docs for cosmofy.)
And then the entire backlog of cosmofy
And then I started building cosmo-python in Claude Code, but switched to pi-coding-agent. Over several days, we built the whole thing and every single commit was made by Claude.
Part 1: From setup to first build (Claude Code)
Part 2: From uv + python-build-standalone to first release
Part 3: From GitHub actions to robust release
Ok, so then I wanted to write this post with links to transcripts. pi has a native /share that generates a secret gist which is cool, but I wanted some more visualization of who was doing what.
And that burned a whole day.
Reflections #
Working with coding agents is extremely addictive. The agent works quickly, but it requires some amount of your attention. How much attention, though? Things get pretty thorny quickly.
One reason vibe coding is so addictive is that you are always *almost* there but not 100% there. The agent implements an amazing feature and got maybe 10% of the thing wrong, and you are like "hey I can fix this if i just prompt it for 5 more mins"
— Yoko (@stuffyokodraws) January 19, 2026
And that was 5 hrs ago
-
Objective criteria let you delegate. If the agent needs to wait for you to figure out if things are working, you're still working on the problem and you haven't delegated it. Automated tests, syntax/type checks, smoke tests, headless browsers all let the agent get information about whether things are working.
-
Iterate on specs first. This is true for humans too. Don't let the agent build the first rev because it's easy. You'll end up iterating all day. Do lots of throwaway experiments to figure out what the criteria should be instead of doing a huge rewrite every time you want a new feature.
-
Code reviews work. When I did extensive code reviews for
cosmo-python, it ended up making the tools simpler for both humans and agents to understand.
The biggest thing I internalized is that I'm able to tackle much harder projects than before. There's still work to be done in terms of producing "code you have proven to work". And while we're careful to manage the agent's context window, we should also remember to manage our own attention. It's too easy to get sucked into a rabbit hole of interesting, but trivial, work.