Coding Agents are Addictive #

Many lessons learned

Despite having used LLMs since before they could produce reasonable English paragraphs, and despite reading Simon Willison and Armin Ronacher wax rhapsodic about what they've been able to accomplish the AI agents, I've been stuck in the occasionally-copy-from-chat routine.

Then Steve Yegge introduced beads which seemed interesting until it turned out to be a bit of a nightmare. But there was something about how he tied agent work to the way humans work that made it click for me and so a little over a week ago I decided to install Claude Code.

But what to try it on? Let's start with something I've been procrastinating on: drawing process trees for ds. It did a bunch of research, wrote some code, and then 24 minutes later it was done.

Ok, I think, I've had some success with code reviews. Let's try that. And then that was done.

Overall here's how fixing the entire backlog of ds went. (Towards the end I used this session to also create docs for cosmofy.)

    Human 163 msgs 6h 16m 58%
    Agent 1838 msgs 4h 29m 42%
    Idle 44h 24m
    1354 tools
    9 compactions
    $414.15 ↑150.0M ↓350k
  

    2026-01-11 11:33:03
    2026-01-13 18:42:19
  

And then the entire backlog of cosmofy

    Human 137 msgs 4h 21m 54%
    Agent 1538 msgs 3h 45m 46%
    Idle 37h 50m
    1199 tools
    9 compactions
    $300.05 ↑126.7M ↓365k
  

    2026-01-11 18:23:59
    2026-01-13 16:19:53
  

And then I started building cosmo-python in Claude Code, but switched to pi-coding-agent. Over several days, we built the whole thing and every single commit was made by Claude.

Part 1: From setup to first build (Claude Code)

    Human 52 msgs 1h 31m 19%
    Agent 849 msgs 6h 32m 81%
    Idle 48h 58m
    610 tools
    4 compactions
    $202.54 ↑51.5M ↓148k
  

    2026-01-11 23:02:39
    2026-01-14 08:04:10
  

Part 2: From uv + python-build-standalone to first release

    Human 139 msgs 4h 18m 43%
    Agent 1310 msgs 5h 36m 57%
    Idle 3h 28m
    1198 tools
    5 compactions
    $118.08 ↑10.8M ↓330k
  

    2026-01-14 07:40:53
    2026-01-14 21:03:08
  

Part 3: From GitHub actions to robust release

    Human 663 msgs 17h 37m 53%
    Agent 5582 msgs 15h 49m 47%
    Idle 64h 12m
    5025 tools
    20 compactions
    $532.11 ↑43.7M ↓1.4M
  

    2026-01-14 21:04:47
    2026-01-18 22:42:48
  

Ok, so then I wanted to write this post with links to transcripts. pi has a native /share that generates a secret gist which is cool, but I wanted some more visualization of who was doing what.

And that burned a whole day.

    Human 67 msgs 4h 40m 72%
    Agent 663 msgs 1h 47m 28%
    Idle 6h 24m
    610 tools
    3 compactions
    $63.35 ↑4.5M ↓269k
  

    2026-01-19 10:12:09
    2026-01-19 23:02:47
  

Reflections #

Working with coding agents is extremely addictive. The agent works quickly, but it requires some amount of your attention. How much attention, though? Things get pretty thorny quickly.

One reason vibe coding is so addictive is that you are always *almost* there but not 100% there. The agent implements an amazing feature and got maybe 10% of the thing wrong, and you are like "hey I can fix this if i just prompt it for 5 more mins"

And that was 5 hrs ago
— Yoko (@stuffyokodraws) January 19, 2026

Objective criteria let you delegate. If the agent needs to wait for you to figure out if things are working, you're still working on the problem and you haven't delegated it. Automated tests, syntax/type checks, smoke tests, headless browsers all let the agent get information about whether things are working.
Iterate on specs first. This is true for humans too. Don't let the agent build the first rev because it's easy. You'll end up iterating all day. Do lots of throwaway experiments to figure out what the criteria should be instead of doing a huge rewrite every time you want a new feature.
Code reviews work. When I did extensive code reviews for cosmo-python, it ended up making the tools simpler for both humans and agents to understand.

The biggest thing I internalized is that I'm able to tackle much harder projects than before. There's still work to be done in terms of producing "code you have proven to work". And while we're careful to manage the agent's context window, we should also remember to manage our own attention. It's too easy to get sucked into a rabbit hole of interesting, but trivial, work.

2026-01-20 # ai, programming, agents, Claude, Simon Willison, Armin Ronacher