Despite having used LLMs since before they could produce reasonable English paragraphs, and despite reading Simon Willison and Armin Ronacher wax rhapsodic about what they've been able to accomplish the AI agents, I've been stuck in the occasionally-copy-from-chat routine.
But what to try it on? Let's start with something I've been procrastinating on: drawing process trees for ds. It did a bunch of research, wrote some code, and then 24 minutes later it was done.
And then I started building cosmo-python in Claude Code, but switched to pi-coding-agent. Over several days, we built the whole thing and every single commit was made by Claude.
Ok, so then I wanted to write this post with links to transcripts. pi has a native /share that generates a secret gist which is cool, but I wanted some more visualization of who was doing what.
Working with coding agents is extremely addictive. The agent works quickly, but it requires some amount of your attention. How much attention, though? Things get pretty thorny quickly.
One reason vibe coding is so addictive is that you are always *almost* there but not 100% there. The agent implements an amazing feature and got maybe 10% of the thing wrong, and you are like "hey I can fix this if i just prompt it for 5 more mins"
Objective criteria let you delegate. If the agent needs to wait for you to figure out if things are working, you're still working on the problem and you haven't delegated it. Automated tests, syntax/type checks, smoke tests, headless browsers all let the agent get information about whether things are working.
Iterate on specs first. This is true for humans too. Don't let the agent build the first rev because it's easy. You'll end up iterating all day. Do lots of throwaway experiments to figure out what the criteria should be instead of doing a huge rewrite every time you want a new feature.
Code reviews work. When I did extensive code reviews for cosmo-python, it ended up making the tools simpler for both humans and agents to understand.
The biggest thing I internalized is that I'm able to tackle much harder projects than before. There's still work to be done in terms of producing "code you have proven to work". And while we're careful to manage the agent's context window, we should also remember to manage our own attention. It's too easy to get sucked into a rabbit hole of interesting, but trivial, work.
📝 Why I Stopped Using nbdev (Hamel Husain). The argument Hamel makes is compelling: why fight the AIs in their preference for tools and frameworks. My counter is: I still want good taste. Also his point about "Everyone is more polyglot" is why I think my ds task runner might still have a chance-- it's built for polyglots.
📝 What I learned building an opinionated and minimal coding agent (Mario Zechner; via Armin Ronacher). Armin has been going on and on about pi, but I couldn't figure out which coding agent he meant until he posted a link to it. After a few days using Claude Code (more on this later), I switched to using pi-coding-agent and haven't looked back. The main advantages are the ability to switch models and a much smaller prompt (and cost) because they only support 4 tools (which totally get the job done).
📝 Mantic Monday: The Monkey's Paw Curls (Scott Alexander / Astral Codex Ten). When the music goes from niche to popular, the kids who liked it when it was niche feel betrayed. Compare with plastics (rare + high status => ubiquitous and dead common) and GPS (rare + military defense => driving from home to work). When prediction markets were weird and niche, they were high status. Now they're mostly sports gambling, so declasse.
📝 A Software Library with No Code (Drew Breunig; via Simon Willison). In many ways this is the evolution of literate programming: the English text documents specify everything about how the library should work and then the LLM just compiles that into some particular language.
VSCode has had a fancy terminal IntelliSense for some time now. For some reason, it only worked on my macOS laptop, but not on my Linux machine. So I started digging around and found an important caveat for the integrated terminal:
Note that the script injection may not work if you have custom arguments defined in the terminal profile, have enabled Editor: Accessibility Support, have a complex bash PROMPT_COMMAND, or other unsupported setup.
Turns out that my use of bash-preexec messed up the PROMPT_COMMAND enough that VSCode couldn't inject itself properly.
Now as I described in the previous post, I'm only really using bash-preexec to measure the run time of commands. So I used ChatGPT 5.2 and Claude Opus 4.5 to help me work through my .bashrc to remove that constraint.
First, we keep track of whether we're in the prompt (we don't want to time those commands) and we separately "arm" the timer after the prompt is drawn (so we can time things after the next command runs).
# at the top__cmd_start_us=0__cmd_timing_armed=0__in_prompt=0__timer_arm(){__cmd_timing_armed=1;}__timer_debug_trap(){[[$__in_prompt-eq1]]&&return0[[$__cmd_timing_armed-eq1]]||return0__cmd_timing_armed=0locals=${EPOCHREALTIME%.*}u=${EPOCHREALTIME#*.}__cmd_start_us="${s}${u:0:6}"}trap'__timer_debug_trap' DEBUG
__s=${EPOCHREALTIME%.*}__u=${EPOCHREALTIME#*.}__cmd_start_us="${__s}${__u:0:6}"unset __s __u
# ...PROMPT_COMMAND="__prompt_command; __timer_arm"
The trap bit is clever and does most of the heavy lifting.
Once I got this working with my PS1 (see below), I asked Claude for any other improvements it could think of. I did this 3 times and incorporated all of its suggestions.
The main things I changed were to lazy-load completions and other imports. This brought the shell startup time down from 600ms to 14ms which I definitely notice.
Then there were some quality-of-life improvements:
HISTCONTROL=ignoreboth:erasedups
shopt-s histappend histverify # append and expand history fileHISTTIMEFORMAT="%F %T "# timestamp entriesHISTSIZE=10000HISTFILESIZE=20000# ...shopt-s globstar # let '**' match 0 or more files and dirsshopt-s cdspell # autocorrect minor typos in cdshopt-s autocd # type directory name to cd into it
🐦 Matt Pocock on Ralph Wiggum (Matt Pocock). The technique is simple enough that matches my intuition for how work gets done in a sprint. Matt also has a nice video explainer.
📝 Logging Sucks - Your Logs Are Lying To You (Boris Tane). Argues for passing a context object around and logging that object (with all the details you could possibly need) when something goes wrong. Extends the concept of structured logging to "wide events".
📝 Why Stripe’s API Never Breaks: The Genius of Date-Based Versioning (Harsh Shukla). I got through most of this post before it was revealed that Stripe has a version-level description of which features were added to the API and adapters that convert inputs and outputs into the appropriate version level based on date. Very cool, but how do you handle security issues in versions? You options (as far as I can tell are):
Announce you can no longer use a particular version. (Breaks "we support every version".)
Change the behavior of the specific version and re-release with the same version number. (Breaks "this version has this particular behavior".)
Some kind of automatic translation that says "this published version maps to this internal version".
In any case, it's all very nice, but unlikely to impact how most people will design versioned artifacts in the future.
📖 The Gene: An Intimate History by Siddhartha Mukherjee (2016; via Siraj Raval). The book makes many concepts in biology understandable. Combining the author's personal history makes it heart-warming.
📝 On deathbed advice/regret (hazn; via Tyler Cowen). I agree with the main point of the post which is why I've usually taken deathbed regret and converted it into specific advice. For many years, I've had the following (lightly edited) list towards the top of my todo list:
📝 Introducing Beads: A coding agent memory system (Steve Yegge). The whole thing was vibecoded and is kinda crazy, but I've actually been looking for a way to track issues from within git. Apparently the agents really like it.
📝 Six New Tips for Better Coding With Agents (Steve Yegge). Programming by hand is artisanal. Programming by copy-pasting from a chatbot is obsolete. The future appears to be conducting an orchestra of bot swarms.
📝 Childhood and Education #16: Letting Kids Be Kids (Zvi Mowshowitz). I grew increasingly angry while reading this. Zvi documents so many rage-inducing examples of bad rules around letting children do things on their own.
cosmofy 0.2.0 is available. So many things came together for this release:
Three open source developers I follow (William McGuan, Simon Willison, and Charlie Marsh) were all in a twitter thread where the concept of something like cosmofy was mentioned.
This release represents a very large shift from bundling individual python files to using uv to bundle entire venv directories. The behavior of the CLI is now much more similar to uv in form and function.
When I was designing the low-level zip file manipulation tools for cosmofy, I wanted an easy way to see the contents of the bundle. We're so used to using ls for looking into directories that I thought it would be cool to emulate as much of ls as I could.
But then I realized this was insane. First, many of the options are just aliases for slightly more explicit options. Charlie Marsh would never have a -t that was an alias for --sort=time. Why should I?
In the end I decided to go with the most common options (sorting, list view), a couple that were easy to implement, and a few longer-form ones that cover most of the aliases.
Imitation is the highest form of flattery which is why as part of the cosmofy 0.2.0 release, I decided to change everything about how the CLI behaved to make it work more like the way the tools from Astral work.
I have a long-term plan for Astral to take over making Cosmopolitan Python apps. It's a long shot, but if they do, it'll be a huge win for cross-platform compatible executables. I also saw this popular issue that there should be a uv bundle command that bundles everything up.
To make it easier to adopt, I decided to make the interface follow Astral's style in three important ways:
Subcommand structure: It's gotta be cosmofy bundle and cosmofy self update
Colored output: Gotta auto-detect that stuff. Luckily, I had fun with brush years ago, so I know about terminal color codes.
Global flags: Some of those flags gotta be global.
Smart ENV defaults: smart defaults + pulling from environment variables to override.
Now I didn't start out wanting to build my own argument parser (really, I promise I didn't!). I tried going the argparse route (I even tried my own attrbox / docopt solution), but I had a few constraints:
I really don't want 3rd party dependencies (even my own). cosmofy needs to stay tight and small.
I want argument parsing to go until it hits the subcommand and then delegate the rest of the args to the subcommand parser.
I want to pass global options from parent to child sub-parser as needed.
Together these pushed for a dedicated parser. This lets me write things like:
usage =f"""\
Print contents of a file within a Cosmopolitan bundle.
Usage: cosmofy fs cat <BUNDLE> <FILE>... [OPTIONS]
Arguments:
{common_args}
<FILE>... one or more file patterns to show
tip: Use `--` to separate options from filenames that start with `-`
Example: cosmofy fs cat bundle.zip -- -weird-filename.txt
Options:
-p, --prompt prompt for a decryption password
{global_options}
"""@dataclassclassArgs(CommonArgs):
__doc__ = usage
file:list[str]= arg(list, positional=True, required=True)
prompt:bool= arg(False, short="-p")...defrun(args: Args)->int:...
cmd = Command("cosmofy.fs.cat", Args, run)if __name__ =="__main__":
sys.exit(cmd.main())
For the colored output, I took inspiration from William McGuan's rich which uses tag-like indicators to style text.
📝 How I Found Myself Running a Microschool (Kelsey Piper / Center for Educational Progress). Over the past 10 years I have migrated to essentially this view: you need direct instruction to get the basics and a foundation; you need to see people enact the values you want to transmit; and you need a strong motivating project to get you over the humps when the going gets tough.
📝 Ideas Aren’t Getting Harder to Find (Karthik Tadepalli / Asterisk). Knowing what is causing productivity growth to start to slow is critical to selecting appropriate policies for how to get it going again. Karthik makes a good case for why the idea that "ideas are getting harder to find" is wrong and why it's more of a failure of the market to weed out bad ideas and promote good ones.