tl;dr: Evolved my Ralph-style automation for the Rust Symex port from a one-prompt bash loop into a generic Go tool (ralph) driven by a clean/dirty + plan/implement/review state machine.
Last month, I wrote about using Claude to port Angr’s core symbolic execution engine in Rust. One of my conclusions was that I should have spent more time improving the automation so that I could let Claude iterate on the backlog across multiple sessions after in-depth planning sessions. This post covers the evolution of that automation and lessons learned.
Python Looper
The first looper was a bash script that simply called claude -p in a loop
with the same prompt each time. I used Claude to inspect the implementation and
the logs that we had from running it along with the following additional
considerations:
- OOM kept killing the orchestration loop itself
- Should use different prompts if the repo is clean versus dirty
- Use structured logging for better observability
Claude suggested using cgroup-based OOM isolation via systemd-run to solve
the issue where my orchestration loop would inadvertently get killed. It
implemented this and the rest of the considerations without much supervision.
It did not get things perfect – it missed Claude’s --output-format=json,
which would have simplified the script’s parsing logic, and it still does not
cleanly comprehend Claude’s usage limits. Despite these imperfections, I ran
this looper (with periodic tweaking) on-and-off for three weeks landing over
400 commits (excluding bookkeeping) on the Rust Symex. A Claude-generated
summary of what changed (between commits 7e755f4 and 964a092c):
- SimOption coverage — honored KEEP_IP_SYMBOLIC, NO_IP_CONCRETIZATION, ENABLE_NX, NO_SYMBOLIC_JUMP_RESOLUTION; raise on SYMBOL_FILL_UNCONSTRAINED_REGISTERS.
- New native SimProcedures — scanf/sscanf, getenv/setenv/putenv, putchar/fputc/putc, memcmp, strstr, strdup.
- Arch support — ARM cond codes + ARM/AArch64 return-addr fixes; x86-32 promoted to Supported; synthetic ARM/MIPS32/MIPS64/AArch64 smoke benches.
- VEX / interpreter — Triop/Qop float arith; constant folding + inline hints; per-arch register dispatch via Arch trait; broader fallback telemetry.
- Exploration — DFS strategy w/ auto-detect; set_max_active_states, get_exploration_summary(), set_progress_callback().
- Solver / Z3 — per-site solver.check() counters; model caching to skip half of check_branch_feasibility; RustSolverFallback class replaces monkey-patching; second wrong-answer satisfiable() site fixed.
- Correctness — symbolic syscall num → Python; flareon2015_5 imported-addrs fix; libz3 SONAME-mismatch segfault; max_active_states enforcement.
- Hot-path perf — SmallVec push stacks; Arc/CoW on assumed_constraints fork; consuming RustBV op variants; O(1) load fast-skip via byte-indexed pending stores.
- Tooling — run_optimization_loop.py → .ralph/ orchestrator (FSM, hooks, OOM detection, PID lockfile); set_rust_log_level().
Performance changes (existing benches):
- Wins: ekopartyctf2016_rev250 4.14→2.02s, csaw_wyvern 1.05→0.94s (16.9x), flareon2015_5 6.66→5.57s, fauxware 0.40→0.28s (flipped to 1.4x), strcpy_find 0.45→0.39s (flipped 0.2x→2.3x), securityfest_fairlight 23.7→22.0s.
- Slight regression: google2016_unbreakable_1 3.20→3.50s (high variance).
- Net: 17/22 faster than Python; three new known-slower benches added with root-cause notes (hackcon 0.7x, mma_howtouse 0.6x, sokohashv2 0.4x).
I plan to do another in-depth post about the Rust Symex in the next few weeks.
Go Looper
The Python looper is certainly capable but is closely tied to the Angr repo. I wanted a generic, standalone tool I could point at any project.
I also wanted more than a simple loop. I had already started using different prompts depending on whether the repo was clean or dirty, and my typical Claude workflow has three distinct phases: iterate on an implementation plan, let Claude autonomously implement it, then review the output until it’s acceptable. A finite state machine seemed like the natural way to encode that.
Finally, I wanted the tool to work like git: hooks and configuration live
in the repo (think .git), but the git binary itself is generic. That
keeps the tool reusable while letting each repo customize behavior.
After creating several Go CLI tools with Claude, I’ve concluded that “build me a Go CLI” is too open-ended to produce good results. Without guidance – usually a lot of hand-holding during planning – Claude haphazardly stitches code together to hit the stated features. One way I have tried to counteract this is to derive “design principles” from Go repos I think are well-architected, to shrink Claude’s solution space. This started as a single markdown file derived from Github CLI that I referenced in every planning session. It worked, but lacked principles for things the upstream repo does not do (e.g., storing data in a database).
Since I had already adopted beads, I decided to encode these principles as beads issues that Claude could consult during planning. I created a repo and populated it from the Go source, Github CLI, and Effective Go; there is a simple website for browsing the content. This nudges Claude toward code I can understand (when it actually follows the principles) and lets me “bring my own beads” to any Go CLI project. I can imagine maintaining different “flavors” for different scenarios.
To test this scaffolding and create a generic looper, I built a Go Ralph tool. Specifically, I created a new repo, started Claude, and had it plan a new tool based on the Python looper, the logs from the Python looper, my desired state machine, and my design beads. This started with the v0.2.0 beads and Claude quickly created a working tool that I could use to improve itself. Getting to a working tool took some supervision – most memorably, one detour where Claude went off trying to verify that every design bead was enforced in its own output. That burned a session or two before I noticed and course-corrected, and it also surfaced enough gaps in the principles to drive another byob-go-cli release.
I have let the new Ralph tool continue to improve the Rust Symex after several planning sessions. Here’s a Claude-generated summary of what changed between commits 964a092c and 76f53094:
- Parallel-exploration foundation (angr-v5a5) — SharedLineageSolver + ScopePath skeleton; SymContext lineage / scope_path fields; lineage telemetry through get_solver_stats; with_z3_solver dispatcher.
- state.inspect MVP — extended to reg_read/reg_write/instruction/irsb/exit.
- Symbolic-CC dispatch — amd64/x86/arm symbolic CC paths; unified amd64/x86 ccall dispatch.
- New native SimProcedures — fopen/fdopen/fclose/fseek/ftell/rewind, fwrite/fflush/setvbuf, __libc_start_main after_main continuation.
- Arch promotions — ARMEL and MIPS64 to Supported (inline-ELF synthetic benches).
- VEX — Newton-Raphson reciprocal/rsqrt FP; single-word CAS oldHi fix; macro-driven width-family / float-lane dispatch.
- Memory — Address newtype; spans-first containment; balanced Concat tree; ITE-arm collapse with matching loads; multiple-concretization Or(addr==aK) hoist (K≤8).
- Symbolic simplification — pre-Z3 Extract rewrite + multi-byte Reverse fix; commutative-op canonicalization; Cmp(ZeroExt(k,x), BVV) fold; Shl/Lshr/Ashr/Mul w/ concrete RHS → bit-slice.
- Solver — batch add_constraints under one lock; ANGR_Z3_TACTIC hook; mul2concat for fairlight; min/max binary search seeded from cached model; SMT-LIB2 cross-context round-trip spike.
- Instrumentation — VEX dispatch / memory volume / concretization fanout / AST
emission counters; z3_ast cache hit/miss; native-proc fallback by error
reason; orphan-BVS fallback paths;
--dump-counters/--counters-json. - Bug fixes — proxy recovery of Rust AST for symbolic registers; state-export ancestor-chain walk; recovered symbols through Rust solver (drop pre-pin); Reverse leaf Z3 emission.
- Cleanup — dropped dead pending_python_constraints, _cb_sync_constraints FFI chain, desc-string constraint reconstruction, _replace_with_rust_snapshot, redundant _attach_rust_solver_fallback.
Performance is roughly flat across those commits. As before, I plan to do a deeper post on the Rust Symex itself in the next few weeks.
Lessons Learned
It has been fun to continue to experiment with Ralph loops to see how to make use of them. Once you have done the work to figure out what to build and how to build it, these loops work fairly well. In the Rust Symex context, the agent benefits from having a reference implementation (i.e., the Python code) to inform the planning process. But it was also able to improve its own code with just the BYOB scaffolding, no reference implementation in sight. At some point it would be fun to build the same tool twice – once with the scaffolding, once without – and compare the results.
Next Steps
There is a bit more on the backlog for the Rust Symex – my plan is to have Ralph continue to tinker on it for a bit longer before trying to better quantify its utility.