Background

During the pandemic, I watched a fair amount of YouTube. One of the YouTubers that I started watching was Not Just Bikes, specifically starting with the Strong Towns series. The case studies were compelling but do they apply to where I live? How solvent are our streets?

Planning

I started thinking about creating a tool to explore this question in February. The initial idea that I fed into Gemini Deep Research:

I want to write a tool that calculates the total area of pavement in my city of Livermore, California since I can’t find the data publicly available. Data should not be behind a paywall. I would write the backend in go with a web interface.

I nudged the research plan slightly before letting it start researching:

I’m mostly interested in roads but exploring other paved surfaces would be interesting too. Finding all the parking areas seems like another side quest

From the output, I asked Gemini to give me an implementation plan that I could hand to Claude. I sat on this plan for a few days before deciding that I wanted to expand the scope of the tool – I wanted to forecast the cost to maintain that pavement infrastructure:

i have the following idea. i want to use it as a starting point to explore city budgets with car-dependent design. my initial idea is to estimate the cost to maintain the current pavemented infrastructure but now i’m thinking about forecasting costs more generally. what should i add to my plan:

(full plan from before)

Gemini gave me a new plan that included forecasting the Pavement Condition Index with different maintenance strategies and budgets. With this new plan, I pivoted to Claude Code.

Implementation

The initial implementation went quickly given the extensive plan. After prototyping some features, my main concern was that I had generated a bunch of slop that would get increasingly unwieldy to maintain. This led to many cycles of generate some functionality and then refactor.

Early on, I instructed Claude to use the GitHub CLI repo as a reference for any design decisions. This informed the general shape of the tool but Claude was not perfect about sticking to them (and I was not disciplined enough to prompt it to). The code had already grown to around 4-5k LoC and I asked Claude to audit the implementation against the design decisions. This identified several gaps which we refactored leading to proper dependency injection, shared mock storage, and iostreams usage.

Later, I added a ton of golangci-lint linters to the repo. These caught a lot of common coding issues and adding them seemed like an easy way to drive Claude towards cleaner code. It would have been nice to have them enabled from the beginning though rather than spending a few sessions adding linters and cleaning up lint.

I worked on the project on and off for a few months and came to the conclusion that I needed to give Claude more structure to keep the repo aligned with my coding preferences. This led to the creation of byob-go-cli, a collection of preferences based on idiomatic Go (that I described previously). I ran multiple passes to identify gaps between the byob-go-cli preferences and the implementation, adding beads as needed. This took several weeks to coalesce.

Once my initial proof-of-concept was in good shape, it was time to test on a bunch of cities. I had mostly been testing on Livermore and decided to branch out to other cities/regions. Livermore is inland and one of the early failures when branching out was for cities with a coastline. Specifically, when calculating the percentage paved for cities on the coast (e.g., San Francisco), the percentage would be tiny as it included vast amount of water in the total area. I was curious to see if Claude could figure it out and we went through many cycles of plan-implement-test without succeeding. Turns out “make no mistakes” does not automatically make Claude solve the problem. Eventually, Claude figured out how to stitch the coastline together properly (even when there were small gaps) but it struggled to figure out what side of the coastline was land (I did not want to pull in any additional data sources). It tried many different probing strategies including using points inside the city boundary. We kept running into corner cases and eventually I suggested to use the roads themselves as probes (since they should be on land) which solved the issue where sometimes the boundaries were inverted. This finally succeeded and we were closing in on v0.1.0.

I published v0.1.0 and with it the generated site. Being a personal project, there was no release party. I liked browsing the site and exploring the data for different cities. I made sure to add a banner to the site that this was a work in progress as I was sure there were still bugs lurking.

On June 9th, Fable 5 was released and I wanted to see what bugs it could find in the repo. It did a comprehensive review across multiple dimensions, discovering nearly 80 bugs (correctness 37, perf 13, tests-docs 10, architecture 7, domain-math 6, security 5). This audit was conducted over 20+ hours spanning multiple sessions (totaling almost 150M tokens, including cache reads; almost 17M tokens excluding cache re-reads) using a workflow that fanned out to 144 agent sessions. I had Claude file beads for all the findings which I then used Opus to fix after we all lost access to Fable. Fortunately, there was enough detail in the beads that Opus was able to iterate through and address nearly all the bugs.

For brevity, I glossed over a few things in this implementation description including needing to iterate on the export format to generate smaller JSON objects and addressing performance issues in some of the stages.

Output

The output from the tool is published under my personal website due to a quirk with GitHub pages. The site includes a number of different examples including Livermore, CA, the initial motivator.

Livermore pavement area screenshot.

Livermore pavement area screenshot.

It also has several metro areas including the Bay Area, Denver, Boston, Los Angeles, and Portland. These multi-city views add Compare and Aggregate tabs. I made some effort (with Claude) to populate initial pavement condition indices and city budgets. There is still more work to do.

Drawing inspiration from City Nerd I also included an example with the Top 50 US cities. It is fun to zoom around the US and look at how different cities compare.

Lessons Learned

The biggest lesson that I learned from implementing this project is the benefit of putting guardrails on Claude early on in the implementation. If you ask Claude to bring you a rock and know that you want a blue rock, telling it upfront that the rock should be blue will save a lot of time and energy. The BYOB scaffolding helps convey the kind of code that I want to read/review.

Second, this further reinforced the lack of creative problem solving with tools like Claude. It might have gotten there eventually but “make no mistakes” is no substitute for creativity.

Finally, I plan to continue to iterate on the site and data before reaching out to some local advocacy organizations to see if it is useful to them. It would be nice to see this project make a difference in my city.