Assert on behavior, not implementation

byob-testing.3 testing

Problem: tests that assert on internal call counts ("expected Store.Get to be called exactly twice"), private field values, or the precise order of collaborator interactions break the moment the implementation refactors — even when the user-visible behavior is unchanged. The test becomes a snapshot of one specific implementation, not a contract about what the code does. Maintenance cost compounds: every internal refactor triggers test churn that adds no signal, and the suite slowly trains its authors to treat test failures as noise.

Idea: assert on the behavior the caller can observe — return values, errors (including their errors.Is/errors.As shape), and the side effects the code is responsible for. For commands, that means the bytes in the stdout/stderr buffers from iostreams.Test() (per byob-testing.1), the exit error, and any persisted state the test set up the storage layer to inspect. It does not mean the sequence of method calls on a fakePrompter, the internal cursor position of a paginator, or whether Store.Get was called before Store.List.

The fakes from byob-testing.1 make this easy to get wrong — once a fake records calls, the temptation is to assert on those records. Resist except where the call itself is the behavior under test (e.g. "the dry-run mode does not invoke Store.Write"). Treat recorded calls as a debugging aid, not the default assertion surface.

Tradeoffs: behavior-focused tests can miss regressions in internal state that don't surface through the public boundary. Mitigation: make the boundary wider when it matters — return the new state, expose an inspection method, emit a log line the test can assert on. If you find yourself wanting to test a private invariant, the invariant probably wants to be observable.

When not to use: characterization tests written specifically to lock in current implementation behavior before a refactor (rare, and delete them after the refactor). Tests that assert on call counts as a proxy for algorithmic complexity (uncommon in CLIs).

Design

// Behavior-focused: assert on what the user sees.
func TestListPrintsItems(t *testing.T) {
    io, _, stdout, _ := iostreams.Test()
    f := &Factory{IOStreams: io, Store: fakeStoreWithItems("alpha", "beta")}

    cmd := NewCmdList(f, nil)
    require.NoError(t, cmd.Execute())

    if diff := cmp.Diff("alpha\nbeta\n", stdout.String()); diff != "" {
        t.Errorf("stdout mismatch (-want +got):\n%s", diff)
    }
}

// Implementation-focused: brittle. The test now owns the call
// sequence, not the behavior. Refactor `List` to cache or batch, and
// this breaks without any user-visible change.
func TestListCallsStoreGetTwice(t *testing.T) {
    s := &recordingStore{}
    cmd := NewCmdList(&Factory{Store: s}, nil)
    _ = cmd.Execute()

    if len(s.getCalls) != 2 {
        t.Errorf("Store.Get calls = %d, want 2", len(s.getCalls))
    }
}

The exception — when the absence of a call is the contract:

func TestDryRunDoesNotWrite(t *testing.T) {
    s := &recordingStore{}
    cmd := NewCmdDelete(&Factory{Store: s}, nil)
    cmd.SetArgs([]string{"--dry-run", "item-1"})
    require.NoError(t, cmd.Execute())

    if len(s.writeCalls) != 0 {
        t.Errorf("Store.Write called %d times in dry-run, want 0",
            len(s.writeCalls))
    }
}

Here the no-op behavior is the user-visible contract of --dry-run, so asserting on the absence of the call is the same as asserting on behavior. The shape is the exception, not the default.