Skip to main content
Kevin Arthur
AboutCase StudiesLab
Download ResumeGet in touch

Free Resources

Design Checklist

Ship designs that developers love. 12 critical specs for perfect handoffs.

Get Free Checklist

AI Readiness Audit

Is your UX ready for AI? Take the 2-minute assessment.

Start Free Audit
EmailLinkedInBehance

© 2026 Kevin Arthur. Designed & Built with care.

Back to Case Studies
Shipped Productdeveloper-tools

OpenCode Harness: Engineering a Deterministic AI Development Interface

An open-source VS Code client I built for the opencode AI agent — full chat, real diff review, and several sessions running in tabs. The hard part wasn't the features; it was hunting down 'Silent Staleness', a bug where the data was right under the hood but the UI quietly trusted stale, cached values. The fix was rebuilding the interface around live state that re-derives on every change and persists to ExtensionContext.globalState.

Lead Engineer & Product Designer
2026 – Present · Ongoing (beta)
TypeScript • VS Code Extension API • Webview UI • ...
OpenCode Harness inside a full VS Code window — TypeScript editor showing the state-persistence code beside the docked OpenCode chat panel with model selection and live context

📋 TL;DR Summary

The Problem

I built this to live inside my editor all day, which meant the people using it — starting with me — would notice the instant the UI lied. And it did lie, in a way I started calling 'Silent Staleness'. The data underneath was correct, but the interface kept trusting cached values and start-up snapshots that never refreshed. Features looked wired; some were quietly dead.

The Solution

I rebuilt the state pipeline around one simple rule I now call Active Re-Derivation: the UI never trusts a cached local — it rebuilds itself from live state on every change, then writes that truth down to ExtensionContext.globalState. I paired that with strict CSS containment so the layout can't deform when fonts scale or text flips to RTL, and a wall of automated tests so these dead wires can't quietly creep back.

My Role

Lead Engineer & Product Designer — I designed and built all of it solo: the host/webview split, the IPC bridge, the state-persistence layer, the accessibility work, and the contract + visual testing framework.

Business Impact

Shipped a free, open-source VS Code client with multiple concurrent AI sessions, 75+ models, and a UI I can actually trust — backed by 300+ automated tests. Silent Staleness went from an invisible production risk to a loud, repeatable test failure.

OpenCode Harness is an open-source VS Code extension I built to wrap the opencode CLI agent in a real graphical client — a chat panel, a proper side-by-side diff viewer, and several sessions running in tabs at once. It starts the agent's server for you and talks to it over an HTTP SDK, so the whole agentic loop — tool calls, diffs, subagents, reasoning — lives inside the editor instead of scrolling past in a terminal you keep flipping back to.

Open source — read the code on GitHub or install it from the VS Code Marketplace. Everything described below is in the public repo.

I'm going to skip the feature tour. What's actually worth your time is the story of one specific, maddening class of bug I had to chase down — and the architecture change that finally put it to rest.


1. Why I Built It — and Why the Bar Was Higher Than I Expected

Honestly, this started as a tool for myself. I was living in the opencode CLI all day and kept hitting the same friction: scrolling back through older chat items was painful, re-reading or reviewing a long response in a terminal was clumsy, and I was forever bouncing between the CLI and my editor (VS Codium) just to actually look at the code the agent was changing. I wanted the diff review in the same window. I wanted to do all my work in one place instead of paying a context-switch tax all day long.

So I built the GUI I wished already existed — and then it kept growing. When people started asking for voice input, I added a fully local voice mode. Piece by piece it became something that gives me more granular control than the CLI exposes on its own, with plenty more I still want to add. It's a client over the same opencode server, so it can't do anything the agent itself can't — but it can make all of it far nicer to actually live in.

That origin is exactly why the quality bar turned out to be so unforgiving. Most web apps get to be a little bit wrong — a button that lags a frame, a counter that's stale until the next refresh — and people just shrug. Developer tools don't get that grace. The folks using them write software for a living; they're the most likely to notice, the most likely to distrust, and the most likely to uninstall the moment the interface tells them something untrue.

”
Architectural Principle

In a developer tool, the UI is not a view of the truth — it is the contract. A passing data layer is not a passing feature.

That standard raised the bar for OpenCode Harness in three ways:

  • It runs inside the IDE. The chat panel lives in a sandboxed webview docked next to a real editor. It has to survive window reloads, theme changes, font resizing, and RTL languages without flinching.
  • It's stateful and long-lived. Sessions run concurrently (default cap of 5), each with its own model and history. State has to persist across reloads, not reset to a hopeful default.
  • It's judged by experts. When the context-usage bar says 40% and the real number is 80%, an engineer notices. Trust, once broken, doesn't come back.

OpenCode Harness inside a full VS Code window — the state-persistence code open in the editor beside the docked OpenCode chat panel OpenCode Harness docked inside VS Code, beside the very globalState persistence code this case study is about: per-tab model selection, session controls, and live context — all of which must reflect live state, not a cached snapshot.


2. The Architecture of an Extension

Here's the thing most people don't realize about a VS Code extension: it isn't one program, it's two — separated by a hard process boundary. Once that clicked for me, everything else about this bug made sense.

  • The Extension Host runs in Node.js. It owns the file system, spawns the opencode serve process, holds the SDK client, and is the only side that can touch ExtensionContext.globalState (the durable key-value store that survives reloads).
  • The Webview Panel runs in an isolated DOM — effectively a sandboxed browser iframe. It owns rendering. It has no direct access to Node, the file system, or persisted storage.

The two halves can only talk through an asynchronous postMessage IPC bridge. Every state change, every render, every persisted byte crosses that wire as a serialized message.

Two processes, one wire

Tap a piece to see what it owns. Then send a state update across the bridge.

ExtensionContext.globalState · the durable source of truth
postMessage IPC· async · serialized
  • ▸The only channel between the two processes
  • ▸Every state change crosses as a serialized message
  • ▸A truth boundary — anything cached past it can go stale

The backend follows a deliberate thin-orchestrator pattern: ChatProvider delegates to focused services (TabManager, StreamCoordinator, MessageRouter, DiffHandler) rather than accumulating logic. That separation matters for the bug hunt ahead — when state is owned by many small services and rendered across a process boundary, it's dangerously easy for one side to hold a value the other side has already moved past.

”
Architectural Principle

A process boundary is a truth boundary. Anything cached on the far side of an async wire is a stale value waiting to happen.


3. The Engineering Challenge: Exposing the Dead-Wire Bug

I started calling it Silent Staleness. The pattern is insidious precisely because nothing throws.

Picture the data plumbing as a circuit. The wire from the source to the UI is intact — values flow, the component renders, no error fires. But one of two things has quietly failed:

  1. The read path trusts a cached local. A value was captured once at initialization and then read forever after, even as the underlying system state moved on.
  2. The write path is dead. A user action mutated state in the UI, but the mutation never triggered its serialization — so on the next reload, or the next re-render from authoritative state, the change evaporates.

Either way the symptom is the same and maddening: it works once, looks correct, and silently drifts. No stack trace. No red console. Just a slow erosion of trust.

Here's where it actually bit, across real features:

The context-usage bar that froze at init

The context-fill indicator was computed once when a session opened and then cached in a local variable. As the conversation grew, the real fill climbed — but the bar kept reporting the initialization snapshot. The plumbing was perfect; the displayed number was a lie.

Silent Staleness, live

Send turns and reload the window. Watch the cached readout drift from the truth — without ever throwing an error.

❌ Cached local34,000 tok
17%looks fine…
✅ Active re-derivation34,000 tok
17%live truth

Session opened — both readouts agree. Now send a few turns.

The model selector that didn't persist

Switching a tab's model updated the in-memory UI immediately, so it looked wired. But for some paths the selection never reached the write path down to globalState. Reload the window and the tab reverted to a stale default — the choice was real in the DOM and dead on disk.

Per-tab model selection

Each tab is an independent worker. Switch one tab's model — the others don't budge — then reload to see the choice persist.

Anthropic
OpenAI
Google
opencode Zen
Active sessions
Refactor authClaude Sonnet 4.6
Write testsGPT-5 mini
Explore repoGemini 2.5 Flash

Chat-panel and RTL toggles that reset on reload

The RTL/LTR text-direction toggle and chat-panel layout preferences are pure UI state — exactly the kind of thing a user sets once and expects to stick. When the serialization step was missing, every reload re-derived from a default-LTR snapshot instead of the user's actual last choice.

The diagnosis: Every one of these had working data underneath. The defect was never in the plumbing — it was in trusting a cached value (dead read) or failing to fire the persist step (dead write). You cannot test your way out of this with data-layer tests alone, because the data layer passes. The lie lives at the UI/persistence seam.


4. The Structural Remediation

The fix had two fronts: stop the layout from deforming, and stop the state from lying.

4a. Layout containment: surviving font and RTL scaling shifts

The webview has to render correctly while the user changes the chat font size (8–32px), inherits arbitrary editor fonts, and flips the entire text direction to RTL. Without strict containment, long unbroken tokens — file paths, URLs, hashes, minified diff lines — blow out their containers and deform the layout the moment scaling changes.

The remedy was strict CSS encapsulation built on design tokens, with aggressive wrapping as a hard guarantee rather than a hope:

/* Tokenized, contained, and wrap-anywhere — so no single
   long token can deform the panel under font/RTL scaling. */
.message-body,
.tool-output,
.diff-line {
  min-width: 0;                 /* let flex/grid children actually shrink */
  overflow-wrap: anywhere;      /* break inside long unbreakable tokens */
  word-break: break-word;
  font-size: var(--chat-font-size);
  font-family: var(--chat-font-family, var(--vscode-editor-font-family));
}

[dir="rtl"] .message-body { text-align: start; } /* logical, not physical */

The principle is logical layout — start/end instead of left/right, min-width: 0 so containers can shrink, and overflow-wrap: anywhere so a 200-character token never wins a fight with its container. This is guarded directly by visual tests (diff-wrapping.spec.ts, chat-context-usage.spec.ts).

4b. Active Re-Derivation: the state architecture that doesn't lie

This is the heart of the fix. I deleted the assumption that a cached local is ever trustworthy and replaced it with a one-directional discipline:

”
Architectural Principle — Active Re-Derivation

The UI never trusts a cached local. On every relevant signal it re-derives its components from live system state, then serializes that derived truth down to globalState. Storage is the single source of truth; the DOM is a projection of it.

Before and after: the Silent Staleness flow caches at init and lets the write path die, so the UI drifts silently; the Active Re-Derivation flow derives from live state, serializes to globalState, and renders from that derived truth, with mutations and reloads feeding back into live state

Concretely, this meant:

  • Reads re-derive. The context-usage bar, quota, and cost are recomputed from live values on each update and on resume — the status strip restores the last valid persisted fill rather than re-rendering an init snapshot.
  • Writes are contracts. Every user-mutable preference — model per tab, RTL direction, panel state, observed token/cost usage — has a guaranteed serialization step into globalState. If you can change it, it persists, full stop.
  • Persistence is hardened. State is serialized through a defined shape (a state contract) so a window reload reconstructs the exact session picture — quota, cost, context fill, and per-tab model — instead of resetting to defaults.
stateManager.ts1 file changed
export function restoreTabs(ctx: ExtensionContext): Tab[] {
- const saved = cachedTabs ?? [];
+ const saved = ctx.globalState.get<Tab[]>(STATE_KEY);
// Storage is the single source of truth.
- return saved;
+ return (saved ?? []).map(deriveTabState);
}
Review the change before it touches disk.

Inside the editor: the OpenCode panel running a coding session next to the StreamCoordinator source that renders each chunk from live stream state The real thing, in context: the agentic loop on the left, the code that streams it on the right. Each tool call, edit, and result is a projection of live state — not a one-time cached render.


5. Preventing Future Regressions

Killing the bug once is not the job. The job is making it impossible to silently reintroduce. Silent Staleness slips past humans because it never throws — so the test wall had to catch exactly the things humans miss.

Contract testing of package.json

Every command, keybinding, and setting an extension exposes is declared in package.json. A surface declared but not wired (or wired but not declared) is a dead wire by definition. Contract tests assert the manifest and the implementation agree across all 109 declared surfaces:

  • 46 commands — each declared command resolves to a real handler.
  • 20 keybindings — each binding points to a registered command.
  • 43 settings — each setting is read by the code that claims to honor it.

Structural validation loops & message-contract tests

The IPC bridge is a serialization seam — the exact place stale or malformed state hides. A dedicated message-contract suite (message-contract.test.ts) and a round-trip test assert that every message shape that crosses the host↔webview wire serializes and deserializes losslessly, so the webview can never silently drop a field the host sent.

UI state checkpoints (visual regression)

Eighteen Playwright snapshot specs photograph the webview in known states and fail the build on unexpected pixel drift — directly targeting the features Silent Staleness attacked:

  • chat-context-usage.spec.ts — the usage bar reflects live fill, not an init snapshot.
  • diff-wrapping.spec.ts — long tokens wrap; no container deformation under scaling.
  • revert.spec.ts, messages.spec.ts, subagent-panel.spec.ts, webview-contract.spec.ts, and more — each pins a UI state that must re-derive correctly.

A real tool-execution session in the OpenCode panel beside the renderer that re-derives each tool card from its live call record Exactly the kind of UI state the visual snapshots pin: tool cards, statuses, and outputs that must re-render correctly from live state every time.

The test wall

What turns a silent dead wire into a red build.

0+
automated tests

Contract: package.json manifest ↔ implementation agree on every surface.

Contract-verified surfaces0
0
commands
0
keybindings
0
settings

Every declared surface must resolve to a real handler — declared-but-unwired fails the build.

The outcome: Silent Staleness went from an invisible, untyped, untestable production risk to a loud, repeatable, automated failure. A dead wire now turns the build red.


Engineering Outcomes

DimensionResultDetail
State architectureActive Re-DerivationUI re-derives from live state, serializes to globalState; storage is the single source of truth
Layout robustnessZero deformation under scalingTokenized CSS + overflow-wrap: anywhere + logical layout, guarded by visual tests
Test coverage300+ automated testsUnit, integration, contract, round-trip, and 18 visual regression gates
Contract surface109 verified surfaces46 commands · 20 keybindings · 43 settings asserted against the manifest
Concurrency5 concurrent sessionsIndependent tabs, each with its own model, mode, and persisted history
Reach75+ modelsPer-tab selection across Claude, GPT, Gemini, and dozens more
AccessibilityWCAG-aligned24×24px touch targets (2.5.5), AAA high-contrast preset, full keyboard nav, focus traps, prefers-reduced-motion

Key Learnings

  1. A passing data layer is not a passing feature. The most dangerous bugs are the ones that never throw. Silent Staleness lived entirely at the UI/persistence seam, where data-layer tests are blind.
  2. Re-derive, don't cache. Caching a value across an async process boundary is a stale value waiting to happen. Treating durable storage as the single source of truth — and the DOM as a disposable projection — made the interface deterministic.
  3. If you can change it, it must persist. Every user-mutable affordance needs a guaranteed write path. A toggle that doesn't survive reload isn't a feature; it's a dead wire wearing a feature's clothes.
  4. Make the invisible loud. The real win wasn't the fix — it was the contract and visual test wall that converts this entire class of defect into a red build before it can ever ship.

The takeaway: If there's one thing I carried out of this, it's that building software people can actually trust is less about piling on features and more about refusing to let the interface lie — even quietly, even once. Swapping fragile cached locals for an architecture that re-derives from live state turned a tool that looked wired into one I can prove actually is.


Explore the project: the full source is on GitHub (MIT licensed), and the extension is live on the VS Code Marketplace.

Like what you see?

Let's explore how we can work together on your next project.

Book a 15-Min Call
Back To Case Studies