Observability in agentic UX

The first time I used Claude Code, I gave it a non-trivial task: tidy up my Downloads folder. I expected to see a spinner for 30 seconds and a “done” at the end. What I got was different.

Before touching anything, it showed me a plan: “I’ll group by file type, create folders, move, then confirm with you”. I confirmed. As it went, it kept narrating: “now reading the folder”, “now creating folders”, “now moving x to y”. At the end, a complete log of everything done. I could pause mid-flight. I could cancel. I could ask why a decision was made.

This is observability. And, in my experience, it’s the most underrated UX competence in agentic. Without it, trust dies on the first non-trivial task.

The problem: the agent is a black box

In a traditional product, the user sees everything. Press a button, see the result. Feedback is immediate and visible.

In an agent, there are delays. Background work. Decisions the system makes without asking. If none of this is visible, three things happen, in order:

The user mistrusts.
The user starts asking for confirmation on every step, losing the value of automation.
The user gives up.

Nielsen’s classic visibility of system status heuristic isn’t negotiable. In agentic products, it’s more critical than ever, and harder to apply.

The five pillars of observability

1. Plan before execution. The agent exposes the plan and asks for confirmation. Doesn’t have to be long. “I’ll do A, B, C. Continue?” Gives the user a moment of override before the AI burns tokens (or changes files).

2. Real-time state. As it runs, the agent says what it’s doing. Not at the end. During. Lines like “reading file x” or “querying the database” are enough. They don’t need elegant design; they just need to exist.

3. Attribution in multi-agent. When there are multiple agents, say which one is talking or acting. Can be light, no avatar swaps every turn, but some way of saying “this came from the billing agent” helps a lot when something breaks.

4. Pause and cancel. The user can stop mid-flight. Without losing context. Without restarting the conversation. Claude Code does this well: you can interrupt mid-command, say “wait, actually change this”, and the agent adjusts without losing prior reasoning.

5. Auditable history. Everything the agent did is logged. The user can scroll back and find the step where something went wrong. In products with cost (tokens), the history also shows the cost per action.

How Claude Code applies the five

Worth studying as a piece of design.

Plan: on a non-trivial task, shows a numbered plan. Asks for confirmation or adjustments.
Real-time state: every step, messages like “Reading file…”, “Editing line 42…”, “Running tests…”. Text, not animation.
Attribution: when using sub-agents, names them. “Spawning research agent…”
Pause: the input bar is never blocked. You can type mid-execution, and the agent adjusts. A detail that changes the feel completely.
History: every interaction stays visible, every executed command stays logged. Text, scrollable.

None of this is magic. It’s a deliberate decision to surface what’s normally hidden.

Where observability fails

Three anti-patterns I see regularly:

Spinner without text. A spinning circle for 40 seconds isn’t observability. It’s animated opacity. The user has no idea if they’re close, far, or stuck.

“Thinking” animations without content. The agent shows three oscillating dots. Doesn’t say what it’s thinking about. Fine for short tasks. On long ones, starts to feel patronising.

Logs for developers only. The team has detailed logs in an internal terminal; the user gets a generic line. Observability has to reach the interface, not stay in backend logs.

The psychological cost of poor observability

There’s a silent calculation the user runs every time they delegate to an AI: “if this goes wrong, will I know in time to stop it?” Without observability, the answer is “no”. Result: the user delegates less, asks for smaller tasks, loses the value of automation.

When observability is good, the user delegates more. Trusts they can see what’s happening. Trusts they can stop it. Real productivity with the agent as co-pilot or autonomous agent depends more on observability than on the model’s raw capability.

Combining with the other themes

Observability doesn’t stand alone. Pairs with capability discovery: one helps discover what’s possible, the other helps trust while it happens. And in multi-agent, observability becomes even more critical because there are more moving parts. Covered in Multi-agent orchestration.

More background in the Design for AI guide. On the ethical principles that reinforce why transparency matters, see Ethical principles in AI design.