Designing the way humans and AI agents talk to each other

An end-to-end interaction pattern for an AI agent interface, covering the chat environment, in-chat elements, artifacts, and the agent's working layer. Built so people can see what the agent is doing, stay in control, and trust the output.

Role Interaction Designer Team 2-person design team Timeline 2-week engagement Tools Figma
Agentic Experience agent chat interface on laptop mockup

AI agents can now take real actions and return real work, not just chat replies.

But that power creates a new design problem. When an agent interprets a request, makes decisions, and produces files on its own, the person on the other side is left out of the loop.

What did the user ask? What is the Agent thinking? What did the Agent do? What is the Agent referencing?

The Brief

I was brought in for a focused two-week engagement to design the agent chat experience for Impetus, an enterprise operations platform. The brief wasn't "make a chatbot." It was to define a coherent interaction pattern for working with an agent: requesting work, watching it reason, reviewing outputs, and steering it when it's wrong.

This case study maps that pattern across four layers, each one making the agent more legible, more steerable, and more trustworthy.

Human and robot fist bump — human-AI collaboration

An agent that works invisibly asks you to trust blindly.

Four layers of the pattern

Chat Environment

The structure you work inside

Chat Elements

How each message communicates state and logic

Artifacts

The work the agent produces, reviewable and extractable

Virtual Machine

The agent's working layer, made visible

What's out there?

Before designing anything, I audited how existing products handle the same problems: Shopify, Anthropic, Hatchcanvas, Manus, ChatGPT, Slack, and Bard. The question wasn't which UI looked best, it was which patterns made the agent's work legible and the output trustworthy. Six dimensions, six products.

Competitive reference screenshots, Shopify, Hatchcanvas, Anthropic, ChatGPT, Manus, Slack, Bard

Shopify · Hatchcanvas · Anthropic · ChatGPT · Manus · Slack · Bard, reference captures across six interaction dimensions

Competitive UI analysis table, Chat Listing, Artifacts UI, View Artifacts, Repository, Agentic Mode, Input File UI across all products

The table surfaced the gap: most products handle chat listing and basic artifacts, but none showed the agent's live execution in a way a non-technical user could follow. The agentic mode column, the row that mattered most, was mostly blank. Making live agent execution visible to a non-technical user — something no existing product did — became the brief for the Virtual Machine layer.

The foundation is the workspace itself. A predictable structure is the first form of trust, people relax when the space behaves consistently.

Where conversation lives How the Agent's actions surface How every interactive element behaves across states
Chat environment default state

Default structure

The default state sets the contract: a focused composer, a clear model selector, and an honest disclaimer that the agent can make mistakes. The empty state invites a request without overwhelming, everything else stays out of the way until needed.

Expanded side panel

Expand actions & chat scroll

The side panel collapses to an icon strip by default, keeping the chat canvas uncluttered. Expanding it reveals three quick actions at the top — New Chat, Search Chat, Saved Memories — followed by a scrollable chat history grouped into Starred and Chats. The panel slides open inline without a layout jump, so the workspace adjusts without losing context.

Side panel element states matrix

Side panel element states

Trust at the system level is built from rigor at the component level. I documented every interactive element across Normal, Hover, Pressed, and Selected so behavior is consistent and legible everywhere. The states matrix isn't decoration, it's the spec that keeps the agent's workspace feeling reliable.

Inside the conversation, each message has to do more than display text — it has to communicate state. I built a small system of chat elements so every message carries that information legibly.

What the user asked What the Agent is thinking What the Agent did What the Agent is referencing
Simplified chat showing three voice types Chat with real content
Text styles annotation table

Text styles (in context + spec)

I defined distinct text treatments for the three voices in any agent conversation: the user's prompt, the agent's thinking, and the agent's response. Separating these visually is what lets a person scan a long exchange and instantly know who is "speaking." The "Thought for Xs" treatment makes the agent's reasoning a first-class, collapsible part of the message, visible when you want to verify, out of the way when you don't.

This is the decision-visibility layer. The thinking block reveals, in plain language: what the agent understood, the steps it planned, and the assumptions it made, so a wrong interpretation gets caught before the user trusts the output.

Tags and VM action elements spec

Tags & agent action elements

Agents don't just talk, they cite sources, reference files, and run actions. I designed a set of in-chat tags for this: action badges (what the agent is doing), source/link tags (clickable provenance), and artifact/file tags (the files under review or in use). Inside the thinking block, these combine into a readable trace: action → details → sources → output → status (Done / Executing). Provenance you can click is provenance you can trust.

Annotated chat showing user prompt, agent thinking, action name, links/files, response headings and body text

The agent's real value is the work it produces. Artifacts are how that work becomes something a person can review, verify, edit, extract, and hand off, the human-in-the-loop core of the system. An answer locked inside a chat bubble can't be trusted or used; an artifact can.

Artifacts panel open beside chat

Artifacts listing (opened)

Every file the agent generates collects in a dedicated Artifacts panel, typed, named, individually downloadable, and "Download All" for the whole set. The user can collapse it to keep focus, or open it to see everything the agent has produced in one place. Output becomes inventory, not a scroll-back hunt.

User sending files into the conversation

Artifacts & folders sent by the user

The loop runs both ways. Users send files and folders into the conversation as inputs, with the same tag language used for the agent's outputs. One consistent file element, whether it's something you gave the agent or something it made, so the exchange reads as a single shared workspace.

Opened artifact document view Opened artifact code view

Opened artifact: document view

Opening an artifact reveals a full preview beside the conversation, a formatted document the user can read, verify against their request, and publish or download. This is the review checkpoint: the agent's claim and the actual deliverable sit side by side for confirmation.

Opened artifact: code view

For structured output, the same artifact switches to a code/data view, the actual query definitions and schema the agent used. Toggling between human-readable preview and raw structure is the strongest trust move in the system: the user can verify exactly what the agent did before relying on it.

Artifact component patterns — listing states, variants, and artifact opened view

Artifact component patterns

Like the rest of the system, artifacts are a documented component set: list-item states (Normal / Hover / Pressed / Selected), artifact-as-response vs. artifact-as-input variants, and the preview-and-breakdown anatomy. This is what makes the pattern reusable beyond this one product.

The deepest layer makes the agent's work itself visible. Most chat interfaces hide execution behind a spinner, the user waits, unsure what's happening or whether to trust the result. I designed a "computer" layer that does the opposite: it shows the agent working, names the file or source it's touching, and lets the user replay the whole run. Visible work is trustable work.

VM working banner collapsed inline in chat

Working banner (collapsed, in chat)

While the agent runs, a compact banner sits inline above the composer, "Fynd is using xyz.gpt file", with a thumbnail peek of what it's operating on. Above it, the live action trace updates in real time: each action names itself, shows its steps, cites its sources, and reports status (Done → Executing). The user never wonders "is it stuck?", the work is narrated as it happens, without leaving the conversation.

VM working view opened in split screen

Working view (opened, split screen)

Expanding the banner opens the Fynd Computer: a dedicated panel beside the chat that shows the agent's environment, the action it's running, the source or file it's acting on, and a playback bar to step through the run. Pairing the conversation with a live view of the machine turns an opaque process into something a person can watch, pause, and verify. This is the human-in-the-loop principle pushed all the way down to execution.

The pattern: collapsed ↔ opened

Like every other layer, the VM is a documented, reusable pattern with two states: collapsed (the in-chat banner, for staying in flow) and opened (the full computer view, for close inspection). Progressive disclosure again, transparency on demand, never forced. The same playback controls in both states let the user replay what the agent did, making the agent's work an auditable record rather than a one-time event.

This is the trust-through-transparency centerpiece of the system. An agent you can watch and replay is an agent you can trust with real work.

End to end

Put together, the four layers form one loop: the user makes a request, sees the agent's interpretation and reasoning, watches the work run, reviews the artifact it produces, and either extracts it or steers the agent to try again. Visibility and control at every step, that's the pattern.

Agent loop diagram: request → interpretation → human checkpoint → execution → artifact output → extract or steer

The hardest decision wasn't visual, it was how much of the agent's mind to show. Too little and the user can't trust it; too much and the interface drowns. The answer was progressive disclosure: collapsed by default, expandable on demand, with provenance always one click away.

Designing for a non-deterministic system also reframed what "done" means. A traditional UI is correct or broken; an agent interface has to be honest about uncertainty and always leave the human a way to intervene. That principle — legible, steerable, honest, is what I'd carry into any agent product.