AffZeroAffZero Docs

Chatbot UX Best Practices

How AffZero's AI assistant implements the same interaction patterns used by ChatGPT, Claude, and Gemini for reliable, safe, and pleasant chat experiences.

Chatbot UX Best Practices

This document captures the principles behind AffZero's AI assistant design and explains how each one was implemented. It is intended for developers extending or maintaining the chat feature.


1. Never ask the model to author UI

The problem. If you instruct the LLM to print a JSON object that the client then parses, the model is acting as a template engine. It may paraphrase, omit fields, add extra text, or wrap the object in markdown code fences. Any of those deviations silently breaks the UI.

The pattern used by ChatGPT / Claude / Gemini. Tool calls are structured events in the streaming protocol — not text. The client reacts to the data, not to anything the model writes.

How AffZero implements it. The server returns result.toUIMessageStreamResponse() (AI SDK v6), which streams UIMessageChunk SSE events. When a write tool runs, the SDK emits a tool-output-available chunk containing the tool result as typed data. The client parses only text-delta chunks for the visible bubble and tool-output-available chunks (where output.__confirmation_required === true) for confirmation cards. The model's text never influences whether a card appears.


2. Halt generation after a pending action

The problem. When a model continues generating text after calling write tools it tends to hallucinate the outcome ("Done!") before the user has even approved, or invents UI limitations ("I can only show one card at a time").

The pattern. Stop the agentic loop the moment a confirmation is pending; resume after the user responds.

How AffZero implements it. A custom stopWhen predicate is passed alongside stepCountIs(8):

const stopOnConfirmation: StopCondition<any> = ({ steps }) => {
  const last = steps[steps.length - 1];
  return !!last?.toolResults.some((r: any) => r?.output?.__confirmation_required);
};

The loop halts immediately; the user sees the confirmation card(s) and nothing else until they act.


3. Parallel multi-action confirmations

The problem. Forcing a single confirmation per turn makes editing 10 conversions take 10 round-trips. Some earlier implementations also told the model that "only one card can be shown at a time", which was both wrong and user-hostile.

The pattern. Let the model call write tools multiple times in one turn (parallel tool calls). Each call produces one card. All cards render simultaneously.

How AffZero implements it. The tool-output-available stream parsing loop collects all confirmation chunks into a newConfirmations array (deduplicated by actionKey). setPendingConfirmations(newConfirmations) fires once after the stream ends with the full set. The system prompt now explicitly says:

"To act on multiple records, call the write tool once per record in the SAME turn — each renders its own card. Never claim only one can be shown at a time."


4. Deferred direct execution — no model round-trip on Confirm

The old problem. Previously, "confirming" meant re-sending the conversation to the model and hoping it re-called the write tool with the exact same parameters. This had two failure modes:

  1. Infinite confirmation loop — if the model re-derived parameters slightly differently, a fresh confirmation card appeared instead of executing.
  2. Re-fetch slowness — the model re-called read tools on every turn because tool results were discarded between turns.

The pattern. Capture the full {action, params} payload from the pending confirmation card on the client. On Confirm, POST those payloads directly to a server endpoint that runs getAction(action).run(ctx, params) without invoking the LLM at all. Return confirmedResults as plain JSON. The client renders a deterministic summary.

How AffZero implements it.

  • hooks/use-chat-engine.ts: confirmActionsImpl POSTs { confirmedActions: [{ action, params }, ...] }.
  • app/api/ai/chat/route.ts: when confirmedActions is present and non-empty, the route short-circuits before streamText, executes each action through the registry, logs to chat_agent_runs, and returns { confirmedResults }. incrementAiCalls is NOT called — no LLM token was used.
  • buildChatTools no longer has a confirmedSet check — write tools ALWAYS return __confirmation_required. Execution only happens through the direct branch.

Deduplication. Pending confirmations are now deduplicated by toolCallId (unique per invocation, assigned by the AI SDK) rather than by actionKey. This ensures that 7 offer edits on the same connection produce 7 distinct cards even when the earlier computeActionKey hash was colliding.

computeActionKey fix. The function now uses a fully recursive stable JSON serialisation so data: { id: "195" } and data: { id: "196" } produce different keys. The key is used only for logging and display — never for execution matching.


4a. Short-TTL read cache

The problem. Without caching, re-opening the chatbot or starting a follow-up turn (e.g. "also add aff_sub3 to those offers") forces the model to re-fetch all platform data — adding 2–5 s round-trips per turn.

The pattern. Cache expensive read-tool results in server memory for a short TTL (3 minutes). On the next turn, the same query returns instantly from memory.

How AffZero implements it. lib/ai/tool-cache.ts is a module-level Map<string, {value, expires}>. In buildChatTools, pull_stats and tracker_action_get calls check getCached(orgId::toolId::params) before executing. On success, setCached stores the result. Error results are never cached. The cache is bounded at 500 entries with LRU-style eviction.


4b. Compact tool-result memory (client → server)

The problem. Even with a server-side cache, the model still thinks it needs to re-fetch data because it has no memory of what was returned in previous turns (tool results are not included in the re-sent conversation history by default).

The pattern. The client accumulates a compact "recently fetched data" log (up to 5 entries, 800-char preview each) from tool-output-available stream chunks. On the next request, this is sent as toolMemory. The server injects it as a ## Recently fetched data section into the system context block.

How AffZero implements it. hooks/use-chat-engine.ts collects non-error read-tool outputs into toolMemoryRef.current and sends them as toolMemory on every subsequent request. app/api/ai/chat/route.ts appends the memory section to the context block when present. The system prompt instructs the model: "Do NOT call read tools again for data already in the 'Recently fetched data' section."


5. Optimistic streaming text

The pattern. Show text as it arrives — don't buffer the full response before rendering. Users perceive streamed output as faster even at the same tokens-per-second.

How AffZero implements it. The streaming loop updates setMessages on every text-delta chunk, so the assistant bubble grows character by character just like ChatGPT.


6. Never leak raw data into the chat bubble

The problem. If a JSON blob with an encrypted connection ID or internal key appears verbatim in the chat history, it confuses users and may expose internal IDs.

The patterns.

  • Structured streaming means tool result objects never travel through the text channel at all.
  • A defense-in-depth sanitizer in formatMessage (components/chat/chat-helpers.ts) strips any stray {..."__confirmation_required":true...} blob before rendering.

7. Human-readable confirmation cards

The problem. Showing Execute "edit_conversion" on S3afxrJwlh74Mw...? tells the user nothing actionable.

How AffZero implements it. Two fixes work together:

  • The confirmMessage for tracker_action_edit now looks up the connection name from ctx.connections instead of displaying the raw ID.
  • ConfirmationPreview in both chatbot-widget.tsx and mobile-chat.tsx renders key fields from params.data: conversion ID, human-readable status label (e.g. "Confirmed (Approved)"), and comment. This applies to tracker_action_edit, clone_offer, and the template/automation creation flows.

8. No double-confirmation

The rule. Never ask "Are you sure?" in text before calling the write tool. The confirmation card is the approval step. Asking twice is frustrating and erodes trust.

How AffZero enforces it. The system prompt explicitly states: "Do NOT ask 'Should I proceed?' or 'Are you sure?' before calling a write tool. The confirmation card IS the approval step."


9. Honest post-action reporting

The rule. After a confirmed tool runs, the model must report what the tool actually returned. Writing "Done!" when the tool returned an error is a lie.

How AffZero enforces it. The system prompt states: "Report what the tool ACTUALLY returned. If success: true → tell the user what was done. If error: ... → report the error honestly. Writing 'Done!' does NOTHING unless the tool returned success: true."


10. Clarify before acting, not after

The rule. If required information is missing, ask for it before calling the write tool. Never call a tool with guessed parameters and then apologise.

How AffZero enforces it. The system prompt provides explicit guidance: missing required fields → ask first, then call. If all information is present → call immediately. This keeps the conversation linear and predictable.


Summary table

PracticeWhere implemented
Structured tool-call streamingapp/api/ai/chat/route.tstoUIMessageStreamResponse()
Halt on pending confirmationapp/api/ai/chat/route.ts → custom stopOnConfirmation StopCondition
Parse confirmations from typed datahooks/use-chat-engine.tstool-output-available SSE chunk loop
Parallel multi-card confirmationshooks/use-chat-engine.ts + system prompt
Stable idempotent action keyslib/ai/chat-tools.ts actionKey + confirmedActionIds
Human-readable card previewcomponents/chat/chatbot-widget.tsx, mobile-chat.tsx ConfirmationPreview
Connection name (not encrypted ID)lib/ai/actions/registry.ts confirmMessage
Sanitize leaked JSON blobscomponents/chat/chat-helpers.ts stripConfirmationJson
No double-confirmationSystem prompt
Honest result reportingSystem prompt