By Johnny Chan · UI/UX Designer, Hong Kong
How to Usability-Test AI Features Before You Ship
Task scripts, trust probes, and what to log when answers change every run. Structured testing still works on probabilistic UI.

Classic usability testing assumes the screen is the same for everyone. AI features can return different text for the same task. You still run structured sessions. You add questions about trust, understanding, and what people do when the answer looks wrong.
Test jobs, not prompt tricks
Give goals people already have: find why an order failed, draft a polite refund reply, compare two plans. Do not teach magic phrases. You are evaluating product UX, not participant skill at prompting.
Add probes standard tests skip
- Did you trust this answer? What made you trust or doubt it?
- If this were wrong, what would you do next?
- Did you notice sources, labels, or confidence cues?
Treat variance as signal
Run enough sessions to see when different outputs confuse people versus when variation does not matter. If inconsistency breaks comprehension, add structure: fixed templates, constrained choices, or UI that normalizes the answer shape.
Sort fixes by layer
Bucket findings into model policy, prompt or copy, and pure interface. Many failures are UI problems: weak empty states, no confirm step, missing edit. Those often ship without retraining anything.
Let's work together
Open to UI/UX projects, collaborations, and product design support in Hong Kong and remotely.
Let's Connect