Illustrated flow

HAI researcher

User thread

IntentConstraintsTone

Agent thread

PlanValidateRepair

Output thread

Next stepChecklistAction

Practitioner lane~6 min

Assistant Evaluation Basics (3-metric core)

Use this to get a fast, defensible quality baseline before optimization.

Metrics

Task success rate — % completed without escalation.
Time to correct outcome — median time from prompt to usable result.
User trust rating — confidence score after each task.

Protocol

Pick 3 representative user tasks.
Run 5–8 moderated sessions.
Capture handoff failures and correction loops.
Prioritize fixes by impact on success + trust.

Output template

Task | Success | Time | Trust | Failure mode | Next fix

Next best step: Handoff Friction Mapping

Need plain-language onboarding content? Use Zero-to-useful AI setup.