Assistant Evaluation Basics (3-metric core)
Use this to get a fast, defensible quality baseline before optimization.
Metrics
- Task success rate — % completed without escalation.
- Time to correct outcome — median time from prompt to usable result.
- User trust rating — confidence score after each task.
Protocol
- Pick 3 representative user tasks.
- Run 5–8 moderated sessions.
- Capture handoff failures and correction loops.
- Prioritize fixes by impact on success + trust.
Output template
Task | Success | Time | Trust | Failure mode | Next fix
Next best step:
Handoff Friction Mapping
Need plain-language onboarding content? Use Zero-to-useful AI setup.