Evaluation Criteria
Table of Contents
- Understanding Evaluation Criteria
- Accessing Evaluation Settings
- How Evaluations Work
- Creating Evaluation Criteria
- Writing Effective Prompts
- Assigning to AI Employees
- Viewing Evaluation Results
- Managing Criteria
- Using Evaluations for Improvement
- Best Practices
Understanding Evaluation Criteria
Evaluation criteria are AI-powered quality metrics that automatically assess conversation quality.What Evaluations Measure
You define what success looks like, and the system evaluates each conversation: Objective Achievement: Did the AI employee accomplish the call’s purpose? Script Adherence: Were required statements or disclosures made? Tone and Manner: Was the conversation professional and appropriate? Information Gathering: Was necessary information collected? Customer Experience: Did the interaction likely satisfy the caller?Why Use Evaluations
Scalable Quality Assurance: Review quality across hundreds or thousands of calls automatically. Consistent Standards: Same criteria applied uniformly to all calls. Trend Identification: Spot patterns in quality over time. Training Signals: Identify where AI employees need improvement.Accessing Evaluation Settings
Navigate to the evaluation criteria configuration.Navigation Path
- Click Settings in the main navigation
- Select Evaluation from the settings menu
- The Evaluation Criteria page displays
Page Layout
The page shows:- Header with title and create button
- Table listing all defined criteria
- Information about each criterion
- AI employees assigned to each
- Action menus for management
How Evaluations Work
Understanding the evaluation process helps you configure effective criteria.Evaluation Process
- Conversation completes: A call finishes with a transcript
- System queues evaluation: The conversation enters evaluation queue
- AI analyzes transcript: Each criterion is evaluated against the conversation
- Results recorded: Pass/fail or score is saved
- Results viewable: Evaluations appear on conversation detail pages
Timing
Evaluations run after conversations complete:- May take a few moments to process
- Results appear once evaluation completes
- Batch processing during high volume
Evaluation Scope
Criteria evaluate individual conversations:- Each call is assessed independently
- Same criteria apply to all applicable conversations
- Results aggregate for analytics
Creating Evaluation Criteria
Define new criteria to assess conversation quality.Accessing Creation
- Navigate to Settings > Evaluation
- Click New Criterion button
- Criterion creation dialog opens
Required Fields
Identifier: A unique name for the criterion:- Short, descriptive name
- Appears in evaluation results
- Used for filtering and reporting
- Describes what to look for
- Defines success and failure
- May include examples
Optional: AI Employee Assignment
All Employees: If no specific employees selected, criterion applies to all. Specific Employees: Select which AI employees this criterion evaluates.Writing Effective Prompts
The prompt determines how accurately evaluations assess quality.Prompt Structure
A good evaluation prompt includes:- What to evaluate: Clear description of the criterion
- How to assess: What indicates success or failure
- Context: Any relevant background information
- Examples (optional): Concrete instances of pass/fail scenarios
Example Prompts
Appointment Scheduling Success: “Evaluate whether an appointment was successfully scheduled during this conversation. The criterion is met if the caller agreed to a specific date and time for a meeting or consultation. The criterion fails if no appointment was made, or if the caller declined to schedule.” Required Disclosure: “Determine whether the AI employee made the required call recording disclosure at the beginning of the conversation. The disclosure must include notification that the call may be recorded. Check the first few exchanges of the transcript for this statement. Pass if disclosure was made, fail if it was not present.” Issue Resolution: “Assess whether the caller’s primary issue was resolved during this conversation. Consider: Did the caller’s question get answered? Was their problem addressed? Did they express satisfaction with the resolution? The criterion passes if the issue appears resolved or significantly advanced, fails if the caller’s concern remains unaddressed.”Prompt Characteristics
Clear criteria: Unambiguous definition of pass/fail. Observable markers: What in the transcript indicates success? Reasonable scope: Focused on one specific aspect. Appropriate context: Enough information for accurate assessment.Assigning to AI Employees
Control which AI employees are evaluated by each criterion.All Employees (Default)
If no specific employees are selected:- Criterion applies to all AI employees
- Any conversation can be evaluated
- Useful for universal quality standards
Specific Employee Assignment
Select specific AI employees:- Only their conversations are evaluated
- Useful for role-specific criteria
- Reduces noise from irrelevant evaluations
- Criteria specific to certain roles
- Testing criteria before broad rollout
- Different standards for different use cases
Managing Assignments
Assignments can be changed:- Edit the criterion
- Modify the employee selection
- Save changes
- New evaluations use updated assignment
Viewing Evaluation Results
Access evaluation outcomes for individual conversations.On Conversation Detail Pages
Navigate to any conversation detail page:- Evaluation results section displays
- Each criterion shows pass/fail
- Explanations may be included
In Evaluation List
If available, a dedicated view may show:- All evaluations across conversations
- Filtering by criterion
- Filtering by result (pass/fail)
- Aggregate statistics
Result Information
Each evaluation result includes: Criterion Name: Which criterion was evaluated. Result: Pass, Fail, or score. Explanation: AI-generated reasoning for the result.No Results Scenarios
Evaluations may not appear when:- Conversation too short to evaluate meaningfully
- No criteria assigned to the AI employee
- Evaluation still processing
- Technical issue prevented evaluation
Managing Criteria
Maintain and modify evaluation criteria over time.Viewing Criteria Details
The criteria table shows:- Identifier (name)
- Prompt preview
- Assigned employees (or “All employees”)
- Creation date
Editing Criteria
Modify criterion configuration:- Click the actions menu
- Select Edit
- Modify identifier, prompt, or assignments
- Save changes
Deleting Criteria
Remove criteria no longer needed:- Click the actions menu
- Select Delete
- Confirm deletion
- Past evaluation results may be affected
- Reporting using this criterion will change
- Consider archiving vs. deleting if history matters
Using Evaluations for Improvement
Leverage evaluation data to enhance AI employee performance.Identifying Patterns
Look for criteria that frequently fail:- Indicates systematic issues
- Points to configuration improvements needed
- Highlights training gaps
Reviewing Failed Evaluations
When criteria fail repeatedly:- Review sample conversations that failed
- Read transcripts to understand why
- Identify root causes
- Adjust AI employee configuration
Configuration Adjustments
Based on evaluation insights: Prompt improvements: Update AI employee system prompts to address gaps. Script additions: Add required elements missing in conversations. Training data: Provide examples of successful conversations.Tracking Improvement
Monitor evaluation results over time:- After making changes, watch for improvement
- Track pass rates by week or month
- Celebrate wins and continue iterating
Best Practices
Start with Key Metrics
Begin with criteria that matter most:- Core business objectives
- Compliance requirements
- Customer satisfaction indicators
Write Clear Prompts
Ambiguity hurts accuracy: Vague prompts produce inconsistent results. Define success explicitly: State exactly what constitutes passing. Consider edge cases: How should unusual situations be handled?Test Before Broad Deployment
Create criteria and observe initial results:- Do evaluations match your manual assessment?
- Are pass/fail decisions sensible?
- Does the explanation make sense?
Iterate on Prompts
Evaluation prompts are not perfect initially:- Review results periodically
- Refine prompts based on observations
- Improve accuracy over time
Balance Quantity and Quality
Few focused criteria beat many vague ones:- Each criterion should provide actionable insight
- Too many criteria dilute attention
- Prioritize what you will actually act on
Use Results Actively
Evaluations are only valuable if used:- Review regularly
- Take action on findings
- Close the loop from insight to improvement
Align with Business Goals
Criteria should connect to outcomes you care about:- Revenue impact
- Customer satisfaction
- Compliance requirements
- Operational efficiency
Document Criteria Rationale
Keep records of:- Why each criterion exists
- What successful outcomes look like
- How results should be interpreted