Skip to main content

Evaluation Criteria

Table of Contents

  1. Understanding Evaluation Criteria
  2. Accessing Evaluation Settings
  3. How Evaluations Work
  4. Creating Evaluation Criteria
  5. Writing Effective Prompts
  6. Assigning to AI Employees
  7. Viewing Evaluation Results
  8. Managing Criteria
  9. Using Evaluations for Improvement
  10. Best Practices

Understanding Evaluation Criteria

Evaluation criteria are AI-powered quality metrics that automatically assess conversation quality.

What Evaluations Measure

You define what success looks like, and the system evaluates each conversation: Objective Achievement: Did the AI employee accomplish the call’s purpose? Script Adherence: Were required statements or disclosures made? Tone and Manner: Was the conversation professional and appropriate? Information Gathering: Was necessary information collected? Customer Experience: Did the interaction likely satisfy the caller?

Why Use Evaluations

Scalable Quality Assurance: Review quality across hundreds or thousands of calls automatically. Consistent Standards: Same criteria applied uniformly to all calls. Trend Identification: Spot patterns in quality over time. Training Signals: Identify where AI employees need improvement.

Accessing Evaluation Settings

Navigate to the evaluation criteria configuration.
  1. Click Settings in the main navigation
  2. Select Evaluation from the settings menu
  3. The Evaluation Criteria page displays

Page Layout

The page shows:
  • Header with title and create button
  • Table listing all defined criteria
  • Information about each criterion
  • AI employees assigned to each
  • Action menus for management

How Evaluations Work

Understanding the evaluation process helps you configure effective criteria.

Evaluation Process

  1. Conversation completes: A call finishes with a transcript
  2. System queues evaluation: The conversation enters evaluation queue
  3. AI analyzes transcript: Each criterion is evaluated against the conversation
  4. Results recorded: Pass/fail or score is saved
  5. Results viewable: Evaluations appear on conversation detail pages

Timing

Evaluations run after conversations complete:
  • May take a few moments to process
  • Results appear once evaluation completes
  • Batch processing during high volume

Evaluation Scope

Criteria evaluate individual conversations:
  • Each call is assessed independently
  • Same criteria apply to all applicable conversations
  • Results aggregate for analytics

Creating Evaluation Criteria

Define new criteria to assess conversation quality.

Accessing Creation

  1. Navigate to Settings > Evaluation
  2. Click New Criterion button
  3. Criterion creation dialog opens

Required Fields

Identifier: A unique name for the criterion:
  • Short, descriptive name
  • Appears in evaluation results
  • Used for filtering and reporting
Example identifiers: “appointment_booked”, “disclosure_made”, “issue_resolved” Prompt: Instructions for how to evaluate:
  • Describes what to look for
  • Defines success and failure
  • May include examples

Optional: AI Employee Assignment

All Employees: If no specific employees selected, criterion applies to all. Specific Employees: Select which AI employees this criterion evaluates.

Writing Effective Prompts

The prompt determines how accurately evaluations assess quality.

Prompt Structure

A good evaluation prompt includes:
  1. What to evaluate: Clear description of the criterion
  2. How to assess: What indicates success or failure
  3. Context: Any relevant background information
  4. Examples (optional): Concrete instances of pass/fail scenarios

Example Prompts

Appointment Scheduling Success: “Evaluate whether an appointment was successfully scheduled during this conversation. The criterion is met if the caller agreed to a specific date and time for a meeting or consultation. The criterion fails if no appointment was made, or if the caller declined to schedule.” Required Disclosure: “Determine whether the AI employee made the required call recording disclosure at the beginning of the conversation. The disclosure must include notification that the call may be recorded. Check the first few exchanges of the transcript for this statement. Pass if disclosure was made, fail if it was not present.” Issue Resolution: “Assess whether the caller’s primary issue was resolved during this conversation. Consider: Did the caller’s question get answered? Was their problem addressed? Did they express satisfaction with the resolution? The criterion passes if the issue appears resolved or significantly advanced, fails if the caller’s concern remains unaddressed.”

Prompt Characteristics

Clear criteria: Unambiguous definition of pass/fail. Observable markers: What in the transcript indicates success? Reasonable scope: Focused on one specific aspect. Appropriate context: Enough information for accurate assessment.

Assigning to AI Employees

Control which AI employees are evaluated by each criterion.

All Employees (Default)

If no specific employees are selected:
  • Criterion applies to all AI employees
  • Any conversation can be evaluated
  • Useful for universal quality standards

Specific Employee Assignment

Select specific AI employees:
  • Only their conversations are evaluated
  • Useful for role-specific criteria
  • Reduces noise from irrelevant evaluations
When to use specific assignment:
  • Criteria specific to certain roles
  • Testing criteria before broad rollout
  • Different standards for different use cases

Managing Assignments

Assignments can be changed:
  1. Edit the criterion
  2. Modify the employee selection
  3. Save changes
  4. New evaluations use updated assignment

Viewing Evaluation Results

Access evaluation outcomes for individual conversations.

On Conversation Detail Pages

Navigate to any conversation detail page:
  • Evaluation results section displays
  • Each criterion shows pass/fail
  • Explanations may be included

In Evaluation List

If available, a dedicated view may show:
  • All evaluations across conversations
  • Filtering by criterion
  • Filtering by result (pass/fail)
  • Aggregate statistics

Result Information

Each evaluation result includes: Criterion Name: Which criterion was evaluated. Result: Pass, Fail, or score. Explanation: AI-generated reasoning for the result.

No Results Scenarios

Evaluations may not appear when:
  • Conversation too short to evaluate meaningfully
  • No criteria assigned to the AI employee
  • Evaluation still processing
  • Technical issue prevented evaluation

Managing Criteria

Maintain and modify evaluation criteria over time.

Viewing Criteria Details

The criteria table shows:
  • Identifier (name)
  • Prompt preview
  • Assigned employees (or “All employees”)
  • Creation date

Editing Criteria

Modify criterion configuration:
  1. Click the actions menu
  2. Select Edit
  3. Modify identifier, prompt, or assignments
  4. Save changes
Note: Changes affect future evaluations only. Past results remain unchanged.

Deleting Criteria

Remove criteria no longer needed:
  1. Click the actions menu
  2. Select Delete
  3. Confirm deletion
Considerations:
  • Past evaluation results may be affected
  • Reporting using this criterion will change
  • Consider archiving vs. deleting if history matters

Using Evaluations for Improvement

Leverage evaluation data to enhance AI employee performance.

Identifying Patterns

Look for criteria that frequently fail:
  • Indicates systematic issues
  • Points to configuration improvements needed
  • Highlights training gaps

Reviewing Failed Evaluations

When criteria fail repeatedly:
  1. Review sample conversations that failed
  2. Read transcripts to understand why
  3. Identify root causes
  4. Adjust AI employee configuration

Configuration Adjustments

Based on evaluation insights: Prompt improvements: Update AI employee system prompts to address gaps. Script additions: Add required elements missing in conversations. Training data: Provide examples of successful conversations.

Tracking Improvement

Monitor evaluation results over time:
  • After making changes, watch for improvement
  • Track pass rates by week or month
  • Celebrate wins and continue iterating

Best Practices

Start with Key Metrics

Begin with criteria that matter most:
  • Core business objectives
  • Compliance requirements
  • Customer satisfaction indicators

Write Clear Prompts

Ambiguity hurts accuracy: Vague prompts produce inconsistent results. Define success explicitly: State exactly what constitutes passing. Consider edge cases: How should unusual situations be handled?

Test Before Broad Deployment

Create criteria and observe initial results:
  • Do evaluations match your manual assessment?
  • Are pass/fail decisions sensible?
  • Does the explanation make sense?

Iterate on Prompts

Evaluation prompts are not perfect initially:
  • Review results periodically
  • Refine prompts based on observations
  • Improve accuracy over time

Balance Quantity and Quality

Few focused criteria beat many vague ones:
  • Each criterion should provide actionable insight
  • Too many criteria dilute attention
  • Prioritize what you will actually act on

Use Results Actively

Evaluations are only valuable if used:
  • Review regularly
  • Take action on findings
  • Close the loop from insight to improvement

Align with Business Goals

Criteria should connect to outcomes you care about:
  • Revenue impact
  • Customer satisfaction
  • Compliance requirements
  • Operational efficiency

Document Criteria Rationale

Keep records of:
  • Why each criterion exists
  • What successful outcomes look like
  • How results should be interpreted

Summary

Evaluation criteria bring automated quality assurance to your AI employee conversations. By defining clear criteria with well-written prompts, you can assess quality at scale, identify improvement opportunities, and continuously enhance your AI employees’ performance. The key to successful evaluations is treating them as a cycle: define criteria, review results, improve configurations, and refine criteria based on learnings. This continuous improvement approach maximizes the value of your AI employee investments.