Lab 4

Quality Gate Design

45 min 8 sections 2 prerequisites

What You'll Learn

Design a multi-layer quality assurance system
Create validation prompts and criteria
Build feedback loops for continuous improvement

Prerequisites

2.3-quality-and-iteration
2.4-human-ai-handoffs

Part 1 of 8

Lab Overview

AI without quality control is a liability. In this lab, you'll design a comprehensive quality assurance system for an AI workflow—the kind of system that lets you sleep at night knowing your AI isn't embarrassing you.

What you'll create:

A complete quality gate specification
Validation prompts and criteria
Handoff protocols
Feedback capture system

Part 2 of 8

Part 1: Choose Your Scenario (5 minutes)

Select an AI workflow to protect:

Option A: Customer Email Response Generation AI drafts responses to customer inquiries that agents can approve and send.

Option B: Content Summarization Pipeline AI summarizes long documents into executive briefs for leadership.

Option C: Data Extraction and Categorization AI extracts structured data from unstructured documents (invoices, contracts, etc.).

Option D: Personalized Marketing Copy AI generates personalized marketing messages based on customer profiles.

Option E: Your Choice An AI workflow you're building or planning.

Part 3 of 8

Part 2: Define Quality Criteria (10 minutes)

Identify Quality Dimensions

For your chosen workflow, define what "quality" means:

Dimension 1: Accuracy What needs to be factually correct?

[Specific accuracy requirement 1]
[Specific accuracy requirement 2]

Dimension 2: Completeness What must be included?

[Required element 1]
[Required element 2]

Dimension 3: Format/Structure What structure is required?

[Format requirement 1]
[Format requirement 2]

Dimension 4: Tone/Voice What should it sound like?

[Tone requirement 1]
[Tone requirement 2]

Dimension 5: Constraints What must be avoided?

[Constraint 1]
[Constraint 2]

Create Quality Rubric

Dimension	1 (Fail)	3 (Acceptable)	5 (Excellent)	Weight
Accuracy	[Describe]	[Describe]	[Describe]	X%
Completeness	[Describe]	[Describe]	[Describe]	X%
Format	[Describe]	[Describe]	[Describe]	X%
Tone	[Describe]	[Describe]	[Describe]	X%
Constraints	[Describe]	[Describe]	[Describe]	X%

Minimum passing score: [e.g., 3.5 weighted average]

Part 4 of 8

Part 3: Design Quality Gates (15 minutes)

Gate 1: Automated Validation

What can be checked automatically without AI judgment?

AUTOMATED CHECKS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
□ Length within range: [min] - [max] words
□ Required sections present: [list]
□ Prohibited terms absent: [list]
□ Format requirements met: [specifics]
□ Links/references valid: [if applicable]
□ [Additional automated check]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Gate 2: AI Cross-Validation

Design a prompt that has AI evaluate AI output:

QUALITY EVALUATION PROMPT
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
You are a quality reviewer. Evaluate this [content type]
against the following criteria:

CONTENT TO REVIEW:
"""
{ai_output}
"""

ORIGINAL REQUEST:
"""
{original_input}
"""

Evaluate each criterion (1-5) with explanation:

1. ACCURACY
   - Are all facts verifiable?
   - Any potential hallucinations?
   Score: [1-5]
   Issues: [List any concerns]

2. COMPLETENESS
   - All required elements present?
   - Any gaps in coverage?
   Score: [1-5]
   Issues: [List any missing items]

3. FORMAT
   - Matches expected structure?
   - Professional presentation?
   Score: [1-5]
   Issues: [List any format problems]

4. TONE
   - Appropriate for audience?
   - Consistent throughout?
   Score: [1-5]
   Issues: [List any tone problems]

5. CONSTRAINTS
   - All restrictions followed?
   - Any policy violations?
   Score: [1-5]
   Issues: [List any violations]

OVERALL ASSESSMENT:
- Pass/Fail: [Based on minimum threshold]
- Confidence: [Low/Medium/High]
- Recommended action: [Approve/Revise/Escalate]
- Specific fixes needed: [If applicable]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Gate 3: Human Review

Define when and how humans review:

HUMAN REVIEW TRIGGERS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Always review when:
□ AI confidence < [threshold]%
□ [Specific trigger 1]
□ [Specific trigger 2]

Sample review:
□ [X]% of outputs randomly selected
□ First [N] outputs of each type
□ [Other sampling strategy]

Escalation required when:
□ [Critical condition 1]
□ [Critical condition 2]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Human Review Interface

What should reviewers see?

REVIEW PACKAGE CONTENTS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Original input/request
2. AI-generated output
3. Automated check results
4. AI evaluation scores
5. Flagged concerns
6. Suggested actions

REVIEWER OPTIONS:
□ Approve as-is
□ Approve with minor edits
□ Request revision (with feedback)
□ Reject (with reason)
□ Escalate to [role]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Part 5 of 8

Part 4: Design Feedback Loop (10 minutes)

Capture Quality Data

What should be logged for every output?

QUALITY LOG ENTRY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Timestamp: [datetime]
Request ID: [unique identifier]
Input summary: [brief description]
Model used: [model name/version]

Automated checks: [pass/fail summary]
AI evaluation score: [weighted average]
Human review: [Approve/Edit/Reject/Escalate]
Human edits: [what was changed]
Final outcome: [what was shipped]

Processing time: [duration]
Token usage: [count]
Cost: [amount]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Define Improvement Triggers

When should you adjust the system?

Metric	Threshold	Action
First-pass approval rate	< 70%	Review and improve prompts
Average human edit time	> [X] min	Improve AI output quality
Escalation rate	> 10%	Expand AI capability or scope
Customer complaints	Any	Root cause analysis
Cost per output	> [X]	Optimize model selection

Feedback Integration Process

WEEKLY QUALITY REVIEW
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Pull quality metrics dashboard
2. Identify lowest-scoring areas
3. Sample rejected/edited outputs
4. Categorize common issues
5. Prioritize improvements
6. Update prompts/criteria
7. Monitor impact
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Part 6 of 8

Part 5: Visualize the System (5 minutes)

Create Quality Gate Flow Diagram

[AI Output Generated]
        │
        ▼
┌─────────────────┐
│ GATE 1: AUTO    │──FAIL──┐
│ Format, length, │        │
│ prohibited terms│        │
└────────┬────────┘        │
         │ PASS            │
         ▼                 │
┌─────────────────┐        │
│ GATE 2: AI EVAL │──FAIL──┤
│ Quality scoring │        │
│ Issue flagging  │        │
└────────┬────────┘        │
         │ PASS            │
         ▼                 │
┌─────────────────┐        │
│ GATE 3: HUMAN   │──FAIL──┘
│ (if triggered)  │        │
└────────┬────────┘        │
         │ PASS            │
         ▼                 ▼
    [APPROVED]        [REVISION]
                          │
                          ▼
                    [Back to AI]

Part 7 of 8

Deliverable

Create a Quality Gate Specification Document containing:

Quality Criteria Definition
- All quality dimensions
- Detailed rubric with examples
- Minimum thresholds
Gate Specifications
- Gate 1: Automated checks (complete list)
- Gate 2: AI evaluation prompt (ready to use)
- Gate 3: Human review protocol
Handoff Design
- Review package contents
- Reviewer options and actions
- Escalation paths
Feedback System
- Data capture schema
- Improvement triggers
- Review process
System Diagram
- Visual flow of quality gates
- Decision points and paths

Part 8 of 8

Extension Challenge

Build a Working Prototype

Create a prompt that generates outputs for your chosen scenario
Create the AI evaluation prompt
Run 10 test cases through both
Calculate your first-pass approval rate
Identify the most common quality issues
Refine your generation prompt based on findings
Re-run and measure improvement

Document your iteration process and results.

Ready to Complete?

You've viewed 0 of 8 sections. Mark this lab as complete when you're done.