VO Walkthrough Guide

A comprehensive guide to AI21 Maestro’s Validated Output capabilities.

AI21 Maestro is an intelligent agentic system designed to handle complex AI workflows.
This guide focuses specifically on the Validated Output, with practical examples ranging from basic usage to advanced scenarios.

Understanding the Problem

Traditional LLM interactions often look like this:

# Traditional approach - unreliable
response = client.complete(
    prompt="""Write a Python function that:
    - Calculates fibonacci numbers
    - Is under 10 lines
    - Has proper docstrings
    - Uses descriptive variable names"""
)
# Sometimes works perfectly, sometimes doesn't follow all constraints

Common issues:

  • Inconsistent adherence to multiple constraints
  • No way to know which requirements were missed
  • Manual trial-and-error to get desired output

How AI21 Maestro's Validated Output Works

AI21 Maestro's Validated Output uses a Generate → Validate → Fix cycle:

  1. Generate: Creates initial response following your requirements
  2. Validate: Evaluates and scores each requirement (0.0 to 1.0)
  3. Fix: Refines output for requirements that scored < 1.0
  4. Repeat: Continues until all requirements are met or budget is exhausted

This systematic approach to instruction following is part of AI21 Maestro's broader agentic architecture, designed to handle complex workflows with reliability and precision.

Input + Requirements → Generate → Validate → Fix → Final Output + Report
                                    ↑         ↓
                                    ← ← ← ← ← ←

Using the API

The Input parameter

You can pass a string to AI21 Maestro as an input and it will be treated as a user message.

from ai21 import AI21Client

client = AI21Client(api_key="your-api-key")

# The following function will block until the default timeout is reached
client.beta.maestro.runs.create_and_poll(
    input="Explain quantum computing to a 10-year-old",
    requirements=[
        {
            "name": "reading_level",
            "description": "Use simple words appropriate for a 10-year-old"
        },
        {
            "name": "length",
            "description": "Keep explanation under 100 words"
        }
    ]
)

Alternatively you can pass an input as an array of message to support multiple turns in a conversation.

input=[
        {
            "role": "user",
            "content": "Explain quantum computing to a 10-year-old",
        },
        {
            "role": "assistant",
            "content": 'Quantum computing is like a super-smart computer that uses tiny things called "qubits" instead of regular bits. While regular bits are like tiny switches that can be off (0) or on (1), qubits can be both at the same time! This helps quantum computers solve really hard problems much faster than normal computers by trying many possibilities at once',
        },
        {
            "role": "user",
            "content": "Translate this to spanish",
        },
    ],

Working with Requirements

Writing Effective Requirements

Good Requirements:

requirements = [
    {
        "name": "word_count",
        "description": "Response must be exactly between 150-200 words"
    },
    {
        "name": "json_format",
        "description": "Output must be valid JSON with 'title' and 'content' fields"
    },
    {
        "name": "no_technical_jargon",
        "description": "Avoid technical terms; explain concepts in plain English"
    }
]

Requirements to Avoid:

# Too vague
{"name": "good_quality", "description": "Make it good"}

# Contradictory
{"name": "short_and_detailed", "description": "Be brief but very detailed"}

# Unmeasurable
{"name": "creative", "description": "Be creative and original"}

Requirement Categories

Format Requirements:

{
    "name": "markdown_format",
    "description": "Use proper markdown with headers, bullet points, and code blocks"
}

Content Requirements:

{
    "name": "include_examples",
    "description": "Provide at least 2 concrete examples for each concept"
}

Style Requirements:

{
    "name": "professional_tone",
    "description": "Use formal business language, avoid contractions and slang"
}

Technical Requirements:

{
    "name": "python_best_practices",
    "description": "Follow PEP 8 style guidelines and use type hints"
}

Understanding the Requirements Report

Enable detailed reporting by including requirements_result:

run = client.beta.maestro.runs.create_and_poll(
    input="Write a product review for a smartphone",
    requirements=[
        {"name": "word_count", "description": "use 200-250 words"},
        {"name": "pros_and_cons", "description": "Include both pros and cons sections"},
        {"name": "rating", "description": "End with a 1-5 star rating. For example: (★★★★☆)"}
    ],
    include=["requirements_result"],
    budget="low"
)

print(f"Result: {run.result}")

# Analyze the results
print(f"Overall Score: {run.requirements_result["score"]}")
print(f"Completion Reason: {run.requirements_result["finish_reason"]}")

print("Requirements Results:")
for req in run.requirements_result["requirements"]:
    print(f"  {req["name"]}: {req["score"]}")
    print(f"   Issue: {req["reason"]}")

Sample Output Analysis

# Example output
Overall Score: 0.67
Completion Reason: Budget exhausted

word_count: 1.0
pros_and_cons: 1.0
rating: 0.6
  Issue: Rating format is '4 out of 5' instead of star format (★★★★☆)

This tells you:

  • 2 out of 3 requirements were perfectly met
  • The rating requirement needs refinement
  • You might need a higher budget or clearer requirement

Budget Control and Performance

Budget Levels Explained

# High Budget - Maximum reliability (~100 seconds for complex tasks)
run = client.beta.maestro.runs.create_and_poll(
    input=task,
    requirements=requirements,
    budget="high"
)

# Medium Budget - Balanced approach (~60 seconds)
run = client.beta.maestro.runs.create_and_poll(
    input=task,
    requirements=requirements,
    budget="medium"
)

# Low Budget - enhanced reliability but favors latency (~20 seconds)
run = client.beta.maestro.runs.create_and_poll(
    input=task,
    requirements=requirements,
    budget="low"
)

Using Third-Party Models

run = client.beta.maestro.runs.create_and_poll(
    input=task,
    requirements=requirements,
    models=["gpt-4o"],  # Specify preferred model
    budget="high"
)