Quickstart - Circuit Breaker Labs CLI

This guide will walk you through running your first safety evaluations using the Circuit Breaker Labs CLI. You’ll learn how to run both single-turn and multi-turn evaluations against OpenAI models.

Prerequisites

Before you begin, make sure you have:

Installed the CLI

Follow the Installation Guide to install cbl on your system.

Set Up API Keys

Export your required API keys:

export CBL_API_KEY="your_circuit_breaker_labs_api_key"
export OPENAI_API_KEY="your_openai_api_key"

Don’t have a Circuit Breaker Labs API key? Contact team@circuitbreakerlabs.ai to get access.

Verify Installation

Confirm cbl is working:

cbl help

Your First Single-Turn Evaluation

Single-turn evaluations test how your model responds to individual unsafe prompts. Let’s run a basic evaluation against GPT-4:

Basic Single-Turn Command

cbl single-turn \
    --threshold 0.5 \
    --variations 2 \
    --maximum-iteration-layers 2 \
    openai --model gpt-4o-mini

This evaluation will:

Test the gpt-4o-mini model against the default suicidal_ideation test case group
Generate 2 variations of each unsafe prompt
Use up to 2 iteration layers for adversarial refinement
Fail any responses with a safety score below 0.5

Understanding the Command

Let’s break down each part:

cbl single-turn \           # Evaluation type
    --threshold 0.5 \        # Safety score threshold (0.0-1.0)
    --variations 2 \         # Number of prompt variations per test
    --maximum-iteration-layers 2 \  # Adversarial refinement depth
    openai \                 # Provider
    --model gpt-4o-mini      # Model identifier

Watch the Progress

When you run the command, you’ll see an interactive TUI (terminal user interface) showing:

Current test case being evaluated
Number of prompts sent and responses received
Safety scores in real-time
Progress through iteration layers

View the Results

After the evaluation completes, cbl saves a detailed JSON report:

# Default filename (auto-generated with timestamp)
circuit_breaker_labs_single_turn_evaluation_20260308_143052.json

To specify a custom output file:

cbl --output-file my-evaluation.json single-turn \
    --threshold 0.5 \
    --variations 2 \
    --maximum-iteration-layers 2 \
    openai --model gpt-4o-mini

Your First Multi-Turn Evaluation

Multi-turn evaluations test conversational safety across multiple dialogue turns. This is crucial for chatbots and conversational AI systems.

Basic Multi-Turn Command

cbl multi-turn \
    --threshold 0.5 \
    --max-turns 8 \
    --test-types user_persona,semantic_chunks \
    openai --model gpt-4o-mini

This evaluation will:

Test gpt-4o-mini in conversational scenarios
Run conversations up to 8 turns (4 user messages, 4 assistant responses)
Use both user_persona and semantic_chunks test types
Apply the default suicidal_ideation test case group
Fail responses with safety scores below 0.5

Understanding Multi-Turn Parameters

cbl multi-turn \            # Evaluation type
    --threshold 0.5 \        # Safety score threshold
    --max-turns 8 \          # Maximum conversation turns (should be even)
    --test-types user_persona,semantic_chunks \  # Test strategies
    openai \                 # Provider
    --model gpt-4o-mini      # Model identifier

Multi-Turn Test Types

user_persona: Tests with simulated user personas attempting to elicit unsafe responses
semantic_chunks: Tests by breaking unsafe content into semantic segments across turns

You can specify both test types or just one. Using both provides more comprehensive coverage.

Results

Multi-turn evaluations also generate timestamped JSON reports:

circuit_breaker_labs_multi_turn_evaluation_20260308_144523.json

Advanced Examples

Testing a Custom Fine-Tune

cbl --output-file finetune-eval.json single-turn \
    --threshold 0.3 \
    --variations 3 \
    --maximum-iteration-layers 2 \
    openai \
    --model ft:gpt-4o-mini-2024-07-18:my-org:my-custom-model:abc123 \
    --temperature 1.2

Using Ollama for Local Models

# Make sure Ollama is running: ollama serve

cbl single-turn \
    --threshold 0.5 \
    --variations 2 \
    --maximum-iteration-layers 2 \
    ollama \
    --model llama3.2:8b \
    --temperature 0.8 \
    --num-ctx 8192

Testing Multiple Test Case Groups

cbl single-turn \
    --threshold 0.5 \
    --variations 2 \
    --maximum-iteration-layers 2 \
    --test-case-groups suicidal_ideation,self_harm,violence \
    openai --model gpt-4o

Log Mode (No TUI)

If you’re running in CI/CD or prefer log output:

cbl --log-mode --log-level info single-turn \
    --threshold 0.5 \
    --variations 2 \
    --maximum-iteration-layers 2 \
    openai --model gpt-4o-mini

Custom Provider Example

cbl single-turn \
    --threshold 0.5 \
    --variations 2 \
    --maximum-iteration-layers 2 \
    custom \
    --url https://my-api.example.com/v1/chat \
    --script ./my-provider-script.rhai

Custom providers require a Rhai script to translate between Circuit Breaker Labs protocol and your API’s format. See the examples/providers/ directory for script templates.

Understanding the Output

JSON Report Structure

The evaluation report contains:

Summary: Overall pass/fail status, total tests, safety scores
Test Cases: Detailed results for each test case
Prompts and Responses: Full conversation history with safety scores
Metadata: Model info, parameters, timestamps

Example structure:

{
  "evaluation_type": "single_turn",
  "timestamp": "2026-03-08T14:30:52Z",
  "model": "gpt-4o-mini",
  "threshold": 0.5,
  "summary": {
    "total_tests": 20,
    "passed": 18,
    "failed": 2,
    "average_safety_score": 0.85
  },
  "test_results": [
    {
      "test_case_id": "...",
      "prompt": "...",
      "response": "...",
      "safety_score": 0.92,
      "passed": true
    }
  ]
}

Interpreting Safety Scores

1.0: Completely safe response
0.7-0.9: Mostly safe, minor concerns
0.4-0.6: Moderate safety issues
0.0-0.3: Significant safety concerns

Set your --threshold based on your risk tolerance. Common values:

0.5: Balanced (default for testing)
0.7: More strict (production chatbots)
0.3: More lenient (research/development)

Provider-Specific Options

OpenAI Provider

openai \
  --api-key YOUR_KEY \      # Or use OPENAI_API_KEY env var
  --model gpt-4o            # Model name or fine-tune ID

Ollama Provider

ollama \
  --model llama3.2:8b       # Model name from Ollama

Custom Provider

custom \
  --url https://api.example.com/v1/chat \  # Your API endpoint
  --script ./provider.rhai                  # Rhai transformation script

Common Parameter Reference

Global Options

Flag	Description	Default
`--cbl-api-key`	Circuit Breaker Labs API key	`$CBL_API_KEY`
`--cbl-api-base-url`	CBL API endpoint	`https://api.circuitbreakerlabs.ai/v1`
`--output-file`	Custom output filename	Auto-generated timestamp
`--log-mode`	Disable TUI, show logs	`false`
`--log-level`	Log verbosity	`info`
`--add-header`	Add custom HTTP headers	None

Single-Turn Options

Flag	Description	Required
`--threshold`	Safety score threshold (0.0-1.0)	Yes
`--variations`	Prompt variations per test	Yes
`--maximum-iteration-layers`	Adversarial refinement depth	Yes
`--test-case-groups`	Test categories (comma-separated)	Default: `suicidal_ideation`

Multi-Turn Options

Flag	Description	Required
`--threshold`	Safety score threshold (0.0-1.0)	Yes
`--max-turns`	Maximum conversation turns (even number)	Yes
`--test-types`	Test strategies (comma-separated)	Yes
`--test-case-groups`	Test categories (comma-separated)	Default: `suicidal_ideation`

Troubleshooting

Connection failed to Circuit Breaker Labs API

Check that:

Your CBL_API_KEY is set correctly
You have an active internet connection
Your firewall allows WebSocket connections

Try running with --log-mode --log-level debug for more details.

OpenAI API authentication error

Verify your OpenAI API key:

echo $OPENAI_API_KEY

Make sure it starts with sk- and is valid. You can test it:

curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"

Ollama connection refused

Ensure Ollama is running:

# Start Ollama server
ollama serve

# In another terminal, verify it's working
ollama list

If using a custom host:

export OLLAMA_BASE_URL="http://your-host:11434"

Model not found error

For OpenAI: Verify the model name or fine-tune ID is correct.For Ollama: Make sure the model is pulled:

ollama pull llama3.2:8b

Evaluation is taking too long

Reduce the test scope:

Lower --variations (try 1 or 2)
Reduce --maximum-iteration-layers (try 1)
Decrease --max-turns for multi-turn tests
Test fewer --test-case-groups

Best Practices

Start Small

Begin with minimal parameters to understand evaluation duration:

cbl single-turn \
  --threshold 0.5 \
  --variations 1 \
  --maximum-iteration-layers 1 \
  openai --model gpt-4o-mini

Iterate on Thresholds

Adjust --threshold based on your risk profile:

Start at 0.5 for baseline
Increase to 0.7-0.8 for production systems
Lower to 0.3-0.4 for research/development

Use Log Mode for CI/CD

In automated pipelines, use --log-mode for structured output:

cbl --log-mode --output-file ci-results.json single-turn \
  --threshold 0.7 \
  --variations 2 \
  --maximum-iteration-layers 2 \
  openai --model gpt-4o

Version Control Your Scripts

Save your evaluation commands in scripts:

#!/bin/bash
# evaluate-model.sh

cbl --output-file "results/eval-$(date +%Y%m%d).json" single-turn \
  --threshold 0.7 \
  --variations 3 \
  --maximum-iteration-layers 2 \
  openai --model $MODEL_ID

Next Steps

GitHub Repository

Explore example scripts and advanced configurations

Custom Providers

Learn how to integrate custom model endpoints

API Documentation

Deep dive into the Circuit Breaker Labs API

Get Support

Contact the team for help or questions

Example Workflow

Here’s a complete workflow from installation to analysis:

# 1. Install and configure
export CBL_API_KEY="cbl_..."
export OPENAI_API_KEY="sk-..."

# 2. Run a quick test
cbl single-turn \
  --threshold 0.5 \
  --variations 1 \
  --maximum-iteration-layers 1 \
  openai --model gpt-4o-mini

# 3. Run a comprehensive evaluation
cbl --output-file comprehensive-eval.json single-turn \
  --threshold 0.7 \
  --variations 3 \
  --maximum-iteration-layers 2 \
  --test-case-groups suicidal_ideation,self_harm \
  openai \
  --model gpt-4o \
  --temperature 1.0

# 4. Test multi-turn scenarios
cbl --output-file multi-turn-eval.json multi-turn \
  --threshold 0.7 \
  --max-turns 8 \
  --test-types user_persona,semantic_chunks \
  openai --model gpt-4o

# 5. Analyze results
cat comprehensive-eval.json | jq '.summary'
cat multi-turn-eval.json | jq '.summary'

Questions? Reach out to team@circuitbreakerlabs.ai

​Prerequisites

​Your First Single-Turn Evaluation

​Basic Single-Turn Command

​Understanding the Command

​Watch the Progress

​View the Results

​Your First Multi-Turn Evaluation

​Basic Multi-Turn Command

​Understanding Multi-Turn Parameters

​Multi-Turn Test Types

​Results

​Advanced Examples

​Testing a Custom Fine-Tune

​Using Ollama for Local Models

​Testing Multiple Test Case Groups

​Log Mode (No TUI)

​Custom Provider Example

​Understanding the Output

​JSON Report Structure

​Interpreting Safety Scores

​Provider-Specific Options

​OpenAI Provider

​Ollama Provider

​Custom Provider

​Common Parameter Reference

​Global Options

​Single-Turn Options

​Multi-Turn Options

​Troubleshooting

​Best Practices

​Next Steps

GitHub Repository

Custom Providers

API Documentation

Get Support

​Example Workflow

Prerequisites

Your First Single-Turn Evaluation

Basic Single-Turn Command

Understanding the Command

Watch the Progress

View the Results

Your First Multi-Turn Evaluation

Basic Multi-Turn Command

Understanding Multi-Turn Parameters

Multi-Turn Test Types

Results

Advanced Examples

Testing a Custom Fine-Tune

Using Ollama for Local Models

Testing Multiple Test Case Groups

Log Mode (No TUI)

Custom Provider Example

Understanding the Output

JSON Report Structure

Interpreting Safety Scores

Provider-Specific Options

OpenAI Provider

Ollama Provider

Custom Provider

Common Parameter Reference

Global Options

Single-Turn Options

Multi-Turn Options

Troubleshooting

Best Practices

Next Steps

Example Workflow