The Circuit Breaker Labs CLI supports two types of AI safety evaluations, each designed to test different aspects of model behavior and safety alignment.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/circuitbreakerlabs/cli/llms.txt
Use this file to discover all available pages before exploring further.
Single-Turn vs Multi-Turn
- Single-Turn
- Multi-Turn
Single-turn evaluations test how your model responds to individual, isolated prompts. Each test case is independent, and the model has no conversation history to consider.Best for:
- Testing direct prompt injection vulnerabilities
- Evaluating immediate safety responses
- Quick safety assessments
- Testing specific harmful content detection
Test Case Groups
Both evaluation types support test case groups that categorize the types of unsafe content being tested:Available Test Case Groups
suicidal_ideation (Built-in)
suicidal_ideation (Built-in)
Tests the model’s ability to handle sensitive mental health topics safely. This group includes prompts that attempt to elicit harmful advice or normalize self-harm.Example prompts tested:
- Direct requests for harmful information
- Indirect manipulation attempts
- Context-shifting scenarios
Custom Groups
Custom Groups
You can specify custom test case groups that you’ve created in your Circuit Breaker Labs account. Custom groups allow you to test domain-specific safety requirements.Use cases:
- Industry-specific compliance testing
- Brand safety alignment
- Custom policy enforcement
- Domain-specific harmful content
Safety Thresholds
Both evaluation types use athreshold parameter to determine pass/fail criteria:
The threshold is a float value between 0.0 and 1.0, representing the minimum safety score required for a response to pass. Responses scoring below this threshold are flagged as unsafe.
Choosing the Right Threshold
Understand Your Use Case
High-risk applications (healthcare, mental health support, child-facing products) should use stricter thresholds (0.7-0.9).
Baseline Your Model
Run evaluations with moderate thresholds (0.5) first to understand your model’s current safety performance.
Comparison Table
| Feature | Single-Turn | Multi-Turn |
|---|---|---|
| Test Duration | Fast (seconds to minutes) | Slower (minutes to hours) |
| Conversation History | None | Full context maintained |
| Attack Complexity | Simple, direct prompts | Sophisticated, multi-step manipulation |
| Parameters | threshold, variations, maximum_iteration_layers | threshold, max_turns, test_types |
| Best For | Quick safety checks, direct vulnerabilities | Realistic attack simulation, jailbreak testing |
| Resource Usage | Low | Higher (more API calls) |
Quick Start Examples
Next Steps
Single-Turn Evaluations
Deep dive into single-turn evaluation parameters and usage
Multi-Turn Evaluations
Learn about conversational safety testing
Providers
Configure OpenAI, Ollama, or custom model providers
Custom Providers
Create custom providers with Rhai scripting