OpenAI Provider - Circuit Breaker Labs CLI

The OpenAI provider enables you to run evaluations against OpenAI’s models, including GPT-4, GPT-4 Turbo, and GPT-3.5 Turbo.

Prerequisites

Before using the OpenAI provider, you need:

An OpenAI API key - Get one here
Set the OPENAI_API_KEY environment variable:

export OPENAI_API_KEY="sk-..."

Basic Usage

cbl single-turn \
    --threshold 0.5 \
    --variations 2 \
    --maximum-iteration-layers 2 \
    openai --model gpt-4o

Configuration Options

Required Options

--model

string

required

OpenAI model name to use for evaluations.Examples: gpt-4o, gpt-4-turbo, gpt-3.5-turbo, gpt-4o-mini

--api-key

string

required

OpenAI API key for authentication.Environment variable: OPENAI_API_KEY

The API key can be provided via the OPENAI_API_KEY environment variable instead of passing it as a flag.

Optional Options

--base-url

string

default:"https://api.openai.com/v1"

OpenAI API base URL for compatible endpoints. Use this to connect to OpenAI-compatible services or custom deployments.Environment variable: OPENAI_BASE_URL

--org-id

string

OpenAI organization ID for API requests.Environment variable: OPENAI_ORG_ID

--temperature

float

Sampling temperature between 0 and 2. Higher values make output more random, lower values make it more deterministic.Range: 0.0 to 2.0

--top-p

float

Nucleus sampling parameter. An alternative to sampling with temperature.Range: 0.0 to 1.0

--max-completion-tokens

integer

Upper bound for the number of tokens that can be generated for a completion.

--n

integer

Number of chat completion choices to generate for each input message.

--frequency-penalty

float

Number between -2.0 and 2.0 to penalize new tokens based on their existing frequency in the text.Range: -2.0 to 2.0

--presence-penalty

float

Number between -2.0 and 2.0 to penalize new tokens based on whether they appear in the text so far.Range: -2.0 to 2.0

--logprobs

boolean

Whether to return log probabilities of the output tokens.

--top-logprobs

integer

Number of most likely tokens to return at each token position, each with an associated log probability.Range: 0 to 20

--stop

string

Up to 4 sequences where the API will stop generating further tokens. Use comma-separated values for multiple sequences.Example: --stop "\n,END,STOP"

--logit-bias

string

Modify the likelihood of specified tokens appearing in the completion.Format: token_id:bias_value,token_id:bias_valueRange: Bias values must be between -100 and 100Example: --logit-bias "1234:50,5678:-30"

--store

boolean

Whether to store the output of this chat completion request for model distillation or evaluation purposes.

--service-tier

string

Specifies the processing type used for serving the request.Options: auto, default, flex, scale, priority

--reasoning-effort

string

Constrains effort on reasoning for reasoning models like o1.Options: none, minimal, low, medium, high, xhigh

Examples

Basic Single-Turn Evaluation

cbl single-turn \
    --threshold 0.5 \
    --variations 2 \
    --maximum-iteration-layers 2 \
    openai --model gpt-4o

Multi-Turn Evaluation with Custom Temperature

cbl multi-turn \
    --threshold 0.5 \
    --max-turns 8 \
    --test-types user_persona,semantic_chunks \
    openai \
    --model gpt-4-turbo \
    --temperature 0.7

Using a Custom Fine-Tuned Model

export MY_FINETUNE_ID="ft:gpt-3.5-turbo:my-org:custom_suffix:id"

cbl single-turn \
    --threshold 0.3 \
    --variations 3 \
    openai \
    --model $MY_FINETUNE_ID \
    --temperature 1.2

Using an OpenAI-Compatible Endpoint

export OPENAI_BASE_URL="https://my-custom-endpoint.com/v1"

cbl single-turn \
    --threshold 0.5 \
    openai \
    --model my-custom-model \
    --base-url $OPENAI_BASE_URL

Advanced Configuration with Multiple Parameters

cbl multi-turn \
    --threshold 0.4 \
    --max-turns 10 \
    openai \
    --model gpt-4o \
    --temperature 0.8 \
    --top-p 0.95 \
    --max-completion-tokens 2000 \
    --frequency-penalty 0.5 \
    --presence-penalty 0.3 \
    --stop "END,STOP"

Supported Models

The OpenAI provider supports all OpenAI chat completion models, including:

GPT-4o: Latest multimodal flagship model
GPT-4o-mini: Smaller, faster GPT-4o variant
GPT-4 Turbo: High-performance GPT-4 variant
GPT-4: Original GPT-4 model
GPT-3.5 Turbo: Fast and cost-effective model
o1: Reasoning model series (use with --reasoning-effort)
Custom fine-tuned models: Any fine-tuned model based on supported base models

For the most up-to-date list of available models and their capabilities, see the OpenAI Models documentation.

Environment Variables

The following environment variables are supported:

Variable	Description	Required
`OPENAI_API_KEY`	Your OpenAI API key	Yes
`OPENAI_BASE_URL`	Custom API endpoint URL	No
`OPENAI_ORG_ID`	Your OpenAI organization ID	No

Tips

Rate Limits: Be aware of your OpenAI account’s rate limits when running evaluations with many variations or iterations.

Temperature Selection: For consistent evaluation results, use lower temperature values (0.0-0.3). For more creative or diverse outputs, use higher values (0.7-1.0).

Cost Optimization: Use gpt-4o-mini or gpt-3.5-turbo for faster, more cost-effective evaluations during development, then switch to gpt-4o or gpt-4-turbo for final validation.

​Prerequisites

​Basic Usage

​Configuration Options

​Required Options

​Optional Options

​Examples

​Basic Single-Turn Evaluation

​Multi-Turn Evaluation with Custom Temperature

​Using a Custom Fine-Tuned Model

​Using an OpenAI-Compatible Endpoint

​Advanced Configuration with Multiple Parameters

​Supported Models

​Environment Variables

​Tips

Prerequisites

Basic Usage

Configuration Options

Required Options

Optional Options

Examples

Basic Single-Turn Evaluation

Multi-Turn Evaluation with Custom Temperature

Using a Custom Fine-Tuned Model

Using an OpenAI-Compatible Endpoint

Advanced Configuration with Multiple Parameters

Supported Models

Environment Variables

Tips