Create Chat Completion

POST /v1/chat/completions

Creates a new chat completion using the specified model and messages. The Pegasi API automatically handles:

Intelligent Model Routing: Dynamically routes requests to the optimal AI model based on the query type, cost constraints, and quality requirements.
Built-in Guardrails: Enforces content safety, factual accuracy, and compliance with usage policies.
Quality Autocorrection: Automatically detects and corrects hallucinations and quality issues in model outputs.

Authentication

Type: Bearer Token

Request Body

Field	Type	Required	Description
model	string	Yes	The model to use
messages	array	Yes	Array of message objects
quality_risk_threshold	number	No	Hallucination risk threshold (0-1). Responses above this may be flagged or autocorrected. Default: 0.7

Example Request

curl -X POST https://sandbox-api.pegasi.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model":"pegasi-1","messages":[{"role":"user","content":"Hello!"}]}'

Using OpenAI's Python SDK

You can use the OpenAI Python SDK to call this endpoint with minimal changes. Simply set the base_url to Pegasi's endpoint and use your Pegasi API key.

Basic Usage

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_PEGASI_KEY",  # Use your Pegasi API key
    base_url="https://sandbox-api.pegasi.ai/v1",  # Pegasi API base URL
)

chat_completion = client.chat.completions.create(
    model="router",
    messages=[
        {"role": "user", "content": "Net income of JPM 1H25?"},
    ],
)

Routing Across Models

To route across a subset of models, specify the array of whitelisted origins:

chat_completion = client.chat.completions.create(
    model="router",
    messages=[{"role": "user", "content": "Net income of JPM 1H25?"}],
    extra_body={
        "models": ["gpt-4.1-mini", "claude-4"]
    }
)

Cost and Willingness to Pay

Specify a max cost and willingness to pay for a 10% improvement on model quality:

chat_completion = client.chat.completions.create(
    model="router",
    messages=[{"role": "user", "content": "Hello world!"}],
    extra_body={
        "models": ["gpt-4.1-mini", "claude-4"],
        "max_cost": 0.02,
        "willingness_to_pay": 0.01
    },
)

Attaching Metadata

Attach metadata (e.g., user, region) to each request:

chat_completion = client.chat.completions.create(
    model="router",
    messages=[{"role": "user", "content": "Summarize earning transcripts"}],
    extra_body={
        "models": ["gpt-4.1-mini", "claude-4"],
        "max_cost": 0.02,
        "willingness_to_pay": 0.01,
        "quality_risk_threshold": 0.7,
        "extra": {
            "ip": "123.123.123.123",
            "Timezone": "UTC+0",
            "Country": "US",
            "City": "New York",
        }
    },
)

quality_risk_threshold: Set this parameter (0–1) to control hallucination detection and autocorrection. Responses with a detected risk above this threshold may be flagged or autocorrected automatically.

Authentication​

Request Body​

Example Request​

Using OpenAI's Python SDK​

Basic Usage​

Routing Across Models​

Cost and Willingness to Pay​

Attaching Metadata​