Create Chat Completion
POST /v1/chat/completions
Creates a new chat completion using the specified model and messages. The Pegasi API automatically handles:
- Intelligent Model Routing: Dynamically routes requests to the optimal AI model based on the query type, cost constraints, and quality requirements.
- Built-in Guardrails: Enforces content safety, factual accuracy, and compliance with usage policies.
- Quality Autocorrection: Automatically detects and corrects hallucinations and quality issues in model outputs.
Authentication
- Type: Bearer Token
Request Body
Field | Type | Required | Description |
---|---|---|---|
model | string | Yes | The model to use |
messages | array | Yes | Array of message objects |
quality_risk_threshold | number | No | Hallucination risk threshold (0-1). Responses above this may be flagged or autocorrected. Default: 0.7 |
Example Request
curl -X POST https://sandbox-api.pegasi.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"model":"pegasi-1","messages":[{"role":"user","content":"Hello!"}]}'
Using OpenAI's Python SDK
You can use the OpenAI Python SDK to call this endpoint with minimal changes. Simply set the base_url
to Pegasi's endpoint and use your Pegasi API key.
Basic Usage
from openai import OpenAI
client = OpenAI(
api_key="YOUR_PEGASI_KEY", # Use your Pegasi API key
base_url="https://sandbox-api.pegasi.ai/v1", # Pegasi API base URL
)
chat_completion = client.chat.completions.create(
model="router",
messages=[
{"role": "user", "content": "Net income of JPM 1H25?"},
],
)
Routing Across Models
To route across a subset of models, specify the array of whitelisted origins:
chat_completion = client.chat.completions.create(
model="router",
messages=[{"role": "user", "content": "Net income of JPM 1H25?"}],
extra_body={
"models": ["gpt-4.1-mini", "claude-4"]
}
)
Cost and Willingness to Pay
Specify a max cost and willingness to pay for a 10% improvement on model quality:
chat_completion = client.chat.completions.create(
model="router",
messages=[{"role": "user", "content": "Hello world!"}],
extra_body={
"models": ["gpt-4.1-mini", "claude-4"],
"max_cost": 0.02,
"willingness_to_pay": 0.01
},
)
Attaching Metadata
Attach metadata (e.g., user, region) to each request:
chat_completion = client.chat.completions.create(
model="router",
messages=[{"role": "user", "content": "Summarize earning transcripts"}],
extra_body={
"models": ["gpt-4.1-mini", "claude-4"],
"max_cost": 0.02,
"willingness_to_pay": 0.01,
"quality_risk_threshold": 0.7,
"extra": {
"ip": "123.123.123.123",
"Timezone": "UTC+0",
"Country": "US",
"City": "New York",
}
},
)
quality_risk_threshold: Set this parameter (0–1) to control hallucination detection and autocorrection. Responses with a detected risk above this threshold may be flagged or autocorrected automatically.