Skip to main content

Create Chat Completion

POST /v1/chat/completions

Creates a new chat completion using the specified model and messages. The Pegasi API automatically handles:

  • Intelligent Model Routing: Dynamically routes requests to the optimal AI model based on the query type, cost constraints, and quality requirements.
  • Built-in Guardrails: Enforces content safety, factual accuracy, and compliance with usage policies.
  • Quality Autocorrection: Automatically detects and corrects hallucinations and quality issues in model outputs.

Authentication

  • Type: Bearer Token

Request Body

FieldTypeRequiredDescription
modelstringYesThe model to use
messagesarrayYesArray of message objects
quality_risk_thresholdnumberNoHallucination risk threshold (0-1). Responses above this may be flagged or autocorrected. Default: 0.7

Example Request

curl -X POST https://sandbox-api.pegasi.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"model":"pegasi-1","messages":[{"role":"user","content":"Hello!"}]}'

Using OpenAI's Python SDK

You can use the OpenAI Python SDK to call this endpoint with minimal changes. Simply set the base_url to Pegasi's endpoint and use your Pegasi API key.

Basic Usage

from openai import OpenAI

client = OpenAI(
api_key="YOUR_PEGASI_KEY", # Use your Pegasi API key
base_url="https://sandbox-api.pegasi.ai/v1", # Pegasi API base URL
)

chat_completion = client.chat.completions.create(
model="router",
messages=[
{"role": "user", "content": "Net income of JPM 1H25?"},
],
)

Routing Across Models

To route across a subset of models, specify the array of whitelisted origins:

chat_completion = client.chat.completions.create(
model="router",
messages=[{"role": "user", "content": "Net income of JPM 1H25?"}],
extra_body={
"models": ["gpt-4.1-mini", "claude-4"]
}
)

Cost and Willingness to Pay

Specify a max cost and willingness to pay for a 10% improvement on model quality:

chat_completion = client.chat.completions.create(
model="router",
messages=[{"role": "user", "content": "Hello world!"}],
extra_body={
"models": ["gpt-4.1-mini", "claude-4"],
"max_cost": 0.02,
"willingness_to_pay": 0.01
},
)

Attaching Metadata

Attach metadata (e.g., user, region) to each request:

chat_completion = client.chat.completions.create(
model="router",
messages=[{"role": "user", "content": "Summarize earning transcripts"}],
extra_body={
"models": ["gpt-4.1-mini", "claude-4"],
"max_cost": 0.02,
"willingness_to_pay": 0.01,
"quality_risk_threshold": 0.7,
"extra": {
"ip": "123.123.123.123",
"Timezone": "UTC+0",
"Country": "US",
"City": "New York",
}
},
)

quality_risk_threshold: Set this parameter (0–1) to control hallucination detection and autocorrection. Responses with a detected risk above this threshold may be flagged or autocorrected automatically.