Create Evaluation
POST /v1/evaluate
Creates a new evaluation job.
Authentication
- Type: Bearer Token
Request Body
Field | Type | Required | Description |
---|---|---|---|
data | string | Yes | Evaluation input, formatted as:<CONTEXT>...</CONTEXT>\n<USER INPUT>...</USER INPUT>\n<MODEL OUTPUT>...</MODEL OUTPUT> |
pass_criteria | string | Yes | The criteria that determine if the output passes. |
rubric | string | Yes | The rubric or scoring guidelines for the evaluation. |
Example Request
curl -X POST [https://sandbox-api.pegasi.ai/v1/evaluate](https://sandbox-api.pegasi.ai/v1/evaluate) \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"data": "<CONTEXT>\nAcme Q2 earnings: $1.2B\n</CONTEXT>\n<USER INPUT>\nWhat were Acme’s Q2 earnings?\n</USER INPUT>\n<MODEL OUTPUT>\nAcme reported $1.2 billion in Q2 earnings.\n</MODEL OUTPUT>",
"pass_criteria": "Must mention the correct Q2 earnings figure.",
"rubric": "The answer should be factually correct and reference the provided context."
}'
Example (Python, OpenAI SDK style)
data = """<CONTEXT>
Acme Q2 earnings: $1.2B
</CONTEXT>
<USER INPUT>
What were Acme's Q2 earnings?
</USER INPUT>
<MODEL OUTPUT>
Acme reported $1.2 billion in Q2 earnings.
</MODEL OUTPUT>
"""
resp = client.evaluations.create(
data=data,
pass_criteria="Must mention the correct Q2 earnings figure.",
rubric="The answer should be factually correct and reference the provided context."
)
Example pass_criteria
The pass_criteria
parameter defines the conditions that determine whether an evaluation passes or fails. This can range from simple to complex requirements.
Simple Criteria
Must mention the correct Q2 earnings figure of $1.2B.
Multi-point Criteria
- The answer must reference the exact earnings figure from the context ($1.2B).
- The answer must be directly responsive to the user's question.
- No additional, unsupported claims should be present.
- The response must use professional, clear language.
Conditional Criteria
The response passes if it:
1. Correctly states Acme's Q2 earnings as $1.2B
2. AND either:
a) Provides context about how this compares to previous quarters, OR
b) Explains what factors contributed to this result (if present in context)
3. AND does not include any factual errors
Example rubric
The rubric
parameter provides scoring guidelines for evaluating responses. A well-structured rubric helps ensure consistent evaluation.
Basic Rubric
- **Factual Accuracy (Required):** The answer must be factually correct and reference the provided context.
- **Relevance (Required):** The answer must directly address the user's question.
- **Completeness:** The answer should provide all relevant details from the context.
- **No Hallucination:** The answer should not invent facts not present in the context.
Detailed Scoring Rubric
Score responses on a scale of 1-5 for each criterion:
1. **Factual Accuracy** (40% weight)
- 5: Perfect accuracy, with precise figures from context
- 3: Mostly accurate with minor imprecisions
- 1: Contains significant factual errors
2. **Relevance** (30% weight)
- 5: Directly addresses the specific question asked
- 3: Somewhat relevant but includes tangential information
- 1: Fails to address the question
3. **Completeness** (20% weight)
- 5: Includes all relevant information from context
- 3: Includes most key points but misses some details
- 1: Missing critical information
4. **Clarity** (10% weight)
- 5: Clear, concise, well-structured response
- 3: Somewhat clear but could be better organized
- 1: Confusing or poorly structured
Domain-Specific Example: Financial Reports
For earnings report questions:
- **Numerical Accuracy:** Must state the exact figures as presented in context
- **Temporal Precision:** Must correctly identify the time period (Q2, 1H25, etc.)
- **Company Identification:** Must correctly attribute figures to the right entity
- **Contextual Comparison:** Should include YoY or QoQ comparisons if present in context
- **Jargon Usage:** Should use appropriate financial terminology
Create Evaluation Slopsquatting
POST /v1/evaluate/security/slopsquatting
Creates a security evaluation to detect hallucinated dependencies and slopsquatting vulnerabilities in AI-generated code.
Security Alert: Approximately 20% of AI-generated code contains hallucinated dependencies that don't exist in official repositories. Attackers are now registering these phantom packages on PyPI, npm, and other repositories to exploit organizations using AI coding assistants. This creates attack surfaces that bypass traditional security scanning tools.
Authentication
- Type: Bearer Token
Implementation Options
1. MCP Integration (Recommended)
For code generation tools like Windsurf and Cursor, use the Model Control Protocol (MCP):
// MCP configuration for code generation tools
{
"mcp_version": "1.0",
"components": [
{
"name": "pegasi_slopsquatting_detection",
"type": "security_check",
"endpoint": "https://sandbox-api.pegasi.ai/v1/evaluate/security/slopsquatting",
"trigger": "on_code_generation",
"settings": {
"sensitivity": "high",
"inline_feedback": true,
"block_high_risk": true
},
"auth": {
"type": "bearer",
"env_var": "PEGASI_API_KEY"
}
}
]
}
2. Direct API Call
from openai import OpenAI
client = OpenAI(
api_key="YOUR_PEGASI_API_KEY",
base_url="https://sandbox-api.pegasi.ai/v1"
)
response = client.evaluations.create(
type="security/slopsquatting",
content={
"code": "import tenserflow as tf", # Misspelled tensorflow
"language": "python"
},
settings={"sensitivity": "high"}
)
For complete documentation and additional integration options, see Supply Chain Security Evaluations.
Request Body
Field | Type | Required | Description |
---|---|---|---|
content.code | string | Yes | The code snippet to analyze for slopsquatting vulnerabilities |
content.language | string | Yes | Programming language of the code (e.g., "python", "javascript", "java") |
content.context | string | No | Additional context about the code's purpose |
settings.sensitivity | string | No | Detection sensitivity: "low", "medium", "high" (default: "medium") |
settings.check_imports | boolean | No | Whether to check import statements (default: true) |
settings.check_package_versions | boolean | No | Whether to check package versions (default: true) |
settings.known_packages_db | string | No | Package database to use: "standard", "extended", "custom" (default: "standard") |
Example Response
{
"id": "eval_sec_7a9b3c2d1e",
"created": 1686571085,
"results": {
"issues": [
{
"type": "slopsquatting",
"package": "tenserflow",
"suggestion": "tensorflow",
"confidence": 0.98,
"risk_level": "high",
"description": "Package 'tenserflow' appears to be a typosquat of 'tensorflow', a popular machine learning library",
"line": 4,
"column": 8
},
{
"type": "slopsquatting",
"package": "matpotlib",
"suggestion": "matplotlib",
"confidence": 0.96,
"risk_level": "high",
"description": "Package 'matpotlib' appears to be a typosquat of 'matplotlib', a popular plotting library",
"line": 5,
"column": 8
}
],
"summary": {
"total_issues": 2,
"high_risk": 2,
"medium_risk": 0,
"low_risk": 0
}
}
}
For more detailed information about slopsquatting detection and MCP integration for IDE tools, see Supply Chain Security Evaluations.