Create Evaluation

POST /v1/evaluate

Creates a new evaluation job.

Authentication

Type: Bearer Token

Request Body

Field	Type	Required	Description
data	string	Yes	Evaluation input, formatted as: `<CONTEXT>...</CONTEXT>\n<USER INPUT>...</USER INPUT>\n<MODEL OUTPUT>...</MODEL OUTPUT>`
pass_criteria	string	Yes	The criteria that determine if the output passes.
rubric	string	Yes	The rubric or scoring guidelines for the evaluation.

Example Request

curl -X POST [https://sandbox-api.pegasi.ai/v1/evaluate](https://sandbox-api.pegasi.ai/v1/evaluate) \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "data": "<CONTEXT>\nAcme Q2 earnings: $1.2B\n</CONTEXT>\n<USER INPUT>\nWhat were Acme’s Q2 earnings?\n</USER INPUT>\n<MODEL OUTPUT>\nAcme reported $1.2 billion in Q2 earnings.\n</MODEL OUTPUT>",
    "pass_criteria": "Must mention the correct Q2 earnings figure.",
    "rubric": "The answer should be factually correct and reference the provided context."
  }'

Example (Python, OpenAI SDK style)

data = """<CONTEXT>
Acme Q2 earnings: $1.2B
</CONTEXT>

<USER INPUT>
What were Acme's Q2 earnings?
</USER INPUT>

<MODEL OUTPUT>
Acme reported $1.2 billion in Q2 earnings.
</MODEL OUTPUT>
"""

resp = client.evaluations.create(
    data=data,
    pass_criteria="Must mention the correct Q2 earnings figure.",
    rubric="The answer should be factually correct and reference the provided context."
)

Example `pass_criteria`

The pass_criteria parameter defines the conditions that determine whether an evaluation passes or fails. This can range from simple to complex requirements.

Simple Criteria

Must mention the correct Q2 earnings figure of $1.2B.

Multi-point Criteria

- The answer must reference the exact earnings figure from the context ($1.2B).
- The answer must be directly responsive to the user's question.
- No additional, unsupported claims should be present.
- The response must use professional, clear language.

Conditional Criteria

The response passes if it:
1. Correctly states Acme's Q2 earnings as $1.2B
2. AND either:
   a) Provides context about how this compares to previous quarters, OR
   b) Explains what factors contributed to this result (if present in context)
3. AND does not include any factual errors

Example `rubric`

The rubric parameter provides scoring guidelines for evaluating responses. A well-structured rubric helps ensure consistent evaluation.

Basic Rubric

- **Factual Accuracy (Required):** The answer must be factually correct and reference the provided context.
- **Relevance (Required):** The answer must directly address the user's question.
- **Completeness:** The answer should provide all relevant details from the context.
- **No Hallucination:** The answer should not invent facts not present in the context.

Detailed Scoring Rubric

Score responses on a scale of 1-5 for each criterion:

1. **Factual Accuracy** (40% weight)
   - 5: Perfect accuracy, with precise figures from context
   - 3: Mostly accurate with minor imprecisions
   - 1: Contains significant factual errors

2. **Relevance** (30% weight)
   - 5: Directly addresses the specific question asked
   - 3: Somewhat relevant but includes tangential information
   - 1: Fails to address the question

3. **Completeness** (20% weight)
   - 5: Includes all relevant information from context
   - 3: Includes most key points but misses some details
   - 1: Missing critical information

4. **Clarity** (10% weight)
   - 5: Clear, concise, well-structured response
   - 3: Somewhat clear but could be better organized
   - 1: Confusing or poorly structured

Domain-Specific Example: Financial Reports

For earnings report questions:

- **Numerical Accuracy:** Must state the exact figures as presented in context
- **Temporal Precision:** Must correctly identify the time period (Q2, 1H25, etc.)
- **Company Identification:** Must correctly attribute figures to the right entity
- **Contextual Comparison:** Should include YoY or QoQ comparisons if present in context
- **Jargon Usage:** Should use appropriate financial terminology

Create Evaluation Slopsquatting

POST /v1/evaluate/security/slopsquatting

Creates a security evaluation to detect hallucinated dependencies and slopsquatting vulnerabilities in AI-generated code.

Security Alert: Approximately 20% of AI-generated code contains hallucinated dependencies that don't exist in official repositories. Attackers are now registering these phantom packages on PyPI, npm, and other repositories to exploit organizations using AI coding assistants. This creates attack surfaces that bypass traditional security scanning tools.

Authentication

Type: Bearer Token

Implementation Options

1. MCP Integration (Recommended)

For code generation tools like Windsurf and Cursor, use the Model Control Protocol (MCP):

// MCP configuration for code generation tools
{
  "mcp_version": "1.0",
  "components": [
    {
      "name": "pegasi_slopsquatting_detection",
      "type": "security_check",
      "endpoint": "https://sandbox-api.pegasi.ai/v1/evaluate/security/slopsquatting",
      "trigger": "on_code_generation",
      "settings": {
        "sensitivity": "high",
        "inline_feedback": true,
        "block_high_risk": true
      },
      "auth": {
        "type": "bearer",
        "env_var": "PEGASI_API_KEY"
      }
    }
  ]
}

2. Direct API Call

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_PEGASI_API_KEY",
    base_url="https://sandbox-api.pegasi.ai/v1"
)

response = client.evaluations.create(
    type="security/slopsquatting",
    content={
        "code": "import tenserflow as tf",  # Misspelled tensorflow
        "language": "python"
    },
    settings={"sensitivity": "high"}
)

For complete documentation and additional integration options, see Supply Chain Security Evaluations.

Request Body

Field	Type	Required	Description
content.code	string	Yes	The code snippet to analyze for slopsquatting vulnerabilities
content.language	string	Yes	Programming language of the code (e.g., "python", "javascript", "java")
content.context	string	No	Additional context about the code's purpose
settings.sensitivity	string	No	Detection sensitivity: "low", "medium", "high" (default: "medium")
settings.check_imports	boolean	No	Whether to check import statements (default: true)
settings.check_package_versions	boolean	No	Whether to check package versions (default: true)
settings.known_packages_db	string	No	Package database to use: "standard", "extended", "custom" (default: "standard")

Example Response

{
  "id": "eval_sec_7a9b3c2d1e",
  "created": 1686571085,
  "results": {
    "issues": [
      {
        "type": "slopsquatting",
        "package": "tenserflow",
        "suggestion": "tensorflow",
        "confidence": 0.98,
        "risk_level": "high",
        "description": "Package 'tenserflow' appears to be a typosquat of 'tensorflow', a popular machine learning library",
        "line": 4,
        "column": 8
      },
      {
        "type": "slopsquatting",
        "package": "matpotlib",
        "suggestion": "matplotlib",
        "confidence": 0.96,
        "risk_level": "high",
        "description": "Package 'matpotlib' appears to be a typosquat of 'matplotlib', a popular plotting library",
        "line": 5,
        "column": 8
      }
    ],
    "summary": {
      "total_issues": 2,
      "high_risk": 2,
      "medium_risk": 0,
      "low_risk": 0
    }
  }
}

For more detailed information about slopsquatting detection and MCP integration for IDE tools, see Supply Chain Security Evaluations.

Authentication​

Request Body​

Example Request​

Example (Python, OpenAI SDK style)​

Example pass_criteria​

Simple Criteria​

Multi-point Criteria​

Conditional Criteria​

Example rubric​

Basic Rubric​

Detailed Scoring Rubric​

Domain-Specific Example: Financial Reports​

Create Evaluation Slopsquatting​

Authentication​

Implementation Options​

1. MCP Integration (Recommended)​

2. Direct API Call​

Request Body​

Example Response​

Authentication

Request Body

Example Request

Example (Python, OpenAI SDK style)

Example `pass_criteria`

Simple Criteria

Multi-point Criteria

Conditional Criteria

Example `rubric`

Basic Rubric

Detailed Scoring Rubric

Domain-Specific Example: Financial Reports

Create Evaluation Slopsquatting

Authentication

Implementation Options

1. MCP Integration (Recommended)

2. Direct API Call

Request Body

Example Response