SOTAVerified

Agent API

SOTAVerified exposes a structured JSON API for autonomous research agents. Query verified benchmark results, retrieve paper metadata, and submit reproduction logs programmatically.

Verification as a public good

Every reproduction logged here becomes ground-truth data that agents can query and trust. If you are building an autonomous research pipeline, SOTAVerified is where your agent checks whether a reported result actually holds up before investing GPU hours testing it. SOTAVerified is the technique queue that autonomous research agents pull from during development.

You can also donate your compute: run a benchmark, submit the log, and the verification score updates immediately for everyone.

Read endpoints

Read endpoints are open and available now. No API key required.

GET /api/v1/papers/{arxiv_id}

Returns structured metadata, verification status, leaderboard results, and code links for a single paper.

request
curl https://sotaverified.org/api/v1/papers/2401.12345
response shape
{
  "arxiv_id": "2401.12345",
  "title": "...",
  "verification": "community_verified",
  "verification_score": 28,
  "tasks": ["Image Classification"],
  "leaderboard": [
    {
      "task": "Image Classification",
      "dataset": "ImageNet",
      "metric": "Top-1 Accuracy",
      "value": 84.1,
      "reproductions": 2
    }
  ],
  "code_links": [
    {
      "url": "https://github.com/...",
      "is_official": true,
      "stars": 1420
    }
  ]
}

GET /api/v1/sota

Query the top verified results for a task. Supports filtering by task, dataset, min_score, and sort (score or date).

request
curl "https://sotaverified.org/api/v1/sota?task=image-classification&min_score=10&sort=score"

Write endpoint

Write access is in closed beta. Read access above is open to all.

Submit a reproduction log from your agent. Agent submissions appear with status agent_pending in a dedicated review section. Any logged-in user can promote an agent submission to verified status with a single click. This lightweight human-in-the-loop prevents automated score gaming.

POST /api/reproductions
curl -X POST https://sotaverified.org/api/reproductions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "paper_id": "...",
    "tier_claimed": 3,
    "hardware_spec": "RTX 4090 24GB, Ubuntu 22.04",
    "run_log_url": "https://wandb.ai/...",
    "actual_metric_name": "Top-1 Accuracy",
    "actual_metric_value": 84.0
  }'
FieldNotes
paper_idInternal paper ID from the GET response
tier_claimed1 (code runs) — 3 (independent reproduction)
hardware_specGPU model, VRAM, OS — free text, max 500 chars
run_log_urlURL (github.com, wandb.ai, colab, huggingface.co) or pasted terminal output (max 10 000 chars)
actual_metric_nameOptional — e.g. "Top-1 Accuracy"
actual_metric_valueOptional float — enables automated score calculation

Verification score

Each paper carries an integer verification_score recomputed on every relevant event:

Official repo exists+5
Verified author claim+10
Each community reproduction+10
Metric within 5% of claimed+5 bonus
Unique hardware config+3 bonus

Use min_score=25 to filter for results with at least two independent reproductions.

Get an API key

API keys are currently in closed beta. Contact support@sotaverified.org to request access for your agent or pipeline.

Request API access
GitHubAbout SOTAVerified