Tools - UniversalBench

The single MCP tool exposed by UniversalBench is called execute. It accepts 20+ optional input fields organised into seven domains. Set one or more per call.

Compute

code

Run Python in a sandboxed environment. Returns stdout, stderr, and execution time.

{
  "name": "execute",
  "arguments": {
    "code": "import math; print(math.pi * 2)"
  }
}

The sandbox includes the Python standard library, common scientific packages (requests, pandas, numpy, etc.), and any packages you install via install_packages. The execution timeout is 60 seconds.

bash

Run a shell command. Returns stdout and stderr.

{ "name": "execute", "arguments": { "bash": "ls -la /tmp" } }

The working directory is /tmp. Most standard Linux utilities are available. The execution timeout is 60 seconds.

parallel_blocks

Run up to eight code blocks concurrently in one call. Returns a list of results in input order.

{
  "name": "execute",
  "arguments": {
    "parallel_blocks": [
      { "code": "import time; time.sleep(1); print('one')" },
      { "code": "import time; time.sleep(1); print('two')" },
      { "code": "import time; time.sleep(1); print('three')" }
    ]
  }
}

Three blocks that each sleep one second take one second total, not three. Useful for independent fetches, parallel data analysis, or fan out queries.

install_packages

Install Python packages before the call. Returns the install log.

{
  "name": "execute",
  "arguments": {
    "install_packages": ["beautifulsoup4", "lxml"],
    "code": "from bs4 import BeautifulSoup; print('installed')"
  }
}

Installed packages persist for the duration of the session. Common packages are already preinstalled and do not need to be listed.

Web and outbound HTTP

web_search

Search the web for live results. Returns the top five results with title, snippet, and URL.

{ "name": "execute", "arguments": { "web_search": "latest GPT model releases" } }

Results are real time and include source URLs.

invoke_llm

Call any major LLM. A prompt string goes in, the completion string comes out.

{ "name": "execute", "arguments": { "invoke_llm": "Write a haiku about distributed systems" } }

The default model is a cheap one. To route to a different model, prefix your prompt with model=<name>;, for example model=claude-3-5-sonnet; Write a haiku.... See the OpenRouter model list for available options.

proxy_http

Make an outbound HTTP call from the workbench. Returns status, headers, body, and bytes received.

{
  "name": "execute",
  "arguments": {
    "proxy_http": {
      "method": "GET",
      "url": "https://api.github.com/repos/octocat/Hello-World",
      "headers": { "Accept": "application/vnd.github+json" },
      "timeout": 15
    }
  }
}

Only http and https schemes are accepted. Internal network addresses and cloud metadata endpoints are blocked at the request layer to prevent server side request forgery. Response bodies above a fixed cap are truncated and flagged in the response.

Files

file_read

Read a file from the workbench filesystem. Returns the file content as text.

{ "name": "execute", "arguments": { "file_read": "/tmp/data.csv" } }

Files must be under /tmp and within a few hundred MB. Files written by code calls in the same session persist for that session.

file_write

Write a file to the workbench filesystem.

{
  "name": "execute",
  "arguments": {
    "file_write": {
      "path": "/tmp/output.json",
      "content": "{\"status\": \"ok\"}"
    }
  }
}

Paths must be under /tmp. Combine with session_id to keep written files available for the next call in the same session.

Database

UniversalBench can connect to any PostgreSQL-compatible database. Provide SUPABASE_URL and SUPABASE_KEY via the secrets vault and the database inputs auto-inject them.

db_select

Structured query with filters, ordering, and limit. The idiomatic read API.

{
  "name": "execute",
  "arguments": {
    "db_select": {
      "table": "orders",
      "filters": [["status", "eq", "shipped"], ["created_at", "gte", "2026-01-01"]],
      "order": "-created_at",
      "limit": 50,
      "select": "id,total,customer_id"
    }
  }
}

Filter operators include eq, neq, gt, gte, lt, lte, like, ilike, in_. Order strings prefix - for descending. Select picks specific columns or * for all.

db_query

Best effort SQL parse for SELECT ... FROM tbl [WHERE ...] [ORDER BY ...] [LIMIT N]. Convenient when your AI already wrote the SQL.

{
  "name": "execute",
  "arguments": {
    "db_query": "SELECT id, email FROM customers WHERE tier = 'pro' LIMIT 20"
  }
}

For anything beyond plain SELECT, prefer db_select for filters and code for joins or aggregations.

db_search

Case insensitive keyword search across one or more text columns.

{
  "name": "execute",
  "arguments": {
    "db_search": {
      "table": "knowledge_base",
      "columns": ["title", "body"],
      "query": "refund policy"
    }
  }
}

db_write

Insert one row, or update rows matching a filter.

{
  "name": "execute",
  "arguments": {
    "db_write": {
      "table": "events",
      "row": { "type": "signup", "email": "new@example.com" }
    }
  }
}

{
  "name": "execute",
  "arguments": {
    "db_write": {
      "table": "customers",
      "filter": [["id", "eq", "cust_123"]],
      "update": { "tier": "pro" }
    }
  }
}

db_upsert

Upsert one row using one or more conflict columns.

{
  "name": "execute",
  "arguments": {
    "db_upsert": {
      "table": "customer_state",
      "row": { "customer_id": "cust_123", "last_seen_at": "2026-05-17T10:00:00Z" },
      "on_conflict": "customer_id"
    }
  }
}

GitHub

Provide GITHUB_TOKEN via the secrets vault and the git inputs auto-inject it.

git_read

Read a file from a GitHub repository.

{
  "name": "execute",
  "arguments": {
    "git_read": {
      "owner": "your-org",
      "repo": "your-repo",
      "path": "src/index.js",
      "ref": "main"
    }
  }
}

Returns content and the file SHA. Works on private repositories.

git_push

Push a file to GitHub. Returns the commit SHA on success.

{
  "name": "execute",
  "arguments": {
    "git_push": {
      "owner": "your-org",
      "repo": "your-repo",
      "path": "src/index.js",
      "content": "// new file content",
      "message": "feat: add new module",
      "branch": "main",
      "sha": "abc123def456"
    }
  }
}

For .py files, UniversalBench validates the source before the push lands. Files that would crash at runtime are rejected before they reach your repository. Your AI cannot ship broken Python through UniversalBench.

Code authoring

A small set of authoring helpers that make AI driven code edits safer and more predictable. Used together they catch the regressions that AI assistants normally introduce.

validate_file

Static check on Python source. Returns an issues list and, where possible, a fixed version.

{ "name": "execute", "arguments": { "validate_file": "import os\ndef foo():\n  return os.path.exsts('/tmp')" } }

code_diff

Run two code blocks and compare their outputs. Useful for verifying that a refactor did not change behaviour.

{
  "name": "execute",
  "arguments": {
    "code_diff": {
      "old_code": "def f(x): return x*2",
      "new_code": "def f(x): return x+x"
    }
  }
}

Returns the output diff if the two diverge on the same inputs.

code_edit

Single anchor find and replace on a file. Pass the old text and the new text. Ambiguous matches are rejected.

{
  "name": "execute",
  "arguments": {
    "code_edit": {
      "path": "/tmp/script.py",
      "old": "TIMEOUT = 30",
      "new": "TIMEOUT = 60"
    }
  }
}

When the anchor appears more than once, the edit is rejected with the count rather than silently editing the wrong place.

safe_deploy

Push a file and run a smoke test against a URL after the push lands. If the smoke test fails, the push is rolled back automatically.

{
  "name": "execute",
  "arguments": {
    "safe_deploy": {
      "owner": "your-org",
      "repo": "your-repo",
      "path": "src/index.js",
      "content": "// new content",
      "message": "feat: ship new feature",
      "smoke_test_url": "https://your-app.com/health"
    }
  }
}

State and secrets

session_id

Keep Python variables, imports, and connections warm across calls.

{
  "name": "execute",
  "arguments": {
    "session_id": "my_pipeline",
    "code": "import pandas as pd; df = pd.read_csv('/tmp/data.csv')"
  }
}

Subsequent calls with the same session_id can reference df and the imported pd without redoing the work. Sessions expire after a period of inactivity.

clear_session

Wipe all state for the given session_id before processing this call. Use when starting over.

{
  "name": "execute",
  "arguments": {
    "session_id": "my_pipeline",
    "clear_session": true,
    "code": "print('fresh session')"
  }
}

secrets_vault

Encrypted per customer secret storage. Values are AES 256 GCM at rest. Your AI references secrets by name, never by value. Save a secret:

{
  "name": "execute",
  "arguments": {
    "secrets_vault": {
      "action": "save",
      "secret_name": "STRIPE_KEY",
      "secret_value": "sk_live_xxx"
    }
  }
}

List secret names (values are never returned):

{ "name": "execute", "arguments": { "secrets_vault": { "action": "list" } } }

Retrieve a secret value:

{ "name": "execute", "arguments": { "secrets_vault": { "action": "get", "secret_name": "STRIPE_KEY" } } }

Delete a secret permanently:

{ "name": "execute", "arguments": { "secrets_vault": { "action": "delete", "secret_name": "OLD_KEY" } } }

Secrets are scoped to your account. Other customers cannot read them even with their own valid URLs. Common credentials like SUPABASE_URL, SUPABASE_KEY, GITHUB_TOKEN, and OPENROUTER_API_KEY are auto-injected into matching tools so your AI never needs to read the value.

Combining inputs

You can set multiple input fields in one call. The workbench evaluates them in a sensible order (install_packages before code, db_select results available to subsequent code, etc.).

{
  "name": "execute",
  "arguments": {
    "install_packages": ["yfinance"],
    "code": "import yfinance as yf; print(yf.Ticker('AAPL').info['currentPrice'])"
  }
}

This installs yfinance and runs the code in one call (one billed call).

task

Optional. A natural-language description of what this call is doing. Used for telemetry and adaptive timeout tuning only. Has no effect on execution.

{
  "name": "execute",
  "arguments": {
    "task": "build month-over-month revenue chart",
    "code": "..."
  }
}

Reference

Documentation Index

​Compute

​code

​bash

​parallel_blocks

​install_packages

​Web and outbound HTTP

​web_search

​invoke_llm

​proxy_http

​Files

​file_read

​file_write

​Database

​db_select

​db_query

​db_search

​db_write

​db_upsert

​GitHub

​git_read

​git_push

​Code authoring

​validate_file

​code_diff

​code_edit

​safe_deploy

​State and secrets

​session_id

​clear_session

​secrets_vault

​Combining inputs

​task