Integration
Integration guide
Every SDK endpoint, cross-SDK routing, model deployments, parameter quirks, virtual-key day-2 operations, fail-safe semantics.
14 min read
Every SDK endpoint Verosek exposes, what the gateway rewrites, and how admin operations on your virtual keys work once you have a tenant.
TL;DR
- Three SDKs land natively: OpenAI at
/v1/*, Anthropic at/v1/messages, Gemini at/v1beta/models/{model}:*. - Every endpoint accepts your virtual key. You can run any configured model through any SDK — the gateway translates request and response shapes.
- Per-model quirks (token-param renames, sampling-param stripping, role conversion) are applied on your behalf.
- Virtual keys have day-2 operations: rotate, block, unblock, update allowed models / budget / rate limits.
- Fail-safe defaults to FAIL_CLOSED — an unreachable Verosek blocks tool calls rather than pass them through.
Overview
The gateway uses the OpenAI Chat Completions shape as its internal canonical format. Clients that hit the OpenAI endpoints pass through; clients that hit the Anthropic or Gemini endpoints are translated on the way in and on the way out.
"Cross-SDK translation" means, concretely: you can send an Anthropic SDK request whose model resolves to an OpenAI deployment on our side, and the response comes back in Anthropic shape. The reverse works too.
OpenAI SDK endpoints
All endpoints accept Authorization: Bearer vsk_.... The router prefix is /v1.
| Method | Path | Request body shape | What the gateway does |
|---|---|---|---|
| POST | /v1/chat/completions | OpenAI Chat Completions | Shield pre-scan → translate to provider → forward → internal tool loop if tool_calls returned → Shield post-scan → audit. |
| POST | /v1/completions | OpenAI legacy completion | Routes to a chat-compatible deployment, adapts to a completion response. |
| POST | /v1/responses | OpenAI Responses API | Scans input, runs against a chat deployment, returns Responses-shaped JSON. |
| POST | /v1/embeddings | OpenAI embeddings | Forwards to the embedding deployment. Shield pre-scans the input string. |
| POST | /v1/images/generations | OpenAI image generation | Shield pre-scans the prompt, then forwards. |
| POST | /v1/images/edits | Multipart image edit | Proxies multipart upload to the selected image deployment. |
| POST | /v1/images/variations | Multipart image variation | Proxies multipart upload. |
| POST | /v1/audio/speech | TTS | Shield pre-scans the input text. |
| POST | /v1/audio/transcriptions | Multipart audio-in | Proxies to the transcription deployment, Shield post-scans the transcript. |
| POST | /v1/audio/translations | Multipart audio-in | Same shape as transcriptions, different upstream operation. |
| POST | /v1/moderations | OpenAI moderations | Shield pre-scans the input; upstream moderation is forwarded. |
| GET | /v1/models | — | Returns the list of deployments the virtual key is allowed to call. |
All Pydantic request shapes live in gateway/models/provider.py (virtual key models) and inside the handlers themselves for endpoint-specific bodies. The OpenAI shapes are mirrored 1:1 from the upstream API.
Streaming is not supported today. Requests with stream: true receive a clear error response.
Anthropic SDK endpoint
| Method | Path | Auth | What the gateway does |
|---|---|---|---|
| POST | /v1/messages | x-api-key: vsk_... | Accepts native Anthropic request body (system top-level, messages[] with content blocks, tools with input_schema). Translates to OpenAI canonical, runs the internal pipeline, translates the response back to Anthropic shape. |
What the gateway translates on input:
- Top-level
system(string or text-block list) → OpenAIsystemmessage. imagecontent blocks withsource.type = base64 | url→ OpenAIimage_urlcontent blocks.tool_use/tool_resultblocks → OpenAItool_callsandrole=toolmessages.
What the gateway translates on output:
- OpenAI
content+tool_calls→ Anthropiccontentblocks (type=text,type=tool_use). - OpenAI
finish_reason→ Anthropicstop_reason(stop→end_turn,length→max_tokens,tool_calls→tool_use).
Anthropic prompt caching (cache_control) is not forwarded today. Audio content blocks sent to Claude are dropped by the translator (Claude does not accept them).
Gemini SDK endpoints
| Method | Path | Auth | What the gateway does |
|---|---|---|---|
| POST | /v1beta/models/{model}:generateContent | x-goog-api-key: vsk_... header or ?key=vsk_... query | Accepts native Gemini request body, translates, runs the pipeline, returns Gemini shape. |
| POST | /v1beta/models/{model}:embedContent | Same | Single embedding. |
| POST | /v1beta/models/{model}:batchEmbedContents | Same | Batch embeddings. |
:streamGenerateContent and :countTokens are not supported today. The gateway returns a clear error for both.
Cross-SDK routing
You can call any configured model through any SDK. The gateway resolves model to a deployment and then translates shapes as needed.
| Client SDK ↓ / Provider → | OpenAI backend | Anthropic backend | Gemini backend |
|---|---|---|---|
OpenAI SDK (POST /v1/chat/completions) | Direct passthrough | Translated to Anthropic /v1/messages, response translated back | Translated to Gemini OpenAI-compatible endpoint, response translated back |
Anthropic SDK (POST /v1/messages) | Inbound translation to canonical, passthrough to OpenAI, response re-shaped to Anthropic | Direct passthrough after inbound translation | Inbound translation to canonical, then Gemini, response re-shaped to Anthropic |
Gemini SDK (POST /v1beta/models/*) | Inbound translation to canonical, passthrough to OpenAI, response re-shaped to Gemini | Inbound → canonical → Anthropic → re-shaped to Gemini | Direct passthrough |
Model routing is driven by the model_name on each deployment, not by the SDK the request came from. See the next section.
Model deployments
A model deployment is a mapping from a friendly model_name (what your client sends) to a concrete provider_model on a provider account we hold for you.
{
"id": "mdl_...",
"provider_id": "prov_...",
"model_name": "gpt-4o-fast",
"provider_model": "gpt-4o-2024-08-06",
"rpm_limit": 10000,
"tpm_limit": 1000000,
"input_cost_per_token": 2.5e-6,
"output_cost_per_token": 1.0e-5,
"priority": 1,
"cooldown_seconds": 5,
"status": "ACTIVE",
"model_type": "chat",
"system_prompt": null,
"default_temperature": null,
"default_max_tokens": null
}
Admin endpoints for deployments (create/list/get/update/delete):
POST /api/v1/models— body isDeploymentCreate.GET /api/v1/models,GET /api/v1/models/{deployment_id},PATCH /api/v1/models/{deployment_id},DELETE /api/v1/models/{deployment_id}.POST /api/v1/models/reclassifyre-runs the automaticmodel_typeclassifier on every deployment.
Runtime playgrounds for each model_type exist at POST /api/v1/models/{deployment_id}/{chat,embed,generate-image,speak,transcribe,moderate,complete}.
Routing strategy
When a client sends model_name, the gateway picks one deployment and forwards the request:
- Look up all
ACTIVEdeployments with thatmodel_name, ordered bypriorityascending (lower number = higher priority). - Filter out deployments currently in cooldown. A deployment enters cooldown after
_ALLOWED_FAILS = 3consecutive failures forcooldown_seconds(default 5). - Filter out deployments whose RPM counter in Redis has already reached
rpm_limitthis minute. - Apply weighted random selection across the surviving set. Lower-priority deployments get more weight, so priority acts as "pick me first unless I'm sick".
- Decrypt the chosen provider's API key and forward.
If all deployments for the model are cooled down, the highest-priority one is tried anyway (warned in logs).
Parameter translation quirks
The gateway normalises per-model differences so clients can always send the same OpenAI-shaped body. Every translation is logged to the audit trace as a modifications note.
- o-series reasoning models (pattern matches
o1,o1-mini,o1-preview,o2,o3,o3-mini, …): renamemax_tokens → max_completion_tokens; striptemperature,top_p,frequency_penalty,presence_penalty; convertsystemrole →developerrole. - gpt-5 family (pattern matches
gpt-5,gpt-5-mini,gpt-5-nano,gpt-5.1,gpt-5.4-nano, …): samemax_completion_tokensrename; striptemperature/top_p/penalties; keepsystemrole. - Anthropic (backend): strip
frequency_penaltyandpresence_penalty; extractsystemmessage to top-level; translateimage_url→imagewithsourceobject; translate tool definitions intoinput_schemashape; backfillmax_tokensif missing (default1024). - Non-reasoning OpenAI models that receive a stray
max_completion_tokens: rename back tomax_tokens.
The capability object per model lives at gateway/core/model_translator.py:34-61 (ModelCapabilities) and is selected via get_model_capabilities(provider_type, provider_model).
Virtual keys — day-2 operations
Admin endpoints all live under /api/v1/keys. Paths use the non-secret reference ID vkr_... — the secret vsk_... value is only ever returned in the POST creation response and is never echoed back by any other endpoint.
| Method | Path | Purpose |
|---|---|---|
| POST | /api/v1/keys | Create a new virtual key. Response includes the plaintext vsk_... once. |
| GET | /api/v1/keys | List keys for this tenant — secret value is never returned. |
| PATCH | /api/v1/keys/{key_ref} | Update name, allowed_models, max_budget, budget_period, rpm_limit, tpm_limit, expires_at, security_profile. |
| POST | /api/v1/keys/{key_ref}/block | Block the key. Takes effect on the next request once the Redis cache is invalidated. |
| POST | /api/v1/keys/{key_ref}/unblock | Unblock. |
| POST | /api/v1/keys/{key_ref}/rotate | Generate a new key with the same policy, block the old one. Response contains the new plaintext vsk_... once. |
| GET | /api/v1/keys/{key_ref}/spend | Current-period spend + budget status. |
| GET | /api/v1/keys/{key_ref}/analytics?days=30 | Aggregate spend, token totals, by-model and by-day breakdowns, recent traces. |
| GET | /api/v1/keys/{key_ref}/tools | Every MCP tool this key has access to (namespaced). |
Budget model:
max_budgetin USD.budget_period = daily | weekly | monthly | null. Reset timestamps are computed at period boundaries.rpm_limit/tpm_limitare counted in Redis per key. Whenmax_budgetis set andcurrent_spend >= max_budgetthe key returns429.
TTL (expires_at) is enforced on every request — an expired key returns 401.
Allowed models are a list. ["*"] means all; [] means MCP-only (no LLM endpoints); otherwise a concrete list like ["gpt-4o-fast", "claude-3-5-sonnet-fast"].
Fail-safe behaviour
- FAIL_CLOSED is the default. If the gateway is unreachable from your application, your SDK gets a network error and your application code sees a failure. Tool calls do not pass through unguarded.
- Per-request decision failures inside the gateway default to BLOCK. A crash in the tool-access enforcer returns a block, not an allow.
- Shield fail behaviour is per check.
fail_behavior: fail_closedreturns a 503;fail_openlets the request through with a warning verdict. Default varies per check — see Shield configuration.
To verify
TODO: unverified — The SDK-side circuit breaker (3-fail-open / 2-success-close semantics) is described in the internal design doc but I could not locate an explicit circuit-breaker implementation in the
verosek/Python SDK source when writing this doc. Confirm with Vaibhav whether the breaker ships in the current SDK release and, if so, add averosek/file-path reference here.
Observability hooks
- Structured logs, JSON-formatted, emitted via
structlog. Every log entry includesagent_idandtrace_idwhen present. - Trace events of interest include
virtual_key_generated,virtual_key_rotated,mcp_connection_restored,mcp_connection_restore_failed,tool_access_blocked,tool_args_modified,audit_session_started,audit_drain_cycle,audit_drain_entry_error,audit_drain_batch_commit_failed,spend_db_update_failed,gateway_started,gateway_shutdown,postgres_connected,redis_connected,shield_startup_failed. - Trace API — the same data is available as queryable JSON via
/api/v1/traces. See Audit API.
Onboarding-only
Handled during onboarding — not public. Metric scrape endpoints, internal service addresses, and centralized-log shipping targets are wired up during the onboarding engagement based on your stack (Datadog / Splunk / ELK / CloudWatch / etc.). We do not publish those specifics here.
What's next
Read the MCP connector catalog to see which tools you can wire up and how the access-rule schema controls what each key can do with them.