AI GATEWAY

One SDK. Any model. Any provider.

Twelve OpenAI endpoints, native Anthropic /v1/messages, and native Gemini /v1beta. Swap SDKs, swap providers, swap models — without editing your app.

Book a demo Read the docs →

Two-line migration

Keep your SDK. Change the base URL.

quickstart.py

import openai

client = openai.OpenAI(
    api_key="vsk_...",                    # Verosek virtual key
    base_url="http://your-gateway/v1",   # ← only line that changed
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "How many users are in the database?"}],
)
# Tools handled internally. Policy enforced. Every step audited.
print(response.choices[0].message.content)

Cross-SDK routing matrix

Run Claude through the OpenAI SDK. GPT through Anthropic. Any combination, any time.

SDK \\ Provider	OpenAI	Anthropic	Gemini
OpenAI SDK
Anthropic SDK
Gemini SDK

API surface

Fifteen endpoints. Three SDKs. One gateway.

/v1/chat/completions

/v1/completions

/v1/responses

/v1/embeddings

/v1/images/generations

/v1/images/edits

/v1/images/variations

/v1/audio/speech

/v1/audio/transcriptions

/v1/audio/translations

/v1/moderations

GET /v1/models

POST /v1/messages (Anthropic)

/v1beta/models/{model}:generateContent (Gemini)

/v1beta/models/{model}:embedContent

Virtual keys

Identity for non-human actors.

Budget caps

Daily / weekly / monthly spend limits per key. Stops on breach.

TTL

Every key expires. No forgotten service accounts.

Rotate

One-click rotate with grace period for in-flight calls.

Revoke

Instant kill. Background audit continues.

Audit binding

Every trace carries the key_id. Permanent forensic link.

Routing

Weighted routing. Priority fallback. Automatic cooldown.

Deployments cool off after N consecutive failures. Traffic drains to the next priority tier until health returns. No per-request timeout wait.

Performance contract

< 30 ms P99 overhead. Measured on baseline profile.

Stage	Budget	What it does
Virtual-key check	~1 ms	Redis cache hit on key_id
Request translation	~1 ms	Pure Python, no I/O
Shield pre-LLM enforce	< 30 ms	Local PII + secrets. ML classifiers run async.
Shield post-LLM enforce	< 30 ms	Same pattern as pre-LLM.
Audit enqueue	~0.5 ms	Redis RPUSH, never blocks.

Two lines to migrate. Every SDK to every provider.

Point your client at the gateway. Keep shipping.

Book a demo Read the integration guide →