AI GATEWAY
One SDK. Any model. Any provider.
Twelve OpenAI endpoints, native Anthropic /v1/messages, and native Gemini /v1beta. Swap SDKs, swap providers, swap models — without editing your app.
01
Two-line migration
Keep your SDK. Change the base URL.
quickstart.py
import openai
client = openai.OpenAI(
api_key="vsk_...", # Verosek virtual key
base_url="http://your-gateway/v1", # ← only line that changed
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "How many users are in the database?"}],
)
# Tools handled internally. Policy enforced. Every step audited.
print(response.choices[0].message.content)02
Cross-SDK routing matrix
Run Claude through the OpenAI SDK. GPT through Anthropic. Any combination, any time.
| SDK \\ Provider | OpenAI | Anthropic | Gemini |
|---|---|---|---|
| OpenAI SDK | |||
| Anthropic SDK | |||
| Gemini SDK |
03
API surface
Fifteen endpoints. Three SDKs. One gateway.
/v1/chat/completions
/v1/completions
/v1/responses
/v1/embeddings
/v1/images/generations
/v1/images/edits
/v1/images/variations
/v1/audio/speech
/v1/audio/transcriptions
/v1/audio/translations
/v1/moderations
GET /v1/models
POST /v1/messages (Anthropic)
/v1beta/models/{model}:generateContent (Gemini)
/v1beta/models/{model}:embedContent
04
Virtual keys
Identity for non-human actors.
Budget caps
Daily / weekly / monthly spend limits per key. Stops on breach.
TTL
Every key expires. No forgotten service accounts.
Rotate
One-click rotate with grace period for in-flight calls.
Revoke
Instant kill. Background audit continues.
Audit binding
Every trace carries the key_id. Permanent forensic link.
05
Routing
Weighted routing. Priority fallback. Automatic cooldown.
Deployments cool off after N consecutive failures. Traffic drains to the next priority tier until health returns. No per-request timeout wait.
06
Performance contract
< 30 ms P99 overhead. Measured on baseline profile.
| Stage | Budget | What it does |
|---|---|---|
| Virtual-key check | ~1 ms | Redis cache hit on key_id |
| Request translation | ~1 ms | Pure Python, no I/O |
| Shield pre-LLM enforce | < 30 ms | Local PII + secrets. ML classifiers run async. |
| Shield post-LLM enforce | < 30 ms | Same pattern as pre-LLM. |
| Audit enqueue | ~0.5 ms | Redis RPUSH, never blocks. |
Two lines to migrate. Every SDK to every provider.
Point your client at the gateway. Keep shipping.