NicoSoft AI is a unified gateway in front of every major model provider. Send one request shape and reach OpenAI, Anthropic, Google, DeepSeek, Azure, and our own hosted models without rewriting clients.
The gateway exposes three universal text endpoints — pick whichever SDK you already use and you get every model. Native pass-throughs (Gemini, image, video) live alongside the universal entries when you need provider-exact shapes.
Make your first request in under a minute. The OpenAI Python SDK works as-is — only the base_url and api_key change.
https://api.nicosoft.ai/v1.from openai import OpenAI
client = OpenAI(
base_url="https://api.nicosoft.ai/v1",
api_key="sk-nsai-...",
)
resp = client.chat.completions.create(
model="anthropic/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Hi"}],
stream=True,
)
for chunk in resp:
print(chunk.choices[0].delta.content or "", end="")All requests require a bearer token. Keys are scoped to a single account and can carry independent credit caps and per-minute limits.
Three universal entries accept any text model. Native pass-throughs preserve provider-exact shapes for clients that need them.
| Endpoint | Format | SDK | Accepted models |
|---|---|---|---|
/v1/chat/completions | OpenAI Chat | OpenAI SDK | All text models |
/v1/messages | Anthropic Messages | Anthropic SDK | All text models |
/v1/responses | OpenAI Responses | OpenAI SDK | All text models |
/v1beta/models/{model}:{method} | Gemini native | Google Gen AI SDK | Gemini only |
response_format: json_object all work transparently — the gateway translates between shapes without dropping features.https://api.nicosoft.ai/v1/chat/completionsDrop-in replacement for the official OpenAI API. Accepts every text model — OpenAI, Anthropic, Gemini, DeepSeek, Azure — by switching the model slug.
from openai import OpenAI
client = OpenAI(
base_url="https://api.nicosoft.ai/v1",
api_key="sk-nsai-...",
)
r = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
print(r.choices[0].message.content)https://api.nicosoft.ai/v1/messagesDrop-in for the Anthropic SDK. Set the SDK's base_url to https://api.nicosoft.ai and call messages.create. Non-Anthropic models are bridged — request, response, and streaming are all translated.
from anthropic import Anthropic
client = Anthropic(
base_url="https://api.nicosoft.ai",
api_key="sk-nsai-...",
)
msg = client.messages.create(
model="anthropic/claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Hi"}],
)
print(msg.content[0].text)https://api.nicosoft.ai/v1beta/models/{model}:generateContentGemini-native endpoints are a strict 1:1 pass-through — point Google's Gen AI SDK at the base URL and the request, response, and stream all preserve Google's original schema. If you'd rather treat Gemini as a regular OpenAI / Anthropic model, send it through /v1/chat/completions or /v1/messages instead.
Every endpoint supports SSE streaming. Set "stream": true (OpenAI / Anthropic families) or use :streamGenerateContent (Gemini).
| Endpoint | SSE format | How to enable |
|---|---|---|
/v1/chat/completions | OpenAI delta lines | "stream": true |
/v1/responses | OpenAI event-typed | "stream": true |
/v1/messages | Anthropic event-typed | "stream": true |
/v1beta/models/* | Gemini JSON chunks | :streamGenerateContent + ?alt=sse |
/v1/chat/completions always returns OpenAI delta lines regardless of the selected model; /v1/messagesalways returns Anthropic event frames. Gemini native preserves Google's raw stream shape.Pass toolson the request and the gateway forwards the schema. Tool-call output is normalised into the request's protocol shape — your existing OpenAI / Anthropic tool-handling loop works regardless of which model actually runs.
Concurrent / parallel tool calls and tool_choice constraints (auto, required, named) pass through unchanged.
https://api.nicosoft.ai/v1/images/generationsOpenAI-compatible synchronous image generation. The OpenAI SDK works directly — set base_url to https://api.nicosoft.ai/v1 and call client.images.generate(...). Billing is per-image; failed requests aren't charged.
Google Imagen is exposed via the Gemini native endpoint at /v1beta/models/imagen-*:generate.
Model slugs are namespaced by upstream family so you can tell at a glance where a request is going.
openai/* — OpenAI native (GPT-4o, GPT-5, etc.)anthropic/* — Anthropic native (Claude Sonnet / Opus / Haiku)gemini-* / google/* — Google Gemini familydeepseek/* — DeepSeekazure/* — Azure-hosted OpenAI modelsnicosoft/* — first-party hosted variantsBrowse the full catalog at /models.
On error the gateway returns a JSON envelope with success: falseand a machine-readable code. Model provider errors are normalised so provider internals aren't exposed to clients.
| Code | HTTP | When |
|---|---|---|
BAD_REQUEST | 400 | Wrong endpoint for model family, missing field, non-text model on text endpoint |
UNAUTHORIZED | 401 | Missing or invalid API key |
INSUFFICIENT_CREDITS | 402 | Balance too low for a paid model |
FORBIDDEN | 403 | Key disabled or expired |
NOT_FOUND | 404 | Model slug doesn't exist |
RATE_LIMITED | 429 | Per-key rate limit exceeded — honour Retry-After |
PROVIDER_ERROR | 502 | Upstream returned an error; retry with backoff |
SERVICE_UNAVAILABLE | 503 | All channels for that model are exhausted |
Rate limits are enforced per API key using a sliding window. When you exceed the limit the gateway returns 429 RATE_LIMITED with a Retry-After header (seconds until the window resets).
| Limit | Scope | Notes |
|---|---|---|
| Requests per minute | Per API key | Applies to every proxy endpoint |
| Tokens per minute | Per API key | Counted after the model response |
| Free-tier daily cap | Per model | Pooled across users; paid models stay live when free runs out |
Defaults are generous for typical usage. For high- throughput or batch workloads, get in touch about dedicated quotas.
Questions or issues — support@nicosoft.dev. Status page: /status. Model catalog: /models.