Get startedOverview

NicoSoft AI documentation

NicoSoft AI is a unified gateway in front of every major model provider. Send one request shape and reach OpenAI, Anthropic, Google, DeepSeek, Azure, and our own hosted models without rewriting clients.

Overview

The gateway exposes three universal text endpoints — pick whichever SDK you already use and you get every model. Native pass-throughs (Gemini, image, video) live alongside the universal entries when you need provider-exact shapes.

Drop-in for the OpenAI, Anthropic, and Responses SDKs.
Pay-as-you-go on the same balance, regardless of upstream provider.
Automatic channel failover when an upstream returns 401 / 403 / 5xx.
Free-tier models routed through pooled keys so you can try them with zero balance.

Quickstart

Make your first request in under a minute. The OpenAI Python SDK works as-is — only the base_url and api_key change.

1Create an API key from the dashboard.

2Set the base URL to https://api.nicosoft.ai/v1.

3Send a request:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.nicosoft.ai/v1",
    api_key="sk-nsai-...",
)

resp = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hi"}],
    stream=True,
)

for chunk in resp:
    print(chunk.choices[0].delta.content or "", end="")

Authentication

All requests require a bearer token. Keys are scoped to a single account and can carry independent credit caps and per-minute limits.

Authorization: Bearer sk-nsai-***

Keep secrets server-side

API keys can read your usage and spend credits. Never ship them in client-side bundles — proxy through your own backend.

Endpoint routing

Three universal entries accept any text model. Native pass-throughs preserve provider-exact shapes for clients that need them.

Endpoint	Format	SDK	Accepted models
`/v1/chat/completions`	OpenAI Chat	OpenAI SDK	All text models
`/v1/messages`	Anthropic Messages	Anthropic SDK	All text models
`/v1/responses`	OpenAI Responses	OpenAI SDK	All text models
`/v1beta/models/{model}:{method}`	Gemini native	Google Gen AI SDK	Gemini only

Features preserved across bridges

Tool calling, streaming, system prompts, reasoning traces, and response_format: json_object all work transparently — the gateway translates between shapes without dropping features.

OpenAI Chat Completions

POSThttps://api.nicosoft.ai/v1/chat/completions

Drop-in replacement for the official OpenAI API. Accepts every text model — OpenAI, Anthropic, Gemini, DeepSeek, Azure — by switching the model slug.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.nicosoft.ai/v1",
    api_key="sk-nsai-...",
)

r = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
print(r.choices[0].message.content)

Anthropic Messages

POSThttps://api.nicosoft.ai/v1/messages

Drop-in for the Anthropic SDK. Set the SDK's base_url to https://api.nicosoft.ai and call messages.create. Non-Anthropic models are bridged — request, response, and streaming are all translated.

from anthropic import Anthropic

client = Anthropic(
    base_url="https://api.nicosoft.ai",
    api_key="sk-nsai-...",
)

msg = client.messages.create(
    model="anthropic/claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hi"}],
)
print(msg.content[0].text)

Gemini

POSThttps://api.nicosoft.ai/v1beta/models/{model}:generateContent

Gemini-native endpoints are a strict 1:1 pass-through — point Google's Gen AI SDK at the base URL and the request, response, and stream all preserve Google's original schema. If you'd rather treat Gemini as a regular OpenAI / Anthropic model, send it through /v1/chat/completions or /v1/messages instead.

Streaming (SSE)

Every endpoint supports SSE streaming. Set "stream": true (OpenAI / Anthropic families) or use :streamGenerateContent (Gemini).

Endpoint	SSE format	How to enable
`/v1/chat/completions`	OpenAI delta lines	`"stream": true`
`/v1/responses`	OpenAI event-typed	`"stream": true`
`/v1/messages`	Anthropic event-typed	`"stream": true`
`/v1beta/models/*`	Gemini JSON chunks	`:streamGenerateContent` + `?alt=sse`

Universal entries translate SSE too

/v1/chat/completions always returns OpenAI delta lines regardless of the selected model; /v1/messagesalways returns Anthropic event frames. Gemini native preserves Google's raw stream shape.

Tool calling

Pass toolson the request and the gateway forwards the schema. Tool-call output is normalised into the request's protocol shape — your existing OpenAI / Anthropic tool-handling loop works regardless of which model actually runs.

Concurrent / parallel tool calls and tool_choice constraints (auto, required, named) pass through unchanged.

Image generation

POSThttps://api.nicosoft.ai/v1/images/generations

OpenAI-compatible synchronous image generation. The OpenAI SDK works directly — set base_url to https://api.nicosoft.ai/v1 and call client.images.generate(...). Billing is per-image; failed requests aren't charged.

Google Imagen is exposed via the Gemini native endpoint at /v1beta/models/imagen-*:generate.

Model naming

Model slugs are namespaced by upstream family so you can tell at a glance where a request is going.

openai/* — OpenAI native (GPT-4o, GPT-5, etc.)
anthropic/* — Anthropic native (Claude Sonnet / Opus / Haiku)
gemini-* / google/* — Google Gemini family
deepseek/* — DeepSeek
azure/* — Azure-hosted OpenAI models
nicosoft/* — first-party hosted variants

Browse the full catalog at /models.

Error handling

On error the gateway returns a JSON envelope with success: falseand a machine-readable code. Model provider errors are normalised so provider internals aren't exposed to clients.

Code	HTTP	When
`BAD_REQUEST`	400	Wrong endpoint for model family, missing field, non-text model on text endpoint
`UNAUTHORIZED`	401	Missing or invalid API key
`INSUFFICIENT_CREDITS`	402	Balance too low for a paid model
`FORBIDDEN`	403	Key disabled or expired
`NOT_FOUND`	404	Model slug doesn't exist
`RATE_LIMITED`	429	Per-key rate limit exceeded — honour `Retry-After`
`PROVIDER_ERROR`	502	Upstream returned an error; retry with backoff
`SERVICE_UNAVAILABLE`	503	All channels for that model are exhausted

Automatic failover

When an upstream returns 401, 403, or 5xx, the gateway auto-tries the next available channel. You only see an error if every channel for that model is exhausted.

Rate limits

Rate limits are enforced per API key using a sliding window. When you exceed the limit the gateway returns 429 RATE_LIMITED with a Retry-After header (seconds until the window resets).

Limit	Scope	Notes
Requests per minute	Per API key	Applies to every proxy endpoint
Tokens per minute	Per API key	Counted after the model response
Free-tier daily cap	Per model	Pooled across users; paid models stay live when free runs out

Defaults are generous for typical usage. For high- throughput or batch workloads, get in touch about dedicated quotas.

Support

Questions or issues — support@nicosoft.dev. Status page: /status. Model catalog: /models.