Get startedOverview

NicoSoft AI documentation

NicoSoft AI is a unified gateway in front of every major model provider. Send one request shape and reach OpenAI, Anthropic, Google, DeepSeek, Azure, and our own hosted models without rewriting clients.

Overview

The gateway exposes three universal text endpoints — pick whichever SDK you already use and you get every model. Native pass-throughs (Gemini, image, video) live alongside the universal entries when you need provider-exact shapes.

  • Drop-in for the OpenAI, Anthropic, and Responses SDKs.
  • Pay-as-you-go on the same balance, regardless of upstream provider.
  • Automatic channel failover when an upstream returns 401 / 403 / 5xx.
  • Free-tier models routed through pooled keys so you can try them with zero balance.

Quickstart

Make your first request in under a minute. The OpenAI Python SDK works as-is — only the base_url and api_key change.

1Create an API key from the dashboard.
2Set the base URL to https://api.nicosoft.ai/v1.
3Send a request:
from openai import OpenAI

client = OpenAI(
    base_url="https://api.nicosoft.ai/v1",
    api_key="sk-nsai-...",
)

resp = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hi"}],
    stream=True,
)

for chunk in resp:
    print(chunk.choices[0].delta.content or "", end="")

Authentication

All requests require a bearer token. Keys are scoped to a single account and can carry independent credit caps and per-minute limits.

Authorization: Bearer sk-nsai-***
Keep secrets server-side
API keys can read your usage and spend credits. Never ship them in client-side bundles — proxy through your own backend.

Endpoint routing

Three universal entries accept any text model. Native pass-throughs preserve provider-exact shapes for clients that need them.

EndpointFormatSDKAccepted models
/v1/chat/completionsOpenAI ChatOpenAI SDKAll text models
/v1/messagesAnthropic MessagesAnthropic SDKAll text models
/v1/responsesOpenAI ResponsesOpenAI SDKAll text models
/v1beta/models/{model}:{method}Gemini nativeGoogle Gen AI SDKGemini only
Features preserved across bridges
Tool calling, streaming, system prompts, reasoning traces, and response_format: json_object all work transparently — the gateway translates between shapes without dropping features.

OpenAI Chat Completions

POSThttps://api.nicosoft.ai/v1/chat/completions

Drop-in replacement for the official OpenAI API. Accepts every text model — OpenAI, Anthropic, Gemini, DeepSeek, Azure — by switching the model slug.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.nicosoft.ai/v1",
    api_key="sk-nsai-...",
)

r = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
print(r.choices[0].message.content)

Anthropic Messages

POSThttps://api.nicosoft.ai/v1/messages

Drop-in for the Anthropic SDK. Set the SDK's base_url to https://api.nicosoft.ai and call messages.create. Non-Anthropic models are bridged — request, response, and streaming are all translated.

from anthropic import Anthropic

client = Anthropic(
    base_url="https://api.nicosoft.ai",
    api_key="sk-nsai-...",
)

msg = client.messages.create(
    model="anthropic/claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hi"}],
)
print(msg.content[0].text)

Gemini

POSThttps://api.nicosoft.ai/v1beta/models/{model}:generateContent

Gemini-native endpoints are a strict 1:1 pass-through — point Google's Gen AI SDK at the base URL and the request, response, and stream all preserve Google's original schema. If you'd rather treat Gemini as a regular OpenAI / Anthropic model, send it through /v1/chat/completions or /v1/messages instead.

Streaming (SSE)

Every endpoint supports SSE streaming. Set "stream": true (OpenAI / Anthropic families) or use :streamGenerateContent (Gemini).

EndpointSSE formatHow to enable
/v1/chat/completionsOpenAI delta lines"stream": true
/v1/responsesOpenAI event-typed"stream": true
/v1/messagesAnthropic event-typed"stream": true
/v1beta/models/*Gemini JSON chunks:streamGenerateContent + ?alt=sse
Universal entries translate SSE too
/v1/chat/completions always returns OpenAI delta lines regardless of the selected model; /v1/messagesalways returns Anthropic event frames. Gemini native preserves Google's raw stream shape.

Tool calling

Pass toolson the request and the gateway forwards the schema. Tool-call output is normalised into the request's protocol shape — your existing OpenAI / Anthropic tool-handling loop works regardless of which model actually runs.

Concurrent / parallel tool calls and tool_choice constraints (auto, required, named) pass through unchanged.

Image generation

POSThttps://api.nicosoft.ai/v1/images/generations

OpenAI-compatible synchronous image generation. The OpenAI SDK works directly — set base_url to https://api.nicosoft.ai/v1 and call client.images.generate(...). Billing is per-image; failed requests aren't charged.

Google Imagen is exposed via the Gemini native endpoint at /v1beta/models/imagen-*:generate.

Model naming

Model slugs are namespaced by upstream family so you can tell at a glance where a request is going.

  • openai/* — OpenAI native (GPT-4o, GPT-5, etc.)
  • anthropic/* — Anthropic native (Claude Sonnet / Opus / Haiku)
  • gemini-* / google/* — Google Gemini family
  • deepseek/* — DeepSeek
  • azure/* — Azure-hosted OpenAI models
  • nicosoft/* — first-party hosted variants

Browse the full catalog at /models.

Error handling

On error the gateway returns a JSON envelope with success: falseand a machine-readable code. Model provider errors are normalised so provider internals aren't exposed to clients.

CodeHTTPWhen
BAD_REQUEST400Wrong endpoint for model family, missing field, non-text model on text endpoint
UNAUTHORIZED401Missing or invalid API key
INSUFFICIENT_CREDITS402Balance too low for a paid model
FORBIDDEN403Key disabled or expired
NOT_FOUND404Model slug doesn't exist
RATE_LIMITED429Per-key rate limit exceeded — honour Retry-After
PROVIDER_ERROR502Upstream returned an error; retry with backoff
SERVICE_UNAVAILABLE503All channels for that model are exhausted
Automatic failover
When an upstream returns 401, 403, or 5xx, the gateway auto-tries the next available channel. You only see an error if every channel for that model is exhausted.

Rate limits

Rate limits are enforced per API key using a sliding window. When you exceed the limit the gateway returns 429 RATE_LIMITED with a Retry-After header (seconds until the window resets).

LimitScopeNotes
Requests per minutePer API keyApplies to every proxy endpoint
Tokens per minutePer API keyCounted after the model response
Free-tier daily capPer modelPooled across users; paid models stay live when free runs out

Defaults are generous for typical usage. For high- throughput or batch workloads, get in touch about dedicated quotas.

Support

Questions or issues — support@nicosoft.dev. Status page: /status. Model catalog: /models.