Proxied vs non-proxied

This page explains how API traffic can reach your backend: through the platform gateway (proxied) or directly from the client to you (non-proxied). This is about how the request reaches your backend, not the service type.

Proxied

In proxied mode, the client calls the platform gateway. The gateway validates the service key, checks subscription and rate limits, then forwards the request to your backend. The platform records usage automatically based on the invocation and (for async) completion callbacks. The client never talks to your backend directly.

Use proxied when you want the platform to handle authentication, rate limiting, and usage metering for you.

Non-proxied

In non-proxied mode, the client calls your backend directly (e.g. your own domain). The platform does not sit in the request path. You are responsible for validating the client and for reporting usage to the platform via the Usage API (e.g. /api/usage/report plus progress/completion endpoints for async).

Use non-proxied when you need the client to hit your infrastructure directly (e.g. WebSockets, long-lived connections, or your own auth).

Proxied streaming and per-token products

For Accept: text/event-stream invokes, usage for per-token plans depends on the product's streaming usage setting (stored on the product as streaming_units_source):

  • Platform counts tokens (GATEWAY_COUNT): tollara.ai derives usage with a proprietary token-counting algorithm, using your product's tokenizer encoding setting.
  • Service reports usage (AGENT_REPORTS): the gateway records units from the service response in order: X-Units-Used (response header, case-insensitive), then a final SSE data: line with tollara.ai JSON { "agentVend": { "totalUnits": <n> } }, then OpenAI-style usage.prompt_tokens / completion_tokens (or total_tokens) in the tail of the stream. If none match, billing falls back to 1 unit.

Choosing

Whether traffic is proxied or non-proxied depends on how subscribers integrate. If they use the gateway invoke URL (path-based or branded API domain), traffic is proxied. If they call your backend URL directly and you report usage via the Usage API, it is non-proxied. The same service can be used in both ways depending on the client.

Non-proxied: long jobs and pre-flight

For non-proxied backends, validate the caller (e.g. with service key validation) and report usage to the Usage API. Before expensive or long-running work, call a core usage estimate with planned units (POST /billing/usage/estimate with a user JWT, or POST /service-keys/estimate-usage with key + secret). You may send incremental usage reports for token/time-style meters as your product requires; refunds and partial failures are policy-specific.