LLM API glossary

Plain-language definitions of the terms you'll meet when calling large language models through an API gateway like KeepRouter. Each entry is a short, standalone answer.

LLM API gateway

An LLM API gateway is a single service that routes your requests to many large language models — from different makers — behind one API and one key, so you choose the model by name instead of holding a separate account and SDK per provider.

OpenAI-compatible API

An OpenAI-compatible API accepts the same requests as OpenAI's API (POST /v1/chat/completions with an Authorization: Bearer key), so any OpenAI SDK works against it by changing only the base URL. KeepRouter's is at https://keeprouter.com/v1.

Anthropic-compatible API

An Anthropic-compatible API accepts the same requests as Anthropic's Claude API (POST /v1/messages with x-api-key and anthropic-version headers) — the same endpoint Claude Code and the Anthropic SDK use.

Token

A token is the unit language models read and write — roughly a word-piece, a few characters on average. Prompt length and output are measured in tokens, and API pricing is quoted per token (usually per million).

Context window

A context window is the maximum number of tokens a model can consider at once — your prompt plus its reply. A larger window lets the model work over longer documents or conversations in one request.

Input, output, and cached pricing

Text models are billed per token at three rates: input (tokens you send), output (tokens generated), and cached input (tokens re-read from a previous prompt at a discount). Output usually costs more than input.

Prompt caching

Prompt caching stores the processed form of a repeated prompt prefix so it isn't re-computed next time, billed at a reduced cached rate — lowering cost and latency for workloads that resend a large, stable prompt.

Per-image pricing

Image-generation models are billed per image produced, not per token — a flat price for each image. KeepRouter shows image models with a per-image price instead of token rates.

Model maker

The model maker is the organization that trained and released a model — for example OpenAI (GPT), Anthropic (Claude), Google DeepMind (Gemini), Zhipu AI (GLM), DeepSeek, ByteDance (Doubao), Moonshot AI (Kimi), Alibaba (Qwen), or MiniMax.

Modality

Modality describes the kinds of input and output a model handles: text models take and return text; vision (multimodal) models also accept images as input; text-to-image models generate images from a text prompt.

Streaming

Streaming returns a model's reply incrementally, token by token, as it is generated, instead of waiting for the whole response — letting an app show text as it arrives.

Tool calling

Tool calling (or function calling) lets a model ask your application to run a named function with structured arguments, then continue using the result — how models trigger actions like searching or calling an API.

Pay-as-you-go / prepaid credits

Pay-as-you-go means you are billed only for what you use, with no fixed monthly fee. KeepRouter uses prepaid credits: you top up a USD balance in advance and each request draws from it.

Billed at cost (0% markup)

Billed at cost means the price you pay per token equals the model's underlying cost, with no percentage added. KeepRouter sells every model at cost — a 0% token markup — and makes no money on the tokens themselves.

Merchant of Record

A Merchant of Record (MoR) is the company that legally sells to you and handles payment, tax, and compliance. KeepRouter's card top-ups are processed by Paddle as Merchant of Record.

Models & pricing · Quickstart