29.05.2026

How to monitor GPT API usage in the Serverspace dashboard

When working with the GPT API in the Serverspace panel, it is important to understand that the cost of requests directly depends on the number of processed tokens. Tokens are units of text that make up the model’s request and response.

One token is approximately equal to:

The longer the request and response, the higher the cost.

Token pricing

Serverspace uses separate pricing — each model has its own rates for input and output tokens, for example for the model GPT-5.3 Codex:

What this means in practice:
Input tokens are the text you send to the model (prompt, instructions, context).
Output tokens are the response generated by the model.

Important: output tokens are usually more expensive because they require more computational resources.

Why token control is important

Controlling token usage helps to:

Without token limits, the model may generate overly long responses, increasing the cost of each request.

How token limits work

The Serverspace panel provides a setting:

  1. Maximum number of tokens
  2. This setting defines the upper limit of the model’s response length.

If a limit is set: the model cannot exceed the specified number of tokens in its response; the output will be automatically cut off when the limit is reached; you get a predictable cost for each request.

Benefits of using token limits

Using token limits allows you to:

Control your budget — prevents overly long and expensive responses
Increase cost predictability — easier to plan expenses
Optimize performance — faster responses with less text
Flexibly manage model behavior — balance between brevity and detail

Usage recommendations

For chatbots: 300–800 tokens
For short answers / FAQs: 100–300 tokens
For article generation: 1000+ tokens (with budget awareness)