When working with the GPT API in the Serverspace panel, it is important to understand that the cost of requests directly depends on the number of processed tokens. Tokens are units of text that make up the model’s request and response.
One token is approximately equal to:
- 1 word (on average) in English,
- or parts of a word / characters in other languages.
The longer the request and response, the higher the cost.
Token pricing
Serverspace uses separate pricing — each model has its own rates for input and output tokens, for example for the model GPT-5.3 Codex:
- Input token cost: 2.3 €/ 1M tokens
- Output token cost: 18.42 €/ 1M tokens

What this means in practice:
Input tokens are the text you send to the model (prompt, instructions, context).
Output tokens are the response generated by the model.
Why token control is important
Controlling token usage helps to:
- avoid unexpected costs,
- predict API budget,
- optimize response quality/length,
- manage load in applications.
Without token limits, the model may generate overly long responses, increasing the cost of each request.
How token limits work
The Serverspace panel provides a setting:
- Maximum number of tokens
- This setting defines the upper limit of the model’s response length.
If a limit is set: the model cannot exceed the specified number of tokens in its response; the output will be automatically cut off when the limit is reached; you get a predictable cost for each request.
Benefits of using token limits
Using token limits allows you to:
Control your budget — prevents overly long and expensive responses
Increase cost predictability — easier to plan expenses
Optimize performance — faster responses with less text
Flexibly manage model behavior — balance between brevity and detail
Usage recommendations
For chatbots: 300–800 tokens
For short answers / FAQs: 100–300 tokens
For article generation: 1000+ tokens (with budget awareness)