Reasoning Tokens

TheAgentic's models includes reasoning tokens in every response. These tokens represent the logical steps the model takes to generate its output.


See reasoning tokens in action

To see the reasoning tokens in each response, you can pass in return_reasoning=True in the extra_body parameter of the chat completion request.

response = client.chat.completions.create(
    model="agentic-large",
    messages=[...],
    extra_body={"return_reasoning": True},
)

The response will include the reasoning tokens in the reasoning_content of the ChatCompletionMessage object.

print(response.choices[0].message.reasoning_content)

When streaming the response, the reasoning tokens will be streamed in the reasoning_content field.

for chunk in response:
    print(chunk.choices[0].message.reasoning_content)

Reasoning Tokens in Usage

Following an example of a chat request and response where the number of reasoning tokens for this request can be seen in the usage section of the response.


ChatCompletion(
    id='179863581b5af5185fa08f87387ab013',
    choices=[...],
    created=1744236734,
    model='agentic-large',
    object='chat.completion',
    service_tier=None,
    system_fingerprint=None,
    usage=CompletionUsage(
        completion_tokens=408,
        prompt_tokens=1772,
        total_tokens=2180,
        completion_tokens_details=CompletionTokensDetails(
            accepted_prediction_tokens=0,
            audio_tokens=0,
            reasoning_tokens=94,
            rejected_prediction_tokens=0
        ),
        prompt_tokens_details=PromptTokensDetails(
            audio_tokens=0,
            cached_tokens=0
        )
    ),
    metadata={}
)

In this example of a response from the API, the usage field contains the information about token counts.

  • prompt_tokens include both the system prompt sent by the user and the internal system prompt
  • completion_tokens_details.reasoning_tokens is the count of reasoning tokens for this request
  • completion_tokens includes the total response from the LLM (reasoning tokens + output tokens)
  • total_tokens is the sum of prompt_tokens and completion_tokens