Disable thinking part in response

How can I disable to receive the thinking part of the response when connecting to Ollama?

Even when using a model that has disabled the thinking in the Modelfile of Ollama, I am still receiving thinking when using the TTMSMCPCloudAI

thank you in advance

It is not implemented and some models give extensive thinking that delays the response:

Changing the following in TMS.MCP.CloudAI, disables the thinking:
line: 1040

aiOllama:
begin
if FFiles.HasBinaryFiles then
begin
APostDataOllama := '{"model":"%s","prompt":"%s","stream":false,"think":false,"images":[%s]}';
AMessages := '';
end;

line 866:

cDataOllama = '{' +
'"model": "%s",'+
'"messages": [ %s ],'+
'"tools": [ %s ],'+
'"stream": false,'+
'"think": false,'+
'"options":{"temperature":%s,"max_tokens":%s}'+
'}';

Please consider to add it as an option.

Thank you

Other useful options also for Ollama is to set the context window

We will evaluate if/how we could unify these additional capabilities across the many cloud LLMs we support and add it when feasible.

Thank you a lot.

For context size, “num_ctx” is just an extra option (OllamaContext) in “options“ json object as temperature and max_tokens. And to set context size is crucial for memory and the work for the LLM.

For “Thinking” as you can see in the source I posted above, it is just a boolean value that can be set with an OllamaThink in settings of the component.

As I can understand, the same option has its own name/value for each LLM you support (except temperature and max_tokens). Each service has its own variable for API key.

Thanks again

A possibility is for now, to set this via CloudAI.Settings.CustomOptions as "think:false" and possibly also the context size this way.

1 Like