How can I disable to receive the thinking part of the response when connecting to Ollama?
Even when using a model that has disabled the thinking in the Modelfile of Ollama, I am still receiving thinking when using the TTMSMCPCloudAI
thank you in advance
It is not implemented and some models give extensive thinking that delays the response:
Changing the following in TMS.MCP.CloudAI, disables the thinking:
line: 1040
aiOllama:
begin
if FFiles.HasBinaryFiles then
begin
APostDataOllama := '{"model":"%s","prompt":"%s","stream":false,"think":false,"images":[%s]}';
AMessages := '';
end;
line 866:
cDataOllama = '{' +
'"model": "%s",'+
'"messages": [ %s ],'+
'"tools": [ %s ],'+
'"stream": false,'+
'"think": false,'+
'"options":{"temperature":%s,"max_tokens":%s}'+
'}';
Please consider to add it as an option.
Thank you
Other useful options also for Ollama is to set the context window
We will evaluate if/how we could unify these additional capabilities across the many cloud LLMs we support and add it when feasible.
Thank you a lot.
For context size, “num_ctx” is just an extra option (OllamaContext) in “options“ json object as temperature and max_tokens. And to set context size is crucial for memory and the work for the LLM.
For “Thinking” as you can see in the source I posted above, it is just a boolean value that can be set with an OllamaThink in settings of the component.
As I can understand, the same option has its own name/value for each LLM you support (except temperature and max_tokens). Each service has its own variable for API key.
Thanks again
A possibility is for now, to set this via CloudAI.Settings.CustomOptions as "think:false" and possibly also the context size this way.
1 Like