Ollama

Categories: Artificial Intelligence

Type: ollama/v1

Connections

Version: 1

Bearer Token

Properties

Name	Label	Type	Description	Required
url	Url	STRING	URL to your Ollama server	false

Actions

Ask

Name: ask

Ask anything you want.

Properties

Name	Label	Type	Description	Required
model	Model	STRING Options codellama, dolphin-phi, gemma, llama2, llama2-uncensored, llama3, llama3.1, llama3.2, llama3.2-vision, llama3.2-vision:90b, llama3.2:1b, llama3.2:3b, llava, mistral, mistral-nemo, moondream, mxbai-embed-large, neural-chat, nomic-embed-text, orca-mini, phi, phi3, qwen2.5, qwq, starling-lm	ID of the model to use.	true
messages	Messages	ARRAY Items [{STRING(role), STRING(content), [FILE_ENTRY](attachments)}]	A list of messages comprising the conversation so far.	true
response	Response	OBJECT Properties {STRING(responseFormat), STRING(responseSchema)}	The response from the API.	false
keepAlive	Keep alive for	STRING	Controls how long the model will stay loaded into memory following the request	false
maxTokens	Num predict	INTEGER	Maximum number of tokens to predict when generating text. (-1 = infinite generation, -2 = fill context)	false
temperature	Temperature	NUMBER	Controls randomness: Higher values will make the output more random, while lower values like will make it more focused and deterministic.	false
topP	Top P	NUMBER	An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.	false
topK	Top K	INTEGER	Specify the number of token choices the generative uses to generate the next token.	false
frequencyPenalty	Frequency Penalty	NUMBER	Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.	false
presencePenalty	Presence Penalty	NUMBER	Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.	false
stop	Stop	ARRAY Items [STRING]	Up to 4 sequences where the API will stop generating further tokens.	false
seed	Seed	INTEGER	Keeping the same seed would output the same response.	false
useNuma	Use NUMA	BOOLEAN Options true, false	Whether to use NUMA.	false
numCtx	Num CTX	INTEGER	Sets the size of the context window used to generate the next token.	false
numBatch	Num batch	INTEGER	Prompt processing maximum batch size.	false
numGpu	Num GPU	INTEGER	The number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable. 1 here indicates that NumGPU should be set dynamically	false
mainGpu	Main GPU	INTEGER	When using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results.	false
lowVram	Low VRAM	BOOLEAN Options true, false		null
f16kv	F16 KV	BOOLEAN Options true, false		null
logitsAll	Logits all	BOOLEAN Options true, false	Return logits for all the tokens, not just the last one. To enable completions to return logprobs, this must be true.	false
vocabOnly	Vocab only	BOOLEAN Options true, false	Load only the vocabulary, not the weights.	false
useMmap	Use MMap	BOOLEAN Options true, false	By default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed. However, if the model is larger than your total amount of RAM or if your system is low on available memory, using mmap might increase the risk of pageouts, negatively impacting performance. Disabling mmap results in slower load times but may reduce pageouts if you’re not using mlock. Note that if the model is larger than the total amount of RAM, turning off mmap would prevent the model from loading at all.	false
useMlock	Use MLock	BOOLEAN Options true, false	Lock the model in memory, preventing it from being swapped out when memory-mapped. This can improve performance but trades away some of the advantages of memory-mapping by requiring more RAM to run and potentially slowing down load times as the model loads into RAM.	false
numThread	Num thread	INTEGER	Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). 0 = let the runtime decide	false
numKeep	Nul keep	INTEGER		null
tfsz	Tfs Z	NUMBER	Tail-free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.	false
typicalP	Typical P	NUMBER		null
repeatLastN	Repeat last N	INTEGER	Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)	false
repeatPenalty	Repeat penalty	NUMBER	Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient.	false
mirostat	Mirostat	INTEGER	Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)	false
mirostatTau	Mirostat Tau	NUMBER	Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text.	false
mirostatEta	Mirostat Eta	NUMBER	Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive.	false
penalizeNewLine	Penalize new line	BOOLEAN Options true, false		null
truncate	Truncate	BOOLEAN Options true, false		null

Example JSON Structure

{
  "label" : "Ask",
  "name" : "ask",
  "parameters" : {
    "model" : "",
    "messages" : [ {
      "role" : "",
      "content" : "",
      "attachments" : [ {
        "extension" : "",
        "mimeType" : "",
        "name" : "",
        "url" : ""
      } ]
    } ],
    "response" : {
      "responseFormat" : "",
      "responseSchema" : ""
    },
    "keepAlive" : "",
    "maxTokens" : 1,
    "temperature" : 0.0,
    "topP" : 0.0,
    "topK" : 1,
    "frequencyPenalty" : 0.0,
    "presencePenalty" : 0.0,
    "stop" : [ "" ],
    "seed" : 1,
    "useNuma" : false,
    "numCtx" : 1,
    "numBatch" : 1,
    "numGpu" : 1,
    "mainGpu" : 1,
    "lowVram" : false,
    "f16kv" : false,
    "logitsAll" : false,
    "vocabOnly" : false,
    "useMmap" : false,
    "useMlock" : false,
    "numThread" : 1,
    "numKeep" : 1,
    "tfsz" : 0.0,
    "typicalP" : 0.0,
    "repeatLastN" : 1,
    "repeatPenalty" : 0.0,
    "mirostat" : 1,
    "mirostatTau" : 0.0,
    "mirostatEta" : 0.0,
    "penalizeNewLine" : false,
    "truncate" : false
  },
  "type" : "ollama/v1/ask"
}

Output

The output for this action is dynamic and may vary depending on the input parameters. To determine the exact structure of the output, you need to execute the action.

On this page