Skip to content

Ollama

Get up and running with large language models.

Categories: artificial-intelligence

Type: ollama/v1


Connections

Version: 1

Bearer Token

Properties

NameLabelTypeDescriptionRequired
urlUrlSTRINGURL to your Ollama servernull

Actions

Ask

Name: ask

Ask anything you want.

Properties

NameLabelTypeDescriptionRequired
modelModelSTRING
Options codellama, dolphin-phi, gemma, llama2, llama2-uncensored, llama3, llama3.1, llama3.2, llama3.2-vision, llama3.2-vision:90b, llama3.2:1b, llava, mistral, mistral-nemo, moondream, mxbai-embed-large, neural-chat, nomic-embed-text, orca-mini, phi, phi3, qwen2.5, starling-lm
ID of the model to use.true
messagesMessagesARRAY
Items [{STRING(role), STRING(content), [FILE_ENTRY](attachments)}]
A list of messages comprising the conversation so far.true
responseResponseOBJECT
Properties {STRING(responseFormat), STRING(responseSchema)}
The response from the API.false
keepAliveKeep alive forSTRINGControls how long the model will stay loaded into memory following the requestnull
maxTokensNum predictINTEGERMaximum number of tokens to predict when generating text. (-1 = infinite generation, -2 = fill context)null
temperatureTemperatureNUMBERControls randomness: Higher values will make the output more random, while lower values like will make it more focused and deterministic.null
topPTop PNUMBERAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.null
topKTop KINTEGERSpecify the number of token choices the generative uses to generate the next token.null
frequencyPenaltyFrequency PenaltyNUMBERNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.null
presencePenaltyPresence PenaltyNUMBERNumber between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.null
stopStopARRAY
Items [STRING]
Up to 4 sequences where the API will stop generating further tokens.null
seedSeedINTEGERKeeping the same seed would output the same response.null
useNumaUse NUMABOOLEAN
Options true, false
Whether to use NUMA.null
numCtxNum CTXINTEGERSets the size of the context window used to generate the next token.null
numBatchNum batchINTEGERPrompt processing maximum batch size.null
numGpuNum GPUINTEGERThe number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable. 1 here indicates that NumGPU should be set dynamicallynull
mainGpuMain GPUINTEGERWhen using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results.null
lowVramLow VRAMBOOLEAN
Options true, false
null
f16kvF16 KVBOOLEAN
Options true, false
null
logitsAllLogits allBOOLEAN
Options true, false
Return logits for all the tokens, not just the last one. To enable completions to return logprobs, this must be true.null
vocabOnlyVocab onlyBOOLEAN
Options true, false
Load only the vocabulary, not the weights.null
useMmapUse MMapBOOLEAN
Options true, false
By default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed. However, if the model is larger than your total amount of RAM or if your system is low on available memory, using mmap might increase the risk of pageouts, negatively impacting performance. Disabling mmap results in slower load times but may reduce pageouts if you’re not using mlock. Note that if the model is larger than the total amount of RAM, turning off mmap would prevent the model from loading at all.null
useMlockUse MLockBOOLEAN
Options true, false
Lock the model in memory, preventing it from being swapped out when memory-mapped. This can improve performance but trades away some of the advantages of memory-mapping by requiring more RAM to run and potentially slowing down load times as the model loads into RAM.null
numThreadNum threadINTEGERSets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). 0 = let the runtime decidenull
numKeepNul keepINTEGERnull
tfszTfs ZNUMBERTail-free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.null
typicalPTypical PNUMBERnull
repeatLastNRepeat last NINTEGERSets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)null
repeatPenaltyRepeat penaltyNUMBERSets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient.null
mirostatMirostatINTEGEREnable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)null
mirostatTauMirostat TauNUMBERControls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text.null
mirostatEtaMirostat EtaNUMBERInfluences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive.null
penalizeNewLinePenalize new lineBOOLEAN
Options true, false
null
truncateTruncateBOOLEAN
Options true, false
null

Example JSON Structure

{
"label" : "Ask",
"name" : "ask",
"parameters" : {
"model" : "",
"messages" : [ {
"role" : "",
"content" : "",
"attachments" : [ {
"extension" : "",
"mimeType" : "",
"name" : "",
"url" : ""
} ]
} ],
"response" : {
"responseFormat" : "",
"responseSchema" : ""
},
"keepAlive" : "",
"maxTokens" : 1,
"temperature" : 0.0,
"topP" : 0.0,
"topK" : 1,
"frequencyPenalty" : 0.0,
"presencePenalty" : 0.0,
"stop" : [ "" ],
"seed" : 1,
"useNuma" : false,
"numCtx" : 1,
"numBatch" : 1,
"numGpu" : 1,
"mainGpu" : 1,
"lowVram" : false,
"f16kv" : false,
"logitsAll" : false,
"vocabOnly" : false,
"useMmap" : false,
"useMlock" : false,
"numThread" : 1,
"numKeep" : 1,
"tfsz" : 0.0,
"typicalP" : 0.0,
"repeatLastN" : 1,
"repeatPenalty" : 0.0,
"mirostat" : 1,
"mirostatTau" : 0.0,
"mirostatEta" : 0.0,
"penalizeNewLine" : false,
"truncate" : false
},
"type" : "ollama/v1/ask"
}

Output

The output for this action is dynamic and may vary depending on the input parameters. To determine the exact structure of the output, you need to execute the action.