Skip to content

Ollama

Get up and running with large language models.

Categories: artificial-intelligence

Type: ollama/v1


Connections

Version: 1

Bearer Token

Properties

NameLabelTypeControl TypeDescriptionRequired
urlUrlSTRINGTEXTURL to your Ollama servernull

Actions

Ask

Name: ask

Ask anything you want.

Properties

NameLabelTypeControl TypeDescriptionRequired
modelModelSTRING
Options codellama, dolphin-phi, gemma, llama2, llama2-uncensored, llama3, llama3.1, llama3.2, llama3.2-vision, llama3.2-vision:90b, llama3.2:1b, llava, mistral, mistral-nemo, moondream, mxbai-embed-large, neural-chat, nomic-embed-text, orca-mini, phi, phi3, qwen2.5, starling-lm
SELECTID of the model to use.true
messagesMessagesARRAY
Items [{STRING(role), STRING(content), [FILE_ENTRY](attachments)}]
ARRAY_BUILDERA list of messages comprising the conversation so far.true
responseResponseOBJECT
Properties {STRING(responseFormat), STRING(responseSchema)}
OBJECT_BUILDERThe response from the API.false
keepAliveKeep alive forSTRINGTEXTControls how long the model will stay loaded into memory following the requestnull
maxTokensNum predictINTEGERINTEGERMaximum number of tokens to predict when generating text. (-1 = infinite generation, -2 = fill context)null
temperatureTemperatureNUMBERNUMBERControls randomness: Higher values will make the output more random, while lower values like will make it more focused and deterministic.null
topPTop PNUMBERNUMBERAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.null
topKTop KINTEGERINTEGERSpecify the number of token choices the generative uses to generate the next token.null
frequencyPenaltyFrequency PenaltyNUMBERNUMBERNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.null
presencePenaltyPresence PenaltyNUMBERNUMBERNumber between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.null
stopStopARRAY
Items [STRING]
ARRAY_BUILDERUp to 4 sequences where the API will stop generating further tokens.null
seedSeedINTEGERINTEGERKeeping the same seed would output the same response.null
useNumaUse NUMABOOLEAN
Options true, false
SELECTWhether to use NUMA.null
numCtxNum CTXINTEGERINTEGERSets the size of the context window used to generate the next token.null
numBatchNum batchINTEGERINTEGERPrompt processing maximum batch size.null
numGpuNum GPUINTEGERINTEGERThe number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable. 1 here indicates that NumGPU should be set dynamicallynull
mainGpuMain GPUINTEGERINTEGERWhen using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results.null
lowVramLow VRAMBOOLEAN
Options true, false
SELECTnull
f16kvF16 KVBOOLEAN
Options true, false
SELECTnull
logitsAllLogits allBOOLEAN
Options true, false
SELECTReturn logits for all the tokens, not just the last one. To enable completions to return logprobs, this must be true.null
vocabOnlyVocab onlyBOOLEAN
Options true, false
SELECTLoad only the vocabulary, not the weights.null
useMmapUse MMapBOOLEAN
Options true, false
SELECTBy default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed. However, if the model is larger than your total amount of RAM or if your system is low on available memory, using mmap might increase the risk of pageouts, negatively impacting performance. Disabling mmap results in slower load times but may reduce pageouts if you’re not using mlock. Note that if the model is larger than the total amount of RAM, turning off mmap would prevent the model from loading at all.null
useMlockUse MLockBOOLEAN
Options true, false
SELECTLock the model in memory, preventing it from being swapped out when memory-mapped. This can improve performance but trades away some of the advantages of memory-mapping by requiring more RAM to run and potentially slowing down load times as the model loads into RAM.null
numThreadNum threadINTEGERINTEGERSets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). 0 = let the runtime decidenull
numKeepNul keepINTEGERINTEGERnull
tfszTfs ZNUMBERNUMBERTail-free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.null
typicalPTypical PNUMBERNUMBERnull
repeatLastNRepeat last NINTEGERINTEGERSets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)null
repeatPenaltyRepeat penaltyNUMBERNUMBERSets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient.null
mirostatMirostatINTEGERINTEGEREnable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)null
mirostatTauMirostat TauNUMBERNUMBERControls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text.null
mirostatEtaMirostat EtaNUMBERNUMBERInfluences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive.null
penalizeNewLinePenalize new lineBOOLEAN
Options true, false
SELECTnull
truncateTruncateBOOLEAN
Options true, false
SELECTnull

JSON Example

{
"label" : "Ask",
"name" : "ask",
"parameters" : {
"model" : "",
"messages" : [ {
"role" : "",
"content" : "",
"attachments" : [ {
"extension" : "",
"mimeType" : "",
"name" : "",
"url" : ""
} ]
} ],
"response" : {
"responseFormat" : "",
"responseSchema" : ""
},
"keepAlive" : "",
"maxTokens" : 1,
"temperature" : 0.0,
"topP" : 0.0,
"topK" : 1,
"frequencyPenalty" : 0.0,
"presencePenalty" : 0.0,
"stop" : [ "" ],
"seed" : 1,
"useNuma" : false,
"numCtx" : 1,
"numBatch" : 1,
"numGpu" : 1,
"mainGpu" : 1,
"lowVram" : false,
"f16kv" : false,
"logitsAll" : false,
"vocabOnly" : false,
"useMmap" : false,
"useMlock" : false,
"numThread" : 1,
"numKeep" : 1,
"tfsz" : 0.0,
"typicalP" : 0.0,
"repeatLastN" : 1,
"repeatPenalty" : 0.0,
"mirostat" : 1,
"mirostatTau" : 0.0,
"mirostatEta" : 0.0,
"penalizeNewLine" : false,
"truncate" : false
},
"type" : "ollama/v1/ask"
}