Skip to content

Ollama

Reference


Get up and running with large language models.

Categories: [artificial-intelligence]

Version: 1


Connections

Version: 1

Bearer Token

Properties

NameTypeControl TypeDescription
UrlSTRINGTEXTURL to your Ollama server

Actions

Ask

Ask anything you want.

Properties

NameTypeControl TypeDescription
ModelSTRINGSELECTID of the model to use.
Messages[{STRING(content), STRING(role)}]ARRAY_BUILDERA list of messages comprising the conversation so far.
Response formatINTEGERSELECTIn which format do you want the response to be in?
Keep alive forSTRINGTEXTControls how long the model will stay loaded into memory following the request
Num predictINTEGERINTEGERMaximum number of tokens to predict when generating text. (-1 = infinite generation, -2 = fill context)
TemperatureNUMBERNUMBERControls randomness: Higher values will make the output more random, while lower values like will make it more focused and deterministic.
Top PNUMBERNUMBERAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
Top KINTEGERINTEGERSpecify the number of token choices the generative uses to generate the next token.
Frequency penaltyNUMBERNUMBERNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
Presence penaltyNUMBERNUMBERNumber between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
Stop[STRING]ARRAY_BUILDERUp to 4 sequences where the API will stop generating further tokens.
Functions[STRING]ARRAY_BUILDEREnter the names of functions you want to use.
SeedINTEGERINTEGERKeeping the same seed would output the same response.
Use NUMABOOLEANSELECTWhether to use NUMA.
Num CTXINTEGERINTEGERSets the size of the context window used to generate the next token.
Num batchINTEGERINTEGERPrompt processing maximum batch size.
Num GPUINTEGERINTEGERThe number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable. 1 here indicates that NumGPU should be set dynamically
Main GPUINTEGERINTEGERWhen using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results.
Low VRAMBOOLEANSELECT
F16 KVBOOLEANSELECT
Logits allBOOLEANSELECTReturn logits for all the tokens, not just the last one. To enable completions to return logprobs, this must be true.
Vocab onlyBOOLEANSELECTLoad only the vocabulary, not the weights.
Use MMapBOOLEANSELECTBy default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed. However, if the model is larger than your total amount of RAM or if your system is low on available memory, using mmap might increase the risk of pageouts, negatively impacting performance. Disabling mmap results in slower load times but may reduce pageouts if you’re not using mlock. Note that if the model is larger than the total amount of RAM, turning off mmap would prevent the model from loading at all.
Use MLockBOOLEANSELECTLock the model in memory, preventing it from being swapped out when memory-mapped. This can improve performance but trades away some of the advantages of memory-mapping by requiring more RAM to run and potentially slowing down load times as the model loads into RAM.
Num threadINTEGERINTEGERSets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). 0 = let the runtime decide
Nul keepINTEGERINTEGER
Tfs ZNUMBERNUMBERTail-free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.
Typical PNUMBERNUMBER
Repeat last NINTEGERINTEGERSets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)
Repeat penaltyNUMBERNUMBERSets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient.
MirostatINTEGERINTEGEREnable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
Mirostat TauNUMBERNUMBERControls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text.
Mirostat EtaNUMBERNUMBERInfluences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive.
Penalize new lineBOOLEANSELECT
TruncateBOOLEANSELECT