ByteChef LogoByteChef
Components

Ollama

Get up and running with large language models.

Categories: Artificial Intelligence

Type: ollama/v1


Connections

Version: 1

Bearer Token

Properties

NameLabelTypeDescriptionRequired
urlUrlSTRINGURL to your Ollama serverfalse

Actions

Ask

Name: ask

Ask anything you want.

Properties

NameLabelTypeDescriptionRequired
modelModelSTRING
Options codellama, dolphin-phi, gemma, llama2, llama2-uncensored, llama3, llama3.1, llama3.2, llama3.2-vision, llama3.2-vision:90b, llama3.2:1b, llama3.2:3b, llava, mistral, mistral-nemo, moondream, mxbai-embed-large, neural-chat, nomic-embed-text, orca-mini, phi, phi3, qwen2.5, qwq, starling-lm
ID of the model to use.true
messagesMessagesARRAY
Items [{STRING(role), STRING(content), [FILE_ENTRY](attachments)}]
A list of messages comprising the conversation so far.true
responseResponseOBJECT
Properties {STRING(responseFormat), STRING(responseSchema)}
The response from the API.false
keepAliveKeep alive forSTRINGControls how long the model will stay loaded into memory following the requestfalse
maxTokensNum predictINTEGERMaximum number of tokens to predict when generating text. (-1 = infinite generation, -2 = fill context)false
temperatureTemperatureNUMBERControls randomness: Higher values will make the output more random, while lower values like will make it more focused and deterministic.false
topPTop PNUMBERAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.false
topKTop KINTEGERSpecify the number of token choices the generative uses to generate the next token.false
frequencyPenaltyFrequency PenaltyNUMBERNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.false
presencePenaltyPresence PenaltyNUMBERNumber between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.false
stopStopARRAY
Items [STRING]
Up to 4 sequences where the API will stop generating further tokens.false
seedSeedINTEGERKeeping the same seed would output the same response.false
useNumaUse NUMABOOLEAN
Options true, false
Whether to use NUMA.false
numCtxNum CTXINTEGERSets the size of the context window used to generate the next token.false
numBatchNum batchINTEGERPrompt processing maximum batch size.false
numGpuNum GPUINTEGERThe number of layers to send to the GPU(s). On macOS it defaults to 1 to enable metal support, 0 to disable. 1 here indicates that NumGPU should be set dynamicallyfalse
mainGpuMain GPUINTEGERWhen using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results.false
lowVramLow VRAMBOOLEAN
Options true, false
null
f16kvF16 KVBOOLEAN
Options true, false
null
logitsAllLogits allBOOLEAN
Options true, false
Return logits for all the tokens, not just the last one. To enable completions to return logprobs, this must be true.false
vocabOnlyVocab onlyBOOLEAN
Options true, false
Load only the vocabulary, not the weights.false
useMmapUse MMapBOOLEAN
Options true, false
By default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed. However, if the model is larger than your total amount of RAM or if your system is low on available memory, using mmap might increase the risk of pageouts, negatively impacting performance. Disabling mmap results in slower load times but may reduce pageouts if you’re not using mlock. Note that if the model is larger than the total amount of RAM, turning off mmap would prevent the model from loading at all.false
useMlockUse MLockBOOLEAN
Options true, false
Lock the model in memory, preventing it from being swapped out when memory-mapped. This can improve performance but trades away some of the advantages of memory-mapping by requiring more RAM to run and potentially slowing down load times as the model loads into RAM.false
numThreadNum threadINTEGERSets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). 0 = let the runtime decidefalse
numKeepNul keepINTEGERnull
tfszTfs ZNUMBERTail-free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.false
typicalPTypical PNUMBERnull
repeatLastNRepeat last NINTEGERSets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)false
repeatPenaltyRepeat penaltyNUMBERSets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient.false
mirostatMirostatINTEGEREnable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)false
mirostatTauMirostat TauNUMBERControls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text.false
mirostatEtaMirostat EtaNUMBERInfluences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive.false
penalizeNewLinePenalize new lineBOOLEAN
Options true, false
null
truncateTruncateBOOLEAN
Options true, false
null

Example JSON Structure

{
  "label" : "Ask",
  "name" : "ask",
  "parameters" : {
    "model" : "",
    "messages" : [ {
      "role" : "",
      "content" : "",
      "attachments" : [ {
        "extension" : "",
        "mimeType" : "",
        "name" : "",
        "url" : ""
      } ]
    } ],
    "response" : {
      "responseFormat" : "",
      "responseSchema" : ""
    },
    "keepAlive" : "",
    "maxTokens" : 1,
    "temperature" : 0.0,
    "topP" : 0.0,
    "topK" : 1,
    "frequencyPenalty" : 0.0,
    "presencePenalty" : 0.0,
    "stop" : [ "" ],
    "seed" : 1,
    "useNuma" : false,
    "numCtx" : 1,
    "numBatch" : 1,
    "numGpu" : 1,
    "mainGpu" : 1,
    "lowVram" : false,
    "f16kv" : false,
    "logitsAll" : false,
    "vocabOnly" : false,
    "useMmap" : false,
    "useMlock" : false,
    "numThread" : 1,
    "numKeep" : 1,
    "tfsz" : 0.0,
    "typicalP" : 0.0,
    "repeatLastN" : 1,
    "repeatPenalty" : 0.0,
    "mirostat" : 1,
    "mirostatTau" : 0.0,
    "mirostatEta" : 0.0,
    "penalizeNewLine" : false,
    "truncate" : false
  },
  "type" : "ollama/v1/ask"
}

Output

The output for this action is dynamic and may vary depending on the input parameters. To determine the exact structure of the output, you need to execute the action.