Firecrawl

Categories: Helpers, Artificial Intelligence

Type: firecrawl/v1

Connections

Version: 1

Bearer Token

Properties

Name	Label	Type	Description	Required
token	API Token	STRING		true

Connection Setup

Go to https://www.firecrawl.dev/app/api-keys
Log in to your account.
Copy the API key. Use these credentials to create a connection in ByteChef.

Actions

Crawl

Name: crawl

Crawl multiple URLs starting from a base URL and extract content.

Properties

Name	Label	Type	Description	Required
url	URL	STRING	The base URL to start crawling from.	true
formats	Formats	ARRAY Items [STRING]	Output formats to include in the response for each crawled page.	false
prompt	Prompt	STRING	A natural language prompt to generate crawler options. Explicitly set parameters will override the generated equivalents.	false
excludePaths	Exclude Paths	ARRAY Items [STRING]	URL pathname regex patterns that exclude matching URLs from the crawl.	false
includePaths	Include Paths	ARRAY Items [STRING]	URL pathname regex patterns that include matching URLs in the crawl. Only paths matching the specified patterns will be included.	false
maxDiscoveryDepth	Max Discovery Depth	INTEGER	Maximum depth to crawl based on discovery order. The root site and sitemapped pages have a discovery depth of 0.	false
sitemap	Sitemap	STRING Options include, skip, only	Sitemap mode: 'include' uses sitemap and other methods (default), 'skip' ignores the sitemap, 'only' crawls only sitemap URLs.	false
limit	Limit	INTEGER	Maximum number of pages to crawl. Default limit is 10000.	false
scrapeOptions	Scrape Options	OBJECT Properties {BOOLEAN(onlyMainContent), [STRING](includeTags), [STRING](excludeTags), INTEGER(maxAge), {}(headers), INTEGER(waitFor), BOOLEAN(mobile), BOOLEAN(skipTlsVerification), INTEGER(timeout), BOOLEAN(removeBase64Images), BOOLEAN(blockAds), STRING(proxy), {STRING(country), [STRING](languages)}(location), [{STRING(type), INTEGER(maxPages)}](parsers), BOOLEAN(storeInCache)}	Options for scraping each page during the crawl.	false
ignoreQueryParameters	Ignore Query Parameters	BOOLEAN Options true, false	Do not re-scrape the same path with different (or none) query parameters.	false
regexOnFullURL	Regex on Full URL	BOOLEAN Options true, false	When true, includePaths and excludePaths patterns are matched against the full URL including query parameters.	false
crawlEntireDomain	Crawl Entire Domain	BOOLEAN Options true, false	Allows the crawler to follow internal links to sibling or parent URLs, not just child paths.	false
allowExternalLinks	Allow External Links	BOOLEAN Options true, false	Allows the crawler to follow links to external websites.	false
allowSubdomains	Allow Subdomains	BOOLEAN Options true, false	Allows the crawler to follow links to subdomains of the main domain.	false
delay	Delay	INTEGER	Delay in seconds between scrapes. Helps respect website rate limits.	false
maxConcurrency	Max Concurrency	INTEGER	Maximum number of concurrent scrapes. If not specified, adheres to your team's concurrency limit.	false
webhook	Webhook	OBJECT Properties {STRING(url), {}(headers), {}(metadata), [STRING](events)}	Webhook configuration to receive crawl status updates.	false
zeroDataRetention	Zero Data Retention	BOOLEAN Options true, false	Enable zero data retention for this crawl. Contact help@firecrawl.dev to enable this feature.	false

Example JSON Structure

{
  "label" : "Crawl",
  "name" : "crawl",
  "parameters" : {
    "url" : "",
    "formats" : [ "" ],
    "prompt" : "",
    "excludePaths" : [ "" ],
    "includePaths" : [ "" ],
    "maxDiscoveryDepth" : 1,
    "sitemap" : "",
    "limit" : 1,
    "scrapeOptions" : {
      "onlyMainContent" : false,
      "includeTags" : [ "" ],
      "excludeTags" : [ "" ],
      "maxAge" : 1,
      "headers" : { },
      "waitFor" : 1,
      "mobile" : false,
      "skipTlsVerification" : false,
      "timeout" : 1,
      "removeBase64Images" : false,
      "blockAds" : false,
      "proxy" : "",
      "location" : {
        "country" : "",
        "languages" : [ "" ]
      },
      "parsers" : [ {
        "type" : "",
        "maxPages" : 1
      } ],
      "storeInCache" : false
    },
    "ignoreQueryParameters" : false,
    "regexOnFullURL" : false,
    "crawlEntireDomain" : false,
    "allowExternalLinks" : false,
    "allowSubdomains" : false,
    "delay" : 1,
    "maxConcurrency" : 1,
    "webhook" : {
      "url" : "",
      "headers" : { },
      "metadata" : { },
      "events" : [ "" ]
    },
    "zeroDataRetention" : false
  },
  "type" : "firecrawl/v1/crawl"
}

Properties

Name	Type	Description
success	BOOLEAN Options true, false
id	STRING
url	STRING

Output Example

{
  "success" : false,
  "id" : "",
  "url" : ""
}

Get Crawl Status

Name: getCrawlStatus

Get the status and results of a crawl job.

Properties

Name	Label	Type	Description	Required
id	Crawl ID	STRING	The ID of the crawl job to retrieve status for.	true

Example JSON Structure

{
  "label" : "Get Crawl Status",
  "name" : "getCrawlStatus",
  "parameters" : {
    "id" : ""
  },
  "type" : "firecrawl/v1/getCrawlStatus"
}

Properties

Name	Type	Description
status	STRING	The current status of the crawl: scraping, completed, or failed.
total	INTEGER	The total number of pages that were attempted to be crawled.
completed	INTEGER	The number of pages that have been successfully crawled.
creditsUsed	INTEGER	The number of credits used for the crawl.
expiresAt	STRING	The date and time when the crawl results will expire.
next	STRING	URL to retrieve the next batch of data. Returned if the crawl is not completed or if the response exceeds 10MB.
data	ARRAY Items [{STRING(markdown), STRING(html), STRING(rawHtml), [STRING](links), STRING(screenshot), {STRING(title), STRING(description), STRING(language), STRING(sourceURL), STRING(keywords), [STRING](ogLocaleAlternate), INTEGER(statusCode), STRING(error)}(metadata)}]	The scraped data from each crawled page.

Output Example

{
  "status" : "",
  "total" : 1,
  "completed" : 1,
  "creditsUsed" : 1,
  "expiresAt" : "",
  "next" : "",
  "data" : [ {
    "markdown" : "",
    "html" : "",
    "rawHtml" : "",
    "links" : [ "" ],
    "screenshot" : "",
    "metadata" : {
      "title" : "",
      "description" : "",
      "language" : "",
      "sourceURL" : "",
      "keywords" : "",
      "ogLocaleAlternate" : [ "" ],
      "statusCode" : 1,
      "error" : ""
    }
  } ]
}

Map

Name: map

Map multiple URLs from a website based on specified options.

Properties

Name	Label	Type	Description	Required
url	URL	STRING	The base URL to start mapping from.	true
search	Search	STRING	Specify a search query to order the results by relevance. Example: 'blog' will return URLs that contain the word 'blog' in the URL ordered by relevance.	false
sitemap	Sitemap	STRING Options include, skip, only	Sitemap mode when mapping. If you set it to 'skip', the sitemap won't be used to find URLs. If you set it to 'only', only URLs that are in the sitemap will be returned. By default ('include'), the sitemap and other methods will be used together to find URLs.	false
includeSubdomains	Include Subdomains	BOOLEAN Options true, false	Include subdomains of the website.	false
ignoreQueryParameters	Ignore Query Parameters	BOOLEAN Options true, false	Do not return URLs with query parameters.	false
ignoreCache	Ignore Cache	BOOLEAN Options true, false	Bypass the sitemap cache to retrieve fresh URLs. Sitemap data is cached for up to 7 days; use this parameter when your sitemap has been recently updated.	false
limit	Limit	INTEGER	Maximum number of links to return (1-100000).	false
timeout	Timeout	INTEGER	Timeout in milliseconds. There is no timeout by default.	false
location	Location	OBJECT Properties {STRING(country), [STRING](languages)}	Location settings for the request. When specified, this will use an appropriate proxy if available and emulate the corresponding language and timezone settings. Defaults to 'US' if not specified.	false

Example JSON Structure

{
  "label" : "Map",
  "name" : "map",
  "parameters" : {
    "url" : "",
    "search" : "",
    "sitemap" : "",
    "includeSubdomains" : false,
    "ignoreQueryParameters" : false,
    "ignoreCache" : false,
    "limit" : 1,
    "timeout" : 1,
    "location" : {
      "country" : "",
      "languages" : [ "" ]
    }
  },
  "type" : "firecrawl/v1/map"
}

Properties

Name	Type	Description
success	BOOLEAN Options true, false
links	ARRAY Items [{STRING(url), STRING(title), STRING(description)}]

Output Example

{
  "success" : false,
  "links" : [ {
    "url" : "",
    "title" : "",
    "description" : ""
  } ]
}

Scrape URL

Name: scrape

Scrape a single URL and extract content in various formats.

Properties

Name	Label	Type	Description	Required
url	URL	STRING	The URL to scrape.	true
formats	Formats	ARRAY Items [STRING]	Output formats to include in the response (e.g., markdown, html, json).	false
formatsSchema	JSON Schema	OBJECT Properties {}	The schema to use for the JSON output. Must conform to JSON Schema.	false
formatsPrompt	JSON Prompt	OBJECT Properties {}	The prompt to use for the JSON output	false
onlyMainContent	Only Main Content	BOOLEAN Options true, false	Only return the main content excluding headers, navs, footers, etc.	false
includeTags	Include Tags	ARRAY Items [STRING]	HTML tags to include in the output.	false
excludeTags	Exclude Tags	ARRAY Items [STRING]	HTML tags to exclude from the output.	false
maxAge	Max Age	INTEGER	Returns a cached version if younger than this age in milliseconds. Speeds up scrapes by up to 500%. Default is 2 days (172800000ms).	false
headers	Headers	OBJECT Properties {}	Custom headers to send with the request (e.g., cookies, user-agent).	false
waitFor	Wait For	INTEGER	Delay in milliseconds before fetching content, allowing the page to load. This is in addition to Firecrawl's smart wait feature.	false
mobile	Mobile	BOOLEAN Options true, false	Emulate scraping from a mobile device. Useful for responsive pages and mobile screenshots.	false
skipTlsVerification	Skip TLS Verification	BOOLEAN Options true, false	Skip TLS certificate verification when making requests.	false
timeout	Timeout	INTEGER	Timeout in milliseconds for the request. Default is 30000 (30 seconds). Maximum is 300000 (5 minutes).	false
removeBase64Images	Remove Base64 Images	BOOLEAN Options true, false	Removes all base64 images from output. Image alt text remains but URL is replaced with placeholder.	false
blockAds	Block Ads	BOOLEAN Options true, false	Enables ad-blocking and cookie popup blocking.	false
proxy	Proxy	STRING Options auto, basic, enhanced	Proxy type: 'basic' (fast, basic anti-bot), 'enhanced' (slower, advanced anti-bot, costs up to 5 credits), 'auto' (retries with enhanced if basic fails).	false
location	Location	OBJECT Properties {STRING(country), [STRING](languages)}	Location settings for the request. Uses appropriate proxy and emulates language/timezone.	false
parsers	Parsers	ARRAY Items [{STRING(type), INTEGER(maxPages)}]	Controls how files are processed. When 'pdf' is included (default), PDF content is extracted and converted to markdown (1 credit per page). Empty array returns PDF in base64 (1 credit flat).	false
storeInCache	Store in Cache	BOOLEAN Options true, false	If true, page will be stored in Firecrawl index and cache. Set to false for data protection concerns.	false
zeroDataRetention	Zero Data Retention	BOOLEAN Options true, false	Enable zero data retention for this scrape. Contact help@firecrawl.dev to enable this feature.	false

Example JSON Structure

{
  "label" : "Scrape URL",
  "name" : "scrape",
  "parameters" : {
    "url" : "",
    "formats" : [ "" ],
    "formatsSchema" : { },
    "formatsPrompt" : { },
    "onlyMainContent" : false,
    "includeTags" : [ "" ],
    "excludeTags" : [ "" ],
    "maxAge" : 1,
    "headers" : { },
    "waitFor" : 1,
    "mobile" : false,
    "skipTlsVerification" : false,
    "timeout" : 1,
    "removeBase64Images" : false,
    "blockAds" : false,
    "proxy" : "",
    "location" : {
      "country" : "",
      "languages" : [ "" ]
    },
    "parsers" : [ {
      "type" : "",
      "maxPages" : 1
    } ],
    "storeInCache" : false,
    "zeroDataRetention" : false
  },
  "type" : "firecrawl/v1/scrape"
}

Properties

Name	Type	Description
success	BOOLEAN Options true, false
data	OBJECT Properties {STRING(markdown), STRING(summary), STRING(html), STRING(rawHtml), STRING(screenshot), [STRING](links), {STRING(title), STRING(description), STRING(language), STRING(sourceURL), STRING(keywords), INTEGER(statusCode), STRING(error)}(metadata), STRING(warning)}

Output Example

{
  "success" : false,
  "data" : {
    "markdown" : "",
    "summary" : "",
    "html" : "",
    "rawHtml" : "",
    "screenshot" : "",
    "links" : [ "" ],
    "metadata" : {
      "title" : "",
      "description" : "",
      "language" : "",
      "sourceURL" : "",
      "keywords" : "",
      "statusCode" : 1,
      "error" : ""
    },
    "warning" : ""
  }
}

Search

Name: search

Search the web and optionally scrape search results using Firecrawl.

Properties

Name	Label	Type	Description	Required
query	Search Query	STRING	The search query string.	true
limit	Limit	INTEGER	Maximum number of results to return (1-100).	false
sources	Sources	ARRAY Items [{STRING(type)}]	Sources to search. Determines the arrays available in the response.	false
categories	Categories	ARRAY Items [{STRING(type)}]	Categories to filter results by (github, research, pdf).	false
tbs	Time-Based Search	STRING Options qdr:h, qdr:d, qdr:w, qdr:m, qdr:y	Filter results by time periods.	false
location	Location	STRING	Location parameter for geo-targeted search results (e.g., 'San Francisco,California,United States').	false
country	Country	STRING	ISO country code for geo-targeting search results (e.g., 'US', 'DE', 'FR', 'JP').	false
timeout	Timeout	INTEGER	Timeout in milliseconds.	false
ignoreInvalidURLs	Ignore Invalid URLs	BOOLEAN Options true, false	Excludes URLs from search results that are invalid for other Firecrawl endpoints. Useful when piping data to other Firecrawl API endpoints.	false

Example JSON Structure

{
  "label" : "Search",
  "name" : "search",
  "parameters" : {
    "query" : "",
    "limit" : 1,
    "sources" : [ {
      "type" : ""
    } ],
    "categories" : [ {
      "type" : ""
    } ],
    "tbs" : "",
    "location" : "",
    "country" : "",
    "timeout" : 1,
    "ignoreInvalidURLs" : false
  },
  "type" : "firecrawl/v1/search"
}

Properties

Name	Type	Description
success	BOOLEAN Options true, false
data	OBJECT Properties {[{STRING(title), STRING(description), STRING(url), STRING(markdown), STRING(html), STRING(rawHtml), [STRING](links), STRING(screenshot), {STRING(title), STRING(description), STRING(sourceURL), INTEGER(statusCode), STRING(error)}(metadata)}](web), [{STRING(title), STRING(imageUrl), INTEGER(imageWidth), INTEGER(imageHeight), STRING(url), INTEGER(position)}](images), [{STRING(title), STRING(snippet), STRING(url), STRING(date), STRING(imageUrl), INTEGER(position), STRING(markdown), STRING(html), STRING(rawHtml), [STRING](links), STRING(screenshot), {STRING(title), STRING(description), STRING(sourceURL), INTEGER(statusCode), STRING(error)}(metadata)}](news)}
warning	STRING
id	STRING
creditsUsed	INTEGER

Output Example

{
  "success" : false,
  "data" : {
    "web" : [ {
      "title" : "",
      "description" : "",
      "url" : "",
      "markdown" : "",
      "html" : "",
      "rawHtml" : "",
      "links" : [ "" ],
      "screenshot" : "",
      "metadata" : {
        "title" : "",
        "description" : "",
        "sourceURL" : "",
        "statusCode" : 1,
        "error" : ""
      }
    } ],
    "images" : [ {
      "title" : "",
      "imageUrl" : "",
      "imageWidth" : 1,
      "imageHeight" : 1,
      "url" : "",
      "position" : 1
    } ],
    "news" : [ {
      "title" : "",
      "snippet" : "",
      "url" : "",
      "date" : "",
      "imageUrl" : "",
      "position" : 1,
      "markdown" : "",
      "html" : "",
      "rawHtml" : "",
      "links" : [ "" ],
      "screenshot" : "",
      "metadata" : {
        "title" : "",
        "description" : "",
        "sourceURL" : "",
        "statusCode" : 1,
        "error" : ""
      }
    } ]
  },
  "warning" : "",
  "id" : "",
  "creditsUsed" : 1
}

What to do if your action is not listed here?

If this component doesn't have the action you need, you can use Custom Action to create your own. Custom Actions empower you to define HTTP requests tailored to your specific requirements, allowing for greater flexibility in integrating with external services or APIs.

To create a Custom Action, simply specify the desired HTTP method, path, and any necessary parameters. This way, you can extend the functionality of your component beyond the predefined actions, ensuring that you can meet all your integration needs effectively.

On this page