ByteChef LogoByteChef

Firecrawl

Firecrawl allows you to turn entire websites into LLM-ready markdown

Categories: Helpers, Artificial Intelligence

Type: firecrawl/v1


Connections

Version: 1

Bearer Token

Properties

NameLabelTypeDescriptionRequired
tokenAPI TokenSTRINGtrue

Connection Setup

  1. Go to https://www.firecrawl.dev/app/api-keys
  2. Log in to your account.
  3. Copy the API key. Use these credentials to create a connection in ByteChef.

Actions

Crawl

Name: crawl

Crawl multiple URLs starting from a base URL and extract content.

Properties

NameLabelTypeDescriptionRequired
urlURLSTRINGThe base URL to start crawling from.true
formatsFormatsARRAY
Items [STRING]
Output formats to include in the response for each crawled page.false
promptPromptSTRINGA natural language prompt to generate crawler options. Explicitly set parameters will override the generated equivalents.false
excludePathsExclude PathsARRAY
Items [STRING]
URL pathname regex patterns that exclude matching URLs from the crawl.false
includePathsInclude PathsARRAY
Items [STRING]
URL pathname regex patterns that include matching URLs in the crawl. Only paths matching the specified patterns will be included.false
maxDiscoveryDepthMax Discovery DepthINTEGERMaximum depth to crawl based on discovery order. The root site and sitemapped pages have a discovery depth of 0.false
sitemapSitemapSTRING
Options include, skip, only
Sitemap mode: 'include' uses sitemap and other methods (default), 'skip' ignores the sitemap, 'only' crawls only sitemap URLs.false
limitLimitINTEGERMaximum number of pages to crawl. Default limit is 10000.false
scrapeOptionsScrape OptionsOBJECT
Properties {BOOLEAN(onlyMainContent), [STRING](includeTags), [STRING](excludeTags), INTEGER(maxAge), {}(headers), INTEGER(waitFor), BOOLEAN(mobile), BOOLEAN(skipTlsVerification), INTEGER(timeout), BOOLEAN(removeBase64Images), BOOLEAN(blockAds), STRING(proxy), {STRING(country), [STRING](languages)}(location), [{STRING(type), INTEGER(maxPages)}](parsers), BOOLEAN(storeInCache)}
Options for scraping each page during the crawl.false
ignoreQueryParametersIgnore Query ParametersBOOLEAN
Options true, false
Do not re-scrape the same path with different (or none) query parameters.false
regexOnFullURLRegex on Full URLBOOLEAN
Options true, false
When true, includePaths and excludePaths patterns are matched against the full URL including query parameters.false
crawlEntireDomainCrawl Entire DomainBOOLEAN
Options true, false
Allows the crawler to follow internal links to sibling or parent URLs, not just child paths.false
allowExternalLinksAllow External LinksBOOLEAN
Options true, false
Allows the crawler to follow links to external websites.false
allowSubdomainsAllow SubdomainsBOOLEAN
Options true, false
Allows the crawler to follow links to subdomains of the main domain.false
delayDelayINTEGERDelay in seconds between scrapes. Helps respect website rate limits.false
maxConcurrencyMax ConcurrencyINTEGERMaximum number of concurrent scrapes. If not specified, adheres to your team's concurrency limit.false
webhookWebhookOBJECT
Properties {STRING(url), {}(headers), {}(metadata), [STRING](events)}
Webhook configuration to receive crawl status updates.false
zeroDataRetentionZero Data RetentionBOOLEAN
Options true, false
Enable zero data retention for this crawl. Contact help@firecrawl.dev to enable this feature.false

Example JSON Structure

{
  "label" : "Crawl",
  "name" : "crawl",
  "parameters" : {
    "url" : "",
    "formats" : [ "" ],
    "prompt" : "",
    "excludePaths" : [ "" ],
    "includePaths" : [ "" ],
    "maxDiscoveryDepth" : 1,
    "sitemap" : "",
    "limit" : 1,
    "scrapeOptions" : {
      "onlyMainContent" : false,
      "includeTags" : [ "" ],
      "excludeTags" : [ "" ],
      "maxAge" : 1,
      "headers" : { },
      "waitFor" : 1,
      "mobile" : false,
      "skipTlsVerification" : false,
      "timeout" : 1,
      "removeBase64Images" : false,
      "blockAds" : false,
      "proxy" : "",
      "location" : {
        "country" : "",
        "languages" : [ "" ]
      },
      "parsers" : [ {
        "type" : "",
        "maxPages" : 1
      } ],
      "storeInCache" : false
    },
    "ignoreQueryParameters" : false,
    "regexOnFullURL" : false,
    "crawlEntireDomain" : false,
    "allowExternalLinks" : false,
    "allowSubdomains" : false,
    "delay" : 1,
    "maxConcurrency" : 1,
    "webhook" : {
      "url" : "",
      "headers" : { },
      "metadata" : { },
      "events" : [ "" ]
    },
    "zeroDataRetention" : false
  },
  "type" : "firecrawl/v1/crawl"
}

Output

Type: OBJECT

Properties

NameTypeDescription
successBOOLEAN
Options true, false
idSTRING
urlSTRING

Output Example

{
  "success" : false,
  "id" : "",
  "url" : ""
}

Get Crawl Status

Name: getCrawlStatus

Get the status and results of a crawl job.

Properties

NameLabelTypeDescriptionRequired
idCrawl IDSTRINGThe ID of the crawl job to retrieve status for.true

Example JSON Structure

{
  "label" : "Get Crawl Status",
  "name" : "getCrawlStatus",
  "parameters" : {
    "id" : ""
  },
  "type" : "firecrawl/v1/getCrawlStatus"
}

Output

Type: OBJECT

Properties

NameTypeDescription
statusSTRINGThe current status of the crawl: scraping, completed, or failed.
totalINTEGERThe total number of pages that were attempted to be crawled.
completedINTEGERThe number of pages that have been successfully crawled.
creditsUsedINTEGERThe number of credits used for the crawl.
expiresAtSTRINGThe date and time when the crawl results will expire.
nextSTRINGURL to retrieve the next batch of data. Returned if the crawl is not completed or if the response exceeds 10MB.
dataARRAY
Items [{STRING(markdown), STRING(html), STRING(rawHtml), [STRING](links), STRING(screenshot), {STRING(title), STRING(description), STRING(language), STRING(sourceURL), STRING(keywords), [STRING](ogLocaleAlternate), INTEGER(statusCode), STRING(error)}(metadata)}]
The scraped data from each crawled page.

Output Example

{
  "status" : "",
  "total" : 1,
  "completed" : 1,
  "creditsUsed" : 1,
  "expiresAt" : "",
  "next" : "",
  "data" : [ {
    "markdown" : "",
    "html" : "",
    "rawHtml" : "",
    "links" : [ "" ],
    "screenshot" : "",
    "metadata" : {
      "title" : "",
      "description" : "",
      "language" : "",
      "sourceURL" : "",
      "keywords" : "",
      "ogLocaleAlternate" : [ "" ],
      "statusCode" : 1,
      "error" : ""
    }
  } ]
}

Map

Name: map

Map multiple URLs from a website based on specified options.

Properties

NameLabelTypeDescriptionRequired
urlURLSTRINGThe base URL to start mapping from.true
searchSearchSTRINGSpecify a search query to order the results by relevance. Example: 'blog' will return URLs that contain the word 'blog' in the URL ordered by relevance.false
sitemapSitemapSTRING
Options include, skip, only
Sitemap mode when mapping. If you set it to 'skip', the sitemap won't be used to find URLs. If you set it to 'only', only URLs that are in the sitemap will be returned. By default ('include'), the sitemap and other methods will be used together to find URLs.false
includeSubdomainsInclude SubdomainsBOOLEAN
Options true, false
Include subdomains of the website.false
ignoreQueryParametersIgnore Query ParametersBOOLEAN
Options true, false
Do not return URLs with query parameters.false
ignoreCacheIgnore CacheBOOLEAN
Options true, false
Bypass the sitemap cache to retrieve fresh URLs. Sitemap data is cached for up to 7 days; use this parameter when your sitemap has been recently updated.false
limitLimitINTEGERMaximum number of links to return (1-100000).false
timeoutTimeoutINTEGERTimeout in milliseconds. There is no timeout by default.false
locationLocationOBJECT
Properties {STRING(country), [STRING](languages)}
Location settings for the request. When specified, this will use an appropriate proxy if available and emulate the corresponding language and timezone settings. Defaults to 'US' if not specified.false

Example JSON Structure

{
  "label" : "Map",
  "name" : "map",
  "parameters" : {
    "url" : "",
    "search" : "",
    "sitemap" : "",
    "includeSubdomains" : false,
    "ignoreQueryParameters" : false,
    "ignoreCache" : false,
    "limit" : 1,
    "timeout" : 1,
    "location" : {
      "country" : "",
      "languages" : [ "" ]
    }
  },
  "type" : "firecrawl/v1/map"
}

Output

Type: OBJECT

Properties

NameTypeDescription
successBOOLEAN
Options true, false
linksARRAY
Items [{STRING(url), STRING(title), STRING(description)}]

Output Example

{
  "success" : false,
  "links" : [ {
    "url" : "",
    "title" : "",
    "description" : ""
  } ]
}

Scrape URL

Name: scrape

Scrape a single URL and extract content in various formats.

Properties

NameLabelTypeDescriptionRequired
urlURLSTRINGThe URL to scrape.true
formatsFormatsARRAY
Items [STRING]
Output formats to include in the response (e.g., markdown, html, json).false
formatsSchemaJSON SchemaOBJECT
Properties {}
The schema to use for the JSON output. Must conform to JSON Schema.false
formatsPromptJSON PromptOBJECT
Properties {}
The prompt to use for the JSON outputfalse
onlyMainContentOnly Main ContentBOOLEAN
Options true, false
Only return the main content excluding headers, navs, footers, etc.false
includeTagsInclude TagsARRAY
Items [STRING]
HTML tags to include in the output.false
excludeTagsExclude TagsARRAY
Items [STRING]
HTML tags to exclude from the output.false
maxAgeMax AgeINTEGERReturns a cached version if younger than this age in milliseconds. Speeds up scrapes by up to 500%. Default is 2 days (172800000ms).false
headersHeadersOBJECT
Properties {}
Custom headers to send with the request (e.g., cookies, user-agent).false
waitForWait ForINTEGERDelay in milliseconds before fetching content, allowing the page to load. This is in addition to Firecrawl's smart wait feature.false
mobileMobileBOOLEAN
Options true, false
Emulate scraping from a mobile device. Useful for responsive pages and mobile screenshots.false
skipTlsVerificationSkip TLS VerificationBOOLEAN
Options true, false
Skip TLS certificate verification when making requests.false
timeoutTimeoutINTEGERTimeout in milliseconds for the request. Default is 30000 (30 seconds). Maximum is 300000 (5 minutes).false
removeBase64ImagesRemove Base64 ImagesBOOLEAN
Options true, false
Removes all base64 images from output. Image alt text remains but URL is replaced with placeholder.false
blockAdsBlock AdsBOOLEAN
Options true, false
Enables ad-blocking and cookie popup blocking.false
proxyProxySTRING
Options auto, basic, enhanced
Proxy type: 'basic' (fast, basic anti-bot), 'enhanced' (slower, advanced anti-bot, costs up to 5 credits), 'auto' (retries with enhanced if basic fails).false
locationLocationOBJECT
Properties {STRING(country), [STRING](languages)}
Location settings for the request. Uses appropriate proxy and emulates language/timezone.false
parsersParsersARRAY
Items [{STRING(type), INTEGER(maxPages)}]
Controls how files are processed. When 'pdf' is included (default), PDF content is extracted and converted to markdown (1 credit per page). Empty array returns PDF in base64 (1 credit flat).false
storeInCacheStore in CacheBOOLEAN
Options true, false
If true, page will be stored in Firecrawl index and cache. Set to false for data protection concerns.false
zeroDataRetentionZero Data RetentionBOOLEAN
Options true, false
Enable zero data retention for this scrape. Contact help@firecrawl.dev to enable this feature.false

Example JSON Structure

{
  "label" : "Scrape URL",
  "name" : "scrape",
  "parameters" : {
    "url" : "",
    "formats" : [ "" ],
    "formatsSchema" : { },
    "formatsPrompt" : { },
    "onlyMainContent" : false,
    "includeTags" : [ "" ],
    "excludeTags" : [ "" ],
    "maxAge" : 1,
    "headers" : { },
    "waitFor" : 1,
    "mobile" : false,
    "skipTlsVerification" : false,
    "timeout" : 1,
    "removeBase64Images" : false,
    "blockAds" : false,
    "proxy" : "",
    "location" : {
      "country" : "",
      "languages" : [ "" ]
    },
    "parsers" : [ {
      "type" : "",
      "maxPages" : 1
    } ],
    "storeInCache" : false,
    "zeroDataRetention" : false
  },
  "type" : "firecrawl/v1/scrape"
}

Output

Type: OBJECT

Properties

NameTypeDescription
successBOOLEAN
Options true, false
dataOBJECT
Properties {STRING(markdown), STRING(summary), STRING(html), STRING(rawHtml), STRING(screenshot), [STRING](links), {STRING(title), STRING(description), STRING(language), STRING(sourceURL), STRING(keywords), INTEGER(statusCode), STRING(error)}(metadata), STRING(warning)}

Output Example

{
  "success" : false,
  "data" : {
    "markdown" : "",
    "summary" : "",
    "html" : "",
    "rawHtml" : "",
    "screenshot" : "",
    "links" : [ "" ],
    "metadata" : {
      "title" : "",
      "description" : "",
      "language" : "",
      "sourceURL" : "",
      "keywords" : "",
      "statusCode" : 1,
      "error" : ""
    },
    "warning" : ""
  }
}

Name: search

Search the web and optionally scrape search results using Firecrawl.

Properties

NameLabelTypeDescriptionRequired
querySearch QuerySTRINGThe search query string.true
limitLimitINTEGERMaximum number of results to return (1-100).false
sourcesSourcesARRAY
Items [{STRING(type)}]
Sources to search. Determines the arrays available in the response.false
categoriesCategoriesARRAY
Items [{STRING(type)}]
Categories to filter results by (github, research, pdf).false
tbsTime-Based SearchSTRING
Options qdr:h, qdr:d, qdr:w, qdr:m, qdr:y
Filter results by time periods.false
locationLocationSTRINGLocation parameter for geo-targeted search results (e.g., 'San Francisco,California,United States').false
countryCountrySTRINGISO country code for geo-targeting search results (e.g., 'US', 'DE', 'FR', 'JP').false
timeoutTimeoutINTEGERTimeout in milliseconds.false
ignoreInvalidURLsIgnore Invalid URLsBOOLEAN
Options true, false
Excludes URLs from search results that are invalid for other Firecrawl endpoints. Useful when piping data to other Firecrawl API endpoints.false

Example JSON Structure

{
  "label" : "Search",
  "name" : "search",
  "parameters" : {
    "query" : "",
    "limit" : 1,
    "sources" : [ {
      "type" : ""
    } ],
    "categories" : [ {
      "type" : ""
    } ],
    "tbs" : "",
    "location" : "",
    "country" : "",
    "timeout" : 1,
    "ignoreInvalidURLs" : false
  },
  "type" : "firecrawl/v1/search"
}

Output

Type: OBJECT

Properties

NameTypeDescription
successBOOLEAN
Options true, false
dataOBJECT
Properties {[{STRING(title), STRING(description), STRING(url), STRING(markdown), STRING(html), STRING(rawHtml), [STRING](links), STRING(screenshot), {STRING(title), STRING(description), STRING(sourceURL), INTEGER(statusCode), STRING(error)}(metadata)}](web), [{STRING(title), STRING(imageUrl), INTEGER(imageWidth), INTEGER(imageHeight), STRING(url), INTEGER(position)}](images), [{STRING(title), STRING(snippet), STRING(url), STRING(date), STRING(imageUrl), INTEGER(position), STRING(markdown), STRING(html), STRING(rawHtml), [STRING](links), STRING(screenshot), {STRING(title), STRING(description), STRING(sourceURL), INTEGER(statusCode), STRING(error)}(metadata)}](news)}
warningSTRING
idSTRING
creditsUsedINTEGER

Output Example

{
  "success" : false,
  "data" : {
    "web" : [ {
      "title" : "",
      "description" : "",
      "url" : "",
      "markdown" : "",
      "html" : "",
      "rawHtml" : "",
      "links" : [ "" ],
      "screenshot" : "",
      "metadata" : {
        "title" : "",
        "description" : "",
        "sourceURL" : "",
        "statusCode" : 1,
        "error" : ""
      }
    } ],
    "images" : [ {
      "title" : "",
      "imageUrl" : "",
      "imageWidth" : 1,
      "imageHeight" : 1,
      "url" : "",
      "position" : 1
    } ],
    "news" : [ {
      "title" : "",
      "snippet" : "",
      "url" : "",
      "date" : "",
      "imageUrl" : "",
      "position" : 1,
      "markdown" : "",
      "html" : "",
      "rawHtml" : "",
      "links" : [ "" ],
      "screenshot" : "",
      "metadata" : {
        "title" : "",
        "description" : "",
        "sourceURL" : "",
        "statusCode" : 1,
        "error" : ""
      }
    } ]
  },
  "warning" : "",
  "id" : "",
  "creditsUsed" : 1
}

What to do if your action is not listed here?

If this component doesn't have the action you need, you can use Custom Action to create your own. Custom Actions empower you to define HTTP requests tailored to your specific requirements, allowing for greater flexibility in integrating with external services or APIs.

To create a Custom Action, simply specify the desired HTTP method, path, and any necessary parameters. This way, you can extend the functionality of your component beyond the predefined actions, ensuring that you can meet all your integration needs effectively.

How is this guide?

Last updated on

On this page