Firecrawl
Firecrawl allows you to turn entire websites into LLM-ready markdown
Categories: Helpers, Artificial Intelligence
Type: firecrawl/v1
Connections
Version: 1
Bearer Token
Properties
| Name | Label | Type | Description | Required |
|---|---|---|---|---|
| token | API Token | STRING | true |
Connection Setup
- Go to https://www.firecrawl.dev/app/api-keys
- Log in to your account.
- Copy the API key. Use these credentials to create a connection in ByteChef.
Actions
Crawl
Name: crawl
Crawl multiple URLs starting from a base URL and extract content.
Properties
| Name | Label | Type | Description | Required |
|---|---|---|---|---|
| url | URL | STRING | The base URL to start crawling from. | true |
| formats | Formats | ARRAY Items[STRING] | Output formats to include in the response for each crawled page. | false |
| prompt | Prompt | STRING | A natural language prompt to generate crawler options. Explicitly set parameters will override the generated equivalents. | false |
| excludePaths | Exclude Paths | ARRAY Items[STRING] | URL pathname regex patterns that exclude matching URLs from the crawl. | false |
| includePaths | Include Paths | ARRAY Items[STRING] | URL pathname regex patterns that include matching URLs in the crawl. Only paths matching the specified patterns will be included. | false |
| maxDiscoveryDepth | Max Discovery Depth | INTEGER | Maximum depth to crawl based on discovery order. The root site and sitemapped pages have a discovery depth of 0. | false |
| sitemap | Sitemap | STRING Optionsinclude, skip, only | Sitemap mode: 'include' uses sitemap and other methods (default), 'skip' ignores the sitemap, 'only' crawls only sitemap URLs. | false |
| limit | Limit | INTEGER | Maximum number of pages to crawl. Default limit is 10000. | false |
| scrapeOptions | Scrape Options | OBJECT Properties{BOOLEAN(onlyMainContent), [STRING](includeTags), [STRING](excludeTags), INTEGER(maxAge), {}(headers), INTEGER(waitFor), BOOLEAN(mobile), BOOLEAN(skipTlsVerification), INTEGER(timeout), BOOLEAN(removeBase64Images), BOOLEAN(blockAds), STRING(proxy), {STRING(country), [STRING](languages)}(location), [{STRING(type), INTEGER(maxPages)}](parsers), BOOLEAN(storeInCache)} | Options for scraping each page during the crawl. | false |
| ignoreQueryParameters | Ignore Query Parameters | BOOLEAN Optionstrue, false | Do not re-scrape the same path with different (or none) query parameters. | false |
| regexOnFullURL | Regex on Full URL | BOOLEAN Optionstrue, false | When true, includePaths and excludePaths patterns are matched against the full URL including query parameters. | false |
| crawlEntireDomain | Crawl Entire Domain | BOOLEAN Optionstrue, false | Allows the crawler to follow internal links to sibling or parent URLs, not just child paths. | false |
| allowExternalLinks | Allow External Links | BOOLEAN Optionstrue, false | Allows the crawler to follow links to external websites. | false |
| allowSubdomains | Allow Subdomains | BOOLEAN Optionstrue, false | Allows the crawler to follow links to subdomains of the main domain. | false |
| delay | Delay | INTEGER | Delay in seconds between scrapes. Helps respect website rate limits. | false |
| maxConcurrency | Max Concurrency | INTEGER | Maximum number of concurrent scrapes. If not specified, adheres to your team's concurrency limit. | false |
| webhook | Webhook | OBJECT Properties{STRING(url), {}(headers), {}(metadata), [STRING](events)} | Webhook configuration to receive crawl status updates. | false |
| zeroDataRetention | Zero Data Retention | BOOLEAN Optionstrue, false | Enable zero data retention for this crawl. Contact help@firecrawl.dev to enable this feature. | false |
Example JSON Structure
{
"label" : "Crawl",
"name" : "crawl",
"parameters" : {
"url" : "",
"formats" : [ "" ],
"prompt" : "",
"excludePaths" : [ "" ],
"includePaths" : [ "" ],
"maxDiscoveryDepth" : 1,
"sitemap" : "",
"limit" : 1,
"scrapeOptions" : {
"onlyMainContent" : false,
"includeTags" : [ "" ],
"excludeTags" : [ "" ],
"maxAge" : 1,
"headers" : { },
"waitFor" : 1,
"mobile" : false,
"skipTlsVerification" : false,
"timeout" : 1,
"removeBase64Images" : false,
"blockAds" : false,
"proxy" : "",
"location" : {
"country" : "",
"languages" : [ "" ]
},
"parsers" : [ {
"type" : "",
"maxPages" : 1
} ],
"storeInCache" : false
},
"ignoreQueryParameters" : false,
"regexOnFullURL" : false,
"crawlEntireDomain" : false,
"allowExternalLinks" : false,
"allowSubdomains" : false,
"delay" : 1,
"maxConcurrency" : 1,
"webhook" : {
"url" : "",
"headers" : { },
"metadata" : { },
"events" : [ "" ]
},
"zeroDataRetention" : false
},
"type" : "firecrawl/v1/crawl"
}Output
Type: OBJECT
Properties
| Name | Type | Description |
|---|---|---|
| success | BOOLEAN Optionstrue, false | |
| id | STRING | |
| url | STRING |
Output Example
{
"success" : false,
"id" : "",
"url" : ""
}Get Crawl Status
Name: getCrawlStatus
Get the status and results of a crawl job.
Properties
| Name | Label | Type | Description | Required |
|---|---|---|---|---|
| id | Crawl ID | STRING | The ID of the crawl job to retrieve status for. | true |
Example JSON Structure
{
"label" : "Get Crawl Status",
"name" : "getCrawlStatus",
"parameters" : {
"id" : ""
},
"type" : "firecrawl/v1/getCrawlStatus"
}Output
Type: OBJECT
Properties
| Name | Type | Description |
|---|---|---|
| status | STRING | The current status of the crawl: scraping, completed, or failed. |
| total | INTEGER | The total number of pages that were attempted to be crawled. |
| completed | INTEGER | The number of pages that have been successfully crawled. |
| creditsUsed | INTEGER | The number of credits used for the crawl. |
| expiresAt | STRING | The date and time when the crawl results will expire. |
| next | STRING | URL to retrieve the next batch of data. Returned if the crawl is not completed or if the response exceeds 10MB. |
| data | ARRAY Items[{STRING(markdown), STRING(html), STRING(rawHtml), [STRING](links), STRING(screenshot), {STRING(title), STRING(description), STRING(language), STRING(sourceURL), STRING(keywords), [STRING](ogLocaleAlternate), INTEGER(statusCode), STRING(error)}(metadata)}] | The scraped data from each crawled page. |
Output Example
{
"status" : "",
"total" : 1,
"completed" : 1,
"creditsUsed" : 1,
"expiresAt" : "",
"next" : "",
"data" : [ {
"markdown" : "",
"html" : "",
"rawHtml" : "",
"links" : [ "" ],
"screenshot" : "",
"metadata" : {
"title" : "",
"description" : "",
"language" : "",
"sourceURL" : "",
"keywords" : "",
"ogLocaleAlternate" : [ "" ],
"statusCode" : 1,
"error" : ""
}
} ]
}Map
Name: map
Map multiple URLs from a website based on specified options.
Properties
| Name | Label | Type | Description | Required |
|---|---|---|---|---|
| url | URL | STRING | The base URL to start mapping from. | true |
| search | Search | STRING | Specify a search query to order the results by relevance. Example: 'blog' will return URLs that contain the word 'blog' in the URL ordered by relevance. | false |
| sitemap | Sitemap | STRING Optionsinclude, skip, only | Sitemap mode when mapping. If you set it to 'skip', the sitemap won't be used to find URLs. If you set it to 'only', only URLs that are in the sitemap will be returned. By default ('include'), the sitemap and other methods will be used together to find URLs. | false |
| includeSubdomains | Include Subdomains | BOOLEAN Optionstrue, false | Include subdomains of the website. | false |
| ignoreQueryParameters | Ignore Query Parameters | BOOLEAN Optionstrue, false | Do not return URLs with query parameters. | false |
| ignoreCache | Ignore Cache | BOOLEAN Optionstrue, false | Bypass the sitemap cache to retrieve fresh URLs. Sitemap data is cached for up to 7 days; use this parameter when your sitemap has been recently updated. | false |
| limit | Limit | INTEGER | Maximum number of links to return (1-100000). | false |
| timeout | Timeout | INTEGER | Timeout in milliseconds. There is no timeout by default. | false |
| location | Location | OBJECT Properties{STRING(country), [STRING](languages)} | Location settings for the request. When specified, this will use an appropriate proxy if available and emulate the corresponding language and timezone settings. Defaults to 'US' if not specified. | false |
Example JSON Structure
{
"label" : "Map",
"name" : "map",
"parameters" : {
"url" : "",
"search" : "",
"sitemap" : "",
"includeSubdomains" : false,
"ignoreQueryParameters" : false,
"ignoreCache" : false,
"limit" : 1,
"timeout" : 1,
"location" : {
"country" : "",
"languages" : [ "" ]
}
},
"type" : "firecrawl/v1/map"
}Output
Type: OBJECT
Properties
| Name | Type | Description |
|---|---|---|
| success | BOOLEAN Optionstrue, false | |
| links | ARRAY Items[{STRING(url), STRING(title), STRING(description)}] |
Output Example
{
"success" : false,
"links" : [ {
"url" : "",
"title" : "",
"description" : ""
} ]
}Scrape URL
Name: scrape
Scrape a single URL and extract content in various formats.
Properties
| Name | Label | Type | Description | Required |
|---|---|---|---|---|
| url | URL | STRING | The URL to scrape. | true |
| formats | Formats | ARRAY Items[STRING] | Output formats to include in the response (e.g., markdown, html, json). | false |
| formatsSchema | JSON Schema | OBJECT Properties{} | The schema to use for the JSON output. Must conform to JSON Schema. | false |
| formatsPrompt | JSON Prompt | OBJECT Properties{} | The prompt to use for the JSON output | false |
| onlyMainContent | Only Main Content | BOOLEAN Optionstrue, false | Only return the main content excluding headers, navs, footers, etc. | false |
| includeTags | Include Tags | ARRAY Items[STRING] | HTML tags to include in the output. | false |
| excludeTags | Exclude Tags | ARRAY Items[STRING] | HTML tags to exclude from the output. | false |
| maxAge | Max Age | INTEGER | Returns a cached version if younger than this age in milliseconds. Speeds up scrapes by up to 500%. Default is 2 days (172800000ms). | false |
| headers | Headers | OBJECT Properties{} | Custom headers to send with the request (e.g., cookies, user-agent). | false |
| waitFor | Wait For | INTEGER | Delay in milliseconds before fetching content, allowing the page to load. This is in addition to Firecrawl's smart wait feature. | false |
| mobile | Mobile | BOOLEAN Optionstrue, false | Emulate scraping from a mobile device. Useful for responsive pages and mobile screenshots. | false |
| skipTlsVerification | Skip TLS Verification | BOOLEAN Optionstrue, false | Skip TLS certificate verification when making requests. | false |
| timeout | Timeout | INTEGER | Timeout in milliseconds for the request. Default is 30000 (30 seconds). Maximum is 300000 (5 minutes). | false |
| removeBase64Images | Remove Base64 Images | BOOLEAN Optionstrue, false | Removes all base64 images from output. Image alt text remains but URL is replaced with placeholder. | false |
| blockAds | Block Ads | BOOLEAN Optionstrue, false | Enables ad-blocking and cookie popup blocking. | false |
| proxy | Proxy | STRING Optionsauto, basic, enhanced | Proxy type: 'basic' (fast, basic anti-bot), 'enhanced' (slower, advanced anti-bot, costs up to 5 credits), 'auto' (retries with enhanced if basic fails). | false |
| location | Location | OBJECT Properties{STRING(country), [STRING](languages)} | Location settings for the request. Uses appropriate proxy and emulates language/timezone. | false |
| parsers | Parsers | ARRAY Items[{STRING(type), INTEGER(maxPages)}] | Controls how files are processed. When 'pdf' is included (default), PDF content is extracted and converted to markdown (1 credit per page). Empty array returns PDF in base64 (1 credit flat). | false |
| storeInCache | Store in Cache | BOOLEAN Optionstrue, false | If true, page will be stored in Firecrawl index and cache. Set to false for data protection concerns. | false |
| zeroDataRetention | Zero Data Retention | BOOLEAN Optionstrue, false | Enable zero data retention for this scrape. Contact help@firecrawl.dev to enable this feature. | false |
Example JSON Structure
{
"label" : "Scrape URL",
"name" : "scrape",
"parameters" : {
"url" : "",
"formats" : [ "" ],
"formatsSchema" : { },
"formatsPrompt" : { },
"onlyMainContent" : false,
"includeTags" : [ "" ],
"excludeTags" : [ "" ],
"maxAge" : 1,
"headers" : { },
"waitFor" : 1,
"mobile" : false,
"skipTlsVerification" : false,
"timeout" : 1,
"removeBase64Images" : false,
"blockAds" : false,
"proxy" : "",
"location" : {
"country" : "",
"languages" : [ "" ]
},
"parsers" : [ {
"type" : "",
"maxPages" : 1
} ],
"storeInCache" : false,
"zeroDataRetention" : false
},
"type" : "firecrawl/v1/scrape"
}Output
Type: OBJECT
Properties
| Name | Type | Description |
|---|---|---|
| success | BOOLEAN Optionstrue, false | |
| data | OBJECT Properties{STRING(markdown), STRING(summary), STRING(html), STRING(rawHtml), STRING(screenshot), [STRING](links), {STRING(title), STRING(description), STRING(language), STRING(sourceURL), STRING(keywords), INTEGER(statusCode), STRING(error)}(metadata), STRING(warning)} |
Output Example
{
"success" : false,
"data" : {
"markdown" : "",
"summary" : "",
"html" : "",
"rawHtml" : "",
"screenshot" : "",
"links" : [ "" ],
"metadata" : {
"title" : "",
"description" : "",
"language" : "",
"sourceURL" : "",
"keywords" : "",
"statusCode" : 1,
"error" : ""
},
"warning" : ""
}
}Search
Name: search
Search the web and optionally scrape search results using Firecrawl.
Properties
| Name | Label | Type | Description | Required |
|---|---|---|---|---|
| query | Search Query | STRING | The search query string. | true |
| limit | Limit | INTEGER | Maximum number of results to return (1-100). | false |
| sources | Sources | ARRAY Items[{STRING(type)}] | Sources to search. Determines the arrays available in the response. | false |
| categories | Categories | ARRAY Items[{STRING(type)}] | Categories to filter results by (github, research, pdf). | false |
| tbs | Time-Based Search | STRING Optionsqdr:h, qdr:d, qdr:w, qdr:m, qdr:y | Filter results by time periods. | false |
| location | Location | STRING | Location parameter for geo-targeted search results (e.g., 'San Francisco,California,United States'). | false |
| country | Country | STRING | ISO country code for geo-targeting search results (e.g., 'US', 'DE', 'FR', 'JP'). | false |
| timeout | Timeout | INTEGER | Timeout in milliseconds. | false |
| ignoreInvalidURLs | Ignore Invalid URLs | BOOLEAN Optionstrue, false | Excludes URLs from search results that are invalid for other Firecrawl endpoints. Useful when piping data to other Firecrawl API endpoints. | false |
Example JSON Structure
{
"label" : "Search",
"name" : "search",
"parameters" : {
"query" : "",
"limit" : 1,
"sources" : [ {
"type" : ""
} ],
"categories" : [ {
"type" : ""
} ],
"tbs" : "",
"location" : "",
"country" : "",
"timeout" : 1,
"ignoreInvalidURLs" : false
},
"type" : "firecrawl/v1/search"
}Output
Type: OBJECT
Properties
| Name | Type | Description |
|---|---|---|
| success | BOOLEAN Optionstrue, false | |
| data | OBJECT Properties{[{STRING(title), STRING(description), STRING(url), STRING(markdown), STRING(html), STRING(rawHtml), [STRING](links), STRING(screenshot), {STRING(title), STRING(description), STRING(sourceURL), INTEGER(statusCode), STRING(error)}(metadata)}](web), [{STRING(title), STRING(imageUrl), INTEGER(imageWidth), INTEGER(imageHeight), STRING(url), INTEGER(position)}](images), [{STRING(title), STRING(snippet), STRING(url), STRING(date), STRING(imageUrl), INTEGER(position), STRING(markdown), STRING(html), STRING(rawHtml), [STRING](links), STRING(screenshot), {STRING(title), STRING(description), STRING(sourceURL), INTEGER(statusCode), STRING(error)}(metadata)}](news)} | |
| warning | STRING | |
| id | STRING | |
| creditsUsed | INTEGER |
Output Example
{
"success" : false,
"data" : {
"web" : [ {
"title" : "",
"description" : "",
"url" : "",
"markdown" : "",
"html" : "",
"rawHtml" : "",
"links" : [ "" ],
"screenshot" : "",
"metadata" : {
"title" : "",
"description" : "",
"sourceURL" : "",
"statusCode" : 1,
"error" : ""
}
} ],
"images" : [ {
"title" : "",
"imageUrl" : "",
"imageWidth" : 1,
"imageHeight" : 1,
"url" : "",
"position" : 1
} ],
"news" : [ {
"title" : "",
"snippet" : "",
"url" : "",
"date" : "",
"imageUrl" : "",
"position" : 1,
"markdown" : "",
"html" : "",
"rawHtml" : "",
"links" : [ "" ],
"screenshot" : "",
"metadata" : {
"title" : "",
"description" : "",
"sourceURL" : "",
"statusCode" : 1,
"error" : ""
}
} ]
},
"warning" : "",
"id" : "",
"creditsUsed" : 1
}What to do if your action is not listed here?
If this component doesn't have the action you need, you can use Custom Action to create your own. Custom Actions empower you to define HTTP requests tailored to your specific requirements, allowing for greater flexibility in integrating with external services or APIs.
To create a Custom Action, simply specify the desired HTTP method, path, and any necessary parameters. This way, you can extend the functionality of your component beyond the predefined actions, ensuring that you can meet all your integration needs effectively.
How is this guide?
Last updated on