ByteChef LogoByteChef

ScrapeGraphAI

ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents.

Categories: Artificial Intelligence

Type: scrapeGraphAi/v1


Connections

Version: 1

API Key

Properties

NameLabelTypeDescriptionRequired
keyKeySTRINGtrue
valueValueSTRINGtrue

Connection Setup

  1. Login to the dashboard at https://dashboard.scrapegraphai.com/login.
  2. Copy the API key. Use these credentials to create a connection in ByteChef.

Actions

Get SmartCrawler Status

Name: getCrawlStatus

Get the status and results of a previous smartcrawl request.

Properties

NameLabelTypeDescriptionRequired
task_idTask IdSTRINGThe ID of the crawl job task.true

Example JSON Structure

{
  "label" : "Get SmartCrawler Status",
  "name" : "getCrawlStatus",
  "parameters" : {
    "task_id" : ""
  },
  "type" : "scrapeGraphAi/v1/getCrawlStatus"
}

Output

Type: OBJECT

Properties

NameTypeDescription
statusSTRINGOverall status of the request.
resultOBJECT
Properties {STRING(status), {}(llm_result), [STRING](crawled_urls), [{STRING(url), STRING(markdown)}](pages)}
The crawl job result.

Output Example

{
  "status" : "",
  "result" : {
    "status" : "",
    "llm_result" : { },
    "crawled_urls" : [ "" ],
    "pages" : [ {
      "url" : "",
      "markdown" : ""
    } ]
  }
}

Markdownify

Name: markdownify

Convert any webpage into clean, readable Markdown format.

Properties

NameLabelTypeDescriptionRequired
website_urlWebsite URLSTRINGWebsite URL.true

Example JSON Structure

{
  "label" : "Markdownify",
  "name" : "markdownify",
  "parameters" : {
    "website_url" : ""
  },
  "type" : "scrapeGraphAi/v1/markdownify"
}

Output

Type: OBJECT

Properties

NameTypeDescription
request_idSTRINGUnique identifier for the request.
statusSTRINGStatus of the request. One of: “queued”, “processing”, “completed”, “failed”.
website_urlSTRINGThe original website URL that was submitted.
resultSTRINGThe search results.
errorSTRINGError message if the request failed. Empty string if successful.

Output Example

{
  "request_id" : "",
  "status" : "",
  "website_url" : "",
  "result" : "",
  "error" : ""
}

Search Scraper

Name: searchScraper

Start a AI-powered web search request.

Properties

NameLabelTypeDescriptionRequired
user_promptUser PromptSTRINGThe search query or question you want to ask.true

Example JSON Structure

{
  "label" : "Search Scraper",
  "name" : "searchScraper",
  "parameters" : {
    "user_prompt" : ""
  },
  "type" : "scrapeGraphAi/v1/searchScraper"
}

Output

Type: OBJECT

Properties

NameTypeDescription
request_idSTRINGUnique identifier for the search request.
statusSTRINGStatus of the request. One of: “queued”, “processing”, “completed”, “failed”.
user_promptSTRINGThe original search query that was submitted.
resultOBJECT
Properties {}
The search results.
reference_urlsARRAY
Items [STRING]
List of URLs that were used as references for the answer.
errorSTRINGError message if the request failed. Empty string if successful.

Output Example

{
  "request_id" : "",
  "status" : "",
  "user_prompt" : "",
  "result" : { },
  "reference_urls" : [ "" ],
  "error" : ""
}

Smart Scraper

Name: smartScraper

Extract content from a webpage using AI by providing a natural language prompt and a URL.

Properties

NameLabelTypeDescriptionRequired
user_promptUser PromptSTRINGThe search query or question you want to ask.true
website_urlWebsite URLSTRINGWebsite URL.true

Example JSON Structure

{
  "label" : "Smart Scraper",
  "name" : "smartScraper",
  "parameters" : {
    "user_prompt" : "",
    "website_url" : ""
  },
  "type" : "scrapeGraphAi/v1/smartScraper"
}

Output

Type: OBJECT

Properties

NameTypeDescription
request_idSTRINGUnique identifier for the search request.
statusSTRINGStatus of the request. One of: “queued”, “processing”, “completed”, “failed”.
website_urlSTRINGThe original website URL that was submitted.
user_promptSTRINGThe original search query that was submitted.
resultOBJECT
Properties {}
The search results.
errorSTRINGError message if the request failed. Empty string if successful.

Output Example

{
  "request_id" : "",
  "status" : "",
  "website_url" : "",
  "user_prompt" : "",
  "result" : { },
  "error" : ""
}

Start SmartCrawler

Name: startCrawl

Start a new web crawl request with AI extraction or markdown conversion.

Properties

NameLabelTypeDescriptionRequired
urlURLSTRINGThe starting URL for the crawl.true
promptPromptSTRINGInstructions for data extraction. Required when extraction_mode is true.false
extraction_modeExtraction ModeBOOLEAN
Options true, false
When false, enables markdown conversion mode (2 credits per page). Default is true.false
cache_websiteCache WebsiteBOOLEAN
Options true, false
Whether to cache the website content.false
depthDepthINTEGERMaximum crawl depth.false
max_pagesMax PagesINTEGERMaximum number of pages to crawl.false
same_domain_onlySame Domain OnlyBOOLEAN
Options true, false
Whether to crawl only the same domain.false
batch_sizeBatch SizeINTEGERNumber of pages to process in each batch.false
schemaSchemaOBJECT
Properties {}
JSON Schema object for structured output.false
rulesRulesOBJECT
Properties {[STRING](exclude), [STRING](include_paths), [STRING](exclude_paths), BOOLEAN(same_domain)}
Crawl rules for filtering URLs.false
sitemapSitemapBOOLEAN
Options true, false
Use sitemap.xml for discovery.false
render_heavy_jsRender Heavy JSBOOLEAN
Options true, false
Enable heavy JavaScript rendering.false
stealthStealthBOOLEAN
Options true, false
Enable stealth mode to bypass bot protection using advanced anti-detection techniques. Adds +4 credits to the request cost.false

Example JSON Structure

{
  "label" : "Start SmartCrawler",
  "name" : "startCrawl",
  "parameters" : {
    "url" : "",
    "prompt" : "",
    "extraction_mode" : false,
    "cache_website" : false,
    "depth" : 1,
    "max_pages" : 1,
    "same_domain_only" : false,
    "batch_size" : 1,
    "schema" : { },
    "rules" : {
      "exclude" : [ "" ],
      "include_paths" : [ "" ],
      "exclude_paths" : [ "" ],
      "same_domain" : false
    },
    "sitemap" : false,
    "render_heavy_js" : false,
    "stealth" : false
  },
  "type" : "scrapeGraphAi/v1/startCrawl"
}

Output

Type: OBJECT

Properties

NameTypeDescription
task_idSTRINGUnique identifier for the crawl task. Use this task_id to retrieve the crawl result.

Output Example

{
  "task_id" : ""
}

How is this guide?

Last updated on

On this page