ByteChef LogoByteChef

ScrapeGraphAI

ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents.

Categories: Artificial Intelligence

Type: scrapeGraphAi/v1


Connections

Version: 1

API Key

Properties

NameLabelTypeDescriptionRequired
keyKeySTRINGtrue
valueValueSTRINGtrue

Connection Setup

  1. Login to the dashboard at https://dashboard.scrapegraphai.com/login.
  2. Copy the API key. Use these credentials to create a connection in ByteChef.

Actions

Get SmartCrawler Status

Name: getCrawlStatus

Get the status and results of a previous smartcrawl request.

Properties

NameLabelTypeDescriptionRequired
task_idTask IdSTRINGThe ID of the crawl job task.true

Example JSON Structure

{
  "label" : "Get SmartCrawler Status",
  "name" : "getCrawlStatus",
  "parameters" : {
    "task_id" : ""
  },
  "type" : "scrapeGraphAi/v1/getCrawlStatus"
}

Output

Type: OBJECT

Properties

NameTypeDescription
statusSTRINGOverall status of the request.
resultOBJECT
Properties {STRING(status), {}(llm_result), [STRING](crawled_urls), [{STRING(url), STRING(markdown)}](pages)}
The crawl job result.

Output Example

{
  "status" : "",
  "result" : {
    "status" : "",
    "llm_result" : { },
    "crawled_urls" : [ "" ],
    "pages" : [ {
      "url" : "",
      "markdown" : ""
    } ]
  }
}

Find Task ID

To find Task ID, click here.

Markdownify

Name: markdownify

Convert any webpage into clean, readable Markdown format.

Properties

NameLabelTypeDescriptionRequired
website_urlWebsite URLSTRINGWebsite URL.true

Example JSON Structure

{
  "label" : "Markdownify",
  "name" : "markdownify",
  "parameters" : {
    "website_url" : ""
  },
  "type" : "scrapeGraphAi/v1/markdownify"
}

Output

Type: OBJECT

Properties

NameTypeDescription
request_idSTRINGUnique identifier for the request.
statusSTRINGStatus of the request. One of: “queued”, “processing”, “completed”, “failed”.
website_urlSTRINGThe original website URL that was submitted.
resultSTRINGThe search results.
errorSTRINGError message if the request failed. Empty string if successful.

Output Example

{
  "request_id" : "",
  "status" : "",
  "website_url" : "",
  "result" : "",
  "error" : ""
}

Search Scraper

Name: searchScraper

Start a AI-powered web search request.

Properties

NameLabelTypeDescriptionRequired
user_promptUser PromptSTRINGThe search query or question you want to ask.true

Example JSON Structure

{
  "label" : "Search Scraper",
  "name" : "searchScraper",
  "parameters" : {
    "user_prompt" : ""
  },
  "type" : "scrapeGraphAi/v1/searchScraper"
}

Output

Type: OBJECT

Properties

NameTypeDescription
request_idSTRINGUnique identifier for the search request.
statusSTRINGStatus of the request. One of: “queued”, “processing”, “completed”, “failed”.
user_promptSTRINGThe original search query that was submitted.
resultOBJECT
Properties {}
The search results.
reference_urlsARRAY
Items [STRING]
List of URLs that were used as references for the answer.
errorSTRINGError message if the request failed. Empty string if successful.

Output Example

{
  "request_id" : "",
  "status" : "",
  "user_prompt" : "",
  "result" : { },
  "reference_urls" : [ "" ],
  "error" : ""
}

Smart Scraper

Name: smartScraper

Extract content from a webpage using AI by providing a natural language prompt and a URL.

Properties

NameLabelTypeDescriptionRequired
user_promptUser PromptSTRINGThe search query or question you want to ask.true
website_urlWebsite URLSTRINGWebsite URL.true

Example JSON Structure

{
  "label" : "Smart Scraper",
  "name" : "smartScraper",
  "parameters" : {
    "user_prompt" : "",
    "website_url" : ""
  },
  "type" : "scrapeGraphAi/v1/smartScraper"
}

Output

Type: OBJECT

Properties

NameTypeDescription
request_idSTRINGUnique identifier for the search request.
statusSTRINGStatus of the request. One of: “queued”, “processing”, “completed”, “failed”.
website_urlSTRINGThe original website URL that was submitted.
user_promptSTRINGThe original search query that was submitted.
resultOBJECT
Properties {}
The search results.
errorSTRINGError message if the request failed. Empty string if successful.

Output Example

{
  "request_id" : "",
  "status" : "",
  "website_url" : "",
  "user_prompt" : "",
  "result" : { },
  "error" : ""
}

Start SmartCrawler

Name: startCrawl

Start a new web crawl request with AI extraction or markdown conversion.

Properties

NameLabelTypeDescriptionRequired
urlURLSTRINGThe starting URL for the crawl.true
promptPromptSTRINGInstructions for data extraction. Required when extraction_mode is true.false
extraction_modeExtraction ModeBOOLEAN
Options true, false
When false, enables markdown conversion mode (2 credits per page). Default is true.false
cache_websiteCache WebsiteBOOLEAN
Options true, false
Whether to cache the website content.false
depthDepthINTEGERMaximum crawl depth.false
max_pagesMax PagesINTEGERMaximum number of pages to crawl.false
same_domain_onlySame Domain OnlyBOOLEAN
Options true, false
Whether to crawl only the same domain.false
batch_sizeBatch SizeINTEGERNumber of pages to process in each batch.false
schemaSchemaOBJECT
Properties {}
JSON Schema object for structured output.false
rulesRulesOBJECT
Properties {[STRING](exclude), [STRING](include_paths), [STRING](exclude_paths), BOOLEAN(same_domain)}
Crawl rules for filtering URLs.false
sitemapSitemapBOOLEAN
Options true, false
Use sitemap.xml for discovery.false
render_heavy_jsRender Heavy JSBOOLEAN
Options true, false
Enable heavy JavaScript rendering.false
stealthStealthBOOLEAN
Options true, false
Enable stealth mode to bypass bot protection using advanced anti-detection techniques. Adds +4 credits to the request cost.false

Example JSON Structure

{
  "label" : "Start SmartCrawler",
  "name" : "startCrawl",
  "parameters" : {
    "url" : "",
    "prompt" : "",
    "extraction_mode" : false,
    "cache_website" : false,
    "depth" : 1,
    "max_pages" : 1,
    "same_domain_only" : false,
    "batch_size" : 1,
    "schema" : { },
    "rules" : {
      "exclude" : [ "" ],
      "include_paths" : [ "" ],
      "exclude_paths" : [ "" ],
      "same_domain" : false
    },
    "sitemap" : false,
    "render_heavy_js" : false,
    "stealth" : false
  },
  "type" : "scrapeGraphAi/v1/startCrawl"
}

Output

Type: OBJECT

Properties

NameTypeDescription
task_idSTRINGUnique identifier for the crawl task. Use this task_id to retrieve the crawl result.

Output Example

{
  "task_id" : ""
}

What to do if your action is not listed here?

If this component doesn't have the action you need, you can use Custom Action to create your own. Custom Actions empower you to define HTTP requests tailored to your specific requirements, allowing for greater flexibility in integrating with external services or APIs.

To create a Custom Action, simply specify the desired HTTP method, path, and any necessary parameters. This way, you can extend the functionality of your component beyond the predefined actions, ensuring that you can meet all your integration needs effectively.


Additional Instructions

How to find Task ID

Task ID can be found in the output of the following actions:

  • Start SmartCrawler

How is this guide?

Last updated on

On this page