ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents.

Categories: Artificial Intelligence

Type: scrapeGraphAi/v1

Connections

Version: 1

API Key

Properties

Name	Label	Type	Description	Required
key	Key	STRING		true
value	Value	STRING		true

Connection Setup

Login to the dashboard at https://dashboard.scrapegraphai.com/login.
Copy the API key. Use these credentials to create a connection in ByteChef.

Actions

Name	Label	Type	Description	Required
task_id	Task Id	STRING	The ID of the crawl job task.	true

Name	Type	Description
status	STRING	Overall status of the request.
result	OBJECT Properties {STRING(status), {}(llm_result), [STRING](crawled_urls), [{STRING(url), STRING(markdown)}](pages)}	The crawl job result.

Markdownify

Name: markdownify

Convert any webpage into clean, readable Markdown format.

Properties

Name	Label	Type	Description	Required
website_url	Website URL	STRING	Website URL.	true

Example JSON Structure

{
  "label" : "Markdownify",
  "name" : "markdownify",
  "parameters" : {
    "website_url" : ""
  },
  "type" : "scrapeGraphAi/v1/markdownify"
}

Properties

Name	Type	Description
request_id	STRING	Unique identifier for the request.
status	STRING	Status of the request. One of: “queued”, “processing”, “completed”, “failed”.
website_url	STRING	The original website URL that was submitted.
result	STRING	The search results.
error	STRING	Error message if the request failed. Empty string if successful.

Output Example

{
  "request_id" : "",
  "status" : "",
  "website_url" : "",
  "result" : "",
  "error" : ""
}

Search Scraper

Name: searchScraper

Start a AI-powered web search request.

Properties

Name	Label	Type	Description	Required
user_prompt	User Prompt	STRING	The search query or question you want to ask.	true

Example JSON Structure

{
  "label" : "Search Scraper",
  "name" : "searchScraper",
  "parameters" : {
    "user_prompt" : ""
  },
  "type" : "scrapeGraphAi/v1/searchScraper"
}

Properties

Name	Type	Description
request_id	STRING	Unique identifier for the search request.
status	STRING	Status of the request. One of: “queued”, “processing”, “completed”, “failed”.
user_prompt	STRING	The original search query that was submitted.
result	OBJECT Properties {}	The search results.
reference_urls	ARRAY Items [STRING]	List of URLs that were used as references for the answer.
error	STRING	Error message if the request failed. Empty string if successful.

Output Example

{
  "request_id" : "",
  "status" : "",
  "user_prompt" : "",
  "result" : { },
  "reference_urls" : [ "" ],
  "error" : ""
}

Smart Scraper

Name: smartScraper

Extract content from a webpage using AI by providing a natural language prompt and a URL.

Properties

Name	Label	Type	Description	Required
user_prompt	User Prompt	STRING	The search query or question you want to ask.	true
website_url	Website URL	STRING	Website URL.	true

Example JSON Structure

{
  "label" : "Smart Scraper",
  "name" : "smartScraper",
  "parameters" : {
    "user_prompt" : "",
    "website_url" : ""
  },
  "type" : "scrapeGraphAi/v1/smartScraper"
}

Properties

Name	Type	Description
request_id	STRING	Unique identifier for the search request.
status	STRING	Status of the request. One of: “queued”, “processing”, “completed”, “failed”.
website_url	STRING	The original website URL that was submitted.
user_prompt	STRING	The original search query that was submitted.
result	OBJECT Properties {}	The search results.
error	STRING	Error message if the request failed. Empty string if successful.

Output Example

{
  "request_id" : "",
  "status" : "",
  "website_url" : "",
  "user_prompt" : "",
  "result" : { },
  "error" : ""
}

Start SmartCrawler

Name: startCrawl

Start a new web crawl request with AI extraction or markdown conversion.

Properties

Name	Label	Type	Description	Required
url	URL	STRING	The starting URL for the crawl.	true
prompt	Prompt	STRING	Instructions for data extraction. Required when extraction_mode is true.	false
extraction_mode	Extraction Mode	BOOLEAN Options true, false	When false, enables markdown conversion mode (2 credits per page). Default is true.	false
cache_website	Cache Website	BOOLEAN Options true, false	Whether to cache the website content.	false
depth	Depth	INTEGER	Maximum crawl depth.	false
max_pages	Max Pages	INTEGER	Maximum number of pages to crawl.	false
same_domain_only	Same Domain Only	BOOLEAN Options true, false	Whether to crawl only the same domain.	false
batch_size	Batch Size	INTEGER	Number of pages to process in each batch.	false
schema	Schema	OBJECT Properties {}	JSON Schema object for structured output.	false
rules	Rules	OBJECT Properties {[STRING](exclude), [STRING](include_paths), [STRING](exclude_paths), BOOLEAN(same_domain)}	Crawl rules for filtering URLs.	false
sitemap	Sitemap	BOOLEAN Options true, false	Use sitemap.xml for discovery.	false
render_heavy_js	Render Heavy JS	BOOLEAN Options true, false	Enable heavy JavaScript rendering.	false
stealth	Stealth	BOOLEAN Options true, false	Enable stealth mode to bypass bot protection using advanced anti-detection techniques. Adds +4 credits to the request cost.	false

Example JSON Structure

{
  "label" : "Start SmartCrawler",
  "name" : "startCrawl",
  "parameters" : {
    "url" : "",
    "prompt" : "",
    "extraction_mode" : false,
    "cache_website" : false,
    "depth" : 1,
    "max_pages" : 1,
    "same_domain_only" : false,
    "batch_size" : 1,
    "schema" : { },
    "rules" : {
      "exclude" : [ "" ],
      "include_paths" : [ "" ],
      "exclude_paths" : [ "" ],
      "same_domain" : false
    },
    "sitemap" : false,
    "render_heavy_js" : false,
    "stealth" : false
  },
  "type" : "scrapeGraphAi/v1/startCrawl"
}

Properties

Name	Type	Description
task_id	STRING	Unique identifier for the crawl task. Use this task_id to retrieve the crawl result.

Output Example

{
  "task_id" : ""
}

What to do if your action is not listed here?

If this component doesn't have the action you need, you can use Custom Action to create your own. Custom Actions empower you to define HTTP requests tailored to your specific requirements, allowing for greater flexibility in integrating with external services or APIs.

To create a Custom Action, simply specify the desired HTTP method, path, and any necessary parameters. This way, you can extend the functionality of your component beyond the predefined actions, ensuring that you can meet all your integration needs effectively.

Additional Instructions

How to find Task ID

Task ID can be found in the output of the following actions:

Start SmartCrawler

ScrapeGraphAI

On this page