Scrape URL Tool

What it does

The Scrape URL tool extracts text content from web pages and PDF documents. Perfect for gathering information from websites, analyzing web content, or processing documents that your agents need to work with.

Key features

Extract content from any web page or PDF URL
Choose between full content or tables-only extraction
Get either clean text or raw HTML
Handles JavaScript-heavy pages with Playwright
Automatic token limiting to prevent oversized responses

Parameters

Parameter	Type	Required	Description
`url`	string	Yes	The URL of the webpage or PDF to scrape
`tables_only`	boolean	No	Extract only tables from the page (default: false)
`raw_html`	boolean	No	Return raw HTML instead of parsed text (default: false)

Common use cases

Extract article content

url: "https://example.com/article"
tables_only: false
raw_html: false

Perfect for getting clean text from news articles, blog posts, or documentation.

Get data from tables

url: "https://example.com/data-page"
tables_only: true
raw_html: false

Extract structured data from HTML tables for analysis.

Process PDF documents

url: "https://example.com/document.pdf"
tables_only: false
raw_html: false

Extract text content from PDF files for document analysis.

Get raw HTML for parsing

url: "https://example.com/page"
tables_only: false
raw_html: true

Useful when you need the full HTML structure for custom processing.

Limitations

Content is limited to 30,000 tokens by default
PDF extraction doesn’t handle images or complex formatting
Some dynamic content requiring user interaction may not be captured
Large documents may be truncated

Troubleshooting

“Failed to load page”

Check that the URL is accessible and valid
Verify the website doesn’t block automated access
Try the URL in a browser to confirm it works

“Content truncated”

The page content exceeded the token limit
Consider using tables_only: true for data extraction
Break large documents into smaller sections

“PDF extraction failed”

Ensure the URL points to a valid PDF file
Some password-protected PDFs cannot be processed
Try downloading and hosting the PDF elsewhere

Ask Web - Ask questions about web content using an LLM
Call API - Make API calls to web services

Get Started

Integrations

Core Tools

Control Hub

Social Media Tools

Database Tools

Google Sheets Tools

Advanced Tools

What it does

Key features

Parameters

Common use cases

Extract article content

Get data from tables

Process PDF documents

Get raw HTML for parsing

Limitations

Troubleshooting

Get Started

Integrations

Core Tools

Control Hub

Social Media Tools

Database Tools

Google Sheets Tools

Advanced Tools

​What it does

​Key features

​Parameters

​Common use cases

​Extract article content

​Get data from tables

​Process PDF documents

​Get raw HTML for parsing

​Limitations

​Troubleshooting

​Related tools

What it does

Key features

Parameters

Common use cases

Extract article content

Get data from tables

Process PDF documents

Get raw HTML for parsing

Limitations

Troubleshooting

Related tools