Introduction

Welcome to the Scraping Pros API documentation.

Illustration of a crawler scraping a website.

Features

Universal scraping — Retrieve data from any website, including those with CAPTCHAs, dynamic JavaScript, or blocking mechanisms.
Synchronous and asynchronous modes — Scrape synchronously (immediate result) or asynchronously (URL collections processed in the background, up to tens of thousands of URLs per run).
Browser or plain HTTP — Use a headless browser (browser=true) for dynamic or anti-bot protected sites, or direct HTTP for maximum speed. The system picks the best internal engine automatically per target — clients don't configure engines.
Markdown output — format=markdown returns clean text without scripts, styles, or navigation. Ideal for AI/LLM consumption and RAG pipelines.
Automatic retries — Retry system with proxy rotation on failures.
Retry on block — retry_on_block=true automatically retries up to 3 times with a different IP/fingerprint when a CAPTCHA or 403 is detected. Only charges credits for the successful attempt.
Early CAPTCHA detection — If the site presents a CAPTCHA or block, the response is returned in ~5 seconds (instead of 60-85s). Applies to all plans automatically.
Smart proxies — Automatic proxy rotation with per-country support (200+ countries).
Webhooks — callback_url on async collections to receive a signed POST notification (HMAC-SHA256) when a run completes. Includes run_id, status, counters, and job_ids.
Per-job traceability — Optional custom_id on every request is echoed back in listings, results, and webhook payloads. Map results to your own domain objects without relying on insertion order.
Cursor pagination + incremental polling — GET /runs/{id}/jobs supports cursor, limit, status_filter (CSV), since_completed_at, and order_by=completed_at. Iterate 50k-URL batches with ~1 API call per poll instead of paginating the whole run.
POST to a different URL — http_method.url lets the POST target an API endpoint (e.g. /graphql, /api/login) while the scrape "context" is a separate page.
Client-supplied cookies — Inject session cookies for authenticated flows, paywalls, or feature-flag testing (max 50 entries, 8 KB per value).
Browser actions — Interact with the page: clicks, inputs, selects, key presses, waits, conditional loops, and JavaScript execution (evaluate).
Data extraction — Extract specific data with CSS/XPath selectors directly from the API, without needing to parse HTML.
JavaScript execution (evaluate) — Execute arbitrary JS code in the page context to access data, manipulate the DOM, or trigger AJAX forms.
Network capture (network_capture) — Capture XHR/fetch requests made by the page to discover internal APIs and data endpoints.
File download — Download PDFs, images, and other files, returning their content in base64.
Screenshots — Capture screenshots of scraped pages.
Custom headers and cookies — Send custom HTTP headers and cookies with requests.
Block detection — potentiallyBlockedByCaptcha (legacy) plus richer async fields block_reason, protection_stack, and rule_hits populated by the HTML validator.
Feasibility test — Analyze URLs before scraping them to determine the recommended scraping strategy.
Credit system — you pay for usable content — 1 simple request = 1 credit, 1 browser request = 5 credits. Credits are automatically refunded when is_success=false — any response that doesn't deliver usable HTML (HTTP 4xx/5xx from target, captcha pages, worker failures, timeouts) is refunded on the spot. Applies identically to sync and async. Anti-bot and proxy included.
Timings always present — Every response includes timings (even on errors) for performance diagnostics.
Metrics and billing — Per-client usage metrics and monthly billing endpoints with breakdown by domain and credits.
Plans — GET /v1/plans (no auth) shows all plans with pricing, credits, and features.
MCP Server — Model Context Protocol server for AI agents (Claude, GPT, Cursor) with 6 tools and anti-injection protection.
Health check — Monitoring endpoint that verifies the status of all API components (no authentication required).

Authentication

All endpoints require an authentication token sent in the Authorization header:

Authorization: Bearer <API-KEY>

The exceptions are GET /v1/health, GET /v1/plans, GET /llms.txt, and GET / which do not require authentication.

Demo token (no registration required): demo_6x595maoA6GdOdVb — 5,000 credits/month, 30 req/min. All features enabled except per-country proxies.

Free plan: 1,000 credits/month (200 browser requests or 1,000 simple). For production use, contact the team for Starter plans ($29/month) and above.

For production use, contact the Scraping Pros team to get your API key with higher limits.

Features​

Authentication​

Features

Authentication