📄️ Quickstart
Extract data from any website at scale — without managing browsers, proxies, retries, or rate limits. Submit a list of URLs, stream results as they finish, and let the server handle the hard parts.
📄️ Batch API (recommended)
This is how you should scrape in production. Fire 10,000 URLs, walk away, and come back to handled results with progress tracking, failure visibility, and zero infrastructure.
📄️ Single scrape
Use the Batch API — it's the recommended path for anything more than a handful of URLs. scrape() is for one-off requests: debugging, webhooks that need an immediate answer, or single-URL health checks.
📄️ Anti-Bot Bypass
retryonblock — Server-side retry
📄️ Multiple URLs (scrape_many)
For anything over ~100 URLs, use the Batch API. scrape_many() is fine for small lists where you want a simple list-in / list-out contract — but it runs sync requests client-side with threads, which eats your sync rate limit and can't tolerate crashes.
📄️ Data Extraction
Extract structured data from pages using CSS or XPath selectors.
📄️ JavaScript Execution
Run arbitrary JavaScript in the browser page context. Use this to call internal APIs, read JS variables, or extract data that isn't in the DOM.
📄️ Collections (low-level)
For most use cases prefer the higher-level Batch API (submit_batch()) — it handles collection, run, jobs, and result fetching in one object with streaming results, progress tracking, and callbacks.
📄️ Error Handling
Exception hierarchy
📄️ Viability Test
Analyze sites before scraping to find the best strategy. Tests each URL with multiple scraping modes and reports what works.
📄️ Downloads & Proxy
Downloading files (PDF, images, ZIPs, …)
📄️ Release notes
Curated changelog for the scrapingpros Python SDK. Each release lists what changed from the user's perspective — bugs you'd actually hit, features you can use, things that may need attention.