Python SDK | ScrapingPros

📄️ Quickstart

Extract data from any website at scale — without managing browsers, proxies, retries, or rate limits. Submit a list of URLs, stream results as they finish, and let the server handle the hard parts.

📄️ Batch API (recommended)

This is how you should scrape in production. Fire 10,000 URLs, walk away, and come back to handled results with progress tracking, failure visibility, and zero infrastructure.

Use the Batch API — it's the recommended path for anything more than a handful of URLs. scrape() is for one-off requests: debugging, webhooks that need an immediate answer, or single-URL health checks.

📄️ Anti-Bot Bypass

retryonblock — Server-side retry

📄️ Multiple URLs (scrape_many)

For anything over ~100 URLs, use the Batch API. scrape_many() is fine for small lists where you want a simple list-in / list-out contract — but it runs sync requests client-side with threads, which eats your sync rate limit and can't tolerate crashes.

📄️ Data Extraction

Extract structured data from pages using CSS or XPath selectors.

📄️ JavaScript Execution

Run arbitrary JavaScript in the browser page context. Use this to call internal APIs, read JS variables, or extract data that isn't in the DOM.

📄️ Collections (low-level)

For most use cases prefer the higher-level Batch API (submit_batch()) — it handles collection, run, jobs, and result fetching in one object with streaming results, progress tracking, and callbacks.

📄️ Error Handling

Exception hierarchy

📄️ Viability Test

Analyze sites before scraping to find the best strategy. Tests each URL with multiple scraping modes and reports what works.

📄️ Downloads & Proxy

Downloading files (PDF, images, ZIPs, …)

📄️ Release notes

Curated changelog for the scrapingpros Python SDK. Each release lists what changed from the user's perspective — bugs you'd actually hit, features you can use, things that may need attention.