Scraping Pros Python SDK
Extract data from any website at scale — without managing browsers, proxies, retries, or rate limits. Submit a list of URLs, stream results as they finish, and let the server handle the hard parts.
Install
Requires Python 3.10+.
pip install scrapingpros
Use the demo token demo_6x595maoA6GdOdVb for testing — 5,000 credits/month, 30 req/min. Swap it for your own token when you're ready to ship.
The one example you need
This is the pattern for 90% of production use cases. Submit many URLs, stream results back as each one completes, handle failures individually:
from scrapingpros import ScrapingPros
client = ScrapingPros("demo_6x595maoA6GdOdVb")
batch = client.submit_batch("daily-scrape", [
{"url": "https://example.com/1", "custom_id": "tour-1", "browser": True},
{"url": "https://example.com/2", "custom_id": "tour-2", "browser": True},
{"url": "https://example.com/3", "custom_id": "tour-3", "browser": True},
# ... works with hundreds or tens of thousands of URLs
])
for result in batch.iter_results():
if result.guidance.success:
save_to_db(my_id=result.custom_id, content=result.content)
else:
log_failure(my_id=result.custom_id, reason=result.guidance.error_type)
# Progress is always available on the batch object
if batch.completed_count % 100 == 0:
print(f"{batch.pct:.1f}% — ETA {batch.eta_seconds:.0f}s")
print(f"Done: {batch.success_count} OK, {batch.failed_count} failed")
What this handles for you:
- ✅ Scaling — submit 10 URLs or 50,000. The server manages workers. Your code stays the same.
- ✅ Anti-bot — CAPTCHA detection, proxy rotation, IP rotation, browser fingerprinting. Automatic.
- ✅ Streaming — results yield as soon as each job completes, not after everything is done. Memory stays flat.
- ✅ Progress tracking —
batch.pct,batch.eta_seconds,batch.success_count,batch.failed_countupdate live. - ✅ Failure visibility — every yielded result tells you if it worked and why it didn't. No silent drops.
- ✅ Resume on crash — persist
batch.collection_id+batch.run_id, reattach later withclient.get_batch(). - ✅ Rate-limit math — the SDK polls the server efficiently (1 API call per tick, regardless of batch size).
Need just one URL?
Sometimes you need a single scrape for a webhook handler or a quick check:
result = client.scrape("https://example.com", format="markdown", browser=True)
print(result.content)
→ Single scrape docs — use this for one-off requests, debugging, or anything where you need the answer synchronously.
Async / asyncio?
Every method has an async twin. Same surface, same endpoints — only the local I/O loop changes:
from scrapingpros import AsyncClient # alias for AsyncScrapingPros (since v0.5.1)
async with AsyncClient("your_token") as client:
batch = await client.submit_batch("daily", items)
async for result in batch.iter_results():
await save_async(result)
AsyncScrapingPros (and its alias AsyncClient) does not make scraping faster on its own — it only changes the I/O loop. To scale to many URLs, the right tool is submit_batch() on either client.
Want a list back, not a stream?
If you'd rather block until every URL is done and get a flat list[ScrapeResponse] (drop-in replacement for scrape_many), use batch_scrape:
results = client.batch_scrape([
{"url": u, "custom_id": pid, "browser": True}
for pid, u in catalog.items()
])
for r in results:
if r.guidance.success:
save(r.custom_id, r.content)
Same server-side scaling as submit_batch, simpler signature. Available on both SyncClient and AsyncClient since v0.5.1.
Authentication
# Explicit token
client = ScrapingPros("your_token")
# From environment variable
# export SP_TOKEN=your_token
client = ScrapingPros()
Credits
| Type | Credits |
|---|---|
| Simple HTTP scrape | 1 |
| Browser scrape | 5 |
- Refunded automatically on infrastructure errors.
- Track usage:
client.credits_charged,client.quota_remaining. client.billing()for monthly summaries,client.plans()for pricing.
Where next
- Batch API — the full story: progress, callbacks, resume, tuning for 50k+ URL batches.
- Anti-Bot Bypass — strategies for CAPTCHA, Cloudflare, DataDome, etc.
- Data Extraction — pull structured data directly, no parsing.
- JavaScript Execution — run JS in the page to call internal APIs.