Scraping Pros Python SDK

Extract data from any website at scale — without managing browsers, proxies, retries, or rate limits. Submit a list of URLs, stream results as they finish, and let the server handle the hard parts.

Install

Requires Python 3.10+.

pip install scrapingpros

No signup needed

Use the demo token demo_6x595maoA6GdOdVb for testing — 5,000 credits/month, 30 req/min. Swap it for your own token when you're ready to ship.

The one example you need

This is the pattern for 90% of production use cases. Submit many URLs, stream results back as each one completes, handle failures individually:

from scrapingpros import ScrapingPros

client = ScrapingPros("demo_6x595maoA6GdOdVb")

batch = client.submit_batch("daily-scrape", [
    {"url": "https://example.com/1", "custom_id": "tour-1", "browser": True},
    {"url": "https://example.com/2", "custom_id": "tour-2", "browser": True},
    {"url": "https://example.com/3", "custom_id": "tour-3", "browser": True},
    # ... works with hundreds or tens of thousands of URLs
])

for result in batch.iter_results():
    if result.guidance.success:
        save_to_db(my_id=result.custom_id, content=result.content)
    else:
        log_failure(my_id=result.custom_id, reason=result.guidance.error_type)

    # Progress is always available on the batch object
    if batch.completed_count % 100 == 0:
        print(f"{batch.pct:.1f}% — ETA {batch.eta_seconds:.0f}s")

print(f"Done: {batch.success_count} OK, {batch.failed_count} failed")

What this handles for you:

✅ Scaling — submit 10 URLs or 50,000. The server manages workers. Your code stays the same.
✅ Anti-bot — CAPTCHA detection, proxy rotation, IP rotation, browser fingerprinting. Automatic.
✅ Streaming — results yield as soon as each job completes, not after everything is done. Memory stays flat.
✅ Progress tracking — batch.pct, batch.eta_seconds, batch.success_count, batch.failed_count update live.
✅ Failure visibility — every yielded result tells you if it worked and why it didn't. No silent drops.
✅ Resume on crash — persist batch.collection_id + batch.run_id, reattach later with client.get_batch().
✅ Rate-limit math — the SDK polls the server efficiently (1 API call per tick, regardless of batch size).

→ Full Batch API docs

Need just one URL?

Sometimes you need a single scrape for a webhook handler or a quick check:

result = client.scrape("https://example.com", format="markdown", browser=True)
print(result.content)

→ Single scrape docs — use this for one-off requests, debugging, or anything where you need the answer synchronously.

Async / asyncio?

Every method has an async twin. Same surface, same endpoints — only the local I/O loop changes:

from scrapingpros import AsyncClient   # alias for AsyncScrapingPros (since v0.5.1)

async with AsyncClient("your_token") as client:
    batch = await client.submit_batch("daily", items)
    async for result in batch.iter_results():
        await save_async(result)

Quick note on naming

AsyncScrapingPros (and its alias AsyncClient) does not make scraping faster on its own — it only changes the I/O loop. To scale to many URLs, the right tool is submit_batch() on either client.

Want a list back, not a stream?

If you'd rather block until every URL is done and get a flat list[ScrapeResponse] (drop-in replacement for scrape_many), use batch_scrape:

results = client.batch_scrape([
    {"url": u, "custom_id": pid, "browser": True}
    for pid, u in catalog.items()
])
for r in results:
    if r.guidance.success:
        save(r.custom_id, r.content)

Same server-side scaling as submit_batch, simpler signature. Available on both SyncClient and AsyncClient since v0.5.1.

Authentication

# Explicit token
client = ScrapingPros("your_token")

# From environment variable
#   export SP_TOKEN=your_token
client = ScrapingPros()

Credits

Type	Credits
Simple HTTP scrape	1
Browser scrape	5

Refunded automatically on infrastructure errors.
Track usage: client.credits_charged, client.quota_remaining.
client.billing() for monthly summaries, client.plans() for pricing.

Where next

Batch API — the full story: progress, callbacks, resume, tuning for 50k+ URL batches.
Anti-Bot Bypass — strategies for CAPTCHA, Cloudflare, DataDome, etc.
Data Extraction — pull structured data directly, no parsing.
JavaScript Execution — run JS in the page to call internal APIs.

Install​

The one example you need​

Need just one URL?​

Async / asyncio?​

Want a list back, not a stream?​

Authentication​

Credits​

Where next​