Run
A collection can be executed multiple times. A Run is a single execution of a collection.
Endpoints
-
GET /v1/async/collections/{collection_id}/runs/{run_id}/jobs
-
GET /v1/async/collections/{collection_id}/runs/{run_id}/jobs/{job_id}/result
POST /v1/async/collections/{collection_id}/run
This endpoint triggers a Run of a collection.
A collection_id is required to make this request.
Response Example:
{
"run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
"status": "in_progress",
"total_requests": 2,
"success_requests": 0,
"failed_requests": 0,
"timeout_requests": 0,
"collection_id": "9634997b-6431-4b11-a4cb-fc00e941ba8d",
"job_ids": ["job-uuid-1", "job-uuid-2"],
"callback_url": "https://your-server.com/webhook",
"callback_status": "pending"
}
Details about the returned fields can be found in Reference.
GET /v1/async/collections/{collection_id}/runs
Lists every run of a given collection, newest first.
Useful for two things:
- Audit / dashboards: see all the times a collection has been executed.
- Recovery after a submit timeout: you persisted the
collection_id, yourPOST /runrequest lost its response — re-attach to the live run with?status_filter=in_progressinstead of triggering a duplicate.
# All runs
curl 'https://api.scrapingpros.com/v1/async/collections/{collection_id}/runs' \
-H 'Authorization: Bearer <API-KEY>'
# Just the live run
curl 'https://api.scrapingpros.com/v1/async/collections/{collection_id}/runs?status_filter=in_progress' \
-H 'Authorization: Bearer <API-KEY>'
Response Example:
{
"items": [
{
"run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
"status": "in_progress",
"total_requests": 100,
"success_requests": 73,
"failed_requests": 5,
"timeout_requests": 0,
"collection_id": "9634997b-6431-4b11-a4cb-fc00e941ba8d",
"callback_url": null,
"callback_status": null,
"created_at": 1777853217.82
}
],
"total": 1
}
GET /v1/async/collections/{collection_id}/runs/{run_id}
This endpoint returns the current status of a Run, including the webhook delivery status.
Response Example
{
"run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
"status": "completed",
"total_requests": 2,
"success_requests": 2,
"failed_requests": 0,
"timeout_requests": 0,
"collection_id": "9634997b-6431-4b11-a4cb-fc00e941ba8d",
"job_ids": ["job-uuid-1", "job-uuid-2"],
"callback_url": "https://your-server.com/webhook",
"callback_status": "sent"
}
GET /v1/async/collections/{collection_id}/runs/{run_id}/jobs
Lists all jobs of a run with cursor-based pagination. Returns metadata (URL, status, timings, custom_id, validator fields) without the HTML body — use the /result endpoint below to download content.
Query parameters:
| Param | Type | Default | Description |
|---|---|---|---|
cursor | string | (none) | Opaque cursor returned by the previous page. Omit on first call. Encoding depends on order_by — mixing them returns 400. |
limit | integer | 100 | Page size. Min 1, max 1000. |
status_filter | string / CSV | (none) | Single value or CSV: completed, failed, timeout, processing. Example status_filter=completed,failed,timeout. |
since_completed_at | ISO 8601 string | (none) | Returns only rows with completed_at strictly greater. Accepts Z, +00:00, or naive (UTC). Rows with NULL completed_at are excluded. |
order_by | id | completed_at | id | Sort order. Use completed_at for streaming completions as they finish. |
order_dir | asc | desc | asc | Honored only for order_by=completed_at. |
Response example:
{
"items": [
{
"job_public_id": "e3a1b2c4-...",
"run_public_id": "9b64941a-...",
"collection_id": "9634997b-...",
"status": "completed",
"url": "https://example.com/tours/123",
"custom_id": "tour_12345",
"url_truncated": false,
"status_code": 200,
"message": null,
"queued_at": "2026-04-23T12:00:00.123",
"started_at": "2026-04-23T12:00:02.267",
"completed_at": "2026-04-23T12:00:03.637",
"execution_time_ms": 1370,
"retries_attempted": 0,
"block_reason": null,
"protection_stack": ["cloudflare"],
"rule_hits": []
}
],
"cursor_next": "MzQ=",
"has_more": true
}
Timing: jobs appear in this listing roughly 5 seconds after completion (internal metadata flusher tick). The ordered sequence of queued_at → started_at → completed_at lets you compute queue wait time and execution latency per job.
Retention: listing metadata is retained for 90 days after the run (MySQL partitioned tables). HTML bodies are retained for 48 hours — beyond that window, the /result endpoint returns 404 but the listing above is still available.
Pagination pattern:
cursor = None
while True:
params = {"limit": 500}
if cursor:
params["cursor"] = cursor
page = requests.get(jobs_url, headers=H, params=params).json()
for job in page["items"]:
handle(job)
if not page["has_more"]:
break
cursor = page["cursor_next"]
See full reference at apiReference/scrapeo_asincronico.
GET /v1/async/collections/{collection_id}/runs/{run_id}/jobs/{job_id}/result
Retrieves the full result of a specific job (HTML body, extracted data, timings). Results are available for 48 hours after job completion.
Response Example
{
"url": "https://example.com/tours/123",
"custom_id": "tour_12345",
"html": "<!doctype html>...",
"statusCode": 200,
"timings": {"queue_wait_ms": 45, "proxy_ms": 120},
"potentiallyBlockedByCaptcha": false,
"extracted_data": null
}
The response includes url and custom_id so you can correlate each result back to your original request without relying on insertion order.
If the result is unavailable, the API responds with 404 and a structured detail that tells you which kind of unavailable it is:
HTTP 404
{
"detail": {
"error_code": "result_lost",
"message": "Job result is unavailable due to a service incident during the completion window. Contact support if the data is critical — it may qualify for refund.",
"completed_at": "2026-04-30T12:34:56Z",
"age_hours": 0.4
}
}
error_code | Meaning | Suggested action |
|---|---|---|
result_pending | Job is still in flight, or the worker did not store a result yet. | Retry shortly. |
result_expired | More than 24 h since completion — the body has been pruned. | Re-run the collection if you still need the data. |
result_lost | Body unavailable within the 24 h window. | Contact support — may qualify for refund. |
job_id_invalid | We have no record of this job. | Verify the IDs in your client. |
Webhooks
If the collection has a callback_url configured, a signed HTTP POST is automatically sent upon run completion:
{
"event": "run.completed",
"run_id": "uuid",
"collection_id": "uuid",
"status": "completed",
"total_requests": 2,
"success_requests": 2,
"failed_requests": 0,
"job_ids": ["job-uuid-1", "job-uuid-2"],
"results_url": "https://api.scrapingpros.com/v1/async/collections/{cid}/runs/{rid}",
"timestamp": "2026-04-06T20:30:00Z"
}
Security: The webhook includes an HMAC-SHA256 signature in the headers:
X-SP-Signature: sha256=<hex>-- signature of{timestamp}.{body}X-SP-Timestamp: <unix_epoch>
Retries: If delivery fails (timeout, 5xx), it is automatically retried up to 5 times with backoff: 1min, 5min, 30min, 2h, 12h. The callback_status field reflects the current status.
Reference
run_id: Generated UUID of the run. This value is recommended for run tracking usingGET /v1/async/collections/{collection_id}/runs/{run_id}.status: The current status of the Run. It can take 2 values:in_progressorcompleted.total_requests: Number of requests in the collection.success_requests: Number of requests that delivered usable content (HTTP 2xx + no block signal). A job whose worker completed but whose target returned 4xx/5xx or a captcha page is counted underfailed_requests, not here.failed_requests: Number of requests that failed.timeout_requests: Number of requests that timed out.collection_id: UUID of the collection.job_ids: List of UUIDs of the individual jobs. Use these to retrieve results with the job result endpoint. Available for the lifetime of the run, regardless of status — you can always enumerate the jobs of a run, even afterstatus=completedand after the result bodies have expired (the listing metadata is kept for 90 days).callback_url: Configured webhook URL (if set).callback_status: Webhook delivery status:pending(in progress),sent(delivered),failed(delivery failed),retrying(retrying delivery).