Classic Asynchronous Workflow Example
This section describes the most common usage of the asynchronous endpoints.
The purpose of this flow is:
- Create a collection with multiple requests.
- Execute a Run for this collection.
- Query the run status until it completes.
1. Create a collection
First, we create a collection that groups the requests we want to run asynchronously.
Endpoint
POST /v1/async/collections
Request Body Example
{
"name": "new collection",
"requests": [
{
"url": "www.google.com",
"browser": true,
"screenshot": false,
"actions": [
{
"type": "wait-for-timeout",
"time": 5000
}
]
},
{
"url": "www.example.com",
"browser": true,
"screenshot": false,
"actions": [
{
"type": "wait-for-timeout",
"time": 5000
}
]
}
]
}
Expected Response
{
"id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9",
"name": "new collection",
"message": "Collection created successfully."
}
At this point, the collection is ready to be executed. Save the collection_id as it will be needed for the following steps.
2. Create a Run for the Collection
Once the collection has been created, we can start the Run execution.
A Run represents a single execution of the requests placed in the collection.
Endpoint
POST /v1/async/collections/{collection_id}/run
Parameters
No body is needed for this request, only the collection_id is required as a parameter.
Response Example
{
"run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
"status": "in_progress",
"total_requests": 2,
"success_requests": 0,
"failed_requests": 0,
"timeout_requests": 0,
"collection_id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9"
}
The run is created and begins executing asynchronously.
- The initial
statusalways starts asin_progress. - The
run_iduniquely identifies the execution and should be saved to track the Run.
3. Query the Run Status
Since the run is asynchronous, the execution takes a variable amount of time depending on the number of requests and their complexity.
After a short wait, you can query the run status using the run_id.
Endpoint
GET /v1/async/collections/{collection_id}/runs/{run_id}
Response Example (Still In Progress)
{
"run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
"status": "in_progress",
"total_requests": 2,
"success_requests": 1,
"failed_requests": 0,
"timeout_requests": 0,
"collection_id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9"
}
This response indicates that the Run has started but has not finished.
4. Run Completed
After waiting long enough, the query will return the completed run.
Response Example (Completed)
{
"run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
"status": "completed",
"total_requests": 2,
"success_requests": 2,
"failed_requests": 0,
"timeout_requests": 0,
"collection_id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9"
}
At this point:
statusiscompleted.- All requests defined in the collection have been processed.
success_requestscounts jobs that returned usable content (HTTP 2xx + no captcha/block signal).failed_requestsincludes worker failures and jobs that completed but whose target returned 4xx/5xx or a block page.timeout_requestscovers jobs that exceeded the worker-level timeout. The invarianttotal_requests = success + failed + timeoutalways holds on a completed run.
Retrieving per-job results
The run-status endpoint above is a summary. To iterate each job's URL, custom_id, timings, and HTML, use the jobs listing endpoint with cursor pagination:
cursor = None
while True:
params = {"limit": 500, "order_by": "completed_at", "status_filter": "completed,failed,timeout"}
if cursor:
params["cursor"] = cursor
page = requests.get(
f"{BASE_URL}/v1/async/collections/{COLLECTION_ID}/runs/{run_id}/jobs",
headers=HEADERS, params=params,
).json()
for job in page["items"]:
handle(job["custom_id"], job["url"], job["status"], job["status_code"])
if not page.get("has_more"):
break
cursor = page["cursor_next"]
With order_by=completed_at + since_completed_at you can stream completions incrementally without re-paginating the whole run on each poll. See the API reference for the full semantics.
For the full HTML or extracted data of a specific job: GET /v1/async/collections/{cid}/runs/{run_id}/jobs/{job_id}/result. HTML bodies are retained 48 hours after completion; metadata (status, timings, URL, custom_id) is retained 90 days in the listing endpoint.
Summary
This code flow follows a simple pattern that breaks down into 3 parts:
- Create a Collection With one or more requests (each may include an optional
custom_idfor traceability). - Start the Execution Run the collection asynchronously.
- Query the Run Status Using the run ID until it completes — then iterate the jobs listing to process each result.
Resilient submits (recommended for production)
Two small additions make the workflow safe against transient failures:
Generate an Idempotency-Key per submit
Pass a UUID in the Idempotency-Key header on POST /v1/async/collections. If the response is lost to a network timeout, a retry with the same key within 24 h returns the original collection without creating a duplicate (and without a second charge):
import uuid, requests
key = str(uuid.uuid4())
resp = requests.post(
f"{BASE_URL}/v1/async/collections",
headers={**HEADERS, "Idempotency-Key": key},
json={"name": "daily-2026-04-30", "requests": [...]},
)
# Safe to retry resp on timeout — same key + same body returns the same collection.
Reattach to a live run via GET /collections/{cid}/runs
If the response to POST /run is lost, you don't need to retry the run (which would queue a duplicate). Look it up by collection — the run_id is already created server-side:
runs = requests.get(
f"{BASE_URL}/v1/async/collections/{collection_id}/runs?status_filter=in_progress",
headers=HEADERS,
).json()
if runs["total"] > 0:
run_id = runs["items"][0]["run_id"] # reattach
else:
run_id = requests.post(
f"{BASE_URL}/v1/async/collections/{collection_id}/run",
headers=HEADERS,
).json()["run_id"]
These two patterns combined remove the most common production failure mode: doubled batches caused by client-side retry of an already-successful request.