Asynchronous Endpoints

The asynchronous endpoints allow you to group multiple requests into a collection and execute them in the background. Ideal for scraping large volumes of URLs.

POST `/v1/async/collections`

Creates a new request collection.

Request

curl -X POST \
  'https://api.scrapingpros.com/v1/async/collections' \
  -H 'Authorization: Bearer <API-KEY>' \
  -H 'Idempotency-Key: 7f3a2b1e-4c5d-4f1a-8b2c-9d4e5f6a7b8c' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "My collection",
    "requests": [
        {
            "url": "https://example.com",
            "custom_id": "tour_12345",
            "browser": true
        },
        {
            "url": "https://example.org",
            "custom_id": "tour_12346",
            "use_proxy": "any"
        }
    ]
  }'

Headers

Header	Required	Description
`Authorization`	Yes	`Bearer <API-KEY>`
`Idempotency-Key`	No	Client-generated unique key (UUID recommended). Lets you safely retry the request after a network timeout without creating a duplicate collection. See Idempotency below.

Body

Field	Type	Required	Description
`name`	string	Yes	Name of the collection
`requests`	array	No	List of requests. Same format as the `/v1/sync/scrape` body — each request may include `custom_id` for traceability

`custom_id` (optional, per request)

Client-supplied identifier (max 255 chars) that is echoed back in job listings, result payloads, and webhooks. Lets you correlate jobs to your own domain objects without matching by URL.

Deduplication by (url, custom_id): two requests with the same URL but different custom_id are considered distinct jobs (useful when the same URL feeds multiple pipelines — e.g. {url, "english"} and {url, "spanish"}). Two requests with the same URL and the same custom_id (or both without custom_id) are deduplicated — only the first one is kept and duplicates_skipped is incremented in the response.

Response (201)

{
  "id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9",
  "name": "My collection",
  "message": "Collection created successfully.",
  "duplicates_skipped": 0,
  "blocked_urls": []
}

Response when some URLs were rejected

If one or more URLs fail validation (private/internal IPs, unsupported protocols, malformed input), the collection is still created with the URLs that passed, and the rejected ones come back in blocked_urls. You can use that list to fix the inputs and re-submit only the failures, without having to parse the message.

{
  "id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9",
  "name": "My collection",
  "message": "Collection created successfully. 2 URL(s) blocked (SSRF protection).",
  "duplicates_skipped": 0,
  "blocked_urls": [
    {
      "index": 1,
      "url": "http://192.168.1.1/admin",
      "reason": "private_ip",
      "message": "URL resolved to a private or internal IP."
    },
    {
      "index": 2,
      "url": "ftp://example.com/file",
      "reason": "invalid_protocol",
      "message": "Only http and https URLs are accepted."
    }
  ]
}

reason is one of: private_ip, dns_failed, blocked_hostname, invalid_protocol, invalid_port, malformed_url, or blocked (generic fallback). index matches the position of the URL in your original requests array.

Idempotency

Network timeouts during submit are common with large batches. Without protection, a client retry can create a second collection and double the cost. Send an Idempotency-Key header — any UUID per logical operation — to make the retry safe:

Scenario	Result
First request with the key	Collection is created normally. Response header `Idempotency-Replayed: false`.
Retry with same key + same body within 24 h	Returns the original `id` and response, without re-processing or re-charging. Header `Idempotency-Replayed: true`.
Same key but different body	`422` — `"Idempotency-Key reused with a different payload."` Use a new key for a different operation.
Two requests with the same key arriving in parallel	The second one waits (up to 30 s) and replays the first one's response.

The key is stored for 24 h, scoped to your client. It must be ≤ 200 chars and contain no whitespace or :. UUIDs (e.g. generated with uuid.uuid4()) are recommended.

If you don't send the header, behavior is unchanged — each POST creates a new collection.

GET `/v1/async/collections`

Lists collections, optionally filtered by name and creation time.

Request

# All collections
curl 'https://api.scrapingpros.com/v1/async/collections' \
  -H 'Authorization: Bearer <API-KEY>'

# Exact-name match (e.g. recover a collection after a timeout)
curl 'https://api.scrapingpros.com/v1/async/collections?name=daily-2026-04-30' \
  -H 'Authorization: Bearer <API-KEY>'

# All collections that start with `daily-`
curl 'https://api.scrapingpros.com/v1/async/collections?name_prefix=daily-' \
  -H 'Authorization: Bearer <API-KEY>'

# All collections created in the last hour
curl 'https://api.scrapingpros.com/v1/async/collections?since=2026-04-30T11:00:00Z' \
  -H 'Authorization: Bearer <API-KEY>'

Query parameters

Param	Type	Description
`name`	string	Exact match on the collection name.
`name_prefix`	string	Returns collections whose name starts with this prefix.
`since`	ISO 8601	Returns collections created at or after this timestamp. Collections created before this field was tracked (legacy) are excluded when `since` is set.

All three can be combined; they apply with AND semantics.

Response (200)

[
    {
        "id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9",
        "name": "My collection",
        "created_at": 1777853200.5,
        "updated_at": 1777853200.5
    },
    {
        "id": "11d6f8af-9a54-4b6c-b793-e12b77c86159",
        "name": "Another collection",
        "created_at": 1777851000.1,
        "updated_at": 1777851500.3
    }
]

created_at and updated_at are epoch seconds (UTC). They are null for collections created before this field was tracked — clients should tolerate the null.

GET `/v1/async/collections/{collection_id}`

Gets a specific collection by its ID.

Request

curl 'https://api.scrapingpros.com/v1/async/collections/c38b0bcf-cb7c-4728-8704-2c2e267dcff9' \
  -H 'Authorization: Bearer <API-KEY>'

Response (200)

{
  "id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9",
  "name": "My collection",
  "created_at": 1777853200.5,
  "updated_at": 1777853200.5
}

PUT `/v1/async/collections/{collection_id}`

Updates an existing collection. Both the name and the request list can be modified. If a new request list is sent, it replaces the previous one.

Request

curl -X PUT \
  'https://api.scrapingpros.com/v1/async/collections/c38b0bcf-cb7c-4728-8704-2c2e267dcff9' \
  -H 'Authorization: Bearer <API-KEY>' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "Updated collection",
    "requests": [
        {
            "url": "https://new-example.com",
            "browser": true
        }
    ]
  }'

Response (200)

{
  "id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9",
  "name": "Updated collection",
  "message": "Collection updated successfully."
}

POST `/v1/async/collections/{collection_id}/run`

Executes all requests in a collection asynchronously. A collection can be executed multiple times.

Request

curl -X POST \
  'https://api.scrapingpros.com/v1/async/collections/c38b0bcf-cb7c-4728-8704-2c2e267dcff9/run' \
  -H 'Authorization: Bearer <API-KEY>'

No body required.

Response (201)

{
  "run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
  "status": "in_progress",
  "total_requests": 2,
  "success_requests": 0,
  "failed_requests": 0,
  "timeout_requests": 0,
  "collection_id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9"
}

GET `/v1/async/collections/{collection_id}/runs`

Lists every run that has been executed against a given collection, newest first. Useful when you persist a collection_id and need to enumerate its history (current run, previous re-runs, audit trail), or when you want to reattach to a live run after the original POST /run request lost its response (network timeout, etc.).

Request

# All runs of this collection
curl 'https://api.scrapingpros.com/v1/async/collections/c38b0bcf-.../runs' \
  -H 'Authorization: Bearer <API-KEY>'

# Just the live run (helpful after a submit timeout — you keep `collection_id`,
# you fetch the run that's already executing)
curl 'https://api.scrapingpros.com/v1/async/collections/c38b0bcf-.../runs?status_filter=in_progress' \
  -H 'Authorization: Bearer <API-KEY>'

Query parameters

Param	Type	Description
`status_filter`	`in_progress` \| `completed`	Filter by run status. Omit for all runs.

Response (200)

{
  "items": [
    {
      "run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
      "status": "in_progress",
      "total_requests": 100,
      "success_requests": 73,
      "failed_requests": 5,
      "timeout_requests": 0,
      "collection_id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9",
      "callback_url": null,
      "callback_status": null,
      "created_at": 1777853217.82
    },
    {
      "run_id": "8c9bafe2-...",
      "status": "completed",
      "total_requests": 100,
      "success_requests": 99,
      "failed_requests": 1,
      "timeout_requests": 0,
      "collection_id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9",
      "callback_url": "https://example.com/webhook",
      "callback_status": "sent",
      "created_at": 1777840000.10
    }
  ],
  "total": 2
}

Order: newest first (created_at desc). Runs created before this field was tracked sort to the bottom with created_at: null.

GET `/v1/async/collections/{collection_id}/runs/{run_id}`

Queries the status and result of an execution. Call periodically until status is completed.

Request

curl 'https://api.scrapingpros.com/v1/async/collections/c38b0bcf-cb7c-4728-8704-2c2e267dcff9/runs/9b64941a-4545-4c57-9174-c70e781d9192' \
  -H 'Authorization: Bearer <API-KEY>'

Response -- in progress (200)

{
  "run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
  "status": "in_progress",
  "total_requests": 2,
  "success_requests": 1,
  "failed_requests": 0,
  "timeout_requests": 0,
  "collection_id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9"
}

Response -- completed without errors (200)

{
  "run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
  "status": "completed",
  "total_requests": 2,
  "success_requests": 2,
  "failed_requests": 0,
  "timeout_requests": 0,
  "collection_id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9",
  "failed_jobs": []
}

Response -- completed with errors (200)

{
  "run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
  "status": "completed",
  "total_requests": 3,
  "success_requests": 2,
  "failed_requests": 1,
  "timeout_requests": 0,
  "collection_id": "c38b0bcf-cb7c-4728-8704-2c2e267dcff9",
  "failed_jobs": [
    {
      "job_id": "e3a1b2c4-...",
      "url": "https://example.com/page-that-failed",
      "custom_id": "tour_12346",
      "status": "failed",
      "error": "Connection timeout"
    }
  ]
}

Response Fields

Field	Type	Description
`run_id`	string (UUID)	Unique identifier of the execution
`status`	string	Status: `in_progress` or `completed`
`total_requests`	integer	Total requests in the collection
`success_requests`	integer	Requests that delivered usable content: worker completed, target responded with HTTP 2xx, and no block signal (captcha, softblock, etc.) was detected.
`failed_requests`	integer	Requests that did not deliver usable content. Includes: worker failures, 4xx/5xx from the target, captcha pages, and any response flagged by the HTML validator.
`timeout_requests`	integer	Requests that timed out at the worker level (never got a response from the target).
`success_criterion`	object \| null	Declarative description of the active classification rule. Fields: `version` (e.g. `content_success_v1`) and `rules` (human-readable predicates). Lets clients pin the rule version in their SDK and detect silent policy changes.

Counters measure content success

success_requests is what the client gets back, not worker-level completion. A job that finished but whose target returned a 500 or a captcha page counts as failed. This aligns the counter with what you're paying for — HTML you can use.

For worker-level health (how many jobs finished without infrastructure failure, regardless of content), use the per-job listing with status_filter=completed and count the items yourself.

Where are the scraping results?

The run status endpoint returns a summary (total / success / failed counters) but does not include the scraped content inline. To retrieve per-job data use the two endpoints below:

List all jobs of a run (cursor-paginated, with URL and timings): GET /v1/async/collections/{collection_id}/runs/{run_id}/jobs
Full result of a specific job (HTML/JSON body): GET /v1/async/collections/{collection_id}/runs/{run_id}/jobs/{job_id}/result

Result bodies are retained for 48 hours after job completion. Metadata in the listing endpoint is retained for 90 days. The list of job_ids on the run itself is available for the lifetime of the run (it falls back to the durable record if the cache misses), so you can always enumerate the jobs of a run regardless of how long ago it completed.

GET `/v1/async/collections/{collection_id}/runs/{run_id}/jobs`

Lists all jobs belonging to a run, with cursor-based pagination. Returns URL, status, timings, custom_id, and validator metadata for every job — without the (potentially large) HTML body.

Typical use: iterate over completed jobs to pick which ones to download full result for, or build a custom dashboard.

Request

curl 'https://api.scrapingpros.com/v1/async/collections/c38b0bcf-.../runs/9b64941a-.../jobs?limit=100' \
  -H 'Authorization: Bearer <API-KEY>'

Query parameters

Param	Type	Default	Description
`cursor`	string	(none)	Opaque cursor returned by the previous page. Omit on the first call. Encoding depends on `order_by` (see below) — passing a cursor generated with a different `order_by` responds with 400.
`limit`	integer	100	Page size. Min 1, max 1000.
`status_filter`	string / CSV	(none)	Filter by status. Accepts a single value or a CSV for multiple: `completed`, `failed`, `timeout`, `processing`. Example: `status_filter=completed,failed,timeout`. Omit to return all.
`since_completed_at`	ISO 8601 string	(none)	When set, returns only rows with `completed_at` strictly greater. Accepts `Z`, `+00:00`, or naive (treated as UTC). Rows with NULL `completed_at` (e.g. still `processing`) are excluded. Useful for incremental polling to avoid re-fetching already-seen completions.
`order_by`	`id` \| `completed_at`	`id`	Sort order. `id` preserves legacy behaviour (insertion order). `completed_at` sorts by when the job finished — ideal for streaming completions as they happen.
`order_dir`	`asc` \| `desc`	`asc`	Direction (only honored for `order_by=completed_at`; `order_by=id` is always ASC).

Cursor encoding

order_by=id: base64(str(auto_increment_id)) — back-compat. Stable across deploys.
order_by=completed_at: base64("<iso_ts>|<job_public_id>"). The tuple (completed_at, job_public_id) is a stable position in the ordering; the job_public_id breaks ties when two jobs complete at the same millisecond.

Cursors have no TTL. A cursor stays valid for as long as the partition containing newer rows exists (90-day retention). Once the run's partition is dropped, the cursor returns an empty page without error.

Response (200)

{
  "items": [
    {
      "job_public_id": "e3a1b2c4-...",
      "run_public_id": "9b64941a-...",
      "collection_id": "c38b0bcf-...",
      "status": "completed",
      "url": "https://example.com/tours/123",
      "custom_id": "tour_12345",
      "url_truncated": false,
      "status_code": 200,
      "message": null,
      "queued_at": "2026-04-23T12:00:00.123",
      "started_at": "2026-04-23T12:00:02.267",
      "completed_at": "2026-04-23T12:00:03.637",
      "execution_time_ms": 1370,
      "retries_attempted": 0,
      "block_reason": null,
      "protection_stack": ["cloudflare"],
      "rule_hits": []
    }
  ],
  "cursor_next": "MzQ=",
  "has_more": true
}

Response fields

Field	Type	Description
`job_public_id`	UUID	Job identifier
`url`	string	Target URL of the scrape. Truncated to 2048 chars — see `url_truncated`
`url_truncated`	bool	`true` if the original URL exceeded the storage column size and was cut. Compare URLs with care if this is `true`
`custom_id`	string \| null	Client-supplied identifier, echoed from the collection request
`status`	string	`completed` / `failed` / `timeout` / `processing`
`status_code`	integer \| null	HTTP status returned by the target site
`queued_at` / `started_at` / `completed_at`	ISO 8601	Lifecycle timestamps (UTC, millisecond precision)
`execution_time_ms`	integer	End-to-end duration in ms
`retries_attempted`	integer	0 when the first attempt succeeded
`block_reason`	string \| null	Populated by the HTML validator when content is flagged (captcha, softblock, shell, hard_block, etc.)
`protection_stack`	array of strings	Anti-bot providers detected on the target (e.g. `["cloudflare", "datadome"]`)
`rule_hits`	array of strings	Validator rules that fired (debug/diagnostic; may be empty in success cases)
`is_success`	bool \| null	Pre-computed content-success verdict. `true` = the client received usable HTML (2xx + no captcha/block). `false` = worker completed but content is not usable (4xx/5xx/captcha) OR worker failed/timed out. `null` = verdict not computed (pre-migration rows). Sum of rows with `is_success=true` equals `run.success_requests` by construction — classify jobs on the client without replicating the rule.
`cursor_next`	string \| null	Cursor to request the next page. `null` when there are no more items
`has_more`	bool	`true` when additional pages exist

Paginating the full run

import requests

BASE = "https://api.scrapingpros.com"
H = {"Authorization": "Bearer <API-KEY>"}
url = f"{BASE}/v1/async/collections/{cid}/runs/{rid}/jobs"

cursor = None
while True:
    params = {"limit": 500}
    if cursor:
        params["cursor"] = cursor
    page = requests.get(url, headers=H, params=params).json()
    for job in page["items"]:
        process(job)  # e.g. store in your DB, queue HTML download, etc.
    if not page["has_more"]:
        break
    cursor = page["cursor_next"]

There is a ~5-second lag between a job completing and appearing in this listing (the metadata flusher runs on a 5s tick). For strict real-time notification, use webhooks (coming in a future release).

Efficient incremental polling

When iterating a large batch (thousands of URLs), combine order_by=completed_at, since_completed_at, and status_filter to fetch only new completions since the last poll:

import requests
from datetime import datetime, timezone

BASE = "https://api.scrapingpros.com"
H = {"Authorization": "Bearer <API-KEY>"}

# Track the newest completed_at we have already consumed.
last_seen = None

while True:
    params = {
        "order_by": "completed_at",
        "status_filter": "completed,failed,timeout",
        "limit": 1000,
    }
    if last_seen:
        params["since_completed_at"] = last_seen

    r = requests.get(
        f"{BASE}/v1/async/collections/{cid}/runs/{rid}/jobs",
        headers=H, params=params,
    )
    page = r.json()
    for job in page["items"]:
        handle(job)
        if job["completed_at"] and (not last_seen or job["completed_at"] > last_seen):
            last_seen = job["completed_at"]
    if not page.get("has_more"):
        break  # caught up; poll again after a short sleep

For a batch of 50 000 URLs this reduces polling cost from ~50 API calls per tick (full pagination + client-side dedup) to ~1 call per tick.

GET `/v1/async/collections/{collection_id}/runs/{run_id}/jobs/{job_id}/result`

Returns the full scraping result of a single job — same shape as the response of POST /v1/sync/scrape, plus url and custom_id for traceability.

Request

curl 'https://api.scrapingpros.com/v1/async/collections/c38b0bcf-.../runs/9b64941a-.../jobs/e3a1b2c4-.../result' \
  -H 'Authorization: Bearer <API-KEY>'

Response (200)

{
  "url": "https://example.com/tours/123",
  "custom_id": "tour_12345",
  "status": "completed",
  "html": "<!doctype html> ...",
  "statusCode": 200,
  "extracted_data": null,
  "timings": { "total_ms": 1370, "navigation_ms": 890 },
  "executionTime": 1.37,
  "potentiallyBlockedByCaptcha": false,
  "guidance": {
    "success": true,
    "error_type": null,
    "next_steps": [],
    "suggested_request": null,
    "stop_reason": null
  }
}

The response shape matches POST /v1/sync/scrape — guidance in particular is now populated in the async /result endpoint so clients have the same post-hoc diagnostics they already get in sync mode (why a request failed, which parameters to adjust, whether to retry).

Retention

HTML bodies are retained for 48 hours after job completion. After that window, the listing metadata remains for 90 days but this endpoint returns 404. If you need longer retention, download the result once the job completes (or subscribe to the run callback_url) and persist on your side.

When the result is not available (404)

If the job completes successfully you'll receive 200 with the result body. If the body is unavailable, the response is 404 with a structured detail that tells you which kind of unavailable it is, so you can react accordingly:

HTTP 404
{
  "detail": {
    "error_code": "result_lost",
    "message": "Job result is unavailable due to a service incident during the completion window. Contact support if the data is critical — it may qualify for refund.",
    "completed_at": "2026-04-30T12:34:56Z",
    "age_hours": 0.4
  }
}

`error_code`	What it means	Suggested action
`result_pending`	The job is still in flight (or the worker did not store a result yet).	Retry shortly — typical jobs complete in seconds.
`result_expired`	The job completed more than 24 h ago. The metadata is still in the listing endpoint, but the body has been pruned.	Re-run the collection if you still need the data.
`result_lost`	The job completed within the last 24 h, but the body is unavailable.	Contact support — may qualify for a refund.
`job_id_invalid`	We have no record of this `job_id` for the given `run_id`.	Verify the IDs you're using; this typically points to a client bug.

The detail field has been a string in older versions of the API and may keep being a string in some edge paths (e.g. when the upstream lookup itself times out). Robust clients should accept both shapes.

Example: polling until completion and downloading results

import time, requests

BASE = "https://api.scrapingpros.com"
H = {"Authorization": "Bearer <API-KEY>"}
COLLECTION_ID = "c38b0bcf-cb7c-4728-8704-2c2e267dcff9"

# 1. Start the run
run = requests.post(
    f"{BASE}/v1/async/collections/{COLLECTION_ID}/run",
    headers=H
).json()
run_id = run["run_id"]

# 2. Poll the run status until completed
while True:
    status = requests.get(
        f"{BASE}/v1/async/collections/{COLLECTION_ID}/runs/{run_id}",
        headers=H
    ).json()
    print(f"Status: {status['status']} — "
          f"{status['success_requests']}/{status['total_requests']} successful")
    if status["status"] == "completed":
        break
    time.sleep(5)

# 3. Iterate jobs via cursor, download HTML, match by custom_id
cursor, results = None, {}
while True:
    params = {"limit": 500, "status_filter": "completed"}
    if cursor:
        params["cursor"] = cursor
    page = requests.get(
        f"{BASE}/v1/async/collections/{COLLECTION_ID}/runs/{run_id}/jobs",
        headers=H, params=params
    ).json()
    for job in page["items"]:
        # Download full body for jobs you care about
        r = requests.get(
            f"{BASE}/v1/async/collections/{COLLECTION_ID}/runs/{run_id}/jobs/{job['job_public_id']}/result",
            headers=H
        ).json()
        results[job["custom_id"] or job["url"]] = r["html"]
    if not page["has_more"]:
        break
    cursor = page["cursor_next"]

# 4. (optional) Check failed jobs
if status.get("failed_jobs"):
    print("Failed jobs:")
    for job in status["failed_jobs"]:
        print(f"  - custom_id={job.get('custom_id')} url={job['url']}: {job['error']}")

Python SDK

The SDK (pip install scrapingpros>=0.3.0) offers client.iter_run_jobs(collection_id, run_id) as a generator that handles cursor pagination internally, plus typed models with datetime parsing for the timestamp fields.

POST /v1/async/collections​

Request​

Headers​

Body​

custom_id (optional, per request)​

Response (201)​

Response when some URLs were rejected​

Idempotency​

GET /v1/async/collections​

Request​

Query parameters​

Response (200)​

GET /v1/async/collections/{collection_id}​

Request​

Response (200)​

PUT /v1/async/collections/{collection_id}​

Request​

Response (200)​

POST /v1/async/collections/{collection_id}/run​

Request​

Response (201)​

GET /v1/async/collections/{collection_id}/runs​

Request​

Query parameters​

Response (200)​

GET /v1/async/collections/{collection_id}/runs/{run_id}​

Request​

Response -- in progress (200)​

Response -- completed without errors (200)​

Response -- completed with errors (200)​

Response Fields​

Where are the scraping results?​

GET /v1/async/collections/{collection_id}/runs/{run_id}/jobs​

Request​

Query parameters​

Cursor encoding​

Response (200)​

Response fields​

Paginating the full run​

Efficient incremental polling​

GET /v1/async/collections/{collection_id}/runs/{run_id}/jobs/{job_id}/result​

Request​

Response (200)​

When the result is not available (404)​

Example: polling until completion and downloading results​

POST `/v1/async/collections`

Request

Headers

Body

`custom_id` (optional, per request)

Response (201)

Response when some URLs were rejected

Idempotency

GET `/v1/async/collections`

Request

Query parameters

Response (200)

GET `/v1/async/collections/{collection_id}`

Request

Response (200)

PUT `/v1/async/collections/{collection_id}`

Request

Response (200)

POST `/v1/async/collections/{collection_id}/run`

Request

Response (201)

GET `/v1/async/collections/{collection_id}/runs`

Request

Query parameters

Response (200)

GET `/v1/async/collections/{collection_id}/runs/{run_id}`

Request

Response -- in progress (200)

Response -- completed without errors (200)

Response -- completed with errors (200)

Response Fields

Where are the scraping results?

GET `/v1/async/collections/{collection_id}/runs/{run_id}/jobs`

Request

Query parameters

Cursor encoding

Response (200)

Response fields

Paginating the full run

Efficient incremental polling

GET `/v1/async/collections/{collection_id}/runs/{run_id}/jobs/{job_id}/result`

Request

Response (200)

When the result is not available (404)

Example: polling until completion and downloading results