Scraping Pros API

Documentation to facilitate the use of the Scraping Pros API.

Sync Endpoints

POST `/v1/sync/scrape`

This endpoint extracts the HTML from a page. It returns the HTML as soon as it is obtained. A request body with the following structure is required:

{
    "url": "https://example.com",
    "browser": true,
    "language": "en-us",
    "screenshot": true,
    "use_proxy": {
      "proxy": "any",
      "max_retries": 3,
      "delay_seconds": 1,
      "backoff_factor": 2
    },
    "headers": {
      "X-Custom-Header": "value"
    },
    "actions": [
        {
            "type": "input",
            "selector": "//input[@name='search']",
            "text": "search query"
        },
        {
            "type": "click",
            "selector": "css:button[type='submit']",
            "wait_for_navigation": true
        }
    ],
    "extract": {
        "title": "css:h1",
        "prices": {
            "selector": "css:.price",
            "multiple": true
        }
    },
    "cookies": {
      "key_1": "value_1",
      "key_2": "value_2"
    }
}

Response

{
    "html": "<html>...</html>",
    "markdown": null,
    "statusCode": 200,
    "message": "Request completed successfully",
    "executionTime": 3.45,
    "screenshot": "base64...",
    "extracted_data": {
        "title": "Example Domain",
        "prices": ["$10.99", "$24.99", "$5.00"]
    },
    "evaluate_results": null,
    "network_requests": null,
    "potentiallyBlockedByCaptcha": false,
    "timings": {
        "queue_wait_ms": 45,
        "proxy_ms": 120,
        "browser_launch_ms": 2300,
        "navigation_ms": 8500,
        "extraction_ms": 150
    }
}

html: The HTML of the scraped page (null when format=markdown).
markdown: Clean text in markdown format (null when format=html).
statusCode: HTTP status code of the page.
message: Message indicating the result of the operation.
executionTime: Execution time in seconds.
screenshot: Base64-encoded screenshot (only if screenshot: true).
extracted_data: Extracted data (only if extract is used).
evaluate_results: Array with the results of each evaluate action executed, in order (only if evaluate actions were used).
network_requests: List of captured network requests (only if network_capture was used).
potentiallyBlockedByCaptcha: Boolean indicating whether the response appears to be a blocking or captcha page.
timings: Object with detailed timing metrics. Always present, even on errors (with partial values). Useful for performance diagnostics.
guidance: AI-friendly guidance object, present in every response:
- success: true only if real content was returned without blocks
- error_type: classified error (captcha, timeout, proxy_error, ssl_error, dns_error, empty_content, rate_limited, site_error)
- error_provider: specific CAPTCHA provider if detected (cloudflare, amazon, datadome, perimeterx, akamai, recaptcha, hcaptcha)
- next_steps: ordered list of what to try next (empty on success)
- suggested_request: ready-to-use request body for the next attempt (null on success or when retrying won't help)
- stop_reason: if present, do NOT retry — the issue is permanent (SSL error, DNS failure, 404, all bypass methods exhausted)
- credits_charged / credits_refunded: transparent credit tracking per request

Credit headers in each response:

X-Credits-Charged: 5            # credits charged for this request (1 simple, 5 browser)
X-Quota-Used: 1523              # credits used this month
X-Quota-Remaining: 48477        # credits remaining

Request Parameters

url (required)

The URL of the site to scrape.

"url": "https://example.com"

browser

Optional field, default value: false. If set to true, a browser will be used for scraping, which is slower but more reliable for dynamic sites or sites with JavaScript.

format

Optional field, default value: "html". Possible values: "html" or "markdown".

When "markdown" is used, the response returns clean text in the markdown field (instead of html). Scripts, styles, navigation, footers, and boilerplate are removed. Ideal for AI/LLM consumption and RAG pipelines.

"format": "markdown"

When format=markdown: the markdown field contains the text, html is null. When format=html (default): the html field contains the HTML, markdown is null.

retry_on_block

Optional field, default value: false. When enabled (true), the server automatically retries up to 3 times with a different IP/fingerprint when it detects a CAPTCHA or 403 response.

"retry_on_block": true

Only credits for the successful attempt are charged. If all attempts fail, 1 set of credits is charged and the result of the last attempt is returned (with potentiallyBlockedByCaptcha: true).

Early CAPTCHA detection: when the browser detects a CAPTCHA or block on the page, it returns immediately in ~5 seconds (instead of waiting 60-85s until timeout). This applies both with and without retry_on_block enabled. With retry_on_block=true, each retry also benefits from early detection, achieving up to 3 attempts in ~15 seconds total.

screenshot

Optional field, default value: false. When set to true, the scraper will take a screenshot of the page and return it as a base64 encoded string.

use_proxy

This field is optional and can have 2 formats:

string (Chooses a random proxy)
Object (Advanced proxy configuration with retry system)

string

If use_proxy is set to "any", the scraper will use a proxy intelligently chosen by the system. If the value is "<country_code>", a proxy from a specific country can be selected.

"use_proxy": "any"

"use_proxy": "MX"

Object with retries

With this format, a retry system can be configured in case a request fails due to a problem with the selected proxy.

proxy: Same format as the string ("any" or <country code>).
max_retries: Maximum number of retries.
delay_seconds: Initial delay before retrying the scrape.
backoff_factor: Multiplier applied after each retry.

Example:

"use_proxy": {
  "proxy": "US",
  "max_retries": 3,
  "delay_seconds": 1,
  "backoff_factor": 2
}

With the example above, the delay is applied as follows:

Attempt	Delay
1	1s
2	2s
3	4s

Country proxy

The use_proxy parameter accepts an ISO 3166-1 alpha-2 country code (e.g., "US", "MX", "GB") to use a proxy from a specific country. This is useful when a site returns different content based on the visitor's geographic location.

"use_proxy": "US"

Possible values for use_proxy:

Value	Behavior
`"any"`	Proxy automatically chosen by the system, no country restriction
`"US"`, `"MX"`, etc.	Proxy from a specific country (requires prior approval)
Field not sent	No proxy, direct request from the server

Approval flow

To use proxies from a specific country, your account needs prior approval. The flow is:

Check available countries: GET /v1/proxy/countries to see which countries are available.
Request access: POST /v1/proxy/request-country with the desired country_code. This creates a pending request.
Wait for approval: An administrator reviews and approves the request.
Check status: GET /v1/proxy/status to see your approved and pending countries.
Use the proxy: Once approved, you can use "use_proxy": "US" (or the approved country code) in your requests.

Error if not approved

If you try to use a proxy from a country without having approval, the response will include:

{
    "html": "",
    "statusCode": 403,
    "message": "Country proxy 'US' is not approved for your account.",
    "error_type": "country_proxy_not_approved",
    "country_code": "US"
}

The error_type field with value "country_proxy_not_approved" allows detecting this case programmatically.

Country Proxy Endpoints

GET `/v1/proxy/countries`

Returns the list of available countries for proxies.

{
    "countries": ["US", "GB", "MX", "BR", "AR", "DE", "FR", "ES"]
}

POST `/v1/proxy/request-country`

Requests access to proxies from a country. Creates a request that an administrator must approve.

Body:

{
    "country_code": "US",
    "reason": "We need to scrape prices on Amazon US"
}

Possible responses:

If the request was created (pending approval):

{
    "status": "pending",
    "country_code": "US",
    "message": "Request submitted. An admin will review and approve your access."
}

If the country is already approved for your account:

{
    "status": "already_approved",
    "country_code": "US",
    "message": "Country proxy 'US' is already approved for your account."
}

GET `/v1/proxy/status`

Shows the status of country proxy approvals for your account.

{
    "client_id": "my-client",
    "approved_countries": ["US", "GB"],
    "pending_countries": ["MX"]
}

headers

Optional field. Allows sending custom HTTP headers with the scraping request. It is a key-value dictionary.

"headers": {
  "Accept-Language": "en-US",
  "X-Custom-Header": "my-value",
  "ocp-apim-subscription-key": "abc123"
}

The headers are applied to the HTTP request made by the worker. Useful when a site requires specific headers to respond correctly (API keys, subscription tokens, etc.).

Note: This field only applies when browser is false (simple HTTP scraping). In browser mode, use the language field to control the language.

cookies

Optional field. Key-value dictionary with client-supplied cookies injected before scraping. Useful for authenticated sessions, paywalls behind client credentials, feature-flag testing, or any case where the client owns the session and our internal cookie management cannot help.

Applied both with and without browser:

With browser: true, cookies are pre-injected into the browser context via page.setCookie(...) before the first navigation.
With http_method (POST/PUT), cookies are forwarded on the outgoing request as a Cookie header — this is important when the POST target (via http_method.url) is cross-domain from the scrape URL.

"cookies": {
  "session_id": "abc123",
  "consent": "accepted"
}

Guardrails:

Limit	Value	Behavior when exceeded
Max entries	50	`422 Unprocessable Entity`
Max value size	8 KB per cookie value	`422 Unprocessable Entity`
Name	Non-empty string	`422 Unprocessable Entity` if any name is empty or non-string

Example of a combined flow (authenticated page behind a login):

{
  "url": "https://example.com/dashboard",
  "browser": true,
  "cookies": {
    "session_id": "abc123",
    "csrf_token": "xyz789"
  }
}

Relationship to server-side cookie management

Some anti-bot protections (DataDome, Akamai) require cookies generated by a real browser from the same session. For those cases the system manages cookies automatically based on per-domain profiles — you do not need to pass cookies yourself. The cookies field is for session data you already own (auth, paywalls, preferences).

language

Optional field. String that allows requesting a specific language from the browser when scraping. It is recommended to match the country of the proxy being used.

The format follows the language-region structure (e.g., en-us, es-ar).

In cases where the language is specified in the URL itself, it is recommended to modify the URL instead of using this field. E.g., https://en.wikipedia.org/wiki/Main_Page.

window_size

Optional field. Allows configuring the browser window size. Format: "width,height".

"window_size": "1920,1080"

network_capture

Optional field. Only available when browser is true. Allows capturing network requests made by the page during scraping. Useful for discovering internal APIs, XHR/fetch endpoints, or understanding what resources a site loads.

resource_types (optional): Array of resource types to capture. Valid values: document, stylesheet, image, media, font, script, xhr, fetch, eventsource, websocket, manifest, other. If null or omitted, captures everything.
listen_after_load_ms (optional): Extra milliseconds to keep listening for requests after the page finishes loading. Maximum 10000 ms. Useful for capturing requests that fire in the background.

"network_capture": {
    "resource_types": ["xhr", "fetch"],
    "listen_after_load_ms": 3000
}

The captured requests are returned in the network_requests field of the response. Each entry contains:

url: URL of the request.
method: HTTP method (GET, POST, etc.).
resource_type: Resource type (xhr, fetch, document, etc.).
status: HTTP status code of the response (can be null if the request did not complete).
content_type: Content-Type of the response (without parameters, e.g., application/json).

Example response with network_requests:

{
    "html": "...",
    "statusCode": 200,
    "network_requests": [
        {
            "url": "https://api.example.com/v2/products?page=1",
            "method": "GET",
            "resource_type": "fetch",
            "status": 200,
            "content_type": "application/json"
        }
    ]
}

Actions

actions is an optional field that allows interacting with the page before performing the scraping. It accepts an array of objects where each one must have a type field. Can only be used when browser is set to true.

click

Click on an element.

selector: XPath selector or CSS selector (with css: prefix)
wait_for_navigation (optional): If true, waits for navigation to complete after the click. Useful when the page changes URL.

{
    "type": "click",
    "selector": "css:button[type='submit']",
    "wait_for_navigation": true
}

input

Fill a text input field.

selector: XPath selector or CSS selector
text: The text to enter

{
    "type": "input",
    "selector": "//input[@name='search']",
    "text": "search query"
}

select

Select an option from a dropdown menu.

selector: XPath selector or CSS selector
value: The value to select

{
    "type": "select",
    "selector": "css:select#country",
    "value": "AR"
}

key-press

Press a key.

key: The key to press. Combinations are accepted. E.g., "Shift+O", "Enter", "Tab".

{
    "type": "key-press",
    "key": "Enter"
}

wait-for-selector

Wait for an element to appear on the page.

selector: XPath selector or CSS selector
time: Maximum wait time in milliseconds

{
    "type": "wait-for-selector",
    "selector": "css:.results-loaded",
    "time": 5000
}

wait-for-timeout

Wait a fixed amount of time before continuing to the next action.

time: Wait time in milliseconds

{
    "type": "wait-for-timeout",
    "time": 3000
}

collect

Extract data from the current page and accumulate it in memory. Especially useful inside while loops to collect data as you paginate or load more results.

extract: Dictionary of selectors. Same format as the extract parameter of the main request.

{
    "type": "collect",
    "extract": {
        "product_names": {
            "selector": "css:.product-title",
            "multiple": true
        },
        "prices": {
            "selector": "css:.price",
            "multiple": true
        }
    }
}

When using multiple: true, results accumulate between loop iterations (the list is extended). Without multiple, the value is overwritten on each iteration.

evaluate

Execute arbitrary JavaScript code in the browser page context. Results accumulate in the evaluate_results array of the response (one per each evaluate action).

script: JavaScript code to execute. Can be a simple expression or an async function.
timeout (optional): Maximum wait time in milliseconds for JS execution (default: 30000).

Simple expression

{
    "type": "evaluate",
    "script": "document.title"
}

Async function (internal fetch)

Useful for triggering AJAX forms or calling internal endpoints with the page's session cookies:

{
    "type": "evaluate",
    "script": "(async () => { const r = await fetch('/api/data'); return await r.json(); })()",
    "timeout": 15000
}

Full example — AJAX form with hidden fields

Typical sequence for a form that POSTs via JavaScript:

{
    "url": "https://example.com/booking",
    "browser": true,
    "actions": [
        {
            "type": "wait-for-selector",
            "selector": "css:#bookingForm",
            "time": 5000
        },
        {
            "type": "input",
            "selector": "css:#destination",
            "text": "Buenos Aires"
        },
        {
            "type": "evaluate",
            "script": "document.querySelector('#checkin').value = '2026-05-01'"
        },
        {
            "type": "evaluate",
            "script": "(async () => { const form = document.querySelector('#bookingForm'); const data = new URLSearchParams(new FormData(form)); const r = await fetch('/search', {method: 'POST', headers: {'Content-Type': 'application/x-www-form-urlencoded'}, body: data.toString()}); return r.status; })()"
        },
        {
            "type": "wait-for-timeout",
            "time": 2000
        }
    ]
}

The response will include:

{
    "evaluate_results": ["2026-05-01", 200],
    ...
}

Each evaluate execution adds an element to the evaluate_results array in the order they were executed.

note

If the script throws an error, the result will be an object {"error": "error message"} instead of the returned value.

Selectors

The selector is XPath by default, but can be changed to CSS by using the css: prefix before the selector.

"selector": "//div[@class='product']"
"selector": "css:div.product"

Loops

while

Control structure that repeats a sequence of actions while a condition is true, or until a maximum number of iterations is reached.

condition: Condition to continue iterating.
actions: List of actions to execute on each iteration (see Actions).
max_iterations: Maximum number of allowed iterations.

Accepted conditions:

selector-visible: Iterations continue while the selector is visible on the page.
selector-invisible: Iterations continue while the selector is NOT visible on the page.

{
    "type": "selector-visible",
    "selector": "css:.load-more"
}

Full example — click "Load more" until the button disappears:

{
    "url": "https://example.com/products",
    "browser": true,
    "actions": [
        {
            "type": "while",
            "condition": {
                "type": "selector-visible",
                "selector": "css:.load-more-button"
            },
            "actions": [
                {
                    "type": "click",
                    "selector": "css:.load-more-button"
                },
                {
                    "type": "wait-for-timeout",
                    "time": 2000
                }
            ],
            "max_iterations": 10
        }
    ]
}

Example with collect — paginate and accumulate data:

{
    "url": "https://example.com/products",
    "browser": true,
    "actions": [
        {
            "type": "while",
            "condition": {
                "type": "selector-visible",
                "selector": "css:button.next-page"
            },
            "actions": [
                {
                    "type": "collect",
                    "extract": {
                        "titles": {
                            "selector": "css:h3.product-name",
                            "multiple": true
                        }
                    }
                },
                {
                    "type": "click",
                    "selector": "css:button.next-page"
                },
                {
                    "type": "wait-for-timeout",
                    "time": 1500
                }
            ],
            "max_iterations": 20
        }
    ]
}

Extract

extract is an optional field of the main request that allows extracting specific data from the page using CSS selectors. Only works with browser: true.

Supports two formats:

Simple format (string)

"extract": {
    "title": "css:h1",
    "description": "css:meta[name='description']"
}

Advanced format (object)

"extract": {
    "all_prices": {
        "selector": "css:.price",
        "multiple": true
    },
    "main_image_src": {
        "selector": "css:img.hero",
        "attribute": "src"
    },
    "product_classes": {
        "selector": "css:h3.product-title",
        "multiple": true,
        "attribute": "class"
    }
}

selector: CSS selector (with css: prefix)
multiple (optional): If true, returns all elements matching the selector as an array. If false (default), returns only the first one.
attribute (optional): Extracts the value of an element's attribute instead of its text. E.g., "href", "src", "class".

The extracted data is returned in the extracted_data field of the response.

HttpMethod

Optional field, only available when browser is false (simple scraping).

GET: Performs a standard GET request. Same behavior as omitting this field. Does not accept a payload.
POST: Performs a POST request. The payload field is optional; an optional url field lets you POST to a different endpoint than the scrape target.

"http_method": {
  "method": "get"
}

"http_method": {
  "method": "post",
  "payload": {"category": "dogs", "page": 1}
}

POST to a different URL (`http_method.url`)

The optional url field inside http_method sends the POST to a target distinct from ScrapeRequest.url. Typical use cases:

GraphQL / API endpoints: the request "context" is a page URL, but the data lives at /graphql or /api/*.
Login → dashboard flows: POST credentials to /api/login, then scrape a different page.
Search-then-render: POST search parameters, render a result page at a different URL.

{
  "url": "https://example.com/products/category/1",
  "http_method": {
    "method": "post",
    "url": "https://example.com/graphql",
    "payload": {
      "operationName": "SearchProducts",
      "variables": { "category": 1, "limit": 50 }
    }
  }
}

When omitted, the POST goes to ScrapeRequest.url (legacy behavior, backward-compatible).

SSRF validation: the http_method.url is validated like the main URL — loopback, private ranges, and cloud metadata endpoints are rejected with 422.

Cross-domain is allowed: you may POST to a different domain than the scrape URL (e.g. www.example.com + api.example.com). Both URLs must pass SSRF validation.

Not yet supported with browser=true

Sending browser: true together with http_method.url is rejected with 400 in the current release. The combination requires a browser-driven flow that shares cookies between the navigation and the POST (for sites behind DataDome/Akamai). That flow will ship as part of a future release. For now, use browser: false when you need to POST to a different URL.

Response

The successful endpoint response has the following format:

{
  "html": "<html>...</html>",
  "statusCode": 200,
  "message": "OK",
  "screenshot": null,
  "executionTime": 1.23,
  "extracted_data": null,
  "potentiallyBlockedByCaptcha": false
}

potentiallyBlockedByCaptcha

Boolean field that indicates whether the received response appears to be a blocking or captcha page. Useful for easily detecting when a site is blocking scraper access without needing to manually analyze the HTML.

It is marked as true in the following cases:

The server responds with a 403, 429, or 503 status code.
The response HTML contains typical blocking signals, such as:
- Phrases like "Are you a human?", "I'm not a robot", "Verify you are human"
- Presence of captcha services: captcha, reCAPTCHA, hCAPTCHA
- Cloudflare pages: "Just a moment...", "Checking your browser"
- Unusual traffic or suspicious activity messages

Usage example:

response = requests.post("/v1/sync/scrape", json={"url": "https://example.com", "browser": False})
data = response.json()

if data["potentiallyBlockedByCaptcha"]:
    print("The site may be blocking access.")

note

This field is a heuristic, not a guaranteed detection. A false does not ensure the page is not blocked, and a true does not guarantee it is — it only indicates that common blocking signals were detected.

Validator fields (informational)

When the HTML validator flags a response, the async job listing endpoint (GET /runs/{id}/jobs) exposes additional diagnostic fields:

Field	Type	Description
`block_reason`	string \| null	Category of the block: `captcha`, `softblock`, `shell`, `rate_limit`, `hard_block`, `site_error`, etc. `null` when the response looks successful
`protection_stack`	array of strings	Anti-bot providers detected on the target (e.g. `["cloudflare", "datadome"]`)
`rule_hits`	array of strings	Debug-only: internal validator rules that fired

These fields are populated on async responses only. potentiallyBlockedByCaptcha remains available for backward compatibility on sync responses.

POST `/v1/sync/download`

This endpoint downloads a file directly from a URL (PDF, JPG image, PNG, etc.) and returns its content encoded in base64 along with the detected content type.

{
    "url": "https://example.com/document.pdf",
    "use_proxy": "any"
}

Response

{
    "content": "JVBERi0xLjQK...",
    "contentType": "application/pdf",
    "statusCode": 200,
    "message": "OK",
    "executionTime": 0.312
}

content: The file content encoded in base64.
contentType: MIME type of the file returned by the server (e.g., application/pdf, image/png, image/jpeg).
statusCode: HTTP status code of the file server response.
message: Result message or error description.
executionTime: Execution time in seconds.

Parameters

url (required)

The direct URL to the file to download.

"url": "https://example.com/report.pdf"

use_proxy

Optional field. Same format as in /v1/sync/scrape. Useful when the file server restricts access by source IP.

"use_proxy": "any"

GET `/v1/sync/metrics`

This endpoint retrieves global metrics about the API's operation. It can receive 2 optional parameters and returns metrics in JSON format:

{
  "date": "2026-02-11",
  "scrape_type": {
    "browser": {
      "total": 21,
      "success": 21,
      "failed": 0,
      "success_rate": 100,
      "percentage_of_total": 52.5
    },
    "simple": {
      "total": 19,
      "success": 18,
      "failed": 1,
      "success_rate": 94.74,
      "percentage_of_total": 47.5
    },
    "total_requests": 40
  }
}

Parameters

date: Date or date range. Accepted formats:
- YYYY-MM-DD — a specific day
- YYYY-MM-DD:YYYY-MM-DD — date range
metric: Allows requesting data about a specific type. Accepted values: [url, proxy, api_codes, page_codes, exe_time, scrape_type]

GET `/v1/sync/client-metrics`

Endpoint to retrieve per-client usage metrics. Each authenticated client sees only their own metrics. Administrators can see metrics for all clients.

Parameters

date: Date or range. Accepted formats:
- YYYY-MM-DD — a specific day (default: today)
- YYYY-MM-DD:YYYY-MM-DD — date range
- YYYY-MM — full month
client (admin only): Filter by a specific client_id
hourly: If true, includes hourly breakdown (single day only)
detail: If "urls", includes a per-domain breakdown with fields browser_success, browser_failed, simple_success, simple_failed

Example

curl 'https://api.scrapingpros.com/v1/sync/client-metrics?date=2026-03-25&hourly=true' \
  -H 'Authorization: Bearer <API-KEY>'

Example with per-domain breakdown

curl 'https://api.scrapingpros.com/v1/sync/client-metrics?date=2026-03&detail=urls' \
  -H 'Authorization: Bearer <API-KEY>'

GET `/v1/sync/billing`

Billing endpoint that returns a precise per-client usage summary, calculated from MySQL (not a Redis approximation). Ideal for generating monthly consumption reports.

Parameters

month: Month in YYYY-MM format (default: current month)
client (admin only): Filter by a specific client_id
detail: If "urls", includes a per-domain breakdown (by_url)

Example

curl 'https://api.scrapingpros.com/v1/sync/billing?month=2026-03' \
  -H 'Authorization: Bearer <API-KEY>'

Response

{
    "month": "2026-03",
    "clients": {
        "my-client": {
            "simple_success": 15000,
            "simple_failed": 200,
            "simple_total": 15200,
            "browser_success": 8000,
            "browser_failed": 150,
            "browser_total": 8150,
            "total_requests": 23350,
            "total_success": 23000,
            "total_failed": 350
        }
    }
}

Response with `detail=urls`

When detail=urls is passed, each client includes a by_url field with the per-domain breakdown:

{
    "month": "2026-03",
    "clients": {
        "my-client": {
            "simple_success": 15000,
            "simple_failed": 200,
            "simple_total": 15200,
            "browser_success": 8000,
            "browser_failed": 150,
            "browser_total": 8150,
            "total_requests": 23350,
            "total_success": 23000,
            "total_failed": 350,
            "by_url": {
                "example.com": {
                    "simple_success": 10000,
                    "simple_failed": 100,
                    "browser_success": 5000,
                    "browser_failed": 50,
                    "total": 15150
                },
                "other-site.com": {
                    "simple_success": 5000,
                    "simple_failed": 100,
                    "browser_success": 3000,
                    "browser_failed": 100,
                    "total": 8200
                }
            }
        }
    }
}

simple_success / simple_failed: Successful/failed requests without browser.
browser_success / browser_failed: Successful/failed requests with browser.
simple_total / browser_total: Totals by type.
total_requests: Total of all requests.
total_success / total_failed: Success/failure totals.
by_url: Per-domain breakdown (only if detail=urls).

GET `/v1/health`

Health check endpoint for monitoring and observability. Does not require authentication.

Checks the status of all critical API components: Redis, MySQL, proxies (internal and external), workers, and queues.

Example

curl 'https://api.scrapingpros.com/v1/health'

Response

{
    "status": "healthy",
    "checks": {
        "redis": {
            "status": "ok",
            "latency_ms": 1.2
        },
        "mysql": {
            "status": "ok",
            "latency_ms": 3.5
        },
        "proxies_api": {
            "internal": {
                "status": "ok",
                "latency_ms": 15.0
            },
            "external": {
                "status": "ok",
                "latency_ms": 120.0
            }
        },
        "workers": {
            "sync": {"up": 50, "expected": 50},
            "async": {"up": 6, "expected": 6}
        },
        "queues": {
            "pending_jobs": 0,
            "async_scheduler": 0
        }
    },
    "uptime_seconds": 86400
}

Possible statuses

Status	Meaning
`healthy`	All components functioning correctly
`degraded`	Some non-critical component has issues (external proxies, workers below 90%, queue with >500 jobs)
`unhealthy`	A critical component is down (Redis, MySQL, or internal proxies)

Bearer Token:

GET/v1/sync/billingView billing summary for the current month

Bearer Token:

Billing with per-domain breakdown

GET/v1/sync/billing?detail=urlsView billing with per-domain breakdown

Bearer Token:

List available proxy countries

GET/v1/proxy/countriesSee which countries have available proxies

Bearer Token:

Request access to a country proxy

POST/v1/proxy/request-countryRequest approval to use proxies from a country

Bearer Token:

Request Body:

View proxy approval status

GET/v1/proxy/statusView approved and pending countries for your account

Bearer Token:

Health check

GET/v1/healthCheck the status of all API components (no authentication required)

Bearer Token:

Sync Endpoints​

POST /v1/sync/scrape​

Response​

Request Parameters​

url (required)​

browser​

format​

retry_on_block​

screenshot​

use_proxy​

string​

Object with retries​

Country proxy​

Approval flow​

Error if not approved​

Country Proxy Endpoints​

GET /v1/proxy/countries​

POST /v1/proxy/request-country​

GET /v1/proxy/status​

headers​

cookies​

language​

window_size​

network_capture​

Actions​

click​

input​

select​

key-press​

wait-for-selector​

wait-for-timeout​

collect​

evaluate​

Simple expression​

Async function (internal fetch)​

Full example — AJAX form with hidden fields​

Selectors​

Loops​

while​

Extract​

Simple format (string)​

Advanced format (object)​

HttpMethod​

POST to a different URL (http_method.url)​

Response​

potentiallyBlockedByCaptcha​

Validator fields (informational)​

POST /v1/sync/download​

Response​

Parameters​

url (required)​

use_proxy​

GET /v1/sync/metrics​

Parameters​

GET /v1/sync/client-metrics​

Parameters​

Example​

Example with per-domain breakdown​

GET /v1/sync/billing​

Parameters​

Example​

Response​

Response with detail=urls​

GET /v1/health​

Example​

Response​

Possible statuses​

Interactive Examples​

Simple scraping (without browser)​

Markdown output (for AI/LLM)​

With browser​

Retry on block (anti-CAPTCHA)​

With browser + proxy + screenshot​

With browser actions​

Data extraction​

POST request with payload​

Global metrics​

Client metrics​

Download a PDF​

Download an image​

Sync Endpoints

POST `/v1/sync/scrape`

Response

Request Parameters

url (required)

browser

format

retry_on_block

screenshot

use_proxy

string

Object with retries

Country proxy

Approval flow

Error if not approved

Country Proxy Endpoints

GET `/v1/proxy/countries`

POST `/v1/proxy/request-country`

GET `/v1/proxy/status`

headers

cookies

language

window_size

network_capture

Actions

click

input

select

key-press

wait-for-selector

wait-for-timeout

collect

evaluate

Simple expression

Async function (internal fetch)

Full example — AJAX form with hidden fields

Selectors

Loops

while

Extract

Simple format (string)

Advanced format (object)

HttpMethod

POST to a different URL (`http_method.url`)

Response

potentiallyBlockedByCaptcha

Validator fields (informational)

POST `/v1/sync/download`

Response

Parameters

url (required)

use_proxy

GET `/v1/sync/metrics`

Parameters

GET `/v1/sync/client-metrics`

Parameters

Example

Example with per-domain breakdown

GET `/v1/sync/billing`

Parameters

Example

Response

Response with `detail=urls`

GET `/v1/health`

Example

Response

Possible statuses

Interactive Examples

Simple scraping (without browser)

Markdown output (for AI/LLM)

With browser

Retry on block (anti-CAPTCHA)

With browser + proxy + screenshot

With browser actions

Data extraction

POST request with payload

Global metrics

Client metrics

Download a PDF

Download an image