POST `/v1/sync/scrape`

Scrapes a URL and returns the result synchronously.

Request

curl -X POST \
  'https://api.scrapingpros.com/v1/sync/scrape' \
  -H 'Authorization: Bearer <API-KEY>' \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://example.com",
    "browser": true
  }'

Body Parameters

Parameter	Type	Required	Description
`url`	string	Yes	URL to scrape
`browser`	boolean	No	Use headless browser (default: `false`)
`browser_type`	string	No	Browser type when `browser: true`: `"heavy"` (camoufox, anti-detection, default) or `"light"` (pure Playwright, lighter)
`use_proxy`	string \| object	No	Proxy configuration (`"any"`, `"US"`, or object with retries)
`screenshot`	boolean	No	Capture screenshot (default: `false`)
`language`	string	No	Browser language (e.g.: `"en-us"`)
`headers`	object	No	Custom HTTP headers (without browser only)
`cookies`	object	No	Cookies to add before scraping
`actions`	array	No	Actions to execute in the browser (requires `browser: true`)
`extract`	object	No	Selectors for data extraction (requires `browser: true`)
`http_method`	object	No	HTTP method: GET or POST (without browser only)
`window_size`	string	No	Browser window size (e.g.: `"1920,1080"`)
`network_capture`	object	No	Capture network requests during scraping (requires `browser: true`)
`count_for_tracking`	boolean	No	Register this request in the project usage tracking (default: `false`)

`network_capture` Object

Field	Type	Description
`resource_types`	array \| null	Filter by resource type. Valid values: `document`, `stylesheet`, `image`, `media`, `font`, `script`, `xhr`, `fetch`, `eventsource`, `websocket`, `manifest`, `other`. `null` = capture all.
`listen_after_load_ms`	integer \| null	Extra milliseconds to listen after the page loads (maximum 10,000 ms). Useful for capturing background requests.

Response

{
    "html": "<html>...</html>",
    "statusCode": 200,
    "message": "Request completed successfully",
    "executionTime": 3.45,
    "screenshot": null,
    "extracted_data": null,
    "evaluate_results": null,
    "network_requests": null,
    "potentiallyBlockedByCaptcha": false
}

Field	Type	Description
`html`	string	Page HTML
`statusCode`	integer	HTTP code of the scraped page
`message`	string	Result message
`executionTime`	float	Execution time in seconds
`screenshot`	string \| null	Base64 screenshot (if requested)
`extracted_data`	object \| null	Extracted data (if `extract` was used)
`evaluate_results`	array \| null	Results of executed `evaluate` actions, in order (one per `evaluate`)
`network_requests`	array \| null	List of captured network requests (if `network_capture` was used)
`potentiallyBlockedByCaptcha`	boolean \| null	`true` if the response appears to be a block or captcha page

Structure of each entry in `network_requests`

Field	Type	Description
`url`	string	Request URL
`method`	string	HTTP method (`GET`, `POST`, etc.)
`resource_type`	string	Resource type (`xhr`, `fetch`, `document`, etc.)
`status`	integer \| null	HTTP response code
`content_type`	string \| null	Response Content-Type (without parameters, e.g.: `application/json`)

Examples

Simple scraping (without browser)

{
    "url": "https://example.com"
}

With browser and proxy

{
    "url": "https://example.com",
    "browser": true,
    "use_proxy": "any"
}

With proxy and retries

{
    "url": "https://example.com",
    "use_proxy": {
        "proxy": "US",
        "max_retries": 3,
        "delay_seconds": 1,
        "backoff_factor": 2
    }
}

With custom headers

{
    "url": "https://api.example.com/data",
    "headers": {
        "ocp-apim-subscription-key": "abc123",
        "Accept": "application/json"
    }
}

{
    "url": "https://quotes.toscrape.com",
    "browser": true,
    "actions": [
        {
            "type": "click",
            "selector": "css:a[href='/login']"
        },
        {
            "type": "wait-for-selector",
            "selector": "css:#username",
            "time": 5000
        }
    ]
}

With data extraction

{
    "url": "https://example.com/products",
    "browser": true,
    "extract": {
        "title": "css:h1",
        "prices": {
            "selector": "css:.price",
            "multiple": true
        },
        "main_image": {
            "selector": "css:img.product-image",
            "attribute": "src"
        }
    }
}

Loop: paginate and collect data

{
    "url": "https://example.com/products",
    "browser": true,
    "actions": [
        {
            "type": "while",
            "condition": {
                "type": "selector-visible",
                "selector": "css:button.next-page"
            },
            "actions": [
                {
                    "type": "collect",
                    "extract": {
                        "product_names": {
                            "selector": "css:.product-name",
                            "multiple": true
                        }
                    }
                },
                {
                    "type": "click",
                    "selector": "css:button.next-page"
                },
                {
                    "type": "wait-for-timeout",
                    "time": 2000
                }
            ],
            "max_iterations": 10
        }
    ]
}

Execute JavaScript on the page (evaluate)

The evaluate action allows executing arbitrary JavaScript code in the page context. Results accumulate in the evaluate_results array of the response.

{
    "url": "https://example.com",
    "browser": true,
    "actions": [
        {
            "type": "evaluate",
            "script": "document.title"
        },
        {
            "type": "evaluate",
            "script": "Array.from(document.querySelectorAll('a')).map(a => a.href)",
            "timeout": 15000
        }
    ]
}

Response:

{
    "html": "...",
    "evaluate_results": [
        "Example Domain",
        ["https://www.iana.org/domains/reserved"]
    ]
}

Register a request for project tracking

By default, requests are not counted in the tracking. This allows exploring and testing without contaminating the estimate data. Once the scraping is validated, enable the flag so it counts:

{
    "url": "https://example.com/products",
    "browser": true,
    "count_for_tracking": true
}

The accumulated data can be queried from /v1/sync/metrics?metric=tracking and shows, per client, the total counted requests and the breakdown by type (simple vs browser).

Lightweight browser

The browser_type allows choosing between two engines when browser: true:

`browser_type`	Engine	Anti-detection	When to use
`"heavy"` (default)	camoufox + Playwright	Yes -- fingerprint spoofing, humanized mouse, geoip	Sites with CAPTCHA, Cloudflare, or active bot detection
`"light"`	Pure Playwright (headless Chromium)	No	Sites with dynamic JS but no aggressive anti-bot

When to use `"light"`

The site requires JS to render content, but has no bot detection
You want to minimize response time and resource usage
Scraping SPAs, internal dashboards, or APIs that return JSON via XHR

When to use `"heavy"` (or not specify anything)

The site returns CAPTCHA or blocks with a simple browser
Sites with Cloudflare Bot Management, Akamai, or PerimeterX
Any case where human behavior is necessary to pass detection

{
    "url": "https://example.com",
    "browser": true,
    "browser_type": "light"
}

Practical comparison

// Option 1: Without browser (fastest, no JS)
{
    "url": "https://example.com/static-page"
}

// Option 2: Lightweight browser (JS + no anti-detection)
{
    "url": "https://example.com/spa-app",
    "browser": true,
    "browser_type": "light"
}

// Option 3: Heavy browser (JS + full anti-detection, default)
{
    "url": "https://example.com/protected-page",
    "browser": true
}

// Equivalent to the above -- browser_type "heavy" is the default
{
    "url": "https://example.com/protected-page",
    "browser": true,
    "browser_type": "heavy"
}

Lightweight browser with proxy and actions

The browser_type: "light" supports all the same features as "heavy": proxy, cookies, headers, actions, extract, screenshot, and network_capture.

{
    "url": "https://app.example.com/dashboard",
    "browser": true,
    "browser_type": "light",
    "use_proxy": "any",
    "actions": [
        {
            "type": "wait-for-selector",
            "selector": "css:#data-table",
            "time": 5000
        }
    ],
    "extract": {
        "rows": {
            "selector": "css:#data-table tr",
            "multiple": true
        }
    }
}

Lightweight browser with network capture

{
    "url": "https://app.example.com/products",
    "browser": true,
    "browser_type": "light",
    "network_capture": {
        "resource_types": ["xhr", "fetch"]
    }
}

POST request with payload

{
    "url": "https://api.example.com/search",
    "http_method": {
        "method": "post",
        "payload": {
            "query": "shoes",
            "category": "fashion",
            "page": 1
        }
    }
}

POST `/v1/sync/download`

Downloads a file from a URL (PDF, image, etc.) and returns its content encoded in base64 along with the detected content type.

Request

curl -X POST \
  'https://api.scrapingpros.com/v1/sync/download' \
  -H 'Authorization: Bearer <API-KEY>' \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://example.com/document.pdf"
  }'

Body Parameters

Parameter	Type	Required	Description
`url`	string	Yes	URL of the file to download
`use_proxy`	string \| object	No	Proxy configuration (`"any"`, `"US"`, or object with retries)

Response

{
    "content": "JVBERi0xLjQK...",
    "contentType": "application/pdf",
    "statusCode": 200,
    "message": "OK",
    "executionTime": 0.312
}

Field	Type	Description
`content`	string \| null	File content encoded in base64
`contentType`	string \| null	MIME type of the file (e.g.: `application/pdf`, `image/png`)
`statusCode`	integer	HTTP response code from the server
`message`	string	Result or error message
`executionTime`	float \| null	Execution time in seconds

Examples

Download a PDF

{
    "url": "https://example.com/document.pdf"
}

Download an image with proxy

{
    "url": "https://example.com/image.jpg",
    "use_proxy": "any"
}

With country-specific proxy and retries

{
    "url": "https://example.com/report.pdf",
    "use_proxy": {
        "proxy": "US",
        "max_retries": 3,
        "delay_seconds": 1,
        "backoff_factor": 2
    }
}

Decoding the content

The content field is in base64. To get the original file:

import base64, requests

response = requests.post(
    "https://api.scrapingpros.com/v1/sync/download",
    headers={"Authorization": "Bearer <API-KEY>"},
    json={"url": "https://example.com/document.pdf"}
).json()

file_bytes = base64.b64decode(response["content"])

with open("document.pdf", "wb") as f:
    f.write(file_bytes)

# With bash
echo "<CONTENT>" | base64 --decode > document.pdf

GET `/v1/sync/metrics`

Gets API usage metrics.

Parameters (query string)

Parameter	Type	Description
`date`	string	Date range: `YYYY-MM-DD:YYYY-MM-DD`
`metric`	string	Metric type: `url`, `proxy`, `api_codes`, `page_codes`, `exe_time`, `scrape_type`, `tracking`

Example

curl 'https://api.scrapingpros.com/v1/sync/metrics?date=2026-03-01:2026-03-25&metric=scrape_type' \
  -H 'Authorization: Bearer <API-KEY>'

Response

{
  "date": "2026-03-01:2026-03-25",
  "scrape_type": {
    "browser": {
      "total": 1500,
      "success": 1420,
      "failed": 80,
      "success_rate": 94.67,
      "percentage_of_total": 60.0
    },
    "simple": {
      "total": 1000,
      "success": 980,
      "failed": 20,
      "success_rate": 98.0,
      "percentage_of_total": 40.0
    },
    "total_requests": 2500
  }
}

Project usage tracking

curl 'https://api.scrapingpros.com/v1/sync/metrics?metric=tracking' \
  -H 'Authorization: Bearer <API-KEY>'

{
  "date": "2026-03-30",
  "tracking": {
    "cliente-lupa": {
      "total": 1200,
      "browser": 800,
      "simple": 400
    },
    "cliente-ocula": {
      "total": 300,
      "browser": 0,
      "simple": 300
    }
  }
}

The counters are cumulative (not reset by date) and only increment when the request includes "count_for_tracking": true.

Network request capture (discover internal APIs)

{
    "url": "https://example.com",
    "browser": true,
    "network_capture": {
        "resource_types": ["xhr", "fetch"]
    }
}

Response with network_requests:

{
    "html": "...",
    "statusCode": 200,
    "network_requests": [
        {
            "url": "https://api.example.com/v2/products?page=1",
            "method": "GET",
            "resource_type": "fetch",
            "status": 200,
            "content_type": "application/json"
        },
        {
            "url": "https://api.example.com/v2/user/me",
            "method": "GET",
            "resource_type": "xhr",
            "status": 401,
            "content_type": "application/json"
        }
    ]
}

Listen for post-load requests (background polling)

{
    "url": "https://example.com/dashboard",
    "browser": true,
    "network_capture": {
        "resource_types": ["xhr", "fetch"],
        "listen_after_load_ms": 3000
    }
}

GET `/v1/sync/client-metrics`

Gets per-client usage metrics for the authenticated client. Administrators can query metrics for any client.

Parameters (query string)

Parameter	Type	Description
`date`	string	`YYYY-MM-DD` (default: today), `YYYY-MM-DD:YYYY-MM-DD` (range), or `YYYY-MM` (full month)
`client`	string	(Admin only) Filter by specific client_id
`hourly`	boolean	If `true`, includes hourly breakdown (single day only)
`detail`	string	If `"urls"`, includes per-domain breakdown with `browser_success`, `browser_failed`, `simple_success`, `simple_failed`

Example

curl 'https://api.scrapingpros.com/v1/sync/client-metrics?date=2026-03-25&hourly=true' \
  -H 'Authorization: Bearer <API-KEY>'

Example with per-domain breakdown

curl 'https://api.scrapingpros.com/v1/sync/client-metrics?date=2026-03&detail=urls' \
  -H 'Authorization: Bearer <API-KEY>'

GET `/v1/sync/billing`

Gets a monthly billing summary per client, calculated from MySQL (precise data, not a Redis approximation).

Parameters (query string)

Parameter	Type	Description
`month`	string	Month in `YYYY-MM` format (default: current month)
`client`	string	(Admin only) Filter by specific client_id
`detail`	string	If `"urls"`, includes per-domain breakdown (`by_url`)

Request

curl 'https://api.scrapingpros.com/v1/sync/billing?month=2026-03' \
  -H 'Authorization: Bearer <API-KEY>'

Response

{
    "month": "2026-03",
    "clients": {
        "my-client": {
            "simple_success": 15000,
            "simple_failed": 200,
            "simple_total": 15200,
            "browser_success": 8000,
            "browser_failed": 150,
            "browser_total": 8150,
            "total_requests": 23350,
            "total_success": 23000,
            "total_failed": 350
        }
    }
}

Response with `detail=urls`

{
    "month": "2026-03",
    "clients": {
        "my-client": {
            "simple_success": 15000,
            "simple_failed": 200,
            "simple_total": 15200,
            "browser_success": 8000,
            "browser_failed": 150,
            "browser_total": 8150,
            "total_requests": 23350,
            "total_success": 23000,
            "total_failed": 350,
            "by_url": {
                "example.com": {
                    "simple_success": 10000,
                    "simple_failed": 100,
                    "browser_success": 5000,
                    "browser_failed": 50,
                    "total": 15150
                }
            }
        }
    }
}

Field	Type	Description
`simple_success`	integer	Successful requests without browser
`simple_failed`	integer	Failed requests without browser
`simple_total`	integer	Total requests without browser
`browser_success`	integer	Successful requests with browser
`browser_failed`	integer	Failed requests with browser
`browser_total`	integer	Total requests with browser
`total_requests`	integer	Total of all requests
`total_success`	integer	Total successful requests
`total_failed`	integer	Total failed requests
`by_url`	object \| null	Per-domain breakdown (only with `detail=urls`)

GET `/v1/proxy/countries`

Lists available countries for geographic proxies.

Request

curl 'https://api.scrapingpros.com/v1/proxy/countries' \
  -H 'Authorization: Bearer <API-KEY>'

Response

{
    "countries": ["US", "GB", "MX", "BR", "AR", "DE", "FR", "ES"]
}

Field	Type	Description
`countries`	array	List of available ISO 3166-1 alpha-2 country codes
`error`	string \| null	Error message if the proxy service could not be queried

POST `/v1/proxy/request-country`

Requests access to proxies from a specific country. The request remains pending until an administrator approves it.

Request

curl -X POST \
  'https://api.scrapingpros.com/v1/proxy/request-country' \
  -H 'Authorization: Bearer <API-KEY>' \
  -H 'Content-Type: application/json' \
  -d '{
    "country_code": "US",
    "reason": "We need to scrape prices on Amazon US"
  }'

Body Parameters

Parameter	Type	Required	Description
`country_code`	string	Yes	ISO 3166-1 alpha-2 country code (e.g.: `"US"`, `"MX"`)
`reason`	string	No	Reason for the request

Response

{
    "status": "pending",
    "country_code": "US",
    "message": "Request submitted. An admin will review and approve your access."
}

Field	Type	Description
`status`	string	Request status: `"pending"`, `"already_approved"`, or `"error"`
`country_code`	string	Requested country code
`message`	string	Descriptive result message

GET `/v1/proxy/status`

Shows the proxy approval status by country for the authenticated client.

Request

curl 'https://api.scrapingpros.com/v1/proxy/status' \
  -H 'Authorization: Bearer <API-KEY>'

Response

{
    "client_id": "my-client",
    "approved_countries": ["US", "GB"],
    "pending_countries": ["MX"]
}

Field	Type	Description
`client_id`	string	Authenticated client identifier
`approved_countries`	array	Countries already approved for use as proxy
`pending_countries`	array	Countries with pending approval request

GET `/v1/health`

Health check of all API components. No authentication required.

Request

curl 'https://api.scrapingpros.com/v1/health'

Response

{
    "status": "healthy",
    "checks": {
        "redis": {"status": "ok", "latency_ms": 1.2},
        "mysql": {"status": "ok", "latency_ms": 3.5},
        "proxies_api": {
            "internal": {"status": "ok", "latency_ms": 15.0},
            "external": {"status": "ok", "latency_ms": 120.0}
        },
        "workers": {
            "sync": {"up": 50, "expected": 50},
            "async": {"up": 6, "expected": 6}
        },
        "queues": {
            "pending_jobs": 0,
            "async_scheduler": 0
        }
    },
    "uptime_seconds": 86400
}

Field	Type	Description
`status`	string	Overall status: `healthy`, `degraded`, or `unhealthy`
`checks.redis`	object	Redis status and latency
`checks.mysql`	object	MySQL status and latency
`checks.proxies_api`	object	Internal and external proxy status
`checks.workers`	object	Active vs expected workers (sync and async)
`checks.queues`	object	Job queue depth
`uptime_seconds`	integer	Seconds since the API started

Request​

Body Parameters​

network_capture Object​

Response​

Structure of each entry in network_requests​

Examples​

Simple scraping (without browser)​

With browser and proxy​

With proxy and retries​

With custom headers​

With actions (navigate to login)​

With data extraction​

Loop: paginate and collect data​

Execute JavaScript on the page (evaluate)​

Register a request for project tracking​

Lightweight browser​

When to use "light"​

When to use "heavy" (or not specify anything)​

Practical comparison​

Lightweight browser with proxy and actions​

Lightweight browser with network capture​

POST request with payload​

POST /v1/sync/download

Request​

Body Parameters​

Response​

Examples​

Download a PDF​

Download an image with proxy​

With country-specific proxy and retries​

Decoding the content​

GET /v1/sync/metrics

Parameters (query string)​

Example​

Response​

Project usage tracking​

Network request capture (discover internal APIs)​

Listen for post-load requests (background polling)​

GET /v1/sync/client-metrics

Parameters (query string)​

Example​

Example with per-domain breakdown​

GET /v1/sync/billing

Parameters (query string)​

Request​

Response​

Response with detail=urls​

GET /v1/proxy/countries

Request​

Response​

POST /v1/proxy/request-country

Request​

Body Parameters​

Response​

GET /v1/proxy/status

Request​

Response​

GET /v1/health

Request​

Response​

Request

Body Parameters

`network_capture` Object

Response

Structure of each entry in `network_requests`

Examples

Simple scraping (without browser)

With browser and proxy

With proxy and retries

With custom headers

With actions (navigate to login)

With data extraction

Loop: paginate and collect data

Execute JavaScript on the page (evaluate)

Register a request for project tracking

Lightweight browser

When to use `"light"`

When to use `"heavy"` (or not specify anything)

Practical comparison

Lightweight browser with proxy and actions

Lightweight browser with network capture

POST request with payload

POST `/v1/sync/download`

Request

Body Parameters

Response

Examples

Download a PDF

Download an image with proxy

With country-specific proxy and retries

Decoding the content

GET `/v1/sync/metrics`

Parameters (query string)

Example

Response

Project usage tracking

Network request capture (discover internal APIs)

Listen for post-load requests (background polling)

GET `/v1/sync/client-metrics`

Parameters (query string)

Example

Example with per-domain breakdown

GET `/v1/sync/billing`

Parameters (query string)

Request

Response

Response with `detail=urls`

GET `/v1/proxy/countries`

Request

Response

POST `/v1/proxy/request-country`

Request

Body Parameters

Response

GET `/v1/proxy/status`

Request

Response

GET `/v1/health`

Request

Response