Viability Test Endpoints

The viability test endpoints allow you to analyze one or more URLs before scraping them, to determine which scraping strategy is most appropriate. The analysis runs in the background and is queried asynchronously.

POST `/v1/async/viability-test`

Sends a list of URLs for analysis. Returns a run_id immediately.

Request

curl -X POST \
  'https://api.scrapingpros.com/v1/async/viability-test' \
  -H 'Authorization: Bearer <API-KEY>' \
  -H 'Content-Type: application/json' \
  -d '{
    "urls": [
      "https://www.example.com",
      "https://www.booking.com"
    ]
  }'

Body

Field	Type	Required	Description
`urls`	array of strings	Yes	List of URLs to analyze. Minimum 1.
`actions`	array	No	Browser actions to execute before the analysis (click, input, etc.). Same format as in the scraping endpoint.

Response (202)

{
  "run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "in_progress",
  "total_urls": 2,
  "completed_urls": 0
}

GET `/v1/async/viability-test/{run_id}`

Queries the status and results of an analysis. Call periodically until status is completed.

Request

curl 'https://api.scrapingpros.com/v1/async/viability-test/a1b2c3d4-e5f6-7890-abcd-ef1234567890' \
  -H 'Authorization: Bearer <API-KEY>'

Response -- in progress (200)

{
  "run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "in_progress",
  "total_urls": 2,
  "completed_urls": 1,
  "results": null
}

Response -- completed (200)

{
  "run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "completed",
  "total_urls": 2,
  "completed_urls": 2,
  "results": [
    {
      "url": "https://www.example.com",
      "recommended_strategy": "extract_html",
      "captcha_detected": false,
      "captcha_providers": [],
      "javascript_detected": false,
      "javascript_required": false,
      "browser_confidence": 0.05,
      "content_type": "html",
      "rate_limited": false,
      "soft_block": false,
      "retry_after": null,
      "login_wall": false,
      "status_code": 200,
      "redirect_chain": [],
      "protection_signals": {},
      "cloudflare_level": "none",
      "api_detected": false,
      "api_endpoints": [],
      "graphql_detected": false,
      "api_auth_required": false,
      "can_use_simple_request": false,
      "can_use_extract_html": true,
      "can_use_browser": true,
      "proxy_recommended": false,
      "proxy_type": "none"
    },
    {
      "url": "https://www.booking.com",
      "recommended_strategy": "browser",
      "captcha_detected": false,
      "captcha_providers": [],
      "javascript_detected": true,
      "javascript_required": true,
      "browser_confidence": 0.92,
      "content_type": "html",
      "rate_limited": false,
      "soft_block": false,
      "retry_after": null,
      "login_wall": false,
      "status_code": 200,
      "redirect_chain": [],
      "protection_signals": {},
      "cloudflare_level": "none",
      "api_detected": false,
      "api_endpoints": [],
      "graphql_detected": false,
      "api_auth_required": false,
      "can_use_simple_request": false,
      "can_use_extract_html": false,
      "can_use_browser": true,
      "proxy_recommended": false,
      "proxy_type": "none"
    }
  ]
}

Per-URL Result Fields

Recommended Strategy

Field	Type	Description
`recommended_strategy`	string	Suggested strategy: `extract_html`, `browser`, `api`, or `blocked`
`can_use_extract_html`	boolean	Full HTML is accessible without JS
`can_use_browser`	boolean	The browser can scrape without being blocked
`can_use_simple_request`	boolean	There are JSON/XHR endpoints accessible without authentication
`proxy_recommended`	boolean	`true` if using a proxy is recommended to access the site
`proxy_type`	string	Recommended proxy type: `none`, `datacenter`, or `residential`

Protection and Blocks

Field	Type	Description
`captcha_detected`	boolean	A captcha was detected on the page
`captcha_providers`	array	Detected providers (e.g. `["cloudflare", "recaptcha"]`)
`cloudflare_level`	string	Cloudflare level: `none`, `cdn`, or `challenge`
`rate_limited`	boolean	The URL applies active rate limiting
`soft_block`	boolean	A soft block was detected (without explicit captcha)
`retry_after`	integer \| null	Suggested wait time in seconds if rate limited
`login_wall`	boolean	The page requires login to access content
`protection_signals`	object	Protection signals detected in headers/body

JavaScript and Browser

Field	Type	Description
`javascript_detected`	boolean	The page uses JavaScript
`javascript_required`	boolean	The content depends on JS to render
`browser_confidence`	float	Heuristic browser necessity score (0.0--1.0)

API and XHR

Field	Type	Description
`api_detected`	boolean	JSON/XHR endpoints were detected during analysis
`api_endpoints`	array	URLs of detected endpoints
`graphql_detected`	boolean	A GraphQL endpoint was detected
`api_auth_required`	boolean	The endpoints require authentication

General Information

Field	Type	Description
`url`	string	Analyzed URL
`status_code`	integer	HTTP status code of the response
`content_type`	string	Detected content type: `html`, `json`, `xml`, or `unknown`
`redirect_chain`	array	List of intermediate URLs if redirects occurred

Strategies and How to Scrape Based on Them

`extract_html`

The site serves static HTML. The content is fully available in the initial HTTP response, without needing to execute JavaScript. No blocks or captchas were detected.

How to scrape: perform scraping without browser. This is the fastest and most resource-efficient option.

`browser`

The site depends on JavaScript to render its main content. A simple HTTP request would return empty or incomplete HTML. The site does not present active blocks.

How to scrape: perform scraping with browser enabled. The browser will execute the JS and wait for the content to become available.

`api`

During the analysis, JSON or XHR endpoints were detected that return the data directly, without requiring authentication. These endpoints are listed in api_endpoints.

How to scrape: perform scraping without browser, pointing directly to the detected endpoints in api_endpoints. This is the most efficient strategy when available, as it avoids parsing HTML entirely.

`blocked`

The site presents one or more active barriers that prevent scraping under normal conditions: captcha, Cloudflare challenge, or login wall.

How to scrape: scraping is not possible with the current conditions. This may require the use of residential proxies, captcha solving, or manual analysis of the authentication flow.

tip

When you get a blocked result, check the captcha_providers, cloudflare_level, and login_wall fields to understand which specific barrier was detected.

Proxy Recommendation (`proxy_recommended` / `proxy_type`)

Regardless of the recommended strategy, the analysis indicates whether a proxy is needed and what type:

`proxy_type`	When recommended
`none`	Clean site, no blocks or rate limiting
`datacenter`	Rate limiting or soft block detected without active fingerprinting
`residential`	Cloudflare challenge, DataDome, PerimeterX, or Akamai detected

tip

A site can have recommended_strategy: "blocked" and proxy_type: "residential" -- that means scraping could be viable with a residential proxy.

POST /v1/async/viability-test​

Request​

Body​

Response (202)​

GET /v1/async/viability-test/{run_id}​

Request​

Response -- in progress (200)​

Response -- completed (200)​

Per-URL Result Fields​

Recommended Strategy​

Protection and Blocks​

JavaScript and Browser​

API and XHR​

General Information​

Strategies and How to Scrape Based on Them​

extract_html​

browser​

api​

blocked​

Proxy Recommendation (proxy_recommended / proxy_type)​

POST `/v1/async/viability-test`

Request

Body

Response (202)

GET `/v1/async/viability-test/{run_id}`

Request

Response -- in progress (200)

Response -- completed (200)

Per-URL Result Fields

Recommended Strategy

Protection and Blocks

JavaScript and Browser

API and XHR

General Information

Strategies and How to Scrape Based on Them

`extract_html`

`browser`

`api`

`blocked`

Proxy Recommendation (`proxy_recommended` / `proxy_type`)