Skip to main content

Viability Test Endpoints

The viability test endpoints allow you to analyze one or more URLs before scraping them, to determine which scraping strategy is most appropriate. The analysis runs in the background and is queried asynchronously.


POST /v1/async/viability-test

Sends a list of URLs for analysis. Returns a run_id immediately.

Request

curl -X POST \
'https://api.scrapingpros.com/v1/async/viability-test' \
-H 'Authorization: Bearer <API-KEY>' \
-H 'Content-Type: application/json' \
-d '{
"urls": [
"https://www.example.com",
"https://www.booking.com"
]
}'

Body

FieldTypeRequiredDescription
urlsarray of stringsYesList of URLs to analyze. Minimum 1.
actionsarrayNoBrowser actions to execute before the analysis (click, input, etc.). Same format as in the scraping endpoint.

Response (202)

{
"run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "in_progress",
"total_urls": 2,
"completed_urls": 0
}

GET /v1/async/viability-test/{run_id}

Queries the status and results of an analysis. Call periodically until status is completed.

Request

curl 'https://api.scrapingpros.com/v1/async/viability-test/a1b2c3d4-e5f6-7890-abcd-ef1234567890' \
-H 'Authorization: Bearer <API-KEY>'

Response -- in progress (200)

{
"run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "in_progress",
"total_urls": 2,
"completed_urls": 1,
"results": null
}

Response -- completed (200)

{
"run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "completed",
"total_urls": 2,
"completed_urls": 2,
"results": [
{
"url": "https://www.example.com",
"recommended_strategy": "extract_html",
"captcha_detected": false,
"captcha_providers": [],
"javascript_detected": false,
"javascript_required": false,
"browser_confidence": 0.05,
"content_type": "html",
"rate_limited": false,
"soft_block": false,
"retry_after": null,
"login_wall": false,
"status_code": 200,
"redirect_chain": [],
"protection_signals": {},
"cloudflare_level": "none",
"api_detected": false,
"api_endpoints": [],
"graphql_detected": false,
"api_auth_required": false,
"can_use_simple_request": false,
"can_use_extract_html": true,
"can_use_browser": true,
"proxy_recommended": false,
"proxy_type": "none"
},
{
"url": "https://www.booking.com",
"recommended_strategy": "browser",
"captcha_detected": false,
"captcha_providers": [],
"javascript_detected": true,
"javascript_required": true,
"browser_confidence": 0.92,
"content_type": "html",
"rate_limited": false,
"soft_block": false,
"retry_after": null,
"login_wall": false,
"status_code": 200,
"redirect_chain": [],
"protection_signals": {},
"cloudflare_level": "none",
"api_detected": false,
"api_endpoints": [],
"graphql_detected": false,
"api_auth_required": false,
"can_use_simple_request": false,
"can_use_extract_html": false,
"can_use_browser": true,
"proxy_recommended": false,
"proxy_type": "none"
}
]
}

Per-URL Result Fields

FieldTypeDescription
recommended_strategystringSuggested strategy: extract_html, browser, api, or blocked
can_use_extract_htmlbooleanFull HTML is accessible without JS
can_use_browserbooleanThe browser can scrape without being blocked
can_use_simple_requestbooleanThere are JSON/XHR endpoints accessible without authentication
proxy_recommendedbooleantrue if using a proxy is recommended to access the site
proxy_typestringRecommended proxy type: none, datacenter, or residential

Protection and Blocks

FieldTypeDescription
captcha_detectedbooleanA captcha was detected on the page
captcha_providersarrayDetected providers (e.g. ["cloudflare", "recaptcha"])
cloudflare_levelstringCloudflare level: none, cdn, or challenge
rate_limitedbooleanThe URL applies active rate limiting
soft_blockbooleanA soft block was detected (without explicit captcha)
retry_afterinteger | nullSuggested wait time in seconds if rate limited
login_wallbooleanThe page requires login to access content
protection_signalsobjectProtection signals detected in headers/body

JavaScript and Browser

FieldTypeDescription
javascript_detectedbooleanThe page uses JavaScript
javascript_requiredbooleanThe content depends on JS to render
browser_confidencefloatHeuristic browser necessity score (0.0--1.0)

API and XHR

FieldTypeDescription
api_detectedbooleanJSON/XHR endpoints were detected during analysis
api_endpointsarrayURLs of detected endpoints
graphql_detectedbooleanA GraphQL endpoint was detected
api_auth_requiredbooleanThe endpoints require authentication

General Information

FieldTypeDescription
urlstringAnalyzed URL
status_codeintegerHTTP status code of the response
content_typestringDetected content type: html, json, xml, or unknown
redirect_chainarrayList of intermediate URLs if redirects occurred

Strategies and How to Scrape Based on Them

extract_html

The site serves static HTML. The content is fully available in the initial HTTP response, without needing to execute JavaScript. No blocks or captchas were detected.

How to scrape: perform scraping without browser. This is the fastest and most resource-efficient option.


browser

The site depends on JavaScript to render its main content. A simple HTTP request would return empty or incomplete HTML. The site does not present active blocks.

How to scrape: perform scraping with browser enabled. The browser will execute the JS and wait for the content to become available.


api

During the analysis, JSON or XHR endpoints were detected that return the data directly, without requiring authentication. These endpoints are listed in api_endpoints.

How to scrape: perform scraping without browser, pointing directly to the detected endpoints in api_endpoints. This is the most efficient strategy when available, as it avoids parsing HTML entirely.


blocked

The site presents one or more active barriers that prevent scraping under normal conditions: captcha, Cloudflare challenge, or login wall.

How to scrape: scraping is not possible with the current conditions. This may require the use of residential proxies, captcha solving, or manual analysis of the authentication flow.

tip

When you get a blocked result, check the captcha_providers, cloudflare_level, and login_wall fields to understand which specific barrier was detected.


Regardless of the recommended strategy, the analysis indicates whether a proxy is needed and what type:

proxy_typeWhen recommended
noneClean site, no blocks or rate limiting
datacenterRate limiting or soft block detected without active fingerprinting
residentialCloudflare challenge, DataDome, PerimeterX, or Akamai detected
tip

A site can have recommended_strategy: "blocked" and proxy_type: "residential" -- that means scraping could be viable with a residential proxy.