Viability Test Endpoints
The viability test endpoints allow you to analyze one or more URLs before scraping them, to determine which scraping strategy is most appropriate. The analysis runs in the background and is queried asynchronously.
POST /v1/async/viability-test
Sends a list of URLs for analysis. Returns a run_id immediately.
Request
curl -X POST \
'https://api.scrapingpros.com/v1/async/viability-test' \
-H 'Authorization: Bearer <API-KEY>' \
-H 'Content-Type: application/json' \
-d '{
"urls": [
"https://www.example.com",
"https://www.booking.com"
]
}'
Body
| Field | Type | Required | Description |
|---|---|---|---|
urls | array of strings | Yes | List of URLs to analyze. Minimum 1. |
actions | array | No | Browser actions to execute before the analysis (click, input, etc.). Same format as in the scraping endpoint. |
Response (202)
{
"run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "in_progress",
"total_urls": 2,
"completed_urls": 0
}
GET /v1/async/viability-test/{run_id}
Queries the status and results of an analysis. Call periodically until status is completed.
Request
curl 'https://api.scrapingpros.com/v1/async/viability-test/a1b2c3d4-e5f6-7890-abcd-ef1234567890' \
-H 'Authorization: Bearer <API-KEY>'
Response -- in progress (200)
{
"run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "in_progress",
"total_urls": 2,
"completed_urls": 1,
"results": null
}
Response -- completed (200)
{
"run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "completed",
"total_urls": 2,
"completed_urls": 2,
"results": [
{
"url": "https://www.example.com",
"recommended_strategy": "extract_html",
"captcha_detected": false,
"captcha_providers": [],
"javascript_detected": false,
"javascript_required": false,
"browser_confidence": 0.05,
"content_type": "html",
"rate_limited": false,
"soft_block": false,
"retry_after": null,
"login_wall": false,
"status_code": 200,
"redirect_chain": [],
"protection_signals": {},
"cloudflare_level": "none",
"api_detected": false,
"api_endpoints": [],
"graphql_detected": false,
"api_auth_required": false,
"can_use_simple_request": false,
"can_use_extract_html": true,
"can_use_browser": true,
"proxy_recommended": false,
"proxy_type": "none"
},
{
"url": "https://www.booking.com",
"recommended_strategy": "browser",
"captcha_detected": false,
"captcha_providers": [],
"javascript_detected": true,
"javascript_required": true,
"browser_confidence": 0.92,
"content_type": "html",
"rate_limited": false,
"soft_block": false,
"retry_after": null,
"login_wall": false,
"status_code": 200,
"redirect_chain": [],
"protection_signals": {},
"cloudflare_level": "none",
"api_detected": false,
"api_endpoints": [],
"graphql_detected": false,
"api_auth_required": false,
"can_use_simple_request": false,
"can_use_extract_html": false,
"can_use_browser": true,
"proxy_recommended": false,
"proxy_type": "none"
}
]
}
Per-URL Result Fields
Recommended Strategy
| Field | Type | Description |
|---|---|---|
recommended_strategy | string | Suggested strategy: extract_html, browser, api, or blocked |
can_use_extract_html | boolean | Full HTML is accessible without JS |
can_use_browser | boolean | The browser can scrape without being blocked |
can_use_simple_request | boolean | There are JSON/XHR endpoints accessible without authentication |
proxy_recommended | boolean | true if using a proxy is recommended to access the site |
proxy_type | string | Recommended proxy type: none, datacenter, or residential |
Protection and Blocks
| Field | Type | Description |
|---|---|---|
captcha_detected | boolean | A captcha was detected on the page |
captcha_providers | array | Detected providers (e.g. ["cloudflare", "recaptcha"]) |
cloudflare_level | string | Cloudflare level: none, cdn, or challenge |
rate_limited | boolean | The URL applies active rate limiting |
soft_block | boolean | A soft block was detected (without explicit captcha) |
retry_after | integer | null | Suggested wait time in seconds if rate limited |
login_wall | boolean | The page requires login to access content |
protection_signals | object | Protection signals detected in headers/body |
JavaScript and Browser
| Field | Type | Description |
|---|---|---|
javascript_detected | boolean | The page uses JavaScript |
javascript_required | boolean | The content depends on JS to render |
browser_confidence | float | Heuristic browser necessity score (0.0--1.0) |
API and XHR
| Field | Type | Description |
|---|---|---|
api_detected | boolean | JSON/XHR endpoints were detected during analysis |
api_endpoints | array | URLs of detected endpoints |
graphql_detected | boolean | A GraphQL endpoint was detected |
api_auth_required | boolean | The endpoints require authentication |
General Information
| Field | Type | Description |
|---|---|---|
url | string | Analyzed URL |
status_code | integer | HTTP status code of the response |
content_type | string | Detected content type: html, json, xml, or unknown |
redirect_chain | array | List of intermediate URLs if redirects occurred |
Strategies and How to Scrape Based on Them
extract_html
The site serves static HTML. The content is fully available in the initial HTTP response, without needing to execute JavaScript. No blocks or captchas were detected.
How to scrape: perform scraping without browser. This is the fastest and most resource-efficient option.
browser
The site depends on JavaScript to render its main content. A simple HTTP request would return empty or incomplete HTML. The site does not present active blocks.
How to scrape: perform scraping with browser enabled. The browser will execute the JS and wait for the content to become available.
api
During the analysis, JSON or XHR endpoints were detected that return the data directly, without requiring authentication. These endpoints are listed in api_endpoints.
How to scrape: perform scraping without browser, pointing directly to the detected endpoints in api_endpoints. This is the most efficient strategy when available, as it avoids parsing HTML entirely.
blocked
The site presents one or more active barriers that prevent scraping under normal conditions: captcha, Cloudflare challenge, or login wall.
How to scrape: scraping is not possible with the current conditions. This may require the use of residential proxies, captcha solving, or manual analysis of the authentication flow.
When you get a blocked result, check the captcha_providers, cloudflare_level, and login_wall fields to understand which specific barrier was detected.
Proxy Recommendation (proxy_recommended / proxy_type)
Regardless of the recommended strategy, the analysis indicates whether a proxy is needed and what type:
proxy_type | When recommended |
|---|---|
none | Clean site, no blocks or rate limiting |
datacenter | Rate limiting or soft block detected without active fingerprinting |
residential | Cloudflare challenge, DataDome, PerimeterX, or Akamai detected |
A site can have recommended_strategy: "blocked" and proxy_type: "residential" -- that means scraping could be viable with a residential proxy.