Skip to main content

JavaScript Execution (EvaluateAction)

Run arbitrary JavaScript in the browser page context. Use this to call internal APIs, read JS variables, or extract data that isn't in the DOM.

Basic usage

from scrapingpros import EvaluateAction, WaitForSelectorAction

result = client.scrape(
"https://example.com",
browser=True,
actions=[
WaitForSelectorAction(selector="css:body", time=5000),
EvaluateAction(script="document.title"),
],
)
title = result.evaluate_results[0]

Results are returned in evaluate_results — one item per EvaluateAction, in order.

Call internal APIs

Many sites load data from internal API endpoints. Use EvaluateAction to call them directly:

result = client.scrape(
"https://example.com/products",
browser=True,
actions=[
WaitForSelectorAction(selector="css:.product-list", time=5000),
EvaluateAction(script="fetch('/api/products').then(r => r.json())"),
],
)
products = result.evaluate_results[0] # parsed JSON
Why this works

The fetch() runs inside the page context with the page's cookies and session. No authentication needed — it uses the same session as the browser.

Read JavaScript variables

Frameworks like Next.js, Nuxt, and others often embed page data in JS variables:

result = client.scrape(
"https://nextjs-app.com/product/123",
browser=True,
actions=[
EvaluateAction(script="window.__NEXT_DATA__"),
],
)
page_data = result.evaluate_results[0]

Multi-line scripts

result = client.scrape(
"https://example.com",
browser=True,
actions=[
EvaluateAction(script="""
const items = document.querySelectorAll('.item');
Array.from(items).map(el => ({
name: el.textContent.trim(),
href: el.querySelector('a')?.href
}))
"""),
],
)
items = result.evaluate_results[0]

Multiple evaluations

You can chain multiple EvaluateAction steps:

result = client.scrape(
"https://example.com",
browser=True,
actions=[
EvaluateAction(script="window.__NEXT_DATA__"),
EvaluateAction(script="document.title"),
EvaluateAction(script="fetch('/api/stats').then(r => r.json())"),
],
)
next_data = result.evaluate_results[0]
title = result.evaluate_results[1]
stats = result.evaluate_results[2]

Timeout

Default timeout is 30 seconds. Override for slow API calls:

EvaluateAction(
script="fetch('/api/heavy-report').then(r => r.json())",
timeout=60000, # 60 seconds
)

Browser automation actions

EvaluateAction is one of several browser actions. The full list:

ActionPurpose
ClickAction(selector)Click an element
InputAction(selector, text)Type text into a form field
SelectAction(selector, value)Select a dropdown option
KeyPressAction(key)Press a keyboard key
WaitForSelectorAction(selector, time, state=None)Wait for an element to appear (state="visible" server default; pass "attached" for hidden <script> / non-rendered nodes, "hidden" to wait for disappearance)
WaitForTimeoutAction(time)Wait a fixed number of milliseconds
EvaluateAction(script)Execute JavaScript
CollectAction(extract)Extract data during a loop
WhileControl(condition, actions)Loop while a condition is met

Wait for hidden DOM nodes (<script> tags, etc.)

The default wait state is "visible" — the server only considers the selector matched when the element is rendered (non-zero size, display not none). For <script> tags carrying embedded JSON like __NEXT_DATA__, that wait will time out (script tags are never visible). Pass state="attached":

from scrapingpros import WaitForSelectorAction

result = client.scrape(
"https://example.com/product/123",
browser=True,
actions=[
WaitForSelectorAction(
selector="css:script#__NEXT_DATA__",
time=8000,
state="attached", # match as soon as the node exists in the DOM
),
],
)

Available since v0.5.0. Accepted values: "visible", "attached", "hidden". Leave state=None (default) for visible.

Capture response bodies (auth tokens, GraphQL payloads)

network_capture records request metadata (URL, method, status) for every browser request. To also include the response body for selected URLs, set url_pattern to a glob. Useful for grabbing OAuth/Firebase tokens or GraphQL responses without re-running the request yourself:

from scrapingpros import NetworkCaptureConfig

result = client.scrape(
"https://app.example.com/dashboard",
browser=True,
network_capture=NetworkCaptureConfig(
resource_types=["xhr", "fetch"],
url_pattern="*identitytoolkit.googleapis.com*",
),
)

for entry in result.network_requests or []:
if "body" in entry: # only present when url_pattern matched
token = parse_token(entry["body"])

Bodies are capped at 64 KB server-side; larger responses are flagged with body_truncated: true. If the body cannot be captured (e.g. body fetch took more than 5 s, or the page navigated away), the entry gets a body_error field instead of body — the scrape itself never hangs. Single pattern only; for multiple endpoints, use a broader glob (*api*) and filter client-side.

Available since v0.5.0.

Example: login then scrape

from scrapingpros import ClickAction, InputAction, WaitForSelectorAction

result = client.scrape(
"https://example.com/login",
browser=True,
actions=[
InputAction(selector="css:#email", text="user@example.com"),
InputAction(selector="css:#password", text="secret"),
ClickAction(selector="css:button[type=submit]", wait_for_navigation=True),
WaitForSelectorAction(selector="css:.dashboard", time=5000),
],
)