Documentation - JavaScript Scenario
Interact with the webpage you want to scrape.You can also discover this feature using our Postman collection covering every ScrapingBee's features.
💡 Important:
This page explains how to use a specific feature of our main web scraping API !
If you are not yet familiar with ScrapingBee web scraping API, you can read the documentation here .
Basic usage
If you want to interact with pages you want to scrape before we return your the HTML you can add JavaScript scenario to your API call.
For example, if you wish to click on a button, you will need to use this scenario.
{
"instructions": [
{"click": "#buttonId"}
]
}
You can use both CSS and XPath selectors in all instructions. All selectors beginning with
/
will be treated as XPath selectors. All other selectors will be treated as CSS selectors.
And so our scraper will scrape the webpage, click on the button #buttonId
and then return you the HTML of the page.
Important: JavaScript scenarios are JSON formatted, and in order to pass them to a GET request, you need to stringify them.
# Install the Python ScrapingBee library:
# pip install scrapingbee
from scrapingbee import ScrapingBeeClient
client = ScrapingBeeClient(api_key='YOUR-API-KEY')
response = client.get(
'https://www.scrapingbee.com/blog',
params={
'js_scenario': {"instructions": [{ "click": "#buttonId" }]},
},
)
print('Response HTTP Status Code: ', response.status_code)
print('Response HTTP Response Body: ', response.content)
// request Axios
const axios = require('axios');
axios.get('https://app.scrapingbee.com/api/v1', {
params: {
'api_key': 'YOUR-API-KEY',
'url': 'https://www.scrapingbee.com/blog',
'js_scenario': '{"instructions": [{ "click": "#buttonId" }]}',
}
}).then(function (response) {
// handle success
console.log(response);
})
String encoded_url = URLEncoder.encode("YOUR URL", "UTF-8");
require 'net/http'
require 'net/https'
require 'uri'
# Classic (GET )
def send_request
js_scenario = URI::encode('{"instructions": [{ "click": "#buttonId" }]}')
uri = URI('https://app.scrapingbee.com/api/v1/?api_key=YOUR-API-KEY&url=https://www.scrapingbee.com/blog&js_scenario=' + js_scenario)
# Create client
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
# Create Request
req = Net::HTTP::Get.new(uri)
# Fetch Request
res = http.request(req)
puts "Response HTTP Status Code: #{ res.code }"
puts "Response HTTP Response Body: #{ res.body }"
rescue StandardError => e
puts "HTTP Request failed (#{ e.message })"
end
send_request()
<?php
// get cURL resource
$ch = curl_init();
// set url
$js_scenario = urlencode('{"instructions": [{ "click": "#buttonId" }]}');
curl_setopt($ch, CURLOPT_URL, 'https://app.scrapingbee.com/api/v1/?api_key=YOUR-API-KEY&url=https://www.scrapingbee.com/blog&js_scenario=' . $js_scenario);
// set method
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
// return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// send the request and save response to $response
$response = curl_exec($ch);
// stop if fails
if (!$response) {
die('Error: "' . curl_error($ch) . '" - Code: ' . curl_errno($ch));
}
echo 'HTTP Status Code: ' . curl_getinfo($ch, CURLINFO_HTTP_CODE) . PHP_EOL;
echo 'Response Body: ' . $response . PHP_EOL;
// close curl resource to free up system resources
curl_close($ch);
?>
package main
import (
"fmt"
"io/ioutil"
"net/http"
"net/url"
)
func sendClassic() {
// Create client
client := &http.Client{}
// Stringify rules
js_scenario := url.QueryEscape(`{"instructions": [{ "click": "#buttonId" }]}`)
// Create request
req, err := http.NewRequest("GET", "https://app.scrapingbee.com/api/v1/?api_key=YOUR-API-KEY&url=https://www.scrapingbee.com/blog&js_scenario=" + js_scenario, nil)
parseFormErr := req.ParseForm()
if parseFormErr != nil {
fmt.Println(parseFormErr)
}
// Fetch Request
resp, err := client.Do(req)
if err != nil {
fmt.Println("Failure : ", err)
}
// Read Response Body
respBody, _ := ioutil.ReadAll(resp.Body)
// Display Results
fmt.Println("response Status : ", resp.Status)
fmt.Println("response Headers : ", resp.Header)
fmt.Println("response Body : ", string(respBody))
}
func main() {
sendClassic()
}
You can add multiple instructions to the scenario, they will get executed one by one on our end.
Important: We strongly advice you to use JS scenario with json_response
set to true
(
learn more
), as it will return you a detailed report of the scenario execution under the js_scenario_report
key. Very useful for debugging for example.
Below is a quick overview of all the different instruction you can use.
{"evaluate": "console.log('foo')"} # Run custom JavaScript
{"click": "#button_id"} # Click on a an element
{"wait": 1000} # Wait for a fixed duration in ms
{"wait_for": "#slow_div"} # Wait for an element to appear
{"wait_for_and_click": "#slow_div"} # Wait for an element to appear and then click on it
{"scroll_x": 1000} # Scroll the screen in the horizontal axis, in px
{"scroll_y": 1000} # Scroll the screen in the vertical axis, in px
{"fill": ["#input_1", "value_1"]} # Fill some input
{"evaluate": "console.log('toto')"} # Run custom JavaScript code
{"infinite_scroll": # Scroll the page until the end
{
"max_count": 0, # Maximum number of scroll, 0 for infinite
"delay": 1000 # Delay between each scroll, in ms
"end_click": {"selector": "#button_id"} # (optional) Click on a button when the end of the page is reached, usually a "load more" button
}
}
Of course, you can choose to use them in the order you want, and you can use the same one multiple time in one scenario.
Here is an example of a scenario that wait for a button to appear, click on it and then scroll, wait a bit, and scroll again.
{
"instructions": [
{"wait_for_and_click": "#slow_button"},
{"scroll_x": 1000},
{"wait": 1000},
{"scroll_x": 1000},
{"wait": 1000}
]
}
Strict mode
By default, our JavaScript scenario are executed in "strict mode", which means that if an error occurs during the execution of the scenario, the whole scenario will be aborted and an error will be returned.
You can disable this behavior by setting the strict
key to false
in your scenario.
{
"strict": false,
"instructions": [
{"wait_for_and_click": "#slow_button"},
{"scroll_x": 1000},
{"wait": 1000},
{"scroll_x": 1000},
{"wait": 1000}
]
}
Clicking on a button
click
CSS/XPath selector
To click on a button, use this instruction with the CSS/XPath selector of the button you want to click on
If you want to click on the button whose id
is secretButton
you need to use this JavaScript scenario:
{
"instructions": [
{"click": "#secretButton"}
]
}
Wait for a fixed amount of time
wait
duration in ms
To wait for a fixed amount of time, use this instruction with the duration, in ms, you want to wait for.
If you want to wait for 2 seconds, you need to use this JavaScript scenario:
{
"instructions": [
{"wait": 2000}
]
}
Wait for an element to appear
wait_for
CSS/XPath selector
To wait for a particular element to appear, use this instruction with the CSS/XPath selector of the element you want to wait for.
If you want to wait for the element whose class is slow_div
to appear before getting some results, you need to use this JavaScript scenario:
{
"instructions": [
{"wait_for": ".slow_div"}
]
}
Wait for an element to appear and click
wait_for_and_click
CSS/XPath selector
To wait for a particular CSS/XPath element to appear, and then click on it, use this instruction.
​​​​​If you want to wait for the element whose class is slow_div
to appear before clicking on it, you need to use this JavaScript scenario:
{
"instructions": [
{"wait_for_and_click": ".slow_div"}
]
}
Note: this is exactly the same as using:
{
"instructions": [
{"wait_for": ".slow_div"},
{"click": ".slow_div"}
]
}
​​
Scroll Horizontally
scroll_x
number of pixel
To scroll horizontally on a page, use this instruction with the number of pixels you want to scroll.
If you want to scroll down 1000px you need to use this JavaScript scenario:
{
"instructions": [
{"scroll_x": 1000}
]
}
Scroll Vertically
scroll_y
number of pixel
To scroll vertically on a page, use this instruction with the number of pixels you want to scroll.
If you want to scroll down 1000px you need to use this JavaScript scenario:
{
"instructions": [
{"scroll_y": 1000}
]
}
Filling form input
fill
[ selector, value ]
To fill an input, use this instruction with the CSS/XPath selector of the input you want to fill and the value you want to fill it with.
If you want to fill an input whose CSS/XPath selector is #input_1
with the value value_1
you need to use this JavaScript scenario:
{
"instructions": [
{"fill": ["input_1", "value_1"]}
]
}
Infinite scrolling
infinite_scroll
configuration
To scroll the page until the end, use this instruction with the configuration you want to use.
{"infinite_scroll": # Scroll the page until the end
{
"max_count": 0,
"delay": 1000
"end_click": {"selector": "#button_id", "selector_type": "css|auto|xpath"}
}
}
max_count
: The maximum number of scroll you want to do, 0 for infinite.delay
: The delay between each scroll, in ms.end_click
: (optional) A click instruction to click on a button when the end of the page is reached, usually a "load more" button. You can use both a CSS or XPath selector, but you will need to pass the correctselector_type
value.
Note: This instruction is currently not supported when stealth proxies are used.
Executing custom JavaScript
evaluate
JavaScript code
If you need more flexibility and need to run custom JavaScript, you need to use this instruction. However, this instruction is currently not supported when stealth proxies are used.
If you want to run the code console.log('foo')
on the webpage you need to use this JavaScript scenario:
{
"instructions": [
{"evaluate": "console.log('foo')"}
]
}
💡 Good to know: The results of any
evaluate
instruction will be added to theevaluate_results
key in the JSON response ifjson_response=True
is used. You can read more about this here
Timeout
Your whole scenario should not take more than 40 seconds to complete, otherwise the API will timeout.