ScrapingBee

FREEMIUM
By daolf
Updated a month ago
Data
8.8/10
Popularity Score
8723ms
Latency
97%
Success Rate

ScrapingBee API Overview

Web Scraping is hard, scraping at scale can be very challenging.

You have to handle:

💻 Javascript rendering

🛠 Chrome headless

🤖 Captcha

🕵️‍♀️ Proxy

ScrapingBee is a simple API that does all the above for you.

Contact API Provider
star-blueRate

ScrapingBee is meant to be the easiest scraping API available on the web.

You'll directly receive HTML in your terminal:

Please note that every URL that failed will be tried as many time as possible during 30 seconds.

So please be aware of this maximum timeout when writing your own code.

Warning : If you use one or many of those parameters, please always ensure that the URL parameter is the ENCODED! This is extremely important to avoid mixing your API parameters and the possible HTTP parameters of the URL you want to scrape.

Javascript rendering

You can ask ScrapingBee to fetch the URL you want to scrape directly or through a headless browser which will execute the Javascript code on the target page. This is the default behavior

This can be very useful if you are scraping a Single Page Application built on frameworks like React.js / Angular.js / JQuery or Vue.

To fetch the URL directly use render_js=False in your GET request.

To fetch the URL through a chrome headless browser, use render_js =True in the GET request.

Keep in mind that render_js =True is the default behavior. Use render_js=False if you don't need it.

Example with a dummy Single Page Application (SPA):

render_js=True (default behavior)

Will return the full HTML page as you see it in your browser.

`render_js=False` 

Will return empty HTML page as JavaScript won't have render content.

Javascript Execution

You can ask ScrapingBee API to execute arbitrary Javascript code inside our Headless Chrome instance.

This can be useful for example if you need to perform a scroll in case of infinite scroll web page triggering Ajax requests to load more elements.

Or if you need to click some button before specific information is being displayed.

To do so, you need to add the parameter js_snippet with your code encoded to base64.

If you need help encoding you JS snippet in base64 you can find below how to do it:

  • in Python

  • in JS

  • in PHP

  • in ruby

    Warning : Do not forget to correctly encode your URL before calling your API because there is a good chance that special characters such as + are in your base_64 string.

If you need help encoding your URL you can find below how to do it:

If your code needs some time to execute, you can also add an optional wait parameter with a value in milliseconds between 0 and 10000. The browser will then wait for this value before returning the page's HTML.

If you need some help setting all this up, do not hesitate to contact us :).

Custom Cookies

You can pass custom cookies to the webpages you want to crawl.

To do this just passe the cookie string in the cookies parameter.

We currently only handle name and value of custom cookies. If you want to set multiple cookies just separate cookies with ;.

Example:

    cookies = "cookie_name_1=cookie_value1;cookie_name_2=cookie_value_2"

Warning : Do not forget to url encode your cookies parameter, ; and = are special character that needs to be url encoded.

Install SDK for (Node.js)Unirest

OAuth2 Authentication
Client ID
Client Secret
OAuth2 Authentication