Web Scraping is hard, scraping at scale can be very challenging.
You have to handle:
🛠 Chrome headless
ScrapingBee is a simple API that does all the above for you.
ScrapingBee is meant to be the easiest scraping API available on the web.
You'll directly receive HTML in your terminal:
Please note that every URL that failed will be tried as many time as possible during 30 seconds.
So please be aware of this maximum timeout when writing your own code.
Warning : If you use one or many of those parameters, please always ensure that the URL parameter is the ENCODED! This is extremely important to avoid mixing your API parameters and the possible HTTP parameters of the URL you want to scrape.
This can be very useful if you are scraping a Single Page Application built on frameworks like React.js / Angular.js / JQuery or Vue.
To fetch the URL directly use
render_js=False in your
To fetch the URL through a chrome headless browser, use
render_js =True in the
Keep in mind that
render_js =True is the default behavior. Use
render_js=False if you don't need it.
render_js=True (default behavior)
Will return the full HTML page as you see it in your browser. `render_js=False`
This can be useful for example if you need to perform a scroll in case of infinite scroll web page triggering Ajax requests to load more elements.
Or if you need to click some button before specific information is being displayed.
To do so, you need to add the parameter
js_snippet with your code encoded to base64.
If you need help encoding you JS snippet in base64 you can find below how to do it:
Warning : Do not forget to correctly encode your URL before calling your API because there is a good chance that special characters such as
+ are in your base_64 string.
If you need help encoding your URL you can find below how to do it:
If your code needs some time to execute, you can also add an optional
wait parameter with a value in milliseconds between 0 and 10000. The browser will then wait for this value before returning the page's HTML.
If you need some help setting all this up, do not hesitate to contact us :).
You can pass custom cookies to the webpages you want to crawl.
To do this just passe the cookie string in the
We currently only handle name and value of custom cookies. If you want to set multiple cookies just separate cookies with
cookies = "cookie_name_1=cookie_value1;cookie_name_2=cookie_value_2"
Warning : Do not forget to url encode your
= are special character that needs to be url encoded.