Review

Scrapinghub Platform

Overview

Synopsis

Scrapy Cloud is a battle-tested platform for running web crawlers (aka. spiders). Scrapy spiders run in the cloud and scale on demand, from thousands to billion of pages.

Category

Web Scraping Tools

Features

Code your Spiders
Full API access
Code your Spiders
HTTP and HTTPS proxy support (with CONNECT).
A ban detection database with over 130 ban types, status codes or captchas.
Instant access to thousands of IPs in our shared pool

License

Proprietary

Price

Contact for pricing.

Pricing

Subscription

Free Trial

Available

Users Size

Small (<50 employees), Medium (50 to 1000 Enterprise (>1001 employees)

Company

Scrapinghub Platform

PAT Rating™

Editor Rating

Aggregated User Rating

Rate Here

Ease of use

7.6

7.8

Features & Functionality

7.7

9.2

Advanced Features

7.6

8.5

Integration

7.8

9.3

Performance

7.8

Customer Support

7.7

Implementation

Renew & Recommend

—

Bottom Line

Scrapy Cloud, our cloud-based web crawling platform, allows you to easily deploy crawlers and scale them on demand – without needing to worry about servers, monitoring, backups, or cron jobs.

7.7

Editor Rating

8.1

Aggregated User Rating

3 ratings

You have rated this

ScrapingHub Platform is a leading service known for building, deploying and running web crawlers, providing up-to-date data along the way. Collated data are displayed in an amazing stylized interface where they can be reviewed with ease. ScrapingHub platform provides an open source platform called Portia a program designed for Scraping websites. It requires zero programming knowledge; templates are created by clicking on elements on the page you would like to scrape, and Portia will handle the rest.

It will create an automated spider that will scrape similar pages from the website. There are quite a number of spiders crawling thousands to billions of pages, from the Cloud, scaling on demand; and webScraping Cloud is another one of such services. ScrapingHub’s Crawlera allows its users the power to crawl sites using multiple IPs and locations without the fear of getting banned by tracking/proxy management. In other for the smart downloader to achieve this, it distributes requests among many internal nodes; using a proprietary algorithm to minimize the risks of getting banned; it throttles each internal node’s request to sites.

The best part of it all is that this feature is available via a simple HTTP API. ScapingHub’s Splash browser is a lightweight, scriptable browser with an HTTP API. Not only can perfectly render JavaScript, it can also interact with them; provide detailed information on request and response rate of initiated pages.

Scrapinghub allows legible users deploy optimized Splash instances seamlessly. With Splash, screenshots of websites are taken as seen in a browser. Splash offers a simple yet effective Ad block plus filters can be applied at will. For an open source program, it is indeed fantastic.

Filter reviews