Review

Scrapy

Overview

Synopsis

Scrapy is an open source and collaborative framework for extracting the data you need from websites.

Category

Web Scraping Tools

Features

Built-in support for selecting and extracting data from HTML/XML sources
Built-in support for generating feed exports in multiple formats
Robust encoding support and auto-detection
Strong extensibility support
Wide range of built-in extensions and middlewares

License

Open Source

Price

Free

Pricing

Subscription

Free Trial

Available

Users Size

Small (<50 employees), Medium (50 to 1000 Enterprise (>1001 employees)

Company

Scrapy

PAT Rating™

Editor Rating

Aggregated User Rating

Rate Here

Ease of use

8.1

7.5

Features & Functionality

8.2

9.4

Advanced Features

8.1

8.3

Integration

8.0

8.7

Performance

8.0

9.1

Customer Support

8.2

8.6

Implementation

3.4

Renew & Recommend

0.0

Bottom Line

Scrapy is an open source and collaborative framework for extracting the data that users need from websites done in a fast, simple, yet extensible way. Scrapy is an application framework for crawling web sites and extracting structured data which can be used for a wide range of useful applications, like data mining, information processing or historical archival.

8.1

Editor Rating

6.6

Aggregated User Rating

5 ratings

You have rated this

Even though Scrapy was originally designed for web scraping, it can also be used to extract data using APIs (such as Amazon Associates Web Services) or as a general purpose web crawler. Scrapy is supported under Python 2.7 and Python 3.3+. Python 2.6 support was dropped starting at Scrapy 0.20.

Python 3 support was added in Scrapy 1.1. Scrapy lets its user write the rules to extract the data and everything that follows get worked on by the program. It is extensible by design as users can easily plug new functionality without having to touch the core. Scrapy is written in Python and runs on Linux, Windows, Mac and BSD which makes it compatible with different systems.

Users can also deploy their spider webs to Scrapy Cloud. Scrapy Cloud is a battle-tested platform for running web crawlers (aka. spiders). These spiders run in the cloud and scale on demand, from thousands to billion of pages. With a point and click tool (Portia), which is also open source and extensible, users will not have a hard time managing their data. Users also get to manage their spiders from a dashboard and schedule them to run automatically.

Filter reviews