Trending
PAT Index™
ALL
PAT Index™
 
1
Pattern
 
2
Scrapy
 
3
Octoparse
 
4
Frontera
 
5
IEPY
 
6
TheWebMiner
 
7
DEiXTo
 
8
Portia
 
9
GNU Wget
Random Articles
 
Top Free Social Media Analytics Software
Top 42 Free Social Media Management, Social Media Analytics and Social Publishing Software
 
Top 41 Free and Open Source Customer Relationship Management (CRM) Software
 
Top 23 Free and Open Source Human Resource ( HR) Software
 
Top 40 Talent Management Software
 
17 Top Sales Analytics and Sales Intelligence Reporting Software
 
Top 55 Workforce Management Software
 
Top 252 Social Media Management, Social Media Analytics, Social Publishing and Social Monitoring Software
 
Top 31 HR Service Delivery Software
 
Top 34 Human Resource Management ( Core HR) Software
 
Top 24 Predictive Analytics Free Software
 
Predictive Analytics Quadrant_1
What is Predictive Analytics ?
 
Top 27 Free Software for Text Analysis, Text Mining, Text Analytics
 
Top Business Intelligence Tools
Top 238 Free & Premium Business Intelligence Tools
 
Top Predictive Analytics Software API
Top 30 Predictive Analytics Software API
Web Scraping Tools Free
Most Recent
 
Read More
June 7, 2017

Frontera

Frontera is an effective code hosting platform for version control and collaboration. It is a web crawling framework consisting of crawl frontier, and distribution/scaling primitives, allowing to build a large scale online web crawler. Frontera takes care of the logic and policies to follow during the crawl. It stores and prioritises links extracted by the crawler to decide which pages to visit next, and capable of doing it in distributed manner. The frontier is initialized with a list of start URLs, that are called the seeds. Once the frontier is initialized the crawler asks it what pages should be visited next. As the crawler starts to visit the pages [...]

19
 
Read More
June 7, 2017

Scrapy

Scrapy is an open source and collaborative framework for extracting the data that users need from websites done in a fast, simple, yet extensible way. Scrapy is an application framework for crawling web sites and extracting structured data which can be used for a wide range of useful applications, like data mining, information processing or historical archival. Even though Scrapy was originally designed for web scraping, it can also be used to extract data using APIs (such as Amazon Associates Web Services) or as a general purpose web crawler. Scrapy is supported under Python 2.7 and Python 3.3+. Python 2.6 support was dropped starting at Scrapy 0.20. [...]

27.5
 
Read More
June 7, 2017

Portia

Portia is a tool that allows the user to visually scrape websites without any programming knowledge required. With Portia the user can annotate a web page to identify the data that needs to be extracted, and Portia will understand based on these annotations how to scrape data from similar pages. Web scraping involves coding and programming crawlers. If the user is a non-coder person, Portia can help extract web contents easily. This Scrapinghub’s tool lets the user use point&click UI interface to annotate (select) web content for its further scrape and store of it. I’ll go deeper inside Portia later in this post. One can use Portia within a [...]

7
 
Read More
June 7, 2017

DEiXTo

DEiXTo is a powerful web data extraction tool that is based on the W3C Document Object Model (DOM). It allows users to create highly accurate extraction rules that describe what pieces of data to scrape from a website. DEiXTo consists of three separate components to help users. GUI DEiXTo is an MS Windows application implementing a friendly graphical user interface that is used to manage extraction rules (build, test, fine-tune, save and modify). This is all that a user needs for small scale extraction tasks. DEiXToBot is a Perl module implementing a flexible and efficient Mechanize agent capable of extracting data of interest using GUI DEiXTo generate [...]

7.25
 
Read More
June 7, 2017

GNU Wget

GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support. The recursive retrieval of HTML pages, as well as FTP sites is supported -- the user can use Wget to make mirrors of archives and home pages, or traverse the web like a WWW robot (Wget understands /robots.txt). Wget works exceedingly well on slow or unstable connections, keeping getting the document until it is fully retrieved. This allows freedom of movement as the user does not always need to be in a [...]

5.75
Compare
Go