Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web page to identify the data you wish to extract, and Portia will understand based on these annotations how to scrape data from similar pages.
Web Scraping Tools
Filters the pages it visits
Defines CSS or Path selectors
Uses popular output formats such as CSV and JSON
Small (<50 employees), Medium (50 to 1000 Enterprise (>1001 employees)
Portia is a tool that allows the user to visually scrape websites without any programming knowledge required. With Portia the user can annotate a web page to identify the data that needs to be extracted, and Portia will understand based on these annotations how to scrape data from similar pages.
Web scraping involves coding and programming crawlers. If the user is a non-coder person, Portia can help extract web contents easily. This Scrapinghub’s tool lets the user use point&click UI interface to annotate (select) web content for its further scrape and store of it. I’ll go deeper inside Portia later in this post.
One can use Portia within a Scrapinghub account as a free add-on. It provides basic point-&-click tools to grab content from websites. To use Portia, you need to first add the service as an addon to your Scrapinghub project. The user does not need to download or install anything as Portia runs on the web page!
If the user needs a more fine-tuned control, Portia can define CSS or XPath selectors to extract the data. Output the data in popular formats like CSV and JSON. Forget worrying about vendor lock in. Portia is open source and the crawlers it outputs run on Scrapy, an open source web crawling framework. Portia is an efficient and free web scraping tool.