0% found this document useful (0 votes)
90 views

Data Toolbar

Data Toolbar is a browser add-on that extracts structured data from web pages and converts it to a table format for spreadsheets or databases. It uses a genetic tree-matching algorithm to recursively traverse a website's DOM tree and detect nested lists of data items. Features include collecting data and images directly from the browser, processing multi-page catalogs, and supporting irregular multi-row catalogs mixed with ads. Similar tools include web scrapers that are standalone applications, browser extensions, or web-based services.

Uploaded by

linda976
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

Data Toolbar

Data Toolbar is a browser add-on that extracts structured data from web pages and converts it to a table format for spreadsheets or databases. It uses a genetic tree-matching algorithm to recursively traverse a website's DOM tree and detect nested lists of data items. Features include collecting data and images directly from the browser, processing multi-page catalogs, and supporting irregular multi-row catalogs mixed with ads. Similar tools include web scrapers that are standalone applications, browser extensions, or web-based services.

Uploaded by

linda976
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Data Toolbar

Data Toolbar is a Web scraping computer software add-on


to the Internet Explorer, Mozilla Firefox, and Google
Data Toolbar
Chrome Web browsers that collects and converts the Developer(s) DataTool Services
structured data from Web pages into a tabular format that can Operating system Microsoft Windows
be loaded into a spreadsheet or database management
program.[1] Type Browser toolbar,
Web scraping

Algorithm Website www.datatoolbar.com


(http://datatoolbar.co
The program implements a variation of the genetic tree- m/)
matching algorithm with respect to nested lists.[2] That is,
inside a given website, the program recursively traverses the branches of its DOM tree, aiming to detect
nested lists of data items matching the format of the specified content. This approach is known to have
several advantages over a simple string-matching algorithm.[3]

Features
Collection of data and images directly from the Internet Explorer.
Collection of information from Details pages linked to the catalog.
Automatic processing of multi-page catalogs.
Support of irregular multi-row catalogs mixed with advertisement.

Similar tools
Automation Anywhere - The Web Extractor is a part of the larger automation system
Easy Web Extract (http://www.webextract.net) - Standalone application, Windows
Mozenda (http://www.mozenda.com/web-content-extractor) - Web based service
Newprosoft (http://www.newprosoft.com) - Standalone application, includes an Agent,
Windows
OutWit (http://www.outwit.com/) – Standalone Application and Firefox Extension
Data Scraping Studio (https://www.datascraping.co/data-extraction-software.aspx) –
Standalone Application for Windows and Chrome Extension
Diggernaut (https://www.diggernaut.com/) – Web platform with standalone application for
Windows, Linux, MacOS and Google Chrome Extension

Sources
1. "A guide to the mortgage banking industry's leading providers of high-tech products and
services" (http://issuu.com/zackinpublications/docs/sme1101_online). The Journal for
Mortgage Banking Professionals. Zackin Publications. 25 (2): 14. January 2011.
2. Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran S. da Silva, Juliana S. Teixeira A
Brief Survey of Web Data Extraction Tools (http://homepages.dcc.ufmg.br/~berthier/books_jo
urnal_papers/sigmod_record_2002.pdf) Archived (https://web.archive.org/web/20110706162
225/http://homepages.dcc.ufmg.br/~berthier/books_journal_papers/sigmod_record_2002.pd
f) 2011-07-06 at the Wayback Machine ACM SIGMOD Volume 31 Issue 2
3. Nitin Jindal, Bing Liu A Generalized Tree Matching Algorithm Considering Nested Lists for
Web Data Extraction (http://www.siam.org/proceedings/datamining/2010/dm10_081_jindaln.
pdf) Proceedings of the Tenth SIAM International Conference on Data Mining, 2010

External links
http://datatoolbar.com/

Retrieved from "https://en.wikipedia.org/w/index.php?title=Data_Toolbar&oldid=1159216778"

You might also like