Deep Web Research and Discovery Resources 2017: by Marcus P. Zillman, M.S., A.M.H.A
Deep Web Research and Discovery Resources 2017: by Marcus P. Zillman, M.S., A.M.H.A
By
In the last several years, some of the more comprehensive search engines have written
algorithms to search the deeper portions of the world wide web by attempting to find files
such as .pdf, .doc, .xls, ppt, .ps. and others. These files are predominately used by
businesses to communicate their information within their organization or to disseminate
information to the external world from their organization. Searching for this information
using deeper search techniques and the latest algorithms allows researchers to obtain a
vast amount of corporate information that was previously unavailable or inaccessible.
Research has also shown that even deeper information can be obtained from these files by
searching and accessing the “properties” information on these files!
This report and guide is designed to give you the resources you need to better understand
the history of the deep web research, as well as various classified resources that allow
Access the Deep Web and Protect Your Privacy Online with Anonabox by Marco
Chiappetta
http://www.forbes.com/sites/marcochiappetta/2016/04/29/access-the-deep-web-and-
protect-your-privacy-online-with-the-anonabox/#72cd0dc337c2
All of OCLC’s WorldCat Heading Toward the Open Web by Barbara Quint
http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=16353
A Primer on Staying Secure and Anonymous on the Dark Web by Mark Turner
http://www.techspot.com/guides/1292-web-security-anonymizer-primer/
Beyond Google: The Invisible Web - Tools for Teaching the Invisible Web
http://library.laguardia.edu/invisibleweb/teachingtools
Dark Web Version of Facebook Shows a New Way to Secure the Web by Tom
Simonite
https://www.technologyreview.com/s/532256/dark-web-version-of-facebook-shows-a-
new-way-to-secure-the-web/
Deep Web - Exploring the Secrets of the Hidden Internet by Marcus P. Zillman,
M.S., A.M.H.A., - 23 minutes - Internet/Technology Channel
http://www.planetearthradio.com/technology.htm
Digging Deeper into Deep Web Databases by Breaking Through the Top-k Barrier
http://arxiv.org/abs/1208.3876
Everything You Need To Know About the Deep Web In One Simple Infographic
http://www.businessinsider.com/everything-you-need-to-know-about-the-deep-web-in-
one-simple-infographic-2015-2
Grey Literature
http://en.wikipedia.org/wiki/Gray_literature
Here Are the 10 Best Deep Web Search Engines by Kristen Hubby
http://www.dailydot.com/layer8/best-deep-web-search-engines/
Information Retrieval and the Semantic Web by Tim Finin, James Mayfield, Clay
Fink, Anupam Joshi, and R. Scott Cost
http://ebiquity.umbc.edu/paper/html/id/185/
Journey Into the Hidden Web: A Guide for New Researchers by Ryan Dube
http://www.makeuseof.com/tag/journey-into-the-hidden-web-a-guide-for-new-
researchers/
Just the Tip of the Iceberg: Why You Should Be Monitoring the Deep Web
http://www.information-age.com/technology/security/123461668/just-tip-iceberg-why-
you-should-be-monitoring-deep-web
Lessons from the Deep Web That Could Lead To a More Secure IoT by Revathl
Subramanian
http://blogs.ca.com/2015/04/02/lessons-from-the-deep-web-that-could-lead-to-a-more-
secure-iot/?mrm=425878&cid=GLOB-SMM-ABUS-AAR-000002-00000571
Mining the Deep Web: Search Strategies That Work by Lee Ratzan
http://www.computerworld.com/s/article/9005757/Mining_the_Deep_Web_Search_strate
gies_that_work?pageNumber=1
NASA Is Indexing the Deep Web to Show Mankind What Google Won’t by Danielle
Bronner
http://fusion.net/story/145885/nasa-is-indexing-the-deep-web-to-show-mankind-what-
google-wont/
Publications about Web Analysis, Web Search, Citation Indexing, Digital Libraries,
Machine Learning, Neural Networks [Steve Lawrence, Google Labs]
http://research.google.com/pubs/author103.html
The Deep Web: What’s Lurking in the Underbelly of the Internet? By Michelle
Alvarez
http://securityintelligence.com/the-deep-web-whats-lurking-in-the-underbelly-of-the-
internet/#.VUIUNmfD9D8
The Invisible Web: What it is, Why it exists, How to find it, and Its Inherent
Ambiguity
http://www.newworldencyclopedia.org/entry/Deep_Web
The New Search Engines Shining a Light On the Deep Web by Carola Frediani
http://kernelmag.dailydot.com/issue-sections/features-issue-sections/10376/how-to-
search-deep-web-tor/
The Virtual Private Library™ and The Deep Web Video by Melissa Barker
http://zillman.blogspot.com/2009/07/virtual-private-library-and-deep-web.html
Toward the Semantic Deep Web by James Geller, Soon Ae Chun, and Yoo Jung An
http://www.mendeley.com/catalog/toward-semantic-deep-web/
Travel Industry and Deep Web: Exclusive Interview with Marcus P. Zillman
http://plrplr.com/90014/deep-web-and-travel-industry-exclusive-interview-with-marcus-
p-zillman/
UMBC - AgentNews
http://agents.umbc.edu/
Understanding Metadata
http://www.niso.org/standards/resources/UnderstandingMetadata.pdf
10
Using the Internet As a Dynamic Resource Tool for Knowledge Discovery 2017
http://www.zillman.us/white-papers/using-the-internet-as-a-dynamic-resource-tool-for-
knowledge-discovery/
Web Pages Search Engine Based on DNS by Wang Liang, Guo Yi-Ping, and Fang
Ming
http://arxiv.org/pdf/cs.NI/0403035
What Is the Deep Web? A WhatIs Podcast 15 Minute Interview with Marcus P.
Zillman
http://zillman.blogspot.com/2006/10/what-is-deep-web.html
11
Copernic
http://www.copernic.com/
12
MetaLib
http://www.exlibrisgroup.com/category/MetaLibOverview
MetaSearch Initiative
http://www.niso.org/workrooms/mi
MuseGlobal
http://www.museglobal.com/
13
BigChampagne
http://www.bigchampagne.com/
14
15
OpenP2P.com
http://www.openp2p.com/
P2P and the Future of Private Copying by Peter K. Yu, Michigan State University
College of Law
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=578568
Peer-To-Peer Wikipedia
http://en.wikipedia.org/wiki/Peer-to-peer
Port Knocking
http://www.portknocking.org/
Skype
http://www.skype.com/
16
ToPeer
http://www.2peer.com/
TrustyFiles
http://www.trustyfiles.com/
YaCy - Distributed P2P Based Web Indexing and Anonymous Search Engine
http://www.yacy.net/
The Virtual Private Library™ and The Deep Web Video by Melissa Barker
http://zillman.blogspot.com/2009/07/virtual-private-library-and-deep-web.html
18
Cyber Cemetery
http://govinfo.library.unt.edu/
CyberGhost - One of the World's Most Trusted and Secure Virtual Private
Networks
http://www.cyberghostvpn.com/
Deep Web - Discover Resources That Help You Mine the Deep or Invisible Web
Instead of Just Searching the Surface
http://libguides.msubillings.edu/c.php?g=242182&p=1610131
ENDECA
http://www.oracle.com/us/products/applications/commerce/endeca/overview/index.html
20
Google Scholar
http://scholar.google.com/
HighWire Press - Largest Repository of Free Full-Text Life Science Articles in the
World
http://highwire.stanford.edu/
Internet Archive
http://www.archive.org/
Invisible Library
http://invislib.blogspot.com/
Knowledge Discovery
http://www.KnowledgeDiscovery.info/
MagPortal
http://www.magportal.com/
Mappa.Mundi Magazine
http://mappa.mundi.net/
OAIster
http://www.oclc.org/oaister.en.html
22
Open Datasets
https://github.com/caesar0301/awesome-public-datasets
https://www.kaggle.com/datasets
https://www.data.gov/
https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public
https://aws.amazon.com/public-datasets/
https://data.world/
http://data.worldbank.org/
23
reSearcher
http://researcher.sfu.ca/
Science Commons
http://creativecommons.org/science
SciTech Connect
http://www.osti.gov/scitech/
Scrapy Webcrawler
http://scrapy.org/
SIMILE Widgets - Free, Open-Source Data Visualization Web Widgets and More
http://simile-widgets.org/
24
SWRC Ontology
http://ontoware.org/swrc/
The 10 Best Deep Web Search Engines to Explore the Hidden Web by Michelle
Fuchs
https://www.airsassociation.org/services-new/airs-knowledge-network-n/airs-
articles/item/17217-top-10-best-deep-web-search-engines-to-explore-hidden-web
The Deep Web: Shutdowns, New Sites, New Tools by Vincenzo Ciancaglini
http://blog.trendmicro.com/trendlabs-security-intelligence/the-deep-web-shutdowns-new-
sites-new-tools/
25
Tor Project
https://www.torproject.org/
Twitter/Search #deepweb
https://twitter.com/search?q=%23deepweb
26
Web IR & IE
https://groups.yahoo.com/neo/groups/webir/info
http://www.webir.org/
Zaba Search – Free People Search and Public Information Search Engine
http://www.zabasearch.com/
Deep Search, Wide Search and Everything Else You Should Know About Semantic
Search
http://www.dataversity.net/deep-search-wide-search-everything-else-know-semantic-
search/
27
Knowledge Discovery
http://www.KnowledgeDiscovery.info/
KnowledgeNets
http://wissensnetze.ag-nbi.de/
Language Engineering for the Semantic Web: A Digital Library for Endangered
Languages
http://informationr.net/ir/9-3/paper176.html
Magpie - The Samatic Filter and Tool For the Semantic Web
http://projects.kmi.open.ac.uk/magpie/main.html
MetaData at W3C
http://www.w3.org/Metadata/
Rules and Rule Markup Languages for the Semantic Web - RuleML-2003
http://www.informatik.uni-trier.de/~ley/db/conf/semweb/ruleml2003.html
SemanticDeskTop.org
http://www.SemanticDeskTop.org/
29
Simile Widgets – Free, Open-Source Data Visualization Web Widgets and More
http://simile-widgets.org/
30
The Semantic Web By Tim Berners-Lee, James Hendler and Ora Lassila
http://www.scientificamerican.com/article.cfm?id=the-semantic-web
Web Semantics: Science, Services and Agents on the World Wide Web
http://www.sciencedirect.com/science/journal/15708268
31
XML.org
http://www.xml.org/
80legs - Powerful and Economical Service Platform for Crawling and Processing
Web Content
http://www.80legs.com/
AgentLink
http://www.AgentLink.org/
Agents
http://aitopics.org/
ALICEBot
http://www.alicebot.org/
32
ChatBottle Search
https://chatbottle.co/
Common Crawl - Open Repository of Web Crawl Data Composed Of Over 5 Billion
Freely Available Web Pages
http://www.CommonCrawl.org/
34
Google Guide
http://www.googleguide.com/
Imagination Engines
http://www.imagination-engines.com/
Import.io - Turn the Web Into Data With Extractors, Crawlers and Connectors
https://import.io/
35
Lurchr - I Keep Track of What’s Shared by Your Team So You Can Stay Focused
On Work
https://lurchr.com/
36
MultiAgent
http://www.MultiAgent.com/
MySpiders
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.3013
Nomibot - Bots Scour the Web To Bring You What You Want
http://nomibot.com/
Robo Brain - Large Scale Computational System That Learns from Publicly
Available Internet Resources
http://robobrain.me/
37
Semantic Web
http://www.semanticweb.org/
ShoppingBots 2017
http://www.ShoppingBots.info/
SocialBuzzBot - The Business and Social Intelligence Search Engine for Information
Discovery from Social Communities
http://www.SocialBuzzBot.com/
Spidering Hacks
http://www.oreilly.com/catalog/spiderhks/
Spinn3r: RSS Content, News Feeds, News Content, News Crawler and Web Crawler
APIs
http://spinn3r.com/
38
UMBC AgentWeb
http://agents.umbc.edu/
UMBC eBiquity
http://ebiquity.umbc.edu/
39
Web IR & IE
https://groups.yahoo.com/neo/groups/webir/info
http://www.webir.org/
40
Accessibility Resources
http://www.AccessibilityResources.info/
Agriculture Resources
http://www.AgricultureResources.info/
AnswerSpot
http://www.AnswerSpot.us/
Astronomy Resources
http://www.AstronomyResources.info/
Auction Resources
http://www.AuctionResources.info/
Biological Informatics
http://www.BiologicalInformatics.info/
Biotechnology Resources
http://www.BiotechnologyResources.info/
Bot Research
http://www.BotResearch.info/
41
Directory Resources
http://www.DirectoryResources.info/
eCommerce Resources
http://eCommerceResources.info/
Elder Resources
http://www.ElderResources.info/
Employment Resources
http://www.EmploymentResources.info/
Entrepreneurial Resources
http://www.EntrepreneurialResources.info/
Financial Sources
http://www.FinancialSources.info/
Finding People
http://www.FindingPeople.info/
Games Resources
http://www.GamesResources.info/
Genealogy Resources
http://www.GenealogyResources.info/
42
Green Files
http://www.GreenFiles.info/
Healthcare Resources
http://www.HealthcareResources.info/
Internet Alerts
http://www.InternetAlerts.info/
Internet Demographics
http://www.InternetDemographics.info/
Internet Experts
http://www.InternetExperts.info/
Internet Hoaxes
http://www.InternetHoaxes.info/
Intrapreneurial Resources
http://www.IntrapreneurialResources.info/
Journalism Resources
http://www.JournalismResources.info/
Knowledge Discovery
http://www.KnowledgeDiscovery.info/
43
Privacy Resources
http://www.PrivacyResources.info/
Reference Resources
http://www.ReferenceResources.info/
Research Resources
http://www.ResearchResources.info/
RestStress™
http://www.RestStress.com/
Script Resources
http://www.ScriptResources.info/
ShoppingBots
http://www.ShoppingBots.info/
Social Informatics
http://www.SocialInformatics.info/
Student Research
http://www.StudentResearch.info/
Theology Resources
http://www.TheologyResources.info/
Tutorial Resources
http://www.TutorialResources.info/
44
45
Internet MiniGuides™
http://www.InternetMiniguide.com/
LinkSeries Publications
http://www.LinkSeries.com/
Links By Marcus™
http://www.LinksByMarcus.com/
Workshops By Marcus™
http://www.WorkshopsByMarcus.com/
Deep Web Research and Discovery Resources 2017 Article - LLRX and Online White
Paper
http://zillman.blogspot.com/2017/01/llrx-deep-web-research-and-discovery.html
http://DeepWeb.us/
47
Using the Internet As a Dynamic Resource Tool for Knowledge Discovery 2017
http://www.zillman.us/white-papers/using-the-internet-as-a-dynamic-resource-tool-for-
knowledge-discovery/
49
50