0% found this document useful (0 votes)
128 views

22 WebIntelligence Tools Feb2008

paper pwi

Uploaded by

Ratri Oplover
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views

22 WebIntelligence Tools Feb2008

paper pwi

Uploaded by

Ratri Oplover
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

Major Web Intelligence Tools

2005

Web Intelligence Tools


I. Collection
Offline Explorer SpidersRUs (AI Lab) Google Scholar

II. Analysis (Data and Text Mining)


Google APIs Google Translation GATE Arizona Noun Phraser (AI Lab) Self-Organizing Map, SOM (AI Lab) Weka

III. Visualization
NetDraw JUNG Analysts Notebook and Starlight

2005

Collection: Offline Explorer


Developed by MetaProducts Corporation, Offline Explorer can download Web sites to your hard disk for offline browsing. http://www.metaproducts.com/OE.html Advantages of Offline Explorer
Save Time: Download up to 500 files simultaneously.
Save Yesterday's Web Sites for Tomorrow's Use Monitor Web Sites

Mine your Data


TextPipe tool in Offline Explorer Pro edition can extract or change the desired data, or even explort it to a database.

2005

Offline Explorer

Project list

Project properties setup window

Download URLs

File filters, URL filters, and other advanced properties.

Download level

2005

File modification check


4

SpidersRUs
SpidersRUs Digital Library Toolkit was developed by Artificial Intelligence Lab at the University of Arizona. http://ai.eller.arizona.edu/spidersrus/

Provide modular tools for spidering, indexing, searching for building digital libraries in different languages in a simple DIY (Do-ItYourself) way. Users can create their own search engines easily and quickly via the friendly user interface.
SpidersRUs can automate the development of vertical search engines in different domains and languages. It can work on nonEnglish languages such as Asian and Middle East languages.

2005

SpidersRUs

Keyword search Search results

2005

An example of a Chinese search engine built by SpidersRUs

Google Scholar
Google Scholar provides a simple way to broadly search for scholarly literature. http://scholar.google.com/ Features of Google Scholar: Search diverse sources from one convenient place Find papers, abstracts and citations Locate the complete paper through your library or on the web Learn about key papers and scholars in any area of research

2005

Google Scholar
Search for Bioterrorism in Google Scholar
List of papers citing this paper

366 citations

2005

Analysis: Google APIs


Google provides many APIs to help you quickly develop your own applications. http://code.google.com/more/ Examples of Google APIs: Google API for Inlink: Discovers what pages link to your website. Google Data APIs: Provide a simple, standard protocol for reading and writing data on the Web. Several Google services provide a Google Data API, including Google Base, Blogger, Google Calendar, Google Spreadsheets and Picasa Web Albums. Google AJAX Search API: Uses JavaScript to embed a simple, dynamic Google search box and display search results in your own Web pages. Google Analytics: Allows users gather, view, and analyze data about their Website traffic. Users can see which content gets the most visits, average page views and time on site for visits. Google Safe Browsing APIs: Allow client applications to check URLs against Google's constantly-updated blacklists of suspected phishing and malware pages. YouTube Data API: Integrates online videos from YouTube into your applications.

2005

Example: Google API for Inlink

Results: all the related inlink Web pages Input link URL and search
2005

10

Google Translation
Google's Translate function. http://www.google.com/language_tools?hl=en The input and output languages can be Arabic, Chinese, Dutch, English, French, German, Greek, Italian, Japanese, Korean, Portugese, Russian or Spanish. Major functions of Google Translation include:
Search multilingual Web pages
Search the Internet in one language and get the results in another one.

Translate text
Translate free text into multiple languages.

Translate a Web page


Translate a Web page into multiple languages.

2005

11

Google Translation

Search multilingual Web pages

Translate text from Arabic to English

Translate a Web page

2005

12

GATE
Generalised Architecture for Text Engineering (GATE) is a toolkit for Text Mining. It was developed by NLP group at the University of Sheffield (UK). http://gate.ac.uk Information Extraction tasks:
Named Entity Recognition (NE)
Finds names, places, dates, etc. Identifies identity relations between entities in texts. Adds descriptive information to NE results (using CO). Finds relations between TE entities. Fits TE and TR results into specified event scenarios.

Co-reference Resolution (CO)


Template Element Construction (TE) Template Relation Construction (TR) Scenario Template Production (ST)

GATE also includes:


Parsers, stemmers, and Information Retrieval tools; Tools for visualizing and manipulating ontology; and Evaluation and benchmarking tools.

2005

13

GATE

Project information

Attributes

Results display
2005
* Picture is from http://nlp.shef.ac.uk

14

Arizona Noun Phraser


The Arizona Noun Phraser was developed by Artificial Intelligence Lab at the University of Arizona. http://ai.arizona.edu/ The Arizona Noun Phraser is made up of three major components, a tokenizer, a partof-speech tagger, and a phrase generation tool. It generates precise topic descriptions. Tokenizer Separates punctuation and symbols from text without affecting content. Part of Speech (POS) Tagger Uses both lexical and contextual disambiguation in POS assignment; Lexicons include: Brown Corpus, Wall Street Journal, and Specialist Lexicon. Phrase Generation Uses Simple Finite State Automata (FSA) of noun phrasing rules; Breaks sentences and clauses into grammatically correct noun phrases.
2005

15

Arizona Noun Phraser

2005

16

SOM
The multi-level self-organizing map neural network algorithm was developed by Artificial Intelligence Lab at the University of Arizona. Using a 2D map display, similar topics are positioned closer according to their co-occurrence patterns; more important topics occupy larger regions.

2005

17

SOM
Example: FMD Paper Content Map (2001~2005)

Topic Topic region

Different Topics

# of documents belonging to this topic

Warm colors represent new topics.


2005 Developed by AI lab at the University of Arizona

18

Weka
Weka was developed at the University of Waikato in New Zealand. http://www.cs.waikato.ac.nz/~ml/ Tools include: Data preprocessing (e.g., Data Filters), Classification (e.g., BayesNet, KNN, C4.5 Decision Tree, Neural Networks, SVM), Regression (e.g., Linear Regression, Isotonic Regression, SVM for Regression), Clustering (e.g., Simple K-means, Expectation Maximization (EM), Farthest First), Association rules (e.g., Apriori Algorithm, Predictive Accuracy, Confirmation Guided), Feature Selection (e.g., Cfs Subset Evaluation, Information Gain, Chisquared Statistic), and Visualization (e.g., View different two-dimensional plots of the data).
19

2005

Weka

Different analysis tools

The value set of the chosen attribute and the # of input items with each value

Different attributes to choose

2005

20

Visualization: NetDraw
NetDraw is a open source program written by Steve Borgatti from Analytic Technologies for visualizing both 1-mode and 2mode social network data. http://www.analytictech.com/downloadnd.htm Handle multiple relations at the same time, and can use node attributes to set colors, shapes, and sizes of nodes. Pictures can be saved in metafile, jpg, gif and bitmap formats.

Two basic kinds of layouts are implemented: a circle and an MDS/ spring embedding based on geodesic distance. You can also rotate, flip, shift, resize and zoom configurations.

2005

21

NetDraw

Different functions

The networks: nodes representing the individuals and links representing the relations

Display setup of the nodes and relations

2005

22

JUNG
The Java Universal Network/Graph Framework (JUNG) is a software library for the modeling, analysis, and visualization of data that can be represented as a graph or network. It was developed by School of Information and Computer Science at the University of California, Irvine. http://jung.sourceforge.net/index.html The current distribution of JUNG includes implementations of a number of algorithms from graph theory, data mining, and social network analysis: Clustering Decomposition Optimization Random Graph Generation Statistical Analysis Calculation of Network Distances and Flows and Importance Measures (Centrality, PageRank, HITS, etc.).
23

2005

JUNG

Examples of visualization types

2005

* Pictures are from http://jung.sourceforge.net/index.html

24

Analysts Notebook & Starlight


Analysts Notebook, by i2: A 2D graph and timeline layout tool for crime and intelligence analysis Startlight, by Pacific Northwest Lab (PNL): A 3D network visualization and navigation tool for intelligence analysis

2005

25

Analysts Notebook, i2

Starlight, PNL

2005

26

You might also like