0% found this document useful (0 votes)
17 views

1 Introduction To Multimedia Databases

Uploaded by

maik9206
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

1 Introduction To Multimedia Databases

Uploaded by

maik9206
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 50

Introduction to Multimedia

Databases
Motivation

 With the explosive growth of digital media


data, there is a huge demand for new tools
and systems that enables average users to
more efficiently and more effectively search,
access, process, manage, author and share
these digital media contents.
Terms

 Information retrieval
 Multimedia (information) retrieval
 Image retrieval
 Video retrieval
 Audio retrieval (music retrieval)
Data Structures

 Structured (formatted) data


 Unstructured (unformatted) data
 Semi-structured data
 Multimedia includes a combination of text,
audio, still image, animation, and video
content forms.
Data Types

 Raw data:
represent unformatted information content, e.g. letters, pixel, values
 Registering data:
interpretation and identification of the data
 Descriptive data:
information about content and structure of the data, e.g. semantic
search
Multimedia Data

 Text/Document:
- characters represent raw data
- descriptive data may include information for layout and logical
structuring of text, or keywords

 Image:
- pixel represent raw data
- registering data include the height and width of the picture
- descriptive data are individual lines, surfaces and subjects
Multimedia Data
 Video sequence:
- pixel matrices represent the raw data
- registering data provides the number of images per second
- descriptive data provide a scene description, e.g. ‘John’s birthday
party’
 Audio sequence:
- Coding for the digital sample values represent the raw data
- registering data represent the properties of the audio coding
- descriptive data represent the content of the audio
Multimedia Data

 Video and audio differ from other media


types, in particular:
- video/audio retrievals must appear to be
continuous
- video/audio support operations like fast-
forward, rewind, and pause
MM Database Architectures
What is a Multimedia DBMS?

 A multimedia database management


system (MM-DBMS) is a framework that
manages different types of data
potentially represented in a wide diversity
of formats on a wide array of media
sources.
 ‘Management’ in MMDBMS generally
covers the problems of ‘indexing’ and
‘query/retrieval’ of multimedia data.
The Major Issue 1: Query Support

 Allow easy query of multimedia data


 What is query by content?
 Can query be specified as a combination of media (examples)
and text description?
 How to handle different MM objects?
 What query language should be used?
 Allow efficient/effective query of multimedia
data
 What algorithms can be used to efficiently/effectively retrieve
media data on the basis of similarity?
 How should we index the content of different MM objects?
 How to provide traditional DBMS supports?
A Sample Multimedia Scenario

 Consider a police investigation of a large-scale drug operation.


This investigation may generate the following types of data
 Video data captured by surveillance cameras that record the

activities taking place at various locations.


 Audio data captured by legally authorized telephone wiretaps.
 Image data consisting of still photographs taken by
investigators.
 Document data seized by the police when raiding one or more
places.
 Structured relational data containing background information,
bank records, etc., of the suspects involved.
 Geographic information system data remaining geographic
data relevant to the drug investigation being conducted.
Image Queries

Image-based Query (Query by example):


 Police officer Rocky has a photograph in front of
him.
 He wants to find the identity of the person in the
picture.
 Query: “Retrieve all images from the image
library in which the person appearing in the
(currently displayed) photograph appears”
Image Queries

Image Query (Query by keywords):


 Police officer Rocky wants to examine
pictures of “Big Spender”.
 Query: "Retrieve all images from the image
library in which “Big Spender” appears."
Video Queries

Video Query:
 Police officer Rocky is examining a surveillance video of
a particular person being fatally assaulted by an
assailant. However, the assailant's face is occluded and
image processing algorithms return very poor matches.
Rocky thinks the assault was by someone known to the
victim.
 Query: “Find all video segments in which the victim of
the assault appears.”
 By examining the answer of the above query, Rocky
hopes to find other people who have previously
interacted with the victim.
Audio Queries

 Police officer Rocky is listening to an audio


surveillance tape.
 Example: Tape contains a conversation
between individual A (person under
surveillance) and individual B (somebody
meeting person A).
 Query: “Find the identity of individual B, given
that individual A is Denis Dopeman.”
Audio Queries

 Police officer Rocky wants to review all


audio-logs that Denis Dopeman participated
in during some specified time period.
 Query: “Find all audio tapes in which Denis
Dopeman was a participant.”
Text Queries

 Police officer Rocky is browsing an archive of


text documents – these include old
newspaper archives, police department files
on old, unsolved murder cases, witness
statements, etc.
 Query: “Find all documents that deal with the
Cali drug cartel’s financial transactions with
ABC Corp.”
Heterogeneous Queries

Heterogeneous Multimedia Query:


 Find all individuals who have been
photographed with “Big Spender” and who
have been convicted of attempted murder in
South China and who have recently had
electronic fund transfers made into their bank
accounts from ABC Corp.
Heterogeneous Queries
 Answering the above query is problematic
because:
- determining all people convicted of different
crimes may require accessing a wide variety
of databases belonging to different police
jurisdictions and courts
- ABC Corp. may have accounts in hundreds
of banks worldwide each of which uses
different formats and different database
systems
The Major Issue 2: Content
 What is content of a media source? Under
what conditions can content be described
textually and under what conditions must it be
described directly through the original media
type (i.e. non-textual description)?
 How should we extract the content of:
- an image
- a video-clip
- an audio-clip
- a free/unstructured text document
The Major Issue 2: Content
 Text documents: keywords, summarization,
topic, abstract, etc.
 Images: color, texture, shape, object, abstract
concept, etc.
 Music: rhythm, pulse, tone, etc.
 Videos: moving objects, caption, image clip,
etc.
 How about meta-data (e.g. time, location,
size, authors, etc.)?
The Major Issue 2: Content

 How should we index the results of this


extracted content?
 What is retrieval by similarity?
 What algorithms can be used to
efficiently/effectively retrieve media data on
the basis of similarity?
 What are efficient/effective algorithms for
processing such queries?
Content vs. Concept

 Content vs. concept based indexing and retrieval


MMDB Technology
 MMDB covers the following technical domains:
- data processing: text, image, video, audio
processing
- similarity measure: classic information retrieval
- HCI (Human Computer Interaction)
- Artificial Intelligence: for data processing, similarity
measure, and intelligent interface
- User needs, searching behavior, and satisfaction,
etc.
MMDB Related Fields

 Research problems
- systems, content, services, user, evaluation,
implementation, social/business, applications
 Methodologies
- database, information retrieval, signal and image
processing, graphics, computer vision, HCI,
machine learning, statistical modeling, data mining,
pattern analysis, data fusion, social sciences, and
domain knowledge for specific applications
Multimedia Information Retrieval
 Text-based information retrieval
- to many images to annotate
- high cost of human interpretation
- subjectivity of visual content, e.g. “a picture is worth a thousand words”
 Content-based retrieval
- automatically retrieves images, video, and audio based on the visual and
audio content
 History
- Conference on Database applications of Pictorial Applications in 1979
- NSF workshop in 1992
- More active field since 1997 when Internet and web browsing became
popular
Content-Based Image Retrieval
Content-Based Video Retrieval
Content-Based Audio Retrieval

 The aim: to search sounds by their features


in the waveform, statistics, or transform
domains (speech, music, environment audio)
 Entertainment
- film making: searching sound effects
- TV/radio studio: editing programs
- Karaoke, music stores, or online shopping
- query by humming the melody
Content-Based Audio Retrieval

 Audio/video archive management:


- segmenting and indexing of raw recordings
- searching and browsing audio/video clips
 Surveillance:
- monitoring criminal or emergent events
- film rating
Query-Retrieval Matrix
query

examples
humming
speech Example
sketch

sound
stills
text

doc
 text conventional
 video text roar
you retrieval
and
images get a wildlife
 speech type “floods”
documentary
 music and
humget BBC
a tune
sketches radio
and getnews
a
multimedia music piece
Some Applications

 medicine
 get diagnosis of cases with similar scans
 law enforcement
 CCTV video retrieval (car park, public spaces)
 digital libraries
 searching, visualization, summaries, browsing
Multimedia Mining

 Discovering knowledge from large amounts


of different types of data.
 Extraction of implicit knowledge, multimedia
data relationships, or other patterns not
explicitly stored in multimedia databases
Multimedia Miner

 Multimedia data cube


- extraction of images
- feature extractor (preprocessor)
- user interface
- search engine
 Multimedia miner
- characterizer, comparator
- classifier, associator
Multimedia Miner
Multimedia Miner
WebSeek
 Automatically analyze, index, and assign the images
and videos to subject classes
 650000 images and 10000 videos
 Image content-based techniques
 Query modification using relevance feedback
 Image and video subject search and navigation
 Text-based searching
 http://ei.cs.vt.edu/~mm/cache/WebSeek.htm
Representative Systems

 QBIC
 VisualSEEk
 Virage
 MARS
 Blobworld, etc.
Blobworld
CHROMA
Example: Jupiter video search
 video segmentation: generate paragraphs
 identify key frame of video paragraph
 get Jupiter example images, eg, from web
Google image search:

 treat video search as image search


Other Systems

 Diamond Eye: satellite detection (dynamic


events detection)
 Algorithm development and mining system
(ADAM): geophysical phenomenon detection
from large scientific datasets
 Intelligent Satellite Data Information System
(ISIS)
 Informedia: TV and radio news
10 Problems in MIR
 Bridge the semantic gap
- high level concept (objects, events) and low level visual/audio features
(color, texture, shape and structure, layout; motion; audio-pitch, etc.)

 How about “Europe”, “Peace”, etc. as abstract concepts?


The Semantic Gap

1. 120,000 pixels with a


particular spatial
color distribution

2. human faces,
white and yellow
clothes

3. victory, triumph, ...


10 Problems in MIR

 How to best combine human


intelligence and machine intelligence
- keep human in the loop, e.g. relevance feedback
- machine learning/deep learning
 New query paradigms
- query by keywords, similarity, sketching an object, sketching a
trajectory, painting a rough images, etc. Can we think of useful
new paradigms? (e.g. interactive formulation of queries)
10 Problems in MIR

 Multimedia data mining


- searching for interesting/unusual patterns and correlations in
multimedia has many important applications, including Web search
engines and dealing with intelligence data
- work to date on data mining has been mainly in text data
 How to index unlabeled data
- active learning, e.g. relevance feedback
- label propagation, e.g. image/video annotation; LabelMe
 Virtual reality visualization
- Can we use 3D audio/visual visualization techniques to help a user
to navigate through the data space to browse and to retrieve?
10 Problems in MIR
 Incremental learning
- change the parameters of the retrieval algorithms incrementally,
not needing to start from scratch every time we have new data
 Structuring very large databases
- researchers in audio/visual scene analysis and those in Databases
and Information Retrieval should really collaborate closely to find
good ways of structuring very large multimedia databases for
efficient/effective retrieval and search
 Performance evaluation
- TRECVID for video retrieval, how about image retrieval (Flickr,
PASCAL, Caltech, ImageNet, etc.)
10 Problems in MIR

 Domain-specific applications for multimedia


retrieval
- e.g. medical multimedia document management, cultural heritage
resources
Survey Papers
 A review of content-based image retrieval systems
http://www.jisc.ac.uk/media/documents/programmes/
jtap/jtap-054.pdf
 A survey on contents based search in image
databases
http://www.cvl.isy.liu.se/ScOut/TechRep/Papers/
LiTHISYR2215.pdf
 Content based image retrieval systems: a survey

http://give-lab.cs.uu.nl/cbirsurvey/cbir-survey.pdf

You might also like