0% found this document useful (0 votes)
73 views

DBMS Learning Material 1

This document discusses big data and database design. It begins by defining big data and its key characteristics of volume, velocity, and variety. It then discusses sources of big data and different types of data. Finally, it discusses database technologies like RDF and advanced database technologies needed to manage large, complex datasets.

Uploaded by

Vishnu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

DBMS Learning Material 1

This document discusses big data and database design. It begins by defining big data and its key characteristics of volume, velocity, and variety. It then discusses sources of big data and different types of data. Finally, it discusses database technologies like RDF and advanced database technologies needed to manage large, complex datasets.

Uploaded by

Vishnu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

13.

405 DATABASE DESIGN


Faculty in charge: RESHMA SHEIK

BIG DATA

Big data usually includes data sets with sizes beyond the ability of commonly used software
tools to capture, manage, and process data within a tolerable elapsed time. Big data is being
generated by everything around us at all times. Every digital process and social media exchange
produces it. Systems, sensors and mobile devices transmit it. Big data is arriving from multiple
sources at an alarming velocity, volume and variety. To extract meaningful value from big data,
we need optimal processing power, analytics capabilities and skills.

Today, many organizations are collecting, storing, and analyzing massive amounts of data. This
data is commonly referred to as big data because of its volume, the velocity with which it
arrives, and the variety of forms it takes. Big data is creating a new generation of decision
support data management. Businesses are recognizing the potential value of this data and are
putting the technologies, people, and processes in place to capitalize on the opportunities. A key
to deriving value from big data is the use of analytics. Collecting and storing big data creates
little value; it is only data infrastructure at this point. It must be analyzed and the results used by
decision makers and organizational processes in order to generate value.

Big data can be described by the following characteristics:

Volume.

Many factors contribute to the increase in data volume. Transaction-based data stored through
the years. Unstructured data, streaming in from social media. Increasing amounts of sensor and
machine-to-machine data being collected. In the past, excessive data volume was a storage issue.
But with decreasing storage costs, other issues emerge, including how to determine relevance
within large data volumes and how to use analytics to create value from relevant data.

Velocity.

In this context, the speed at which the data is generated and processed to meet the demands and
challenges that lie in the path of growth and development. Data is streaming in at unprecedented
speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are
driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal
with data velocity is a challenge for most organizations.

Variety. Data today comes in all types of formats. Structured, numeric data in traditional
databases. Information created from line-of-business applications. Unstructured text documents,
email, video, audio, stock ticker data and financial transactions. Managing, merging and
governing different varieties of data is something many organizations still grapple with.
BIG DATA SOURCES

Big data has many sources. For example, every mouse click on a web site can be captured in
Web log files and analyzed in order to better understand shoppers buying behaviors and to
influence their shopping by dynamically recommending products. Social media sources such as
Facebook and Twitter generate tremendous amounts of comments and tweets. This data can be
captured and analyzed to understand, for example, what people think about new product
introductions. Machines, such as smart meters, generate data. These meters continuously stream
data about electricity, water, or gas consumption that can be shared with customers and combined
with pricing plans to motivate customers to move some of their energy consumption, such as for
washing clothes, to non-peak hours. There is a tremendous amount of geospatial (e.g., GPS) data,
such as that created by cell phones, that can be used by applications like Four Square to help you
know the locations of friends and to receive offers from nearby stores and restaurants. Image,
voice, and audio data can be analyzed for applications such as facial recognition systems in
security systems.

WHAT IS BIG DATA ANALYTICS

By itself, stored data does not generate business value, and this is true of traditional databases,
data warehouses, and the new technologies such as Hadoop for storing big data. Once the data is
appropriately stored, however, it can be analyzed, which can create tremendous value. A variety
of analysis technologies, approaches, and products have emerged that are especially applicable to
big data, such as in-memory analytics, in-database analytics, and appliances .

ADVATAGES OF BIG DATA ANALYTICS

Anything involving customers could benefit from big data analytics. Recent economic
changes worldwide have changed consumer behaviors. Big data analytics can help develop
definitions of churn and other customer behaviors, as well as an understanding of consumer
behavior from clickstreams .

Business intelligence in general can benefit from big data analytics. This could result in
more numerous and accurate business insights , an understanding of business change better
planning and forecasting , and the identification of root causes of cost .

Specific analytic applications are likely beneficiaries of big data analytics: Big data
analytics might help automate decisions for real-time business processes such as loan approvals
or fraud detection . Potential benefits entered by survey respondents selecting other include
customer loyalty, service experience optimization, healthcare delivery optimization, and supplier
performance based on cost and quality.
RDF
The Resource Description Framework (RDF) is an infrastructure that enables the encoding,
exchange and reuse of structured metadata. RDF is an application of XML that imposes needed
structural constraints to provide unambiguous methods of expressing semantics. RDF
additionally provides a means for publishing both human-readable and machine-processable
vocabularies designed to encourage the reuse and extension of metadata semantics among
disparate information communities. The structural constraints RDF imposes to support the
consistent encoding and exchange of standardized metadata provides for the interchangeability
of separate packages of metadata defined by different resource description communities.

RDF stands for Resource Description Framework

RDF is a framework for describing resources on the web

RDF is designed to be read and understood by computers

RDF is not designed for being displayed to people

RDF is written in XML

RDF is a part of the W3C's Semantic Web Activity

RDF is a W3C Recommendation from 10. February 2004

RDF - Examples of Use

Describing properties for shopping items, such as price and availability

Describing time schedules for web events

Describing information about web pages (content, author, created and modified date)

Describing content and rating for web pictures

Describing content for search engines

Describing electronic libraries

RDF identifies things using Web identifiers (URIs), and describes resources with properties and
property values.
A Resource is anything that can have a URI

A Property is a Resource that has a name, such as "author" or "homepage"

A Property value is the value of a Property, such as "Jan Egil Refsnes" or


"http://www.myschools.com"

Here are some of the likely benefits of rdf :

By providing a consistent framework, RDF will encourage the providing of metadata


about Internet resources.

Because RDF will include a standard syntax for describing and querying data, software
that exploits metadata will be easier and faster to produce.

The standard syntax and query capability will allow applications to exchange information
more easily.

Searchers will get more precise results from searching, based on metadata rather than on
indexes derived from full text gathering.

Intelligent software agents will have more precise data to work with.

How RDF Works

An Internet resource is defined as any resource with a Uniform Resource Identifier (URI). This
includes the Uniform Resource Locators (URL) that identify entire Web sites as well as specific
Web pages. As with today's HTML META tags, the RDF description statements, encased as part
of an Extensible Markup Language (XML) section, could be included within a Web page (that is,
a Hypertext Markup Language - HTML - file) or could be in separate files. RDF is now a formal
W3C Recommendation, meaning that it is ready for general use. Currently, a second W3C
recommendation, still at the Proposal stage, proposes a system in which the descriptions related
to a particular purpose (for example, all descriptions related to security and privacy) would
constitute a class of such like descriptions (using class here much as it is used in object-oriented
programming data modeling and programming). Such classes could fit into a schema or
hierarchy of classes, with subclasses of a class able to inherit the descriptions of the entire class.
The schema of classes proposal would save having to repeat descriptions since a single reference
to the class of which a particular RDF description was a part would suffice. The scheme or
description of the collection of classes could itself be written in RDF language.

ADVANCED DATABASE TECHNOLOGIES

The existing relational DBMS technology has been successfully applied to many application
domains. RDBMS technology has proved to be an effective solution for data management
requirements in large and small organizations, and today this technology forms a key component
of most information systems. However, Applications in domains such as Multimedia,
Geographical Information Systems, digital libraries, mobile database etc. demand a completely
different set of requirements in terms of the underlying database models. The conventional
relational database model is no longer appropriate for these types of data.so there is the need of
new databases and new technologies.

1. MULTIMEDIA DATABASE Multimedia computing has emerged as a major area of research


and has started dominating all facets of lives of mankind. A multimedia database is a database
that hosts one or more primary media file types such as video, audio, radar signals and
documents or pictures in various encoding. These forms have in common that they are much
larger than the earlier forms of data integers, character strings of fixed length and vastly varying
size. These are fall into three main categories:

Static media (time-independent, i.e. images and handwriting)

Dynamic media (time-dependent, i.e. video and sound bytes)

Dimensional media (i.e. 3D games or computer-aided drafting programs- CAD)


2. TEMPORAL DATABASE: Time is an important aspect of real world phenomena. Events
occur at specific points in time. Objects and relationships among objects exist over time. The
ability to model this temporal dimension of real world is essential to many computer applications
such as econometrics, inventory control, airline reservations, medical records, accounting, law,
banking, land and geographical information systems. A temporal database is formed by
compiling and storing temporal data. The difference between temporal data and non-temporal
data is that a time period is appended to data expressing when it was valid or stored in the
database. The data stored by conventional databases consider data to be valid at present time as
in the time instance now. When data in such a database is modified, removed or inserted, the
state of the database is overwritten to form a new state. The state prior to any changes to the
database is no longer available. In essence, temporal data is formed by time-stamping ordinary
data (type of data we associate and store in conventional databases.

3.MOBILE DATABASE:Recent advances in portable and wireless technology led to mobile


computing, a new dimension in data communication and processing. Portable computing devices
coupled with wireless communications allow clients to access data from virtually anywhere and
at any time. Now days you can even connect to your Intranet from an aeroplane. Mobile database
are the database that allows the development and deployment of database applications for
handheld devices, thus, enabling relational database based applications in the hands of mobile
workers. The database technology allows employees using handheld to link to their corporate
networks, download data, work offline, and then connect to the network again to synchronise
with the corporate database. Mobile computing applications, residing fully or partially on mobile
devices, typically use cellular networks to transmit information over wide areas, and wireless
LANs over short distances.

4. GEOGRAPHIC INFORMATION SYSTEMS :GIS is a technological field that incorporates


geographical features with tabular data in order to map, analyse, and assess real-world problems.
The key word to this technology is Geography this means that some portion of the data is
spatial. In other words, data that is in some way referenced to locations on the earth. Coupled
with this data is usually tabular data known as attribute data. Attribute data can be generally
defined as additional information about each of the spatial features. Geographic information
systems(GIS) are used to collect, model, and analyse information describing physical properties
of the geographical world. The scope of GIS broadly encompasses two types of data:

spatial data, originating from maps, digital images, administrative and political
boundaries, roads, transportation networks, physical data, such as rivers, soil
characteristics, climatic regions, land elevations.
non spatial data, such as socio-economic data (like census counts), economic data, and
sales or marketing information. GIS is a rapidly developing domain that offers highly
innovative approaches to meet some challenging technical demands.

5.GENOME DATA: The biological sciences encompass an enormous variety of information.


Environmental science gives us a view of how species live and interact in a world filled with
natural phenomena. Biology and ecology study particular species. Anatomy focuses on the
overall structure of an organism, documenting the physical aspects of individual bodies. Genetics
has emerged as an ideal field for the application of information technology. In a broad sense, it
can be taught of as the construction of models based on information about genes which can be
defined as units of heredity and population and the seeking out of relationships in that
information. The study of genetics can be divided into three branches:

Mendelian genetics:This is the study of the transmission of traits between generations.

Molecular genetics: This is the study of the chemical structure and function of genes at
the molecular level.

Population genetics:This is the study of how genetic information varies across


populations of organisms.

6. DIGITAL LIBRARY: Digital libraries are an important and active research area.
Conceptually, a digital library is an analog of a traditional library-a large collection of
information sources in various media-coupled with the advantages of traditional technologies.
However, digital libraries differ from their traditional counter-parts in significant ways: storage is
digital, remote access is quick and easy, and materials are copied from a master version.
Furthermore, keeping extra copies on hand is easy and is not hampered by budget and storage
restrictions, which are major problems in traditional libraries. Thus, digital technologies
overcome many of the physical and economic limitations of traditional libraries.

7. BIG DATA: Now a days advancement of technology generate large, diverse, longitudinal,
complex, and/or distributed data sets mainly from instruments, sensors, Internet transactions,
email, video, click streams, and/or all other digital sources. Individuals with smartphones and on
social network sites and multimedia will continue to fuel exponential growth of data. The large
pools of data that can be captured, communicated, aggregated, stored, and analysed is part of
every sector and function of the global economy. This amount of data has been exploding.
Companies capture trillions of bytes of information about their customers, suppliers, and
operations, and millions of networked sensors are being embedded in the physical world in
devices such as mobile phones and automobiles, sensing, creating, and communicating data.

You might also like