DBMS Learning Material 1
DBMS Learning Material 1
BIG DATA
Big data usually includes data sets with sizes beyond the ability of commonly used software
tools to capture, manage, and process data within a tolerable elapsed time. Big data is being
generated by everything around us at all times. Every digital process and social media exchange
produces it. Systems, sensors and mobile devices transmit it. Big data is arriving from multiple
sources at an alarming velocity, volume and variety. To extract meaningful value from big data,
we need optimal processing power, analytics capabilities and skills.
Today, many organizations are collecting, storing, and analyzing massive amounts of data. This
data is commonly referred to as big data because of its volume, the velocity with which it
arrives, and the variety of forms it takes. Big data is creating a new generation of decision
support data management. Businesses are recognizing the potential value of this data and are
putting the technologies, people, and processes in place to capitalize on the opportunities. A key
to deriving value from big data is the use of analytics. Collecting and storing big data creates
little value; it is only data infrastructure at this point. It must be analyzed and the results used by
decision makers and organizational processes in order to generate value.
Volume.
Many factors contribute to the increase in data volume. Transaction-based data stored through
the years. Unstructured data, streaming in from social media. Increasing amounts of sensor and
machine-to-machine data being collected. In the past, excessive data volume was a storage issue.
But with decreasing storage costs, other issues emerge, including how to determine relevance
within large data volumes and how to use analytics to create value from relevant data.
Velocity.
In this context, the speed at which the data is generated and processed to meet the demands and
challenges that lie in the path of growth and development. Data is streaming in at unprecedented
speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are
driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal
with data velocity is a challenge for most organizations.
Variety. Data today comes in all types of formats. Structured, numeric data in traditional
databases. Information created from line-of-business applications. Unstructured text documents,
email, video, audio, stock ticker data and financial transactions. Managing, merging and
governing different varieties of data is something many organizations still grapple with.
BIG DATA SOURCES
Big data has many sources. For example, every mouse click on a web site can be captured in
Web log files and analyzed in order to better understand shoppers buying behaviors and to
influence their shopping by dynamically recommending products. Social media sources such as
Facebook and Twitter generate tremendous amounts of comments and tweets. This data can be
captured and analyzed to understand, for example, what people think about new product
introductions. Machines, such as smart meters, generate data. These meters continuously stream
data about electricity, water, or gas consumption that can be shared with customers and combined
with pricing plans to motivate customers to move some of their energy consumption, such as for
washing clothes, to non-peak hours. There is a tremendous amount of geospatial (e.g., GPS) data,
such as that created by cell phones, that can be used by applications like Four Square to help you
know the locations of friends and to receive offers from nearby stores and restaurants. Image,
voice, and audio data can be analyzed for applications such as facial recognition systems in
security systems.
By itself, stored data does not generate business value, and this is true of traditional databases,
data warehouses, and the new technologies such as Hadoop for storing big data. Once the data is
appropriately stored, however, it can be analyzed, which can create tremendous value. A variety
of analysis technologies, approaches, and products have emerged that are especially applicable to
big data, such as in-memory analytics, in-database analytics, and appliances .
Anything involving customers could benefit from big data analytics. Recent economic
changes worldwide have changed consumer behaviors. Big data analytics can help develop
definitions of churn and other customer behaviors, as well as an understanding of consumer
behavior from clickstreams .
Business intelligence in general can benefit from big data analytics. This could result in
more numerous and accurate business insights , an understanding of business change better
planning and forecasting , and the identification of root causes of cost .
Specific analytic applications are likely beneficiaries of big data analytics: Big data
analytics might help automate decisions for real-time business processes such as loan approvals
or fraud detection . Potential benefits entered by survey respondents selecting other include
customer loyalty, service experience optimization, healthcare delivery optimization, and supplier
performance based on cost and quality.
RDF
The Resource Description Framework (RDF) is an infrastructure that enables the encoding,
exchange and reuse of structured metadata. RDF is an application of XML that imposes needed
structural constraints to provide unambiguous methods of expressing semantics. RDF
additionally provides a means for publishing both human-readable and machine-processable
vocabularies designed to encourage the reuse and extension of metadata semantics among
disparate information communities. The structural constraints RDF imposes to support the
consistent encoding and exchange of standardized metadata provides for the interchangeability
of separate packages of metadata defined by different resource description communities.
Describing information about web pages (content, author, created and modified date)
RDF identifies things using Web identifiers (URIs), and describes resources with properties and
property values.
A Resource is anything that can have a URI
Because RDF will include a standard syntax for describing and querying data, software
that exploits metadata will be easier and faster to produce.
The standard syntax and query capability will allow applications to exchange information
more easily.
Searchers will get more precise results from searching, based on metadata rather than on
indexes derived from full text gathering.
Intelligent software agents will have more precise data to work with.
An Internet resource is defined as any resource with a Uniform Resource Identifier (URI). This
includes the Uniform Resource Locators (URL) that identify entire Web sites as well as specific
Web pages. As with today's HTML META tags, the RDF description statements, encased as part
of an Extensible Markup Language (XML) section, could be included within a Web page (that is,
a Hypertext Markup Language - HTML - file) or could be in separate files. RDF is now a formal
W3C Recommendation, meaning that it is ready for general use. Currently, a second W3C
recommendation, still at the Proposal stage, proposes a system in which the descriptions related
to a particular purpose (for example, all descriptions related to security and privacy) would
constitute a class of such like descriptions (using class here much as it is used in object-oriented
programming data modeling and programming). Such classes could fit into a schema or
hierarchy of classes, with subclasses of a class able to inherit the descriptions of the entire class.
The schema of classes proposal would save having to repeat descriptions since a single reference
to the class of which a particular RDF description was a part would suffice. The scheme or
description of the collection of classes could itself be written in RDF language.
The existing relational DBMS technology has been successfully applied to many application
domains. RDBMS technology has proved to be an effective solution for data management
requirements in large and small organizations, and today this technology forms a key component
of most information systems. However, Applications in domains such as Multimedia,
Geographical Information Systems, digital libraries, mobile database etc. demand a completely
different set of requirements in terms of the underlying database models. The conventional
relational database model is no longer appropriate for these types of data.so there is the need of
new databases and new technologies.
spatial data, originating from maps, digital images, administrative and political
boundaries, roads, transportation networks, physical data, such as rivers, soil
characteristics, climatic regions, land elevations.
non spatial data, such as socio-economic data (like census counts), economic data, and
sales or marketing information. GIS is a rapidly developing domain that offers highly
innovative approaches to meet some challenging technical demands.
Molecular genetics: This is the study of the chemical structure and function of genes at
the molecular level.
6. DIGITAL LIBRARY: Digital libraries are an important and active research area.
Conceptually, a digital library is an analog of a traditional library-a large collection of
information sources in various media-coupled with the advantages of traditional technologies.
However, digital libraries differ from their traditional counter-parts in significant ways: storage is
digital, remote access is quick and easy, and materials are copied from a master version.
Furthermore, keeping extra copies on hand is easy and is not hampered by budget and storage
restrictions, which are major problems in traditional libraries. Thus, digital technologies
overcome many of the physical and economic limitations of traditional libraries.
7. BIG DATA: Now a days advancement of technology generate large, diverse, longitudinal,
complex, and/or distributed data sets mainly from instruments, sensors, Internet transactions,
email, video, click streams, and/or all other digital sources. Individuals with smartphones and on
social network sites and multimedia will continue to fuel exponential growth of data. The large
pools of data that can be captured, communicated, aggregated, stored, and analysed is part of
every sector and function of the global economy. This amount of data has been exploding.
Companies capture trillions of bytes of information about their customers, suppliers, and
operations, and millions of networked sensors are being embedded in the physical world in
devices such as mobile phones and automobiles, sensing, creating, and communicating data.