0% found this document useful (0 votes)
3 views

ADBMS original-output

The document provides an overview of MongoDB and Hive, highlighting their features, advantages, and architecture. It also compares relational and NoSQL databases, explains various types of NoSQL databases, and discusses Apache Cassandra's architecture and features. Additionally, it covers JSON and XML, including their data types and schemas.

Uploaded by

Oreo Boi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

ADBMS original-output

The document provides an overview of MongoDB and Hive, highlighting their features, advantages, and architecture. It also compares relational and NoSQL databases, explains various types of NoSQL databases, and discusses Apache Cassandra's architecture and features. Additionally, it covers JSON and XML, including their data types and schemas.

Uploaded by

Oreo Boi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

1

UNIT 3

① Explain features and advantages of MongoDB


① Explain Mongo DB NoSQL Database

(1) MongoDB Faster query processing using index support from


MongoDB is an open-source document database embedded documents and arrays.
management system. Higher availability provided by MongoDB's
MongoDB is one of the popular NoSQL database, replica on facility.
wri en in C++. Automa c failover using mul ple replicas set by
High performance, high availability and automa c MongoDB a server which maintains the same data set
scaling are the important features. MongoDB has its by providing redundancy and increasing data
own ad-hoc query language with rich features set. availability.
Large numbers of use cases can be easily handled. MongoDB provides horizontal scalability
Mobile applica ons, CMS (Content Management It also provides automa c shading techniques and
System), E-commerce, Gaming applica ons, distributes data across a cluster of machines.
Analy cs, Archiving and Logging are the applica ons
areas of MongoDB. (3) Advantages of MongoDB:
Mul -document transac ons are not possible in Flexibility: Schema-less design allows easy changes
MongoDB. to data structures.
MongoDB does provide atomic opera ons on a single Scalability: Can scale out to handle large datasets and
document. high traffic.
The biggest advantage of MongoDB is that, as a cache High Availability: Automa c failover through replica
memory it automa cally uses all free memory sets.
available on the machine. Performance: Fast read/write opera ons, op mized
for big data.
(2) MongoDB Features Cost-Effec ve: Runs on commodity hardware,
High performance as compared to tradi onal SQL reducing costs.
Good support for embedded data models Easy Development: Simple to develop with dynamic
data models and rich querying capabili es.

② Explain Hive along with its architecture


② Describe NoSQL database development tools
② Explain NoSQL database Development Tool: MapReduce

(1) HIVE HIVE is easy to scale and has faster processing.

Hive is a data warehouse infrastructure tool. ●●● Architecture of Hive :-


It processes structured data in HDFS. Hive structures
data into tables, rows, columns and par ons. 1. User Interface
It resides on top of Hadoop. Hive supports Hive Web UI, Hive command line, and
It is used to summarize big Data, analysis of big data. Hive HD through which user can easily process
It is suitable for Online Analy cal Applica on queries.
Processing.
It supports ad hoc querries. It has its own SQL type 2. Meta Store
language called HiveQL or HQL. Hive stores Meta data, schema etc. in respec ve
SQL type scripts can be created for MapReduce database servers known as metasores.
opera ons using HIVE.
Primi ve datatypes like Integers, Floats, Doubles, and 3. HiveQL Process Engine
Strings are supported by HIVE. HiveQL is used as querying language to get
Associa ve Arrays, Lists, Structs etc. can be used. informa on from Metastore. It is an alterna ve to
Serialize API and Deserialized API are used to store MapReduce Java program. HiveQL query can be
and retrieve data. wri en for MapReduce job.
2

4. Execu on Engine For those keys that have mul ple values, MongoDB
Querry processing and result genera on is the job of applies the reduce phase, which collects and
Execu on engine. It is same as that of MapReduce condenses the aggregated data.
results. MongoDB then stores the results in a collec on.
● Map reduce func on ca be used on both structured
5. HDFS or HBASE data and unstructured data.
Hadoop distributed file system or HBASE are the data ▪ map is a javascript func on that maps a
storage techniques to store data into file system. value with a key and emits a key-value pair. It divides
the big problem into mul ple small problems, which
can be further subdivided into sub-problems
(2) MapReduce :- ▪ reduce is a javascript func on that reduces
Map-reduce is a data processing paradigm for or groups all the documents having the same key and
condensing large volumes of data into useful produces the final output, which was the answer to
aggregated results. big problem that you were trying to solve.
For map-reduce opera ons, MongoDB provides the
mapReduce database command. In order to understand how it works, let's consider
In this map-reduce opera on, MongoDB applies the the following examplewhere you will find out the
map phase to each input document (ie. the number of male,, female and others in your collec on
documents in the collec on that match the query named as emp:
condi on). The first step for this is to create the map and reduce
The map func on emits key-value pairs. func ons and then you call the mapReduce func on
and pass the necessary arguments.
3

③ Compare Rela onal and NoSQL databases :-


Feature Rela onal Databases (RDBMS) NoSQL Databases
Data Model Tabular (Rows and Columns) Flexible (Document, Key-Value, Graph, etc.)
Schema Fixed Schema (Predefined) Schema-less or Flexible
Scalability Ver cal Scaling (Harder Horizontal) Horizontal Scaling (Easier to Scale Out)
Consistency ACID (Atomicity, Consistency, Eventual Consistency (BASE Model)
Isola on, Durability)
Query SQL (Structured Query Language) Custom Query Languages (e.g., MongoDB
Language Query Language, CQL)
Joins Supported (Complex Joins between Not Typically Supported
tables)
Flexibility Rigid Structure (Schema Changes Highly Flexible (Schema can change easily)
are Hard)
Data Integrity Strong Data Integrity (Foreign Keys, Limited Integrity (Applica on-driven)
Constraints)
Performance Slower for Large Unstructured Data Op mized for High Volume, Fast
Performance
Use Cases Transac onal Applica ons (e.g., Big Data, Real- me Analy cs, IoT, Content
Banking, CRMs) Management
Examples MySQL, PostgreSQL, Oracle DB, MS MongoDB, Cassandra, Redis, Neo4j,
SQL Server Couchbase

④ Explain with example Four Types of No SQL Databases :-


(1). Key-value store databases
This is very simple NoSQL database. (2). Column store database : Instead of storing data
It is specially designed for storing data as a schema in rela onal tuples (table rows), it is stored in cells
free data. grouped in columns.It offers very high performance
Such data is stored in a form of data along with and a highly scalable architecture
indexed key. Examples:
This type is generally used when you need quick 1. HBase
performance for basic Create-Read-Update-Delete 2.Big Table
opera ons and data is not connected. 3. Hyper Table
Example :
Storing and retrieving session informa on for a Web Use Cases
pages. Some common examples of Column-Family database
Storing user profiles and preferences include event logging and blogs like document
Storing shopping cart data for ecommerce databases, but the data would be stored in a different
Limita ons fashion.
It may not work well for complex queries a emp ng In logging, every applica on can write its own set of
to connect mul ple rela ons of data. columns and have each row key forma ed in such a
If data contains lot of many-to-many rela onships, a way to promote easy lookup based on applica on
Key-Value store is likely to show poor performance and mestamp.
Examples Counters can be a unique use case. It is possible to
Cassandra design applica on that needs an easy way to count or
Azure Table Storage (ATS) increment as events occurs.
DyanmoDB
4

Fig : Key-value store


databases

Fig : Column store database

(3). Document database In online blogging user acts like a document; each
Document databases works on concept of key-value post a document; and each comment, like, or ac on
stores where "documents" contains alot of complex would be a document.
data. All documents would contain informa on about the
Every document contains a unique key, used to type of data, username, post content, or mestamp
retrieve the document. of document crea on.
Key is used for storing, retrieving and managing Limita ons
document-oriented informa on also known as semi- It's challenging for document store to handle a
structured data. transac on that on mul ple documents.
Examples: Document databases may not be good if data is
MongoDB required in aggrega on.
CouchDB
(4). Graph database
Examples:The example of such system would be Data is stored as a graph and their rela onships are
event logging system for an applica on or online stored as a link between them whereas en ty acts
blogging. like a node.
5

Examples: Many recommenda on systems makes effec ve use


Neo4j of this model.
Polyglot
Limita ons
Use Cases Graph Databases may not be offering be er choice
The very important and popular applica on would be over other NoSQL varia ons.
social networking sites can benefit by quickly loca ng If applica on needs to scale horizontally this may
friends, friends of friends, likes, and so on. introduces poor performance.
The Google Maps can help you to use graphs to easily Not very efficient when it needs to update all nodes
model their data for finding close loca ons or with a given parameter.
building shortest routes for direc ons.

⑤ Also explain Column-Oriented Database: Apache Cassandra


⑤ Explain features and advantages of Cassandra
(1). Introduc on Fault Tolerance: Data replica on across nodes and
Cassandra is a distributed storage system mainly data centers ensures high availability.
designed for managing large amount of structured Flexible Data Model: Wide-column store with
across mul ple servers. schema-less design.
Cassandra run with hundreds of servers and manages Tunable Consistency: Choose consistency levels for
them. balance between performance and consistency.
Cassandra provides a simple data model that Linear Scalability: Increases throughput as more
supports dynamic control over data. nodes are added.
Cassandra system was designed to run with economic Mul -Data Center Support: Replica on across
hardware to handle large amount of data. geographically distributed data centers.
CQL (Cassandra Query Language): SQL-like language
Example: Facebook runs on thousands of servers for querying.
located in many data centres using Cassandra.
(4) Advantages:
(2). Types of databases High Availability: Con nuous data access with no
▪ Besides Cassandra system there are few NOSQL down me, even during failures.
databases very popular among users, Fault Tolerant: Survives hardware failures and
▪ Apache's HBase network par ons.
▪ MongoDB Op mized for Write-Heavy Workloads: Handles high
write throughput.
(3) Key Features of Apache Cassandra: Flexible Schema: Easily adapts to changes in data
structure.
Distributed Architecture: Peer-to-peer model with Global Distribu on: Supports mul -region
no single point of failure. replica on for global applica ons.
Scalability: Horizontal scaling by adding more nodes Open Source: No licensing fees, with strong
without down me. community support.

⑥ Cassandra Architecture
(1). Overview Cassandra manages distribu on of data in peer-to-
The Cassandra is basically designed to handle big data peer distributed system across mul ple nodes in a
workloads across mul ple nodes without any single cluster. Every node in a cluster play the same role.
point of failure. Each node is independent of other node with
interconnec ons between each others.
6

Every node in a cluster can accept read and write All write opera on is wri en to the commit log for
requests from users uindependent of data loca on. recovery purpose.
If any node in cluster goes down, read/write requests
can be served from other nodes in the network. (v) Mem-table:
A mem-table in Cassandra is a memory-resident data
(2). Componants of Cassandra structure.
All data entered in commit log will be wri en to the
(i) Node: mem-table.
It is a single computer where data is actually stored. (vi) SSTable:
It is a secondery disk file in which data is flushed from
(ii) Data center: the mem-table when its contents reach a threshold
Mul ple related nodes are working together as a data value.
center.
(iii) Cluster: (vii) Bloom filter:
Cluster has many data centers. It is algorithms to check whether par cular element
(iv) Commit log: is a member of a set or not.
Commit log is manily used as crash-recovery It is a special kind of cache.
mechanism in Cassandra database. Bloom filters are accessed for each and every query.

⑦ What is JSON ? Explain data types and features of JSON


⑦ Explain NoSQL Programming Languages JSON

(1) What is JSON? ● Array: An ordered collec on of values enclosed in


JSON (JavaScript Object Nota on) is a lightweight, square brackets ([ ]). An array can contain mul ple
text-based, and easy-to-read format used for values of any data type, including other arrays and
represen ng structured data. It is widely used for objects. Example: "hobbies": ["reading", "traveling",
data interchange between a server and a client, "coding"]
par cularly in web applica ons and APIs. JSON is
language-independent but derives its syntax from ● Object: Represents a collec on of key-value pairs
JavaScript, and is used in many programming enclosed in curly braces ({ }). Keys are strings, and
environments due to its simplicity and flexibility. values can be any valid JSON data type.

(2) JSON Data Types: ● Null: Represents a null or empty value. It indicates
JSON supports the following data types: that a field has no value or is inten onally le blank.
● String: A string is a sequence of characters enclosed Example: "middleName": null
in double quotes (" "). It can contain le ers, numbers,
spaces, punctua on, or special characters. Example:
"name": "Alice" (3) Features of JSON:
1. Lightweight and Compact: Efficient for data
● Number: Numbers can be integers or floa ng-point transfer.
values. JSON supports both. Example: "age": 30, 2. Human-readable: Simple and easy to
"height": 5.9 understand.
3. Interoperable: Supported by many
● Boolean: Represents a binary value: true or false. programming languages.
Example: "isStudent": false 4. Text-based: Easy to share and store.
5. Flexible: Supports complex, nested data
structures.
7

⑧ Explain NoSQL Programming Languages XML


(1) Introduc on documents easily as like HTML documents. This may
XML is a markup language is very much like HTML but be possible only when XML browsers are as robust
XML was designed to carry (transfer) data and not to and widely available as HTML browsers.
display data. XML stands for Extensible Markup XML should support a wide variety of applica ons.
Language which gives a mechanism to define It should be Easy to write programs and process
structures of a document which is to be transferred various XML documents.
over internet. The XML defines a standard way of The minimum number of op onal features in XML as
adding element to documents. Hence, XML is used it causes more confusion in programmers mind.
for structured documenta on. XML documents should be logically clear.
Unlike HTML, XML tags are not predefined one can The design of XML shall be formal and concise and
define their own tags in XML. can be prepared very fast
XML documents shall be easy to create
(2) Goals of XML : XML should be directly used over
the Internet. Users must be able to view XML

⑨ Explain schemas in XML :-


What is an XML Schema? Namespaces: Allows for the use of mul ple XML
An XML Schema (o en referred to as XSD, which vocabularies within a single document, avoiding
stands for XML Schema Defini on) is a formal naming conflicts.
language used to describe the structure and data Extensibility: XML Schema is extensible, allowing the
types of XML documents. It defines the allowed defini on of custom data types and constraints.
elements, a ributes, and the rela onships between
them, as well as the data types for the values Advantages of Using XML Schemas:
contained in those elements and a ributes.
Valida on: XML schemas ensure that XML
Key Features of XML Schema: documents are well-formed and follow a defined
structure, reducing the risk of errors.
Data Valida on: Ensures that XML documents Data Integrity: By specifying data types and
conform to a defined structure, helping to prevent constraints, schemas ensure that data is consistent
errors. and accurate.
Data Types: Specifies the data types of elements and Documenta on: Schemas act as documenta on for
a ributes (e.g., integer, string, date). the XML format, providing a clear understanding of
Element Rela onships: Defines the rela onships and the structure and requirements.
constraints between elements, such as required Interoperability: XML schemas ensure that XML
elements, op onal elements, and ordering. documents can be shared and understood by
different systems, even across different pla orms.

UNIT 4
① Draw and explain Datawarehouse Architecture
(1). Introduc on security, data modelling and organiza on, extent of
Data warehouse architecture is primarily based on query requirements, Meta data management and
the business processes of a business enterprise applica on, warehouse staging area planning for
taking into considera on the data consolida on op mum bandwidth u liza on and full technology
across the business enterprise with adequate implementa on.
8

(2). Main areas of data warehouse blocks are arranged together in the most op mal way
a. Data acquisi on to serve intended target.
b. Data storage Architecture, in the context of an organiza on's data
c. Informa on delivery warehousing efforts, is a conceptualiza on of how
the data warehouse is built. Data warehouse relates
(3). Building blocks of the data warehouse all components (which has definite func ons and
a. Source data provides specific services together)
b. Data staging so as to make fully func onal data warehouse.
C. Data storage Architecture is the proper arrangement of the
d. Informa on delivery components.
e. Metadata We can build a data warehouse with so ware and
f. Management and control hardware components.
To suit the organiza onal requirements, we need to
(4). Data warehouse Architecture arrange these building blocks in a certain way for
In order to set up this informa on delivery system, maximum benefit.
we need different building blocks. These building

② What is Data warehouses? Explain characteris cs and limita ons of data Warehouse
② Explain in brief characteris cs and limita ons of data warehouse
centralized repository used for storing, analyzing, and
What is a Data Warehouse in ADBMS? repor ng large volumes of historical and current data
from different sources. It is designed for analy cal
A data warehouse in the context of an Advanced processing, not for daily transac onal opera ons,
Database Management System (ADBMS) is a large,
9

and supports complex queries to assist in decision- (2) Limita ons of a Data Warehouse in ADBMS
making and business intelligence.
High Cost: Building and maintaining a data
(1) Characteris cs of a Data Warehouse in ADBMS warehouse is expensive, requiring significant
resources.
Subject-Oriented: Organized around key business Complex Setup: The ETL (Extract, Transform, Load)
topics (e.g., sales, customers) rather than individual process is complex, requiring careful planning and
applica ons. exper se.
Integrated: Combines data from various sources into Latency in Data Updates: Data is typically updated in
a consistent format, ensuring data uniformity. batches, leading to delays in reflec ng real- me
Non-Vola le: Data is read-only and not frequently informa on.
updated, allowing for consistent, historical analysis. Storage Overheads: Data warehouses can require
Time-Variant: Stores historical data, enabling large amounts of storage, especially with
analysis over me for trends and comparisons. denormalized data.
Op mized for Queries: Op mized for complex Limited Real-Time Processing: Data warehouses are
queries and analy cal processing, rather than not designed for real- me transac onal processing
transac onal opera ons. or updates.
Large-Scale Storage: Capable of handling large Performance Degrada on: As data volumes grow,
amounts of data, o en reaching terabytes or query performance can degrade unless properly
petabytes. op mized.
Parallel Processing:ADBMS features like parallel Data Quality Challenges: Ensuring consistent and
processing help improve query performance on large accurate data is difficult, par cularly when
datasets. integra ng data from diverse sources.

③ Explain data warehouse scheme with example :-


Data Warehouse Schema Transac Produc Custom Date_ Total_S
A data warehouse schema defines how data is on_ID t_ID er_ID ID ales
structured in a data warehouse and how tables are
20240
related to each other. It is important for organizing 1 101 1001 500
101
data for efficient querying and analysis. The main
types of data warehouse schemas are:  Dimension Tables:
1. Star Schema o Product: Stores product details.
2. Snowflake Schema o Customer: Stores customer details.
3. Fact Constella on Schema (Galaxy Schema) o Date: Stores date-related
(1). Star Schema informa on.
In the Star Schema, the central table is the fact table,
which contains quan ta ve data (like sales, revenue). (2). Snowflake Schema
Around the fact table are dimension tables, which The Snowflake Schema is a more normalized version
store descrip ve data (like product name, date, or of the Star Schema. In this schema, dimension tables
customer). are split into addi onal tables to reduce redundancy.
 Fact Table: Contains metrics or facts (e.g., Example:
total sales, quan ty sold). In a Snowflake schema, the Product table might be
 Dimension Tables: Contain a ributes related split into two:
to facts (e.g., product, me, loca on).  Product Table:
Example: Product_ID Product_Name Category_ID
For a retail store data warehouse: 101 Laptop 1
 Fact Table:
 Category Table:
10

Category_ID Category need to support different business processes (e.g.,


sales and inventory).
1 Electronics
Example:
3. Fact Constella on Schema (Galaxy Schema)  Fact Tables: Sales and inventory data.
The Fact Constella on Schema involves mul ple fact  Shared Dimension Tables: Product, store,
tables that share common dimension tables. This and me dimensions used by both sales and
schema is useful for complex data warehouses that inventory fact tables

④ Define the following terms : (i) Roll Up (ii) Drill Down (iii) Slice (iv) Dice :-
(i) Roll Up : Roll Up is the process of summarizing or (iii) Slice : Slice involves selec ng a single layer of
aggrega ng data by moving up a hierarchy in a data from a mul dimensional data cube based on
dimension. It involves consolida ng data into higher one dimension. It essen ally creates a 2D view by
levels, like summarizing daily data into monthly data. fixing one dimension.
Example: From daily sales data, roll up to monthly Example: Slice the data cube by a specific year (e.g.,
sales data. 2023) to see data for that year across other
dimensions.
(ii) Drill Down : Drill Down is the opposite of roll-up.
It allows users to view more detailed data by going (iv) Dice : Dice is similar to slice, but it selects data
down to a lower level of granularity, such as from from mul ple dimensions, forming a subcube. It
yearly data to monthly or daily data. allows filtering data along two or more dimensions.
Example: From yearly sales data, drill down to see Example: Dice the data by selec ng data for a specific
monthly or daily sales figures. year (2023) and a specific product (e.g., electronics).

⑤ Explain different OLAP Architectures


⑤ What is OLAP? Explain type of OLAP
OLAP Architectures Data Storage: Data is stored in rela onal tables (in
OLAP (Online Analy cal Processing) systems provide databases like MySQL, Oracle, etc.).
tools to interac vely analyze mul dimensional data. Query Processing: Queries are wri en in SQL and are
There are different OLAP architectures that define processed in real- me, without pre-aggrega on.
how data is stored, accessed, and processed in OLAP Key Feature: ROLAP can handle large amounts of
systems. The key OLAP architectures are: data because it relies on rela onal database
1. ROLAP Architecture (Rela onal OLAP) management systems (RDBMS) that can scale well.
2. MOLAP Architecture (Mul dimensional Example:
OLAP) For example, if you want to query the total sales per
3. HOLAP Architecture (Hybrid OLAP) region for a specific year, ROLAP would query the
rela onal database and aggregate the data as
(1). ROLAP Architecture (Rela onal OLAP) requested on the fly.
Advantages:
In ROLAP, data is stored in tradi onal rela onal  Can handle large datasets since it works
databases, and OLAP opera ons are performed on directly with rela onal databases.
the data using SQL queries. This architecture does not  No need for complex data cubes; it is more
require the use of pre-aggregated data cubes. flexible with real- me query genera on.
Instead, it dynamically generates SQL queries to Disadvantages:
calculate the necessary aggrega ons or summaries  Slower query performance because it needs
when a user runs a query. to calculate aggrega ons dynamically.
11

Complex SQL queries for advanced analysis.  Requires significant storage and memory to
store pre-aggregated data.
(2). MOLAP Architecture (Mul dimensional OLAP) (3). HOLAP Architecture (Hybrid OLAP)

In MOLAP, data is stored in a mul dimensional cube. HOLAP combines elements of both ROLAP and
The cube pre-aggregates data, making querying MOLAP architectures. It stores summary data (pre-
faster since the required calcula ons are already aggregated) in a mul dimensional cube (like MOLAP)
performed. MOLAP systems use a mul dimensional and detailed data in rela onal tables (like ROLAP).
data model, where each cell in the cube represents a HOLAP systems aim to provide the speed of MOLAP
measure (e.g., sales), and the cube is organized by for aggregated data while maintaining the flexibility
dimensions (e.g., me, product, region). of ROLAP for detailed data.

Data Storage: Data is stored in a mul dimensional Data Storage: Summary data is stored in a
cube (usually op mized for faster retrieval). mul dimensional cube, while detailed transac onal
Query Processing: OLAP opera ons (e.g., roll-up, data is stored in rela onal tables.
drill-down) are performed on the pre-aggregated Query Processing: For high-level aggrega ons,
data, leading to very fast query responses. MOLAP is used. For detailed data, ROLAP is used.
Key Feature: Fast query performance because data is Key Feature: Combines the best of both MOLAP and
pre-aggregated and stored in a specialized format. ROLAP to balance performance and scalability.
Example: Example:
In a MOLAP system, a sales data cube might pre- A HOLAP system may store sales data by product and
aggregate data by dimensions like product, region, region in an OLAP cube for fast queries, but detailed
and me. A query asking for total sales for a specific transac onal records (e.g., individual sales
year can be answered immediately without transac ons) are stored in a rela onal database.
recalcula ng the data. Advantages:
Advantages:  Balanced approach offering both fast
 Fast query response due to pre-aggrega on. aggrega on and detailed query capability.
 Excellent for interac ve analysis and ad-hoc  Efficient use of storage by separa ng detailed
querying. data and summary data.
Disadvantages: Disadvantages:
 Less flexible than ROLAP; can struggle with  More complex than both MOLAP and ROLAP.
extremely large datasets.  May require more management and
configura on.

⑥ Define OLAP? Compare different OLAP Architecture


OLAP Architectures how data is stored, accessed, and processed in OLAP
systems.
OLAP (Online Analy cal Processing) systems provide
tools to interac vely analyze mul dimensional data.
There are different OLAP architectures that define

OLAP Architecture ROLAP (Rela onal MOLAP HOLAP (Hybrid OLAP)


(Mul dimensional
1. Data Storage Uses tradi onal Stores data in Hybrid: stores
rela onal databases mul dimensional summary data in cubes
(tables). cubes (pre- and detailed data in
aggregated). rela onal
12

2. Query Processing Queries are processed Pre-aggregated data Combines ROLAP and
in real- me using SQL allows for fast MOLAP: uses
on rela onal querying of rela onal databases
databases. mul dimensional for detailed data and
data. cubes for summary.
3. Performance Slower because data is Fast performance due Offers balanced
calculated on-demand, to pre-aggrega on of performance: fast for
requires real- me data in cubes. summary data
computa on. (MOLAP), flexible for
detailed data (ROLAP).
4. Flexibility Highly flexible in terms Less flexible because Flexible: supports
of querying and data is stored in both detailed queries
analysis as it works predefined cubes and (ROLAP) and fast
with rela onal data. aggrega ons. summary queries
(MOLAP).
5. Scalability Highly scalable: can Limited scalability: Scalable: balances
handle large volumes cube sizes are o en scalability by using
of data, as it uses restricted by available rela onal storage for
rela onal databases memory and storage. detailed data and
that can scale. cubes for summary
data.
6. Data Complexity Can handle complex Works best with Can handle complex
data models and simpler, data by separa ng
rela onships, mul dimensional detailed data (ROLAP)
including many-to- data models that are and aggregated data
many rela onships. structured for cubes. (MOLAP).
7. Examples Oracle OLAP, SAP BW, Microso Analysis Microso SQL Server
IBM DB2 OLAP. Services, Oracle Analysis Services
Essbase, IBM Cognos. (SSAS), Cognos TM1.

⑦ Write a note on Decision support system. Explain with example views and Decision support
Decision Support System (DSS) Example: Retail Sales Analysis A retail company uses
A Decision Support System (DSS) is a computer- a DSS built on an ADBMS to analyze sales data. The
based tool that helps decision-makers analyze data system:
and make informed decisions, par cularly in complex  Retrieves historical sales data from the
and unstructured situa ons. It integrates data from ADBMS.
various sources, applies analy cal models, and  Simulates scenarios like the impact of
provides interac ve tools for scenario analysis and discounts or marke ng campaigns on future
decision-making. sales.
 Generates reports for managers to op mize
In the context of an Advanced Database inventory and pricing strategies.
Management System (ADBMS), DSS relies on a
database to store and manage large volumes of Decision Support Views
structured and unstructured data. The ADBMS DSS in ADBMS provides different views to help
ensures fast data retrieval and manipula on, decision-makers:
suppor ng decision-making processes effec vely. 1. Descrip ve View: Analyzes past data to
Example of DSS in ADBMS understand trends.
13

Example: A report showing sales Example: Iden fying why sales dropped in a
performance over the past month. specific region.
2. Predic ve View: Forecasts future outcomes
based on historical data. Decision Support in ADBMS
Example: Predic ng next quarter's sales In an ADBMS, decision support involves:
based on past data.  Data Retrieval: Quickly accessing relevant
3. Prescrip ve View: Recommends the best data from various sources.
course of ac on.  Analysis and Modeling: Using tools for
Example: Sugges ng pricing strategies based forecas ng, simula ons, and op miza on.
on predicted demand.  Scenario Analysis: Running "what-if"
4. Diagnos c View: Explains the causes of past analyses to predict different outcomes.
outcomes.  Repor ng: Genera ng user-friendly reports
and dashboards for decision-makers.

⑧ What is Decision Support System? Explain different types of Decision Support Systems
Example: A financial system that models different
What is a Decision Support System (DSS) investment strategies based on risk and return.

A Decision Support System (DSS) is a computer- 3.Knowledge-Driven DSS


based system that helps decision-makers analyze
data, evaluate alternatives, and make informed Focus: Provides expert knowledge and
decisions. It provides interactive tools for analyzing recommendations.
complex problems and scenarios, often in situations Function: Helps with specialized decision-making,
with uncertainty or ambiguity. DSS is commonly often using expert systems or rule-based systems.
used in business, healthcare, and management to Example: A medical expert system that helps
support strategic decision-making. doctors diagnose diseases based on symptoms.

Types of Decision Support Systems 4.Communication-Driven DSS


Focus: Facilitates group decision-making and
1. Data-Driven DSS collaboration.
Function: Supports teamwork, discussion, and
Focus: Manages and analyzes large volumes of data. decision-making within a group.
Function: Helps users retrieve data, generate Example: A project management system that
reports, and analyze trends. enables a team to collaborate on resource
Example: A sales analysis system that helps allocation.
managers understand past sales data and predict
future trends. Document-Driven DSS
Focus: Manages and analyzes documents and
2. Model-Driven DSS unstructured data.
Function: Helps decision-makers retrieve and
Focus: Uses mathematical models and simulations analyze documents to inform decisions.
to analyze decision alternatives. Example: A legal document system used to find case
Function: Supports what-if analysis and studies, precedents, or contract clauses.
optimization.
14

⑨ What is Decision Support System (DSS)? Explain advantages and disadvantages of DSS
Be er Problem Solving : Helps break down complex
What is a Decision Support System (DSS)? issues into manageable parts, making it easier to find
op mal solu ons.
A Decision Support System (DSS) is a computer- Collabora on : Facilitates group decision-making by
based system that helps decision-makers analyze enabling collabora on among team members and
data and make informed decisions, especially in stakeholders.
complex and uncertain situa ons. It integrates data,
analy cal models, and user-friendly interfaces to (2) Disadvantages of DSS
support decision-making processes. DSS is widely
used in fields like business, healthcare, and finance High Cost : Developing and maintaining a DSS can be
for be er decision-making by analyzing trends, expensive due to so ware, hardware, and training
forecas ng outcomes, and evalua ng alterna ves. requirements.
Complexity : DSS can be difficult to set up and use,
requiring specialized knowledge and user training.
(1)Advantages of DSS Data Quality Issues : The effec veness of a DSS
depends on the quality of the data. Inaccurate or
Improved Decision Quality : DSS provides accurate outdated data can lead to poor decisions.
and mely data, helping users make be er-informed Over-Reliance on Technology :- Decision-makers may
decisions. become overly dependent on DSS, ignoring other
Increased Efficiency :- Automates data analysis and qualita ve factors that are not captured in the
repor ng, speeding up decision-making and reducing system.
the me needed for manual analysis. Resistance to Change : Users may resist adop ng a
Scenario Analysis : Allows decision-makers to run DSS, especially if they are used to tradi onal
simula ons (what-if analysis) to explore different decision-making methods.
possible outcomes before making decisions.

UNIT 5
① What is KDD? Explain KDD seven step process with suitable diagram
① Explain knowledge discovery process in detail
① What is KDD (Knowledge Discovery in Databases)?

Knowledge Discovery in Databases (KDD) is the Example: Selec ng sales and customer data for
process of extrac ng useful pa erns, trends, and analysis.
insights from large datasets. It combines techniques
from sta s cs, machine learning, and database 2)Data Preprocessing
systems to discover hidden knowledge. KDD includes Descrip on: Clean and transform the data to ensure
various steps to prepare, analyze, and interpret data it is consistent and suitable for analysis.
to make informed decisions. Example: Handling missing values and removing
errors in data.
The Seven-Step KDD Process
1).Data Selec on 3)Data Transforma on
Descrip on: Choose relevant data from different Descrip on: Convert data into a format that is easier
sources (databases, data warehouses, etc.) for for mining algorithms to process.
analysis.
15

Example: Normalizing numerical values or


aggrega ng data. 6)Knowledge Representa on
Descrip on: Present the discovered knowledge in a
4)Data Mining clear and understandable format, such as graphs or
Descrip on: Apply mining techniques (e.g., reports.
classifica on, clustering) to discover pa erns or Example: Visualizing customer segments with pie
rela onships in the data. charts.
Example: Using clustering to group customers by
behavior. 7) Knowledge Interpreta on and Use
Descrip on: Interpret the results and apply them to
5)Pa ern Evalua on make business decisions or strategies.
Descrip on: Assess the discovered pa erns to Example: Using customer insights to create targeted
determine their usefulness and relevance. marke ng campaigns.
Example: Checking if an associa on rule is valid
across different subsets of the data.

③ Explain architecture of data mining system :-

Architecture of a Data Mining System 1.It all starts when the user puts up certain data
The architecture of a data mining system refers to mining requests, these requests are then sent to
the structure and components that work together to data mining engines for pattern evaluation.
extract useful patterns and knowledge from large 2.These applications try to find the solution to the
datasets. The system typically includes the following query using the already present database.
main components: 3.The metadata then extracted is sent for proper
analysis to the data mining engine which sometimes
Basic Working:
interacts with pattern evaluation modules to
determine the result.
16

4.This result is then sent to the


front end in an easily
understandable manner using
a suitable interface.

● A detailed description of
parts of data mining
architecture is shown:

1.Data
Sources: Database, World
Wide Web(WWW), and data
warehouse are parts of data
sources. The data in these
sources may be in the form of
plain text, spreadsheets, or
other forms of media like
photos or videos. WWW is one
of the biggest sources of data.

2. Database Server: The


database server contains the
actual data ready to be
processed. It performs the task
of handling data retrieval as
per the request of the user.

3.Data Mining Engine: It is one


of the core components of the
data mining architecture that
performs all kinds of data
mining techniques like
association, classification,
characterization, clustering,
prediction, etc.

4.Pattern Evaluation Modules: They are


responsible for finding interesting patterns in the 6.Knowledge Base: Knowledge Base is an important
data and sometimes they also interact with the part of the data mining engine that is quite
database servers for producing the result of the user beneficial in guiding the search for the result
requests. patterns. Data mining engines may also sometimes
get inputs from the knowledge base. This
5.Graphic User Interface: Since the user cannot knowledge base may contain data from user
fully understand the complexity of the data mining experiences. The objective of the knowledge base is
process so graphical user interface helps the user to to make the result more accurate and reliable.
communicate effectively with the data mining
system.
17

④ What are different applica ons of data mining


1.Customer Rela onship Management (CRM): 7. Text Mining and NLP: Analyzes unstructured text
Iden fies customer behavior, segments customers, data, such as sen ment analysis and document
and predicts churn to improve marke ng strategies. classifica on.
2.Fraud Detec on:Detects fraudulent ac vi es, such 8.Social Network Analysis: Detects communi es,
as credit card fraud or insurance fraud, by iden fying influencers, and predicts informa on spread on
unusual pa erns. social media.
3.Market Basket Analysis: Iden fies products 9.Educa on: Predicts student performance,
frequently bought together, helping with cross- personalizes learning, and op mizes curriculum
selling, up-selling, and product placement. design.
4.Healthcare:Predicts diseases, recommends 10. Telecommunica ons: Op mizes network
treatments, and detects fraudulent healthcare claims performance, predicts faults, and analyzes customer
using pa ent data. behavior for service improvement.
5.Financial Market Analysis: Analyzes stock trends, 11. E-commerce and Retail:Recommends products,
manages risks, and op mizes por olios based on segments customers, and manages inventory
historical data. efficiently.
6. Manufacturing Op miza on: Predicts equipment
failure, improves quality control, and op mizes
supply chain management.

⑤ What is data mining? What are different challenges in implementa on of data mining
What is Data Mining? High Dimensionality: Datasets with many a ributes
Data Mining is the process of discovering pa erns, can be difficult to analyze without losing important
trends, and useful informa on from large datasets informa on.
using techniques like machine learning, sta s cs, andComplexity of Models: Some data mining models are
database systems. It helps businesses and hard to interpret and explain.
organiza ons make data-driven decisions by Privacy and Security: Ensuring the security and
iden fying hidden pa erns in data. privacy of sensi ve data is cri cal.
Overfi ng: Complex models may overfit the training
Challenges in Implementa on of Data Mining data, leading to poor generaliza on.
Lack of Domain Exper se: Data mining requires both
Data Quality Issues: Incomplete, noisy, or technical and domain knowledge for meaningful
inconsistent data can affect the accuracy of results. analysis.
Data Integra on: Combining data from different Cost: Implemen ng data mining systems can be
sources and formats is complex. expensive for organiza ons.
Scalability: Handling large datasets efficiently Ethical Concerns: Ensuring fairness and avoiding
requires significant computa onal power. biases in models is crucial to prevent discrimina on.

⑥ Define data mining. Explain data mining tasks with suitable example
What is Data Mining?
Data Mining is the process of discovering pa erns, Data Mining Tasks
trends, correla ons, and useful informa on from Data mining tasks can be broadly classified into two
large datasets using techniques from machine categories: Descrip ve and Predic ve tasks. Here's
learning, sta s cs, and database systems. It helps an explana on of both, along with examples.
organiza ons extract valuable insights and make
informed decisions based on data.
18

(1). Descrip ve Data Mining Tasks  Example: In retail, discovering that


These tasks aim to summarize the general "Customers who buy bread also buy bu er"
characteris cs or pa erns in a dataset. is a common buying pa ern.
a) Classifica on
 Descrip on: Classifies data into predefined (2). Predic ve Data Mining Tasks
categories or classes. These tasks involve predic ng future trends based on
 Example: Classifying emails as "spam" or historical data.
"non-spam" based on features like subject, a) Regression
sender, and content.  Descrip on: Predicts con nuous values
b) Clustering based on the rela onship between variables.
 Descrip on: Groups data into clusters based  Example: Predic ng house prices based on
on similar characteris cs without predefined features like size, loca on, and number of
labels. rooms.
 Example: Segmen ng customers into b) Anomaly Detec on (Outlier Detec on)
different groups based on purchasing  Descrip on: Iden fies unusual data points
behavior, such as "frequent buyers," that do not fit the general pa ern of the
"seasonal buyers," etc. dataset.
c) Associa on Rule Mining  Example: Detec ng fraudulent credit card
 Descrip on: Iden fies rela onships or transac ons by iden fying transac ons that
associa ons between items in large datasets. deviate from typical spending behavior

⑦ Explain any data mining tool in detail


Data Mining Tool: RapidMiner 6. Automa on and Deployment: Automates
RapidMiner is an open-source data mining tool used workflows and deploys models to
for data prepara on, machine learning, model produc on.
building, and deployment. It features a user-friendly Example Use Case: Customer Churn Predic on
drag-and-drop interface, making it accessible for 1. Data Import: Load customer data.
users without deep programming knowledge. 2. Preprocessing: Clean the data (handle
Key Features missing values, convert categories).
1. User-Friendly Interface: Drag-and-drop 3. Modeling: Apply a classifica on algorithm
func onality to build models without coding. (e.g., Decision Tree).
2. Data Prepara on: Tools for cleaning, 4. Evalua on: Assess model performance
transforming, and preprocessing data. (accuracy, precision).
3. Machine Learning Algorithms: Supports 5. Deployment: Use the model to predict churn
various algorithms for classifica on, for new customers.
regression, clustering, etc. Advantages:
4. Model Evalua on: Offers tools to evaluate  Easy to use, with no coding required.
model performance using metrics like  Open-source and free.
accuracy and cross-valida on.  Extensive machine learning and data mining
5. Integra on: Can connect to databases, Excel, capabili es.
and big data pla orms like Hadoop. Disadvantages:
 Can be resource-intensive for large datasets.
 Learning curve for advanced features.
19

⑧ Explain predic ve and descrip ve algorithms in data mining


1. Descrip ve Algorithms Example: Market Basket Analysis, where associa on
Descrip ve algorithms aim to summarize and find rules iden fy frequent item pairs in retail.
pa erns or rela onships in data, without predic ng
future outcomes. 2. Predic ve Algorithms
Key Descrip ve Algorithms: Predic ve algorithms use historical data to predict
 Clustering: Groups similar data points future outcomes or classify new data.
together (e.g., K-means clustering to Key Predic ve Algorithms:
segment customers based on buying  Classifica on: Predicts categories (e.g.,
behavior). Decision Trees to classify spam emails).
 Associa on Rule Mining: Finds rela onships  Regression: Predicts con nuous values (e.g.,
between variables (e.g., "If a customer buys predic ng house prices with linear
bread, they will likely buy bu er"). regression).
 Principal Component Analysis (PCA):  Anomaly Detec on: Iden fies unusual data
Reduces data dimensionality while retaining points (e.g., fraud detec on).
key pa erns. Example: Using classifica on to predict if a customer
will default on a loan

⑨ Explain the descrip ve modeling in detai


Descrip ve modeling in ADBMS focuses on oExample: Calcula ng average sales
understanding and summarizing pa erns, by region.
rela onships, and structures in data. It helps 5. Anomaly Detec on:
organiza ons uncover hidden insights and trends, o Iden fies outliers or unusual
aiding decision-making. pa erns in data.
o Example: Detec ng fraudulent
Key Techniques in Descrip ve Modeling: transac ons in banking data.
1. Clustering:
o Groups similar data points based on Descrip ve Modeling Process in ADBMS:
features. 1. Data Collec on: Data is stored in rela onal
o Example: Segmen ng customers by or NoSQL databases.
purchasing behavior (e.g., high vs. 2. Data Preprocessing: Clean and transform
low spenders). data for analysis.
2. Associa on Rule Mining: 3. Model Applica on: Techniques like
o Finds rela onships between items in clustering, associa on rules, and anomaly
data. detec on are applied.
o Example: "Customers who buy bread 4. Analysis and Interpreta on: Insights are
also buy bu er" (market basket derived from the results for decision-making.
analysis).
3. Dimensionality Reduc on (PCA): Applica ons:
o Reduces data complexity by  Customer Segmenta on: Grouping
transforming features into fewer customers based on behavior.
dimensions.  Market Basket Analysis: Iden fying
o Example: Simplifying customer data frequently co-purchased products.
into principal components for  Fraud Detec on: Iden fying unusual or
analysis. fraudulent transac ons.
4. Data Summariza on:  Repor ng: Summarizing key business
o Aggregates data into key sta s cs. metrics from large datasets.
20

Advantages:  Be er decision-making through data-driven


 Efficient handling of large datasets. insights.
 Real- me analysis for immediate insights.

⑩ What is data preprocessing? Explain data preprocessing techniques


What is Data Preprocessing? ▪ Example: Reducing customer a ributes from 20 to
Data preprocessing is the process of preparing raw 5 key components.
data for analysis by cleaning, transforming, and
organizing it. It helps to ensure that data is in a 4.Handling Categorical Data
suitable format for analysis and machine learning, ● Encoding: Conver ng categorical values into
improving accuracy and model performance. numerical formats.
▪ One-hot encoding: Crea ng binary columns for
Data Preprocessing Techniques each category.
▪ Example: "Red", "Blue", "Green" becomes three
1.Data Cleaning binary columns.
▪ Handling Missing Data: Missing values can be filled
with the mean, median, mode, or removed en rely. 5.Outlier Detec on
▪ Example: Replacing missing customer age with the ▪ Handling Outliers: Detec ng and either removing
average age. or adjus ng extreme values.
▪ Example: Removing extreme income values in a
2. Data Transforma on financial dataset.
▪ Normaliza on: Scaling numerical values to a
standard range (e.g., [0, 1]). 6.Discre za on
▪ Example: Scaling income values between 0 and 1. ▪ Conver ng Con nuous Data into Categories:
▪ Standardiza on: Adjus ng data to have a mean of Turning numerical data into categorical intervals.
0 and a standard devia on of 1. ▪ Example: Grouping ages into ranges like "0-18",
▪ Example: Standardizing test scores for analysis. "19-35".

3.Feature Selec on and Dimensionality Reduc on 7.Resampling


▪ Feature Selec on: Removing irrelevant or ▪ Handling Imbalanced Data: Oversampling minority
redundant features. classes or undersampling majority classes in
▪ Example: Dropping unnecessary columns like classifica on problems.
"middle name" in a customer dataset. ▪ Example: Balancing fraud vs. non-fraud cases in a
▪ PCA: Reducing the number of features while fraud detec on model.
retaining key informa on.

⑪ Explain different issues in Data mining task. Explain Data Preprocessing Tasks in KDD Process :-

(1) Issues in Data Mining Tasks Scalability: Large datasets and high-dimensional data
are difficult to handle. Feature selec on and efficient
Data Quality Issues: Missing, noisy, or inconsistent algorithms are needed.
data can distort analysis. Solu ons include Complexity of Data: Unstructured data (text, images)
imputa on and smoothing. and heterogeneous data (from different sources) are
challenging to process.
21

Interpretability: Complex models may be hard to Data Integra on: Combining data from different
interpret, making it difficult to trust their results.sources, ensuring consistency in format and values.
Privacy and Security: Protec ng sensi ve data is Data Transforma on: Normalizing or standardizing
essen al, with anonymiza on and encryp on data for consistency and combining data
methods used. (aggrega on).
Overfi ng: Models may perform well on training Data Reduc on: Reducing the dataset’s
data but poorly on new data, requiring regulariza ondimensionality by selec ng relevant features or using
techniques. PCA.
Data Integra on: Combining data from different Discre za on: Conver ng con nuous values into
sources can introduce inconsistencies and conflicts. discrete categories.
Handling Imbalanced Data: Resampling to balance
(2) Data Preprocessing Tasks in KDD ProcessData underrepresented classes.
Cleaning: Handling missing values and noise through
imputa on or smoothing.

⑫ Explain Pa ern Evalua on and knowledge presenta on steps in Data Mining


1. Pa ern Evalua on In market basket analysis, a pa ern like "If a customer
Pa ern evalua on assesses the quality and buys bread, they are likely to buy bu er" would be
usefulness of the discovered pa erns to ensure they evaluated using support and confidence.
are relevant and ac onable.
Key Points: 2. Knowledge Presenta on
 Interes ngness: Pa erns should provide Knowledge presenta on involves making the
valuable insights. discovered pa erns easy to understand and
 Usefulness: Pa erns must be relevant to the ac onable.
goal, like improving sales or targe ng Methods:
customers.  Visualiza on: Graphs, charts, and plots to
 Simplicity: Pa erns should be easy to display pa erns clearly.
interpret.  Rule Representa on: Presen ng pa erns as
 Validity: Pa erns should generalize well readable rules (e.g., "If X, then Y").
across different datasets.  Reports and Dashboards: Summarized
Evalua on Metrics: pa erns or KPIs in interac ve dashboards for
 Support: Frequency of the pa ern in the quick insights.
dataset.  Natural Language: Conver ng pa erns into
 Confidence: Reliability of a pa ern or rule. textual descrip ons for easy interpreta on.
 Li : Strength of the associa on between Example:
items, considering their individual A decision tree or a sales dashboard could be used to
frequencies. visualize purchasing pa erns or customer segments.
Example:

① Explain Deduc ve Databases


UNIT 6
Deductive Databases in ADBMS
to be derived from existing facts using rules. It
A deductive database in an Advanced Database enables logical inference to automatically generate
Management System (ADBMS) combines traditional new knowledge based on predefined rules.
databases with logical reasoning, allowing new data
22

Key Features o

1. Rules and Inference: Advantages


o Allows the use of rules (like logical
conditions) to infer new facts from 1. Expressiveness: Can model complex
the stored data. relationships and logic.
o Queries are processed using 2. Declarative Queries: Focuses on "what" to
deductive logic, which involves retrieve, not "how" to compute it.
applying rules to data. 3. Knowledge Representation: Suitable for
2. Logical Queries: dynamic data and relationships.
o Queries are expressed in a logical
language (like Prolog) and are Challenges
resolved by applying rules to facts in
the database. 1. Performance: Can be slower than traditional
3. Recursion: databases due to the overhead of reasoning.
o Supports recursive queries, where a 2. Complexity: Managing large sets of rules can
query can reference itself, allowing be difficult.
complex relationships to be 3. Scalability: Performance issues may arise
captured (e.g., hierarchical data). with large data sets.

② What are ac ve databases? Elaborate with example


Ac ve Databases to be taken automa cally in
An ac ve database is a type of database that has response to events.
built-in triggers or rules to automa cally execute o Triggers can be set for specific
certain ac ons in response to specific events, such as opera ons (insert, update, delete)
changes in the data or the occurrence of certain on certain tables.
condi ons. These databases are o en referred to as 3. Rule Execu on:
Event-Driven databases because they respond to o Rules are executed automa cally
events (such as inser ons, dele ons, or updates) by when specific events are detected,
execu ng predefined rules or ac ons without user without requiring a user to manually
interven on. execute queries or procedures.

Key Features of Ac ve Databases Example of Ac ve Database


Consider an e-commerce system where the database
1. Event-Condi on-Ac on (ECA) Model: needs to automa cally handle stock updates when an
▪ Event: Something that happens in the database, order is placed.
such as a data modifica on (e.g., insert, update, or Event:
delete).  An order is placed in the system, causing an
o Condi on: A logical condi on that insert opera on into the Orders table.
must be true for the ac on to be Condi on:
executed (e.g., a threshold value).  The order quan ty exceeds the available
o Ac on: The opera on or ac on that stock for a product.
the database system automa cally Ac on:
performs when the event occurs and  If the condi on is true (i.e., insufficient
the condi on is met (e.g., sending an stock), the database might:
alert, upda ng other tables). o Automa cally send a no fica on to
2. Triggers: the warehouse team to restock.
o Triggers are a specific type of rule in o Update the stock in the Inventory
ac ve databases that define ac ons table.
23

oTrigger a backorder if stock is


insufficient. Challenges
1. Complexity: Managing numerous triggers
Advantages can be complicated.
1. Automa on: Ac ons happen automa cally 2. Performance: Triggers may slow down
based on events. database opera ons.
2. Consistency: Ensures business rules are 3. Maintenance: Debugging and managing
consistently followed. triggers can be challenging.
3. Real- me: Provides immediate responses to
database changes.

③ Compare Temporal, Spa al and Deduc ve Databases


③ Compare Spa al and Temporal databases

Feature Temporal Databases Spa al Databases Deduc ve Databases


Purpose Manage me-related Store and query Use logical rules for
data spa al/geographical data inference and
data reasoning
Key Data Type Time-stamped data, Points, lines, polygons, Facts and rules for
historical records raster data logical reasoning
Queries Queries about data at Spa al queries Logical queries based
different points in me (distance, proximity, on rules and facts
containment)
Example Bank transac on Geographic Family tree, deriving
history, historical informa on systems rela onships via rules
records (GIS)
Applica ons Financial records, Mapping, Expert systems,
medical history, environmental knowledge bases, rule-
version control monitoring, urban based systems
planning
Challenges Complex me-related Efficient spa al Performance
queries, large indexing, handling overhead, managing
historical large spa al data rules and facts
Efficient spa al
indexing, handling
large spa al data
Indexing Time-based indexing Spa al indexing (e.g., Rule-based indexing
R-trees, Quadtrees) and inference
Data Updates Tracking changes over Changes in spa al Deriving new facts
me (valid/transac on coordinates or new through inference
me) spa al data (rules)
Query Example "What was the balance "Find all loca ons "If X is a parent of Y
on account X on 2022- within 10 miles of a and Y is a parent of Z,
01-01?" point" then X is a grandparent
of Z."
24

④ Explain mul media databases in detail with respect to characteris cs and challenges
④ Write a note on : (i) Mul media databases
Mul media Databases in ADBMS o Combines and integrates various
Mul media databases store, manage, and retrieve media (e.g., text, audio, video) for
mul media content such as images, videos, audio, applica ons like mul media
and 3D models. They are designed to handle large, educa on or entertainment.
complex, and unstructured data, which tradi onal
databases are not op mized for. Challenges in Mul media Databases:
1. Storage and Scalability:
Characteris cs of Mul media Databases: o Mul media files are large, requiring
1. Support for Complex Data Types: advanced storage solu ons to
o Manages different types of handle vast amounts of data.
mul media: images, videos, audio, 2. Indexing and Retrieval:
3D models, and text. o Efficiently indexing diverse media
2. Large File Storage: (e.g., images based on shape, audio
o Efficiently stores large files like high- based on sound) can be complex and
resolu on images and long videos me-consuming.
using techniques like compression. 3. Query Complexity:
3. Content-Based and Metadata Indexing: o Processing content-based queries
o Content-Based: Allows searching (e.g., finding similar videos based on
based on media features (e.g., color, visual features) requires complex
shape, sound). algorithms.
o Metadata-Based: Uses descrip ve 4. Compression and Quality:
tags or keywords associated with o Compression reduces file size but
media for search. may result in loss of quality (e.g.,
4. Mul media Querying: lossy formats like MP4).
o Supports content-based (e.g., 5. Real-Time Processing:
finding similar images) and o Real- me applica ons like video
metadata-based queries (e.g., conferencing need low-latency data
searching videos tagged with specific access and processing.
terms). 6. Data Integra on:
5. Efficient Storage: o Integra ng different media types
o Uses compression algorithms (e.g., (text, audio, video) in a unified
JPEG for images, MP4 for video) to system can be challenging.
reduce storage needs while 7. Seman c Understanding:
maintaining quality. o Understanding the context of
6. Real-Time Processing: mul media data (e.g., interpre ng
o Enables real- me access for the meaning of images or videos)
applica ons like video streaming or requires advanced techniques like AI.
live surveillance. 8. Security and Privacy:
7. Handling Unstructured Data: o Mul media data, especially sensi ve
o Designed to manage unstructured content (e.g., medical images),
data, which doesn’t fit neatly into requires strong security and privacy
tradi onal rows and columns. measures.
8. Integra on of Media Types:
25

⑤ Write a note on : (i) Mobile databases


Mobile Databases 3. Key-Value Databases:
Mobile databases are designed to manage and store o Examples: LevelDB or Realm,
data on mobile devices like smartphones and tablets. efficient for storing data as key-value
These databases are op mized for mobile pairs.
environments, where resources like storage, 4. Object-Oriented Databases:
processing power, and ba ery life are limited. They o Examples: DB4O, used for
are widely used in applica ons that require offline applica ons requiring object-
access, synchroniza on with central systems, and oriented data storage.
efficient handling of mul media or transac onal
data. Advantages of Mobile Databases:
1. Offline Access:
Characteris cs of Mobile Databases: o Allows applica ons to store data
1. Lightweight: locally and operate even when not
o Mobile databases are compact and connected to the internet.
op mized for devices with limited 2. Reduced Network Dependency:
resources, enabling efficient storage o Reduces the need for constant
and data handling. network access, which enhances
2. Offline Capability: performance and responsiveness.
o Data can be stored locally on the
device when offline and Challenges of Mobile Databases:
synchronized with a central server 1. Limited Storage:
once the device is online. o Mobile devices have constrained
3. Synchroniza on: storage, which limits the amount of
o Mobile databases support data that can be stored locally.
synchroniza on, ensuring data 2. Network Issues:
consistency between the local o Network connec vity can be
database and the central system. intermi ent or slow, causing delays
4. Efficient Storage: or issues during synchroniza on.
o Data is stored in compressed 3. Security Concerns:
formats, ensuring efficient use of the o Mobile devices are vulnerable to
device's limited storage space. the and data breaches, so strong
security measures like encryp on
Types of Mobile Databases: are essen al.
1. Rela onal Databases: 4. Data Consistency:
o Examples: SQLite. A lightweight, o Ensuring data consistency between
serverless rela onal database used the local device and central database
in many mobile applica ons. can be complex, especially when
2. NoSQL Databases: mul ple devices are involved.
o Examples: MongoDB or Couchbase
Mobile, suitable for apps requiring
flexible, scalable data storage.

⑥ Write a note on : (i) Geographical Informa on Systems (GIS) (ii) Genome data management.
(i) Geographical Informa on Systems (GIS) in visualizing spa al and geographic data. GIS integrates
ADBMS various technologies such as hardware, so ware,
Geographical Informa on Systems (GIS) are systems data, and people to manage spa al data. It is used in
designed for capturing, storing, analyzing, and many sectors such as urban planning, environmental
26

monitoring, transporta on, and disaster 1. Large Data Volume: Genomic data can reach
management. terabytes, requiring scalable storage
Key Components of GIS: solu ons.
1. Hardware: Physical devices like computers, 2. Complexity: Genomic data includes
GPS devices, and sensors. nucleo de sequences, annota ons, and
2. So ware: Programs that help manage and metadata, which need specialized structures.
analyze spa al data (e.g., ArcGIS, QGIS). 3. Data Integra on: Combining data from
3. Data: Includes spa al data (coordinates, different genomic sources (e.g., sequencing,
maps) and non-spa al data (e.g., experimental results) can be complex.
demographic data). 4. Computa onal Queries: Complex queries
4. Methods: Techniques like spa al analysis, like sequence matching or finding gene
modeling, and querying for geographic data. muta ons require powerful database
Types of GIS Data: systems.
 Vector Data: Uses points, lines, and polygons Role of ADBMS:
to represent geographic features (e.g., roads,  Rela onal Databases: Tradi onal rela onal
rivers). databases store genomic sequences and
 Raster Data: Uses grids to represent related data in structured formats.
con nuous data like satellite imagery.  NoSQL Databases: Used for handling
GIS in ADBMS: unstructured and large-scale genomic data,
 Spa al Databases: GIS integrates with spa al such as MongoDB.
databases like PostGIS (PostgreSQL) and  Graph Databases: Graph databases (e.g.,
Oracle Spa al, which are op mized for Neo4j) model complex rela onships
storing and querying spa al data. between genes, muta ons, and pathways.
 Spa al Indexing: Uses specialized indexes Techniques in Genome Data Management:
like R-tree and Quad-tree for efficient spa al  Indexing: Specialized indexing (e.g., B-trees,
queries. hash indexes) is used for efficient retrieval of
 Querying and Analysis: ADBMS supports SQL gene c sequences.
extensions for spa al queries, enabling  Compression: Techniques like gzip reduce
loca on-based analysis, such as finding the the storage footprint of large genomic
shortest path or proximity analysis. datasets.
Applica ons:  Cloud Compu ng: Cloud storage and
 Urban Planning: Zoning, infrastructure distributed compu ng help manage and
management. process large-scale genomic data.
 Environmental Management: Land use, Applica ons:
deforesta on monitoring.  Personalized Medicine: Analyzing gene c
 Disaster Management: Earthquake data to customize medical treatments.
predic ons, recovery plans.  Genomic Research: Iden fying gene c links
 Transporta on: Route planning, traffic to diseases.
analysis.  Gene Annota on: Iden fying func onal
elements in genomes.
(ii) Genome Data Management in ADBMS  Muta onal Analysis: Linking muta ons to
Genome Data Management involves the storage and diseases like cancer.
analysis of DNA sequence data, a cri cal component Challenges:
in genomics and bioinforma cs. With advancements  Scalability: Genomic databases must scale as
in high-throughput sequencing technologies, data volume increases.
genome data has grown exponen ally, requiring  Data Security: Genomic data requires high
efficient databases to store and manage large levels of security due to privacy concerns.
volumes of gene c informa on.  Standardiza on: Lack of universal standards
Challenges in Genome Data Management: for genomic data formats can complicate
data integra on.
27

⑦ Explain “Event-Condi on- Ac on” model in Ac ve Databases


Event-Condi on-Ac on (ECA) Model in Ac ve  Event: A new order is placed (insert into
Databases orders table).
The Event-Condi on-Ac on (ECA) model is a key  Condi on: Order amount is over $1000.
concept in ac ve databases. It allows databases to  Ac on: Apply a discount or no fy the sales
automa cally trigger ac ons based on specific team.
events, provided certain condi ons are met. This
model ensures that the database responds Advantages:
dynamically to changes, improving automa on and 1. Automa on: ECA automates database
real- me data management. responses, reducing manual interven on.
Components of ECA Model: 2. Real-Time Processing: The database reacts
1. Event: instantly to changes.
o An event is a trigger that signals a 3. Consistency: Ensures that business rules are
change in the database, such as an applied consistently.
insert, update, or delete opera on. It 4. Efficiency: Reduces the need for addi onal
can also be a me-based or external so ware logic by implemen ng it directly in
event. the database.
o Examples: Inser ng a new record,
upda ng a field, or reaching a Example:
specific me.  Event: A new transac on is entered into the
2. Condi on: transac on table.
o The condi on is a logical check that  Condi on: If the transac on amount exceeds
is evaluated when the event occurs. $5000.
If the condi on is true, the  Ac on: No fy the fraud department by
associated ac on is triggered. sending an alert.
o Example: If a stock's inventory count
is below a threshold. Applica ons:
3. Ac on: 1. Monitoring: Automated alerts for threshold-
o The ac on is the opera on that based condi ons like low inventory or
happens if the event occurs and the overdue payments.
condi on is met. It could be 2. Data Integrity: Ensures rules like foreign key
modifying the database or no fying constraints are followed.
the user. 3. Business Logic: Encapsulates business rules
o Example: Upda ng stock levels or directly into the database.
sending an email.
Challenges:
Working of ECA Model: 1. Rule Complexity: Managing numerous ECA
When a specific event occurs, the database checks rules can be complex.
whether the condi on is true. If the condi on holds, 2. Performance: Frequent evalua ons can
the system performs the predefined ac on. impact database performance.
For example: 3. Error Handling: Unintended consequences
may arise if rules are not properly set up.
28

You might also like