0% found this document useful (0 votes)
2 views

module 2-2

The document covers key concepts in data mining, including definitions of fact tables, load managers, operational databases, OLAP vs. OLTP, and various schemas for data warehousing. It explains the functionalities of bitmap indexing and the three-tier architecture of a data warehouse. Additionally, it compares ROLAP and MOLAP servers, emphasizing their strengths and weaknesses in data management.

Uploaded by

pp6524878
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

module 2-2

The document covers key concepts in data mining, including definitions of fact tables, load managers, operational databases, OLAP vs. OLTP, and various schemas for data warehousing. It explains the functionalities of bitmap indexing and the three-tier architecture of a data warehouse. Additionally, it compares ROLAP and MOLAP servers, emphasizing their strengths and weaknesses in data management.

Uploaded by

pp6524878
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Part A

Module 2 : Data Mining

2020 March
4. What is a fact table?
Fact table is a table in a dimensional model that stores the measures or metrics of a
business process, such as sales revenue or customer satisfaction, along with the
dimensions that define the context of the metrics, such as time, location, or product.

2021 April
4. What are the functions of a load manager?
Load manager is responsible for managing the process of loading data from various
sources into a data warehouse or other target system, ensuring the data is transformed,
cleansed, and integrated according to the business rules and requirements.

2022 April
4. What is an operational database?
Operational database is a database that is designed to support the day-to-day operations
of an organization, such as managing transactions, inventory, or customer information, and
is optimized for transaction processing rather than data analysis or reporting.
Part B
Module 2 : Data Mining

2020 March
14. Compare and contrast ROLAP and MOLAP servers.
ROLAP (Relational Online Analytical Processing) and MOLAP (Multidimensional Online
Analytical Processing) servers are two different approaches for managing OLAP data.

ROLAP stores the data in a relational database, allowing for flexible queries and joins
between tables, but can suffer from performance issues with complex queries.

MOLAP, on the other hand, stores the data in a multidimensional array, allowing for faster
processing of complex queries but may not be as flexible in terms of querying and data
manipulation. MOLAP may be preferred for smaller datasets, while ROLAP may be better for
larger and more complex datasets.

15. Explain bitmap indexing of OLAP data.


Bitmap indexing is a technique used in OLAP databases to improve query performance. It
involves creating a bitmap for each distinct value of a dimension attribute, where each bit in
the bitmap represents the presence or absence of a specific value in the data. By using
these bitmaps, queries can be answered quickly by performing logical operations (such as
AND, OR, NOT) on the bitmaps rather than searching through the raw data. Bitmap indexing
can be especially effective when dealing with high-cardinality attributes, where the number of
distinct values is very large.

2021 April
14. Explain the illustration of a data cube.
A data cube is a multidimensional representation of data that allows for efficient analysis
and summarization. It consists of measures (numerical values), dimensions (categorical
values), and hierarchies (levels of granularity within dimensions). The cube can be navigated
and sliced to explore different perspectives of the data, with each cell representing a specific
combination of measures and dimensions.
15. Explain the concept of metadata repository.
Metadata is information about the data in a database, such as its structure, format, and
relationships between tables. A metadata repository is a central location for storing and
managing this information, allowing for efficient and consistent management of the data. The
repository can contain information about the schema, data types, relationships, constraints,
and other aspects of the data. By using a metadata repository, organizations can improve
the consistency and accuracy of their data, as well as simplify the process of managing and
querying the data.

2022 April
14. Differentiate between OLAP and OLTP.
OLAP (Online Analytical Processing) and OLTP (Online Transaction Processing) are two
distinct approaches for processing data in a database system.

OLAP is designed for complex queries that involve aggregations of large datasets to provide
business intelligence insights, while OLTP is optimized for transaction processing to manage
operational data in real-time. OLAP systems have a multidimensional view of data that
supports flexible querying and analysis, whereas OLTP systems typically have a flat, two-
dimensional view of data with predefined schema for efficient processing of transactions.
OLAP systems are read-intensive and require high data storage capacity and fast retrieval
capabilities, while OLTP systems are write-intensive and prioritize data consistency,
availability, and concurrency control.
15. Explain bitmap indexing of OLAP data.
(Same answer as 15th question from 2020 March paper)
Part C
Module 2 : Data Mining

2020 March
23. Explain various schemas involved in conceptual modeling
of a data warehouse.
Conceptual modeling of a data warehouse involves the use of various schemas to represent
the information contained in the data warehouse. The three main schemas used in
conceptual modeling are the star schema, the snowflake schema, and the fact constellation
schema.

● Star schema: This schema is the simplest and most commonly used schema in data
warehousing. It consists of a fact table surrounded by one or more dimension tables.
The fact table contains the measures or quantitative data, while the dimension tables
contain the attributes or descriptive data that help to provide context for the
measures in the fact table.

● Snowflake schema: The snowflake schema is a variation of the star schema. It is


used when the dimension tables have hierarchies and relationships between them. In
this schema, the dimension tables are normalized into multiple related tables, which
results in the shape of a snowflake.
● Fact constellation schema: This schema is also known as the galaxy schema. It is
used when multiple fact tables are needed to represent the data warehouse. This
schema contains multiple fact tables and dimension tables that are interconnected,
forming a constellation-like structure.

Overall, the use of these schemas in conceptual modeling is important in defining the
structure and relationships between data elements in a data warehouse. They help to
provide a clear and concise representation of the data that can be easily understood by end-
users, facilitating effective analysis and decision making.

2021 April
23. Explain various schemas involved in conceptual modeling
of a data warehouse.
(Same answer as 23rd question from 2020 March paper)
2022 April
23. With a diagram, explain the three-tier architecture of a data
warehouse.
The three-tier architecture of a data warehouse consists of three layers:

1. Bottom Tier or Data Storage Layer: This layer is also known as the data layer or
storage layer. It is responsible for storing the raw data obtained from various sources,
such as operational databases, spreadsheets, and flat files. This layer contains a
relational or multidimensional database management system that is optimized for
large-scale data storage and efficient querying.

2. Middle Tier or OLAP Server Layer: The middle tier or OLAP (Online Analytical
Processing) server layer is responsible for performing complex analysis and
generating reports for the end-users. This layer consists of OLAP servers that use
multidimensional data cubes to facilitate efficient and fast querying. The data is pre-
aggregated and stored in a way that allows for fast retrieval.

3. Top Tier or Front-end Tools Layer: The top tier or front-end tools layer is
responsible for presenting the data to the end-users in a user-friendly manner. It
includes various reporting and visualization tools such as dashboards, scorecards,
and ad-hoc query tools. These tools provide the end-users with easy access to the
data and enable them to make informed decisions based on the data.

Overall, the three-tier architecture of a data warehouse enables efficient and effective data
storage, processing, and presentation to the end-users. It helps to provide a comprehensive
view of the data, which can be used for decision-making and business intelligence purposes.

You might also like