0% found this document useful (0 votes)
41 views

221

Data warehouses contain large amounts of data from multiple sources to support business intelligence and analytics. There are two main types of metadata in data warehouses - technical metadata for development and administration, and business metadata to understand the stored data. Data marts contain subsets of data from warehouses focused on specific business units or departments to allow quick access to insights. ETL processes extract, transform, and load data into warehouses from source systems to integrate and prepare the data for analysis.

Uploaded by

Abhishek Ranjan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

221

Data warehouses contain large amounts of data from multiple sources to support business intelligence and analytics. There are two main types of metadata in data warehouses - technical metadata for development and administration, and business metadata to understand the stored data. Data marts contain subsets of data from warehouses focused on specific business units or departments to allow quick access to insights. ETL processes extract, transform, and load data into warehouses from source systems to integrate and prepare the data for analysis.

Uploaded by

Abhishek Ranjan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

In the data warehouse architecture, metadata describes the data warehouse database and offers a framework for data.

It
helps in constructing, preserving, handling, and making use of the data warehouse.
There are two types of metadata in data mining:

i) Technical Metadata comprises information that can be used by developers and managers when executing
warehouse development and administration tasks.
ii) Business Metadata comprises information that offers an easily understandable standpoint of the data
stored in the warehouse. Metadata plays an important role for businesses and the technical teams to
understand the data present in the warehouse and convert it into information.

A data mart is a subset of a data warehouse focused on a particular line of business, department, or subject
area. Data marts make specific data available to a defined group of users, which allows those users to quickly
access critical insights without wasting time searching through an entire data warehouse. For example, many
companies may have a data mart that aligns with a specific department in the business, such as finance, sales,
or marketing.

Data Mart Vs Data Warehouse


Data marts and data warehouses are both highly structured repositories where data is stored and managed until
it is needed. However, they differ in the scope of data stored: data warehouses are built to serve as the central
store of data for the entire business, whereas a data mart fulfills the request of a specific division or business
function. Because a data warehouse contains data for the entire company, it is best practice to have strictly
control who can access it. Additionally, querying the data you need in a data warehouse is an incredibly
difficult task for the business. Thus, the primary purpose of a data mart is to isolate—or partition—a smaller set of
data from a whole to provide easier data access for the end consumers. A data mart can be created
from an existing data warehouse—the top-down approach—or from other sources, such as internal operational
systems or external data. Similar to a data warehouse, it is a relational database that stores transactional data
(time value, numerical order, reference to one or more object) in columns and rows making it easy to organize
and access

Extract, Transform, Load (ETL), is a process of data integration that encompasses three steps - extraction,
transformation, and loading. In a nutshell, ETL systems take large volumes of raw data from multiple sources,
convert it for analysis, and load that data into your warehouse.
ETL saves you significant time on data extraction and preparation - time that you can better spend on
evaluating your business. Practicing ETL is also part of a healthy data management workflow, ensuring high data
quality, availability, and reliability. Each of the three major components in the ETL saves time and development
effort by running just once in a dedicated data flow:

Extract: In ETL, the first link determines the strength of the chain. The extract stage determines which data
sources to use, the refresh rate (velocity) of each source, and the priorities (extract order) between them — all of
which heavily impact your time to insight. Transform: After extraction, the transformation process brings
clarity and order to the initial data swamp. Dates and times combine into a single format and strings parse down
into their true underlying meanings. Location data convert to coordinates, zip codes, or cities/countries. The
transform step also sums up, rounds, and averages measures, and it deletes useless data and errors or discards
them for later inspection. It can also mask personally identifiable information (PII) to comply with GDPR, CCPA,
and other privacy requirements. Load: In the last phase, much as in the first, ETL determines
targets and refresh rates. The load phase also determines whether loading will happen incrementally, or if it will
require “upsert” (updating existing data and inserting new data) for the new batches of data.

ROLAP implies Relational OLAP, an application based on relational DBMSs. It performs dynamic
multidimensional analysis of data stored in a relational database. The architecture is like three-tier. It has three
components viz. front end (User Interface), ROLAP server (Metadata request processing engine) and the back
end (Database Server).In this three-tier architecture the user submits the request and ROLAP engine converts
the request into SQL and submits to the backend database. Popular ROLAP products include Metacube by
Stanford Technology Group, Red Brick Warehouse by Red Brick Systems.

MOLAP stands for Multidimensional Online Analytical Processing. It processes the data using the
multidimensional cube using various combinations. Since, the data is stored in multidimensional structure the
MOLAP engine uses the pre-computed or pre-stored information. MOLAP engine processes pre-compiled
information. It has dynamic abilities to perform aggregation of concept hierarchy. MOLAP is very useful in time-
series data analysis and economic evaluation. Tools that incorporate MOLAP include Oracle Essbase, IBM
Cognos, and Apache Kylin.
HOLAP It defines Hybrid Online Analytical Processing. It is the hybrid of ROLAP and MOLAP technologies. It
connects both the dimensions together in one architecture. It stores the intermediate or part of the data in ROLAP
and MOLAP. Depending on the query request it accesses the databases. It stores the relational tables in ROLAP
structure, and the data requires multidimensional views, stored and processed using MOLAP architecturePopular
HOLAP products are Microsoft SQL Server 2000 presents a hybrid OLAP server.
Desktop Online Analytical Processing (DOLAP) architecture is most suitable for local multidimensional
analysis. It is like a miniature of multidimensional database or it’s like a sub cube or any business data cube.

Features of Star Schema: (i) The data is in denormalized database. (ii) It provides quick query response (iii)
Star schema is flexible can be changed or added easily. (iv) It reduces the complexity of metadata for developers
and end users. Advantages of Star Schema:- Query performance, Load performance and administration, Built-
in referential integrity

Features of Snowflake Schema : - (i) It has normalized tables, (ii) Occupy less disk space. ,(iii) It requires more
lookup time as many tables are interconnected and extending dimensions. Advantages of Snowflake Schema:-
i) A Snowflake schema occupies a much smaller amount of disk space compared to the Star schema. Lesser
disk space means more convenience and less hassle. ii) Snowflake schema of small protection from
various Data integrity issues. Most people tend to prefer the Snowflake schema because of how safe if it is.
iii) Data is easy to maintain and more structured. iv) Data quality is better than star schema.

FACT CONSTELLATION SCHEMA :- There is another schema for representing a multidimensional model. This
term fact constellation is like the galaxy of universe containing several stars. It is a collection of fact schemas
having one or more-dimension tables in common as shown in the figure below. This logical representation is
mainly used in designing complex database systems
Advantages of Fact Constellation Schema:- i) Different fact tables are explicitly assigned to the dimensions. ii)
Provides a flexible schema for implementation
Limitations of Fact Constellation Schema:- i) Complexity of the schema involved because of several
aggregations. ii)Fact constellation solution is hard to maintain and support

You might also like