221
221
It
helps in constructing, preserving, handling, and making use of the data warehouse.
There are two types of metadata in data mining:
i) Technical Metadata comprises information that can be used by developers and managers when executing
warehouse development and administration tasks.
ii) Business Metadata comprises information that offers an easily understandable standpoint of the data
stored in the warehouse. Metadata plays an important role for businesses and the technical teams to
understand the data present in the warehouse and convert it into information.
A data mart is a subset of a data warehouse focused on a particular line of business, department, or subject
area. Data marts make specific data available to a defined group of users, which allows those users to quickly
access critical insights without wasting time searching through an entire data warehouse. For example, many
companies may have a data mart that aligns with a specific department in the business, such as finance, sales,
or marketing.
Extract, Transform, Load (ETL), is a process of data integration that encompasses three steps - extraction,
transformation, and loading. In a nutshell, ETL systems take large volumes of raw data from multiple sources,
convert it for analysis, and load that data into your warehouse.
ETL saves you significant time on data extraction and preparation - time that you can better spend on
evaluating your business. Practicing ETL is also part of a healthy data management workflow, ensuring high data
quality, availability, and reliability. Each of the three major components in the ETL saves time and development
effort by running just once in a dedicated data flow:
Extract: In ETL, the first link determines the strength of the chain. The extract stage determines which data
sources to use, the refresh rate (velocity) of each source, and the priorities (extract order) between them — all of
which heavily impact your time to insight. Transform: After extraction, the transformation process brings
clarity and order to the initial data swamp. Dates and times combine into a single format and strings parse down
into their true underlying meanings. Location data convert to coordinates, zip codes, or cities/countries. The
transform step also sums up, rounds, and averages measures, and it deletes useless data and errors or discards
them for later inspection. It can also mask personally identifiable information (PII) to comply with GDPR, CCPA,
and other privacy requirements. Load: In the last phase, much as in the first, ETL determines
targets and refresh rates. The load phase also determines whether loading will happen incrementally, or if it will
require “upsert” (updating existing data and inserting new data) for the new batches of data.
ROLAP implies Relational OLAP, an application based on relational DBMSs. It performs dynamic
multidimensional analysis of data stored in a relational database. The architecture is like three-tier. It has three
components viz. front end (User Interface), ROLAP server (Metadata request processing engine) and the back
end (Database Server).In this three-tier architecture the user submits the request and ROLAP engine converts
the request into SQL and submits to the backend database. Popular ROLAP products include Metacube by
Stanford Technology Group, Red Brick Warehouse by Red Brick Systems.
MOLAP stands for Multidimensional Online Analytical Processing. It processes the data using the
multidimensional cube using various combinations. Since, the data is stored in multidimensional structure the
MOLAP engine uses the pre-computed or pre-stored information. MOLAP engine processes pre-compiled
information. It has dynamic abilities to perform aggregation of concept hierarchy. MOLAP is very useful in time-
series data analysis and economic evaluation. Tools that incorporate MOLAP include Oracle Essbase, IBM
Cognos, and Apache Kylin.
HOLAP It defines Hybrid Online Analytical Processing. It is the hybrid of ROLAP and MOLAP technologies. It
connects both the dimensions together in one architecture. It stores the intermediate or part of the data in ROLAP
and MOLAP. Depending on the query request it accesses the databases. It stores the relational tables in ROLAP
structure, and the data requires multidimensional views, stored and processed using MOLAP architecturePopular
HOLAP products are Microsoft SQL Server 2000 presents a hybrid OLAP server.
Desktop Online Analytical Processing (DOLAP) architecture is most suitable for local multidimensional
analysis. It is like a miniature of multidimensional database or it’s like a sub cube or any business data cube.
Features of Star Schema: (i) The data is in denormalized database. (ii) It provides quick query response (iii)
Star schema is flexible can be changed or added easily. (iv) It reduces the complexity of metadata for developers
and end users. Advantages of Star Schema:- Query performance, Load performance and administration, Built-
in referential integrity
Features of Snowflake Schema : - (i) It has normalized tables, (ii) Occupy less disk space. ,(iii) It requires more
lookup time as many tables are interconnected and extending dimensions. Advantages of Snowflake Schema:-
i) A Snowflake schema occupies a much smaller amount of disk space compared to the Star schema. Lesser
disk space means more convenience and less hassle. ii) Snowflake schema of small protection from
various Data integrity issues. Most people tend to prefer the Snowflake schema because of how safe if it is.
iii) Data is easy to maintain and more structured. iv) Data quality is better than star schema.
FACT CONSTELLATION SCHEMA :- There is another schema for representing a multidimensional model. This
term fact constellation is like the galaxy of universe containing several stars. It is a collection of fact schemas
having one or more-dimension tables in common as shown in the figure below. This logical representation is
mainly used in designing complex database systems
Advantages of Fact Constellation Schema:- i) Different fact tables are explicitly assigned to the dimensions. ii)
Provides a flexible schema for implementation
Limitations of Fact Constellation Schema:- i) Complexity of the schema involved because of several
aggregations. ii)Fact constellation solution is hard to maintain and support