BI Assignment 1
BI Assignment 1
(5)
OLTP vs OLAP
We can divide IT systems into transactional (OLTP) and analytical (OLAP). In general, we can assume that
OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze it.
Source of data Operational data; OLTPs are the Consolidation data; OLAP data
original source of the data. comes from the various OLTP
Databases
Short and fast inserts and Short and fast inserts and
updates initiated by end users updates initiated by end users
Inserts and Updates
Queries Relatively standardized and Often complex queries involving
simple queries Returning aggregations
relatively few records
2.Write a case study of your choice. for the following dimensioanl model.
a) Draw Star schema model. Give explanation for each.(6)
b) Draw Fact constellation model. Explain the concept of confirmed dimensions and fact table.
Give example for each(9)
a)
Dimension tables are used to describe the data we want to store. For example: a retailer might want to
store the date, store, and employee involved in a specific purchase. Each dimension table is its own
category (date, employee, store) and can have one or more attributes. For each store, we can save its
location at the city, region, state and country level. For each date, we can store the year, month, day of
the month, day of the week, etc. This is related to the hierarchy of attributes in the dimension table.
In the star schema, we’ll usually find that some attributes are a subset of other attributes in the same
record. This redundancy is deliberate and done in the name of better performance. We could use date,
location, and sales agent dimensions to aggregate (the transform part of the ETL process) and store data
inside DWH. In dimensional modeling, it’s very important to define the right dimensions and choose
proper granulation.
The Star Schema
The star schema is the simplest model used in DWH. Because the fact table is in the center of the schema
with dimension tables around it, it looks roughly like a star. This is especially apparent when the fact table
is surrounded by five dimension tables. A variant of the star schema the centipede schema, where the
fact table is surrounded by a large number of small dimension tables.
Star schemas are very commonly used in data marts. We can relate them to the top-down data model
approach. We’ll analyze two star schemas (data marts) and then combine them to make a single model.
The sales report is one today’s most common reports. As we mentioned before, in most cases we could
generate sales reports from the live system. But when data or business size makes this too cumbersome,
we’ll have to build a data warehouse or a data mart to streamline the process. After designing our star
schema, an ETL process will get the data from operational database(s), transform the data into the
proper format for the DWH, and load the data into the warehouse.
The model presented above contains of one fact table (colored light red) and five dimension tables
(colored light blue). The tables in the model are:
fact_sales – This table contains references to the dimension tables plus two facts (price and
quantity sold). Note that all five foreign keys together form the primary key of the table.
dim_sales_type – This is a sales-type dimension table with only one attribute, “ type_name ”.
dim_employee – This is an employee dimension table that stores basic employee attributes: full
name and birth year.
dim_product – This is a product dimension table with only two attributes (other than the
primary key): product name and product type.
dim_time – This table handles the time dimension. It contains five attributes besides the
primary key. The lowest-level data is sales by date ( action_date ). The action_week attribute is
the number of the week in that year (i.e. the first week in January would be given the number 1;
the last week in December would get the number 52, etc.)
The actual_month and actual_year attributes store the calendar month and year when the
sale occurred. These can be extracted from the action_date attribute.
The action_weekday attribute stores the name of the day when the sale took place.
dim_store – This is a store dimension. For each store we’ll save the city, region, state and
country where it is located. Here we can clearly notice that the star schema is denormalized.
b)
A Fact constellation means two or more fact tables sharing one or more dimensions. It is also
called Galaxy schema.
Fact Constellation Schema describes a logical structure of data warehouse or data mart. Fact
Constellation Schema can design with a collection of de-normalized FACT, Shared, and Conformed
Dimension tables.
We can look at the two previous models as two data marts, one for the sales department and the other
for the supply department. Each of them consists of only one fact table and a few dimensional tables. If
we wanted, we could combine these two data marts into one model. This type of schema, containing
several fact tables and sharing some dimension tables, is called a galaxy schema. Sharing dimension
tables can reduce database size, especially where shared dimensions have many possible values. Ideally,
in both data marts the dimensions are defined in the same manner. If that’s not the case, we’ll have to
adjust the dimensions to fit both needs.
A galaxy schema, built out of our two example data marts, is shown below:
3.Explain any two Business Intelligence applications with an example(5).
Business intelligence (BI) leverages software and services to transform data into actionable insights that
inform an organization’s strategic and tactical business decisions. BI tools access and analyze data sets
and present analytical findings in reports, summaries, dashboards, graphs, charts and maps to provide
users with detailed intelligence about the state of the business.
The term business intelligence often also refers to a range of tools that provide quick, easy-to-digest
access to insights about an organization's current state, based on available data.
Reporting:
A crucial business application of BI is reporting. As we’ve covered, business intelligence tools collect and
study unstructured sets of data in addition to organizing and using them to generate a range of different
types of reports. These can include staffing, expenses, sales, customer services, and other processes.
Reporting and data analysis are similar, but they vary significantly in purpose, delivery, tasks and value.
Reporting is the process of organizing data in summaries with the intention of monitoring business
performance. Analysis is the process of exploring data to extract insights that can be applied to improve
business practices.
Basically, reporting turns data into plain information. Analysis takes data and turns it into actionable
insights. Both help businesses improve their performance and monitor operations, but use different
methods to do so. Reporting shows users what’s happening and analysis explains why it’s happening.
Both processes can be carried out using visualizations, but don’t have to.
Business intelligence tools are ideal for handling dynamic data. Historically, data visualizations were
static, and a new one would have to be created for every variable change. Modern BI software provides
interactive dashboards that can update in real time, offering a new level of usability and agility in data
analysis.
Performance management
With BI applications, organizations can monitor goal progress based on pre-defined or customizable
timeframes. The data-driven goals may include project completion deadlines, target delivery time, or
sales goals. For example, if you’d like to reach a certain sales goal, your BI system can analyze previous
months of data and suggest a reasonable goal to aim for based on past performance.
These goals can be tracked closely to deliver gets -time updates on goal progress. This helps you
understand what gaps might remain. Users can set the system to alert them when they are getting close
to a target or if the time limit is ending and they have yet to reach their goal. This helps managers and
employees alike stay on top of their progress and helps keep teams goal-oriented.
Users can also monitor goal fulfillment and use progress data to gauge the overall productivity of an
organization. Unlike instances when a substantial amount of time is lost tracking down or organizing
urgently needed data, information is always readily accessible. This saves businesses time and money —
not to mention makes your lives easier!