0% found this document useful (0 votes)
5 views

Week 3 -Data Warehouse Design

The document outlines data warehouse design principles, focusing on data modeling techniques such as Entity-Relationship (ER) diagrams and dimensional modeling. It explains star and snowflake schemas, detailing their structures, advantages, and limitations, along with the roles of fact and dimension tables in organizing data. A practical exercise is included to demonstrate how to create a data model for an online retail store, emphasizing the iterative nature of data warehousing design.

Uploaded by

moroansoma23
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Week 3 -Data Warehouse Design

The document outlines data warehouse design principles, focusing on data modeling techniques such as Entity-Relationship (ER) diagrams and dimensional modeling. It explains star and snowflake schemas, detailing their structures, advantages, and limitations, along with the roles of fact and dimension tables in organizing data. A practical exercise is included to demonstrate how to create a data model for an online retail store, emphasizing the iterative nature of data warehousing design.

Uploaded by

moroansoma23
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Week 3: Data Warehouse Design

1. Data Modeling Techniques


• Data Modeling: The process of formally describing the data and information within an
organization. It involves defining entities, their attributes, and the relationships
between them.
• Key Techniques:
o Entity-Relationship (ER) Diagrams:
 A visual representation of data using entities (boxes representing real-
world objects like customers, products), attributes (characteristics of
entities like customer name, product price), and relationships
(connections between entities like "Customer" orders "Product").
 Example:
 Entities: Customer, Product, Order
 Attributes:
 Customer: CustomerID, CustomerName, Address
 Product: ProductID, ProductName, Price
 Order: OrderID, OrderDate, CustomerID (foreign key
referencing Customer), ProductID (foreign key
referencing Product)
 Relationships:
 Customer places Order (one-to-many)
 Product is included in Order (many-to-many)
 ER Diagram Symbols:
 Rectangle: Entity
 Ellipse: Attribute
 Diamond: Relationship
 Lines: Represent relationships (with cardinality notations like
1:1, 1:N, N:M)
o Dimensional Modeling:
 Specifically designed for data warehouses.
 Focuses on organizing data around business dimensions (e.g., Time,
Customer, Product) and measures (e.g., Sales, Revenue, Quantity).
 Aims to simplify data analysis by making it easier to answer business
questions.
2. Star Schema and Snowflake Schema
• Star Schema:
o The simplest and most common data warehouse schema.
o Structure:
 A central fact table containing measures (e.g., sales amount, quantity
sold).
 Multiple dimension tables surrounding the fact table, each representing
a dimension (e.g., Time, Customer, Product).
 The fact table contains foreign keys that link to the primary keys of the
dimension tables.
o Example:
 Fact Table: Sales
 Attributes: SaleID, ProductID, CustomerID, TimeID,
QuantitySold, SalesAmount
 Dimension Tables:
 Product: ProductID, ProductName, Category, Brand
 Customer: CustomerID, CustomerName, City, Country
 Time: TimeID, Date, Year, Month, Day
o Advantages: Simple to understand and query, efficient for basic reporting.
o Limitations: Can lead to data redundancy in dimension tables, limited
flexibility for complex analysis.
• Snowflake Schema:
o An extension of the star schema.
o Structure:
 Dimension tables are further normalized (broken down into smaller,
more granular tables).
 Creates a "snowflake" shape in the diagram.
o Example:
 Product Dimension (in Snowflake):
 Product: ProductID, ProductName
 Category: CategoryID, CategoryName
 Brand: BrandID, BrandName
 (Relationships: Product belongs to Category, Product belongs
to Brand)
o Advantages: Reduces data redundancy, improves data integrity, better for
complex analysis.
o Disadvantages: More complex to design and query than star schema.
3. Fact and Dimension Tables
• Fact Tables:
o Store numerical measurements or facts about the business.
o Contain foreign keys that link to dimension tables.
o Examples:
 Sales: SalesAmount, QuantitySold, UnitCost
 Orders: OrderID, OrderDate, OrderAmount
 Inventory: ProductID, WarehouseID, QuantityInStock
• Dimension Tables:
o Provide context and meaning to the facts.
o Contain attributes that describe the dimensions of the business.
o Examples:
 Customer: CustomerID, CustomerName, Address, City, Country
 Product: ProductID, ProductName, Category, Brand, Color
 Time: TimeID, Date, Year, Month, Day, Hour
 Geography: Country, Region, City
o Key Characteristics:
 Usually contain a single primary key (often a surrogate key - an
artificial key generated for a table).
 Attributes are typically descriptive and non-additive (e.g.,
CustomerName, ProductName).
 Often contain hierarchies (e.g., Country > Region > City).
Practical: Creating a Data Model for a Sample Business Scenario
• Scenario: Let's consider an online retail store.
• Steps:
1. Identify Entities: Customer, Product, Order, Time, Employee
2. Define Attributes:
 Customer: CustomerID, CustomerName, Address, Email, Phone
 Product: ProductID, ProductName, Description, Price, Category,
Brand
 Order: OrderID, OrderDate, OrderAmount, CustomerID, EmployeeID
 Time: TimeID, Date, Year, Month, Day, Hour
 Employee: EmployeeID, EmployeeName
3. Define Relationships:
 Customer places Order (one-to-many)
 Product is included in Order (many-to-many)
 Order is placed by Employee (one-to-many)
4. Create ER Diagram: (Use a tool like Lucidchart or draw it manually)
5. Design Star Schema:
 Fact Table: Sales
 Attributes: SaleID, ProductID, CustomerID, TimeID,
EmployeeID, QuantitySold, SalesAmount
 Dimension Tables: Customer, Product, Time, Employee
6. Consider Snowflake Schema:
 Product Dimension:
 Product: ProductID, ProductName, Description
 Category: CategoryID, CategoryName
 Brand: BrandID, BrandName
 Customer Dimension:
 Customer: CustomerID, CustomerName, Email
 Address: AddressID, Street, City, State, Country
This practical exercise will help you understand how to apply the concepts of data modeling,
star schema, and snowflake schema in a real-world scenario.
Remember: Data warehousing is an iterative process. The design of the data warehouse will
evolve over time as business requirements and data needs change.

You might also like