0% found this document useful (0 votes)

464 views5 pages

Part 1: The Star Schema Data Model: Healthcare Data Models UC Davis Continuing and Professional Education

The document summarizes a star schema data model used for healthcare data. It includes: - Dimension tables store attributes like patients, providers, locations, and time. Each entry is uniquely identified by a primary key. - Fact tables record interactions between dimensions, like patient encounters. Dimensions are referenced using foreign keys rather than being duplicated. - A date dimension table is created with one row per day, containing attributes like day of week and month. This allows efficient querying of dates rather than searching raw data. - Queries using foreign keys to join fact and dimension tables are faster than searching raw data directly, as the database only needs to scan dimension tables which are smaller than raw data tables.

Uploaded by

manoja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

464 views5 pages

Part 1: The Star Schema Data Model: Healthcare Data Models UC Davis Continuing and Professional Education

Uploaded by

manoja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Healthcare Data Models UC Davis Continuing and Professional Education

Part 1: The Star Schema Data Model

A star schema model is comprised of Dimension Tables and Fact Tables.

Dimension tables. Store your ‘Nouns’, People, Places, Things, and Time. For the purposes of this exercise
assume that your dimension table will have exactly one row, aka one entry, for each noun. For example, a
patient dimension will have exactly one row/entry for each patient at your institution. Each ‘Noun’ is
uniquely identified by a Primary Key.

Fact tables. Are a bit different. Consider encounters as an example. Each encounter can be viewed as an
interaction between many ‘Nouns’. There will be a patient, a clinician, a place (e.g., office/hospital/lab),
and a time. When all of these ‘Nouns’ intersect, we have an encounter. ‘Nouns’ exist in the fact table only
as a reference back to their home in their respective dimension table. There is no point describing the same
noun, in the same way, in a large number of fact tables.

In this case, the key is called a foreign key because it uniquely identifies a row in a foreign table,
specifically the dimension table of interest.

Here is an example of a star schema representing this scenario:

Healthcare Data Models UC Davis Continuing and Professional Education

Step 1: Review the SQL Used to Create the Date Dimension Table: DATE_DIM
This SQL creates a table named DATE_DIM which contains one row for every day from January 3rd
2000 – December 31st 2015. (January 3rd was selected because it was the first Monday of the year
2000). This table should have an entry for every day that you wish to report on. Notice how each row
describes the same day in a number of ways (i.e., Day of week, day of month, day of year). This is
part of the power of describing all dates in single table; all of this data is created once and can be re-
used in every query. The downside is that it takes time to create and maintain these tables.

CREATE TABLE Date_Dim(

Date_Key number(18,0) NOT NULL,
Date_Value Date NOT NULL,
my_Day Char(10 ),
Day_Of_Week number(18,0),
Day_Of_Month number(18,0),
Day_Of_Year number(18,0),
Week_Of_Year number(18,0),
my_Month Char(10),
Month_Of_Year number(18,0),
Quarter_Of_Year number(18,0),
my_Year number(18,0)
Healthcare Data Models UC Davis Continuing and Professional Education

);

--SQL To populate the table:

INSERT INTO
Date_Dim
SELECT
to_number(to_char(CurrDate, 'YYYYMMDD')) as Date_Key,
CurrDate AS Date_Value,
TO_CHAR(CurrDate,'Day') as Day,
to_number(TO_CHAR(CurrDate,'D')) AS Day_Of_Week,
to_number(TO_CHAR(CurrDate,'DD')) AS Day_Of_Month,
to_number(TO_CHAR(CurrDate,'DDD')) AS Day_Of_Year,
to_number(TO_CHAR(CurrDate+1,'IW')) AS Week_Of_Year,
TO_CHAR(CurrDate,'Month') AS my_Month,
to_number(TO_CHAR(CurrDate,'MM')) AS Month_of_Year,
to_number((TO_CHAR(CurrDate,'Q'))) AS Quarter_Of_Year,
to_number(TO_CHAR(CurrDate,'YYYY')) AS my_Year
FROM
(
select
level n,
--The date which starts the reporting period
TO_DATE('01-02-2000','MM-DD-YYYY') + NUMTODSINTERVAL(level,'DAY')
CurrDate
from
dual
connect by
--number of days to get to the end of the reporting period
level <= 5842
);
commit;

You should review the DATE_DIM that was provided to you.

Now that we have seen the creation of the DATE_DIM table we can discuss its use. The purpose of
this table is to have a single interface through which to query for date. For example, think about a
table with 30,000 rows that spans data for 15 years. Over the course of 15 years there are more than
6,000 unique days (rough estimate, leap years and other calendar anomalies can impact) yet a date
will be stored in each of the 30,000 rows of this table. This means there will be a lot of duplicate
dates. Imagine now that you are asked to find all rows that have a date from the first year of the 15
year span. If you were to query the table directly, your database would have to iterate through all
30,000 rows and test whether or not the date on that row is in your year of interest*.

By using the DATE_DIM interface, we can significantly improve our database performance. Refer
back to the SQL that created the DATE_DIM as well as your review of the table. Now consider
querying this table to find data from your specific year. In this case, your database will have to scan
less than 6,000 rows to find the applicable dates (Indexes change the truth of this statement quite a bit.
I am excluding them for simplicity.).
Healthcare Data Models UC Davis Continuing and Professional Education

Now the question becomes how we connect DATE_DIM to our data to achieve this performance
benefit. This is where we exercise the FOREIGN KEY-PRIMARY KEY relationship between the
fact table and the dimension table.

Since each fact carries a copy of the dimensions primary key as a foreign key we can use these data to
JOIN the two tables (reference: http://www.w3schools.com/sql/sql_join.asp ). We will walk through
an example below to illustrate this. But first, let us provide an introduction to our fact table.

ENCOUNTERS_FACT

Since this is our fact table the create script is not present. Rather, this table would be loaded with data
from your institutions EMR. Notice how this table contains foreign keys which act as the connection
back to the dimension tables. It is unlikely your institution captures the data in this format; rather this
table will be created by a database expert who will make it available for reporting. There can be a
significant amount of processing required to put data in this format. In this example we have done it
for you.

In a SQL environment you could execute this query:

Select *
from encounters_fact;

In our case, please study the ENCOUNTER_FACT table that was provided to you.

Take special notice of these fields:

1. Enc_start_datetime – this is a standard date field
2. Date_key – this is a foreign key which uniquely identifies one row in the DATE_DIM table
3. Time_key – this is a foreign key which uniquely identifies one row in the TIME_DIM table

Step 2 – Compare Standard Queries vs Those Written for a Star Schema

Here we illustrate the example proposed above, selecting one year from the 15 year span. We choose
calendar year 2008.

Standard Query for all encounters that take place during calendar year 2008

select
*
Healthcare Data Models UC Davis Continuing and Professional Education

from
encounters_fact
where
trunc(enc_start_datetime) >= to_date('01-01-2008','MM-DD-YYYY')
and trunc(enc_start_datetime) < to_date('01-01-2009','MM-DD-YYYY');

Notice here how the date field in the encounters_fact table is referenced directly. This query
took 0.536 seconds on my machine.

Query using star_schema and DATE_DIM

select
*
from
encounters_fact enc
join date_dim ddim on enc.date_key = ddim.date_key
where
ddim.my_year = 2008;

Notice here how the ENCOUNTERS_FACT.DATE_KEY foreign key is joined to the

DATE_DIM.DATE_KEY primary key. This is the join which allows us to exercise the efficiency
of the star_schema. This query took 0.398 seconds on my machine which is about 25% faster.
Now imagine how much of an improvement there would be if the encounters table had 300,000
or 300,000,000 records. The star schema query would still only have to search the less than 6000
date_dim records even as the ENCOUNTERS_FACT grows to a large scale.

Disclaimer:
The performance will still degrade as ENCOUNTERS_FACT grows large but, this is due to the
fact that your database has to connect DATE_DIM with ENCOUNTERS_FACT through a join.
This join will consume more resources as the table size increases, but most databases are highly
optimized for joins and the expense can be mitigated through indexing.

Snowpro Advanced Data Engineer
No ratings yet
Snowpro Advanced Data Engineer
17 pages
Study Guide: Exam PL-300: Microsoft Power BI Data Analyst
0% (1)
Study Guide: Exam PL-300: Microsoft Power BI Data Analyst
8 pages
29.3 Soft & Hards Rules in DataVault - Intro
No ratings yet
29.3 Soft & Hards Rules in DataVault - Intro
6 pages
DWH Concepts Interview Q&A
No ratings yet
DWH Concepts Interview Q&A
12 pages
AX7vsAX2012 SQLTables
No ratings yet
AX7vsAX2012 SQLTables
1,112 pages
Business Intelligence and Analytics From Big Data To Big Impact PDF
No ratings yet
Business Intelligence and Analytics From Big Data To Big Impact PDF
24 pages
PBL2 SME Governance Problem Statement-V2
No ratings yet
PBL2 SME Governance Problem Statement-V2
3 pages
Data Warehouse and Data Modelling
No ratings yet
Data Warehouse and Data Modelling
11 pages
Example Star Schema For Banking
No ratings yet
Example Star Schema For Banking
16 pages
Types in The Power Query M Formula Language
No ratings yet
Types in The Power Query M Formula Language
7 pages
Super Hybrid BI - PowerBI Gateway
No ratings yet
Super Hybrid BI - PowerBI Gateway
26 pages
Azure Services Periodic Table v1 1
No ratings yet
Azure Services Periodic Table v1 1
1 page
Guidelines Extensions (Table Level) Best Practices
No ratings yet
Guidelines Extensions (Table Level) Best Practices
15 pages
Ey Actuarial Data Management Brochure
100% (1)
Ey Actuarial Data Management Brochure
11 pages
MSBI
No ratings yet
MSBI
30 pages
Power BI Notes
No ratings yet
Power BI Notes
16 pages
Data Modeling 101
No ratings yet
Data Modeling 101
17 pages
Msbi Developer (SSRS, Ssas, Ssis) : Advanced Level
100% (1)
Msbi Developer (SSRS, Ssas, Ssis) : Advanced Level
4 pages
Data Architect or ETL Architect or BI Architect or Data Warehous
No ratings yet
Data Architect or ETL Architect or BI Architect or Data Warehous
4 pages
Row-Level Security (RLS) and Data Permissions - PowerBI
No ratings yet
Row-Level Security (RLS) and Data Permissions - PowerBI
11 pages
Sales Amount by Month - Sort It by The Correct Month Order, Not Alphabetical Order
No ratings yet
Sales Amount by Month - Sort It by The Correct Month Order, Not Alphabetical Order
6 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
8 pages
GIS Succinctly
0% (1)
GIS Succinctly
106 pages
Azure Synapse Analytics
No ratings yet
Azure Synapse Analytics
29 pages
Power BI - Interview Questions
No ratings yet
Power BI - Interview Questions
61 pages
Data Modelling 2 Normalisation: by Haik Richards
No ratings yet
Data Modelling 2 Normalisation: by Haik Richards
28 pages
Course12 2 PDF
No ratings yet
Course12 2 PDF
36 pages
Kimball University
No ratings yet
Kimball University
6 pages
Ms-Bi: Course Content
No ratings yet
Ms-Bi: Course Content
7 pages
Vertipaq Vs OLAP - Change Your Data Modeling Approach - Marco Russo
No ratings yet
Vertipaq Vs OLAP - Change Your Data Modeling Approach - Marco Russo
10 pages
DAX CheetSheat
No ratings yet
DAX CheetSheat
20 pages
Power BI Interview Questions-2
No ratings yet
Power BI Interview Questions-2
39 pages
What Is RDBMS (Relational Database Management System) ?
No ratings yet
What Is RDBMS (Relational Database Management System) ?
54 pages
Star Schema and Technology Review: Musa Sami Ata Abdel-Rahman Supervisor: Professor Sebastian Link
No ratings yet
Star Schema and Technology Review: Musa Sami Ata Abdel-Rahman Supervisor: Professor Sebastian Link
15 pages
Basics of Dimensional Modeling
100% (1)
Basics of Dimensional Modeling
14 pages
PL300
No ratings yet
PL300
36 pages
Data Modeling ER
33% (3)
Data Modeling ER
89 pages
Microsoft Certified Data Analyst Associate Skills Measured
No ratings yet
Microsoft Certified Data Analyst Associate Skills Measured
4 pages
What Is BI Testing
No ratings yet
What Is BI Testing
19 pages
40 Ways To Optimize Your Power BI Report Today
100% (1)
40 Ways To Optimize Your Power BI Report Today
42 pages
SQLSat374 - ETL On Cloud - Azure Data Factory - Reza Rad
No ratings yet
SQLSat374 - ETL On Cloud - Azure Data Factory - Reza Rad
46 pages
Exam DP-900: Microsoft Azure Data Fundamentals - Skills Measured
0% (2)
Exam DP-900: Microsoft Azure Data Fundamentals - Skills Measured
7 pages
MM 910 AdministratorGuide en
No ratings yet
MM 910 AdministratorGuide en
214 pages
SSRS Interview Questions PDF Download Basic Part 2
No ratings yet
SSRS Interview Questions PDF Download Basic Part 2
3 pages
Resume - Ahmad Nawaz
No ratings yet
Resume - Ahmad Nawaz
2 pages
SQL Advanced Cheatsheet
No ratings yet
SQL Advanced Cheatsheet
1 page
DW-BI Best Practices
100% (1)
DW-BI Best Practices
15 pages
11 Create Paginated Reports
No ratings yet
11 Create Paginated Reports
82 pages
MSBI Content
No ratings yet
MSBI Content
6 pages
Power BI Cheat Sheet
No ratings yet
Power BI Cheat Sheet
10 pages
SQL Server Sample Resume
No ratings yet
SQL Server Sample Resume
2 pages
Management Reporter 2012 For Microsoft Dynamics ERP: Installation and Configuration Guides
100% (1)
Management Reporter 2012 For Microsoft Dynamics ERP: Installation and Configuration Guides
27 pages
Resume of Paul Namala
No ratings yet
Resume of Paul Namala
5 pages
Leveraging Power BI With D365 Chicago
No ratings yet
Leveraging Power BI With D365 Chicago
33 pages
PL-300 Part 2 - 2023 - Mandotory
No ratings yet
PL-300 Part 2 - 2023 - Mandotory
48 pages
Pyq Datawarehouse
No ratings yet
Pyq Datawarehouse
6 pages
DWH Unit 2
No ratings yet
DWH Unit 2
13 pages
Tutorial # 1
No ratings yet
Tutorial # 1
58 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
84 pages
dw4 - Dimension1
No ratings yet
dw4 - Dimension1
75 pages
Lecture 8
No ratings yet
Lecture 8
126 pages
Medical Visual Question Answering: A Survey: Lin, Zhang, Tao, Shi, Haffari, Wu, He and Ge
No ratings yet
Medical Visual Question Answering: A Survey: Lin, Zhang, Tao, Shi, Haffari, Wu, He and Ge
18 pages
Proof Cover Sheet
No ratings yet
Proof Cover Sheet
16 pages
N-Grams and Smoothing: CSC 371: Spring 2012
No ratings yet
N-Grams and Smoothing: CSC 371: Spring 2012
39 pages
Dan Jurafsky and James Martin Speech and Language Processing
No ratings yet
Dan Jurafsky and James Martin Speech and Language Processing
46 pages
Computer Standards & Interfaces
No ratings yet
Computer Standards & Interfaces
8 pages
Hdpe Properties
No ratings yet
Hdpe Properties
3 pages
Beyond The Paycheck
No ratings yet
Beyond The Paycheck
4 pages
Soal Latian
No ratings yet
Soal Latian
18 pages
Red Car Theory
No ratings yet
Red Car Theory
3 pages
CV Jirg - WSR4
No ratings yet
CV Jirg - WSR4
20 pages
Pedia NCP
No ratings yet
Pedia NCP
9 pages
Chronic Obstructive Pulmonary Disease: A Case Presentation On
100% (2)
Chronic Obstructive Pulmonary Disease: A Case Presentation On
95 pages
FBS Intro Module
No ratings yet
FBS Intro Module
10 pages
Report 2402410522 1
No ratings yet
Report 2402410522 1
4 pages
3.3.3 Lab IG Install Motherboard
No ratings yet
3.3.3 Lab IG Install Motherboard
2 pages
How To Make Metal Rust
No ratings yet
How To Make Metal Rust
59 pages
KSuite List of Protocols Full
No ratings yet
KSuite List of Protocols Full
459 pages
International Pte LTD: Technical Specifications For Perfex Set Mounted Radiator (Rev1)
No ratings yet
International Pte LTD: Technical Specifications For Perfex Set Mounted Radiator (Rev1)
1 page
The Human Body
No ratings yet
The Human Body
3 pages
Science & The Senses - Perceptions & Deceptions - 2
No ratings yet
Science & The Senses - Perceptions & Deceptions - 2
5 pages
A I. R R H: Bdomen Eview OF Elated Istory
100% (1)
A I. R R H: Bdomen Eview OF Elated Istory
14 pages
Dictionary en CRO Technical
No ratings yet
Dictionary en CRO Technical
1,105 pages
Ra 4688
No ratings yet
Ra 4688
1 page
Cultural Immersion Report
No ratings yet
Cultural Immersion Report
10 pages
Deeva Products Vardaan
No ratings yet
Deeva Products Vardaan
2 pages
Signature Assignment - Development and Developmental Variation
No ratings yet
Signature Assignment - Development and Developmental Variation
17 pages
Pretest in TLE7
No ratings yet
Pretest in TLE7
3 pages
HA033172 - 3 - EPackLITE - 2PH - Power Controller - User - Guide
No ratings yet
HA033172 - 3 - EPackLITE - 2PH - Power Controller - User - Guide
108 pages
Cucumis Melo Review - World Journal of Pharmacy and Pharmaceutical Sciences 2016
No ratings yet
Cucumis Melo Review - World Journal of Pharmacy and Pharmaceutical Sciences 2016
18 pages
Ink and Paint
No ratings yet
Ink and Paint
19 pages
H S Fogler - Elements of Chemical Reaction Engineering 3rd Edition
No ratings yet
H S Fogler - Elements of Chemical Reaction Engineering 3rd Edition
48 pages
Climate Change Conference Brochure-2025-KU
No ratings yet
Climate Change Conference Brochure-2025-KU
2 pages
Amintiri Vol I - Radu Rosetti
100% (1)
Amintiri Vol I - Radu Rosetti
296 pages
STS Reviewer
No ratings yet
STS Reviewer
26 pages
Professional - List Updated List
No ratings yet
Professional - List Updated List
4 pages

Part 1: The Star Schema Data Model: Healthcare Data Models UC Davis Continuing and Professional Education

Uploaded by

Part 1: The Star Schema Data Model: Healthcare Data Models UC Davis Continuing and Professional Education

Uploaded by

Healthcare Data Models UC Davis Continuing and Professional Education

Part 1: The Star Schema Data Model

A star schema model is comprised of Dimension Tables and Fact Tables.

Here is an example of a star schema representing this scenario:

CREATE TABLE Date_Dim(

--SQL To populate the table:

You should review the DATE_DIM that was provided to you.

In a SQL environment you could execute this query:

Take special notice of these fields:

Step 2 – Compare Standard Queries vs Those Written for a Star Schema

Query using star_schema and DATE_DIM

Notice here how the ENCOUNTERS_FACT.DATE_KEY foreign key is joined to the

You might also like