0% found this document useful (0 votes)
138 views

Star Query Versus Star Transformation Query: Which To Choose?

The document discusses two methods for querying star schema in Oracle databases: star queries and star transformation queries. Star queries build a Cartesian product of dimension tables and join it to the fact table using a composite index. Star transformation queries use single-column bitmap indexes on the fact table and rewrite the query to read the fact table first before joining dimensions. The document provides an example schema and query to demonstrate how the star query execution method works by creating indexes and analyzing tables for the optimizer.

Uploaded by

baraago
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
138 views

Star Query Versus Star Transformation Query: Which To Choose?

The document discusses two methods for querying star schema in Oracle databases: star queries and star transformation queries. Star queries build a Cartesian product of dimension tables and join it to the fact table using a composite index. Star transformation queries use single-column bitmap indexes on the fact table and rewrite the query to read the fact table first before joining dimensions. The document provides an example schema and query to demonstrate how the star query execution method works by creating indexes and analyzing tables for the optimizer.

Uploaded by

baraago
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Star Query Versus Star Transformation Query:

Which to Choose?
Michael Janesch
Innovative Consulting

Abstract
Star schema design is the backbone of the data warehouse architecture. The method to query the star schema has
evolved over the past releases of Oracle with enhancements to the Cost-Based Optimizer. Oracle7 introduced the
“Star Query” execution path and Oracle8 gave us the option of using the “Star Transformation Query” execution
path. This paper will track the evolution of querying the star schema and describe how both work so you will be
able to identify and implement which option is best for your data warehousing queries.

Stating the Facts


Before starting, let’s identify the components of star schema design.

Star Schema:
The Star Schema is the basic design of the data warehouse. It is made up of a Fact table and several Dimension
tables. Multiple star schemas can exist in a data warehouse where fact tables may share the same dimensions.

Fact Table:
The Fact Table contains the quantitative information that defines what users will ultimately want to analyze. It’s
key components are made up of foreign keys to the dimension tables. Non-key components are the actual numeric
facts that will be reported, summarized, and analyzed. Fact tables are narrow in record width due to their numeric
nature and large in number of rows. These tables will typically be tens or hundreds of millions of rows. Examples
of facts would be sales and shipment tables.

Dimension Table:
Dimension tables contain the qualitative information that defines how users will analyze the fact information. The
key component is the single column that identifies the record as unique. The non-key components contain the
descriptive information about this record. The important thing to remember about dimension tables is that they are
denormalized, which increases the performance when querying in a star schema. Dimension tables are wide in
record width due to their descriptive character nature and small in number of rows as compared to the Facts to
which they relate. One dimension table that is always found in a data warehouse is Time.

Snowflake Schema:
A Star becomes a Snowflake when the dimension tables are no longer completely denormalized and have foreign
keys to other dimensions. This design may be necessary to save physical space or to fulfill requirements, but can
lead to performance issues and tuning challenges.

Paper #311 / Page 1


Bitmap Index:
Bitmapped indexes were introduced in Oracle7 and have become very useful in data warehouses. As opposed to
the existing B-tree index structures, bitmap indexes store a bit(0,1) in the index entry instead of the indexed
column value. This drastically reduces the amount of space required for the index, which allows more entries per
block and leads to performance increases. Bitmap indexes are more efficient for low cardinality columns(columns
with low distinct values per total number of rows) and B-tree indexes are more efficient for high cardinality
columns.

Thirty-Thousand Foot View of Star Query History


The origin of Fact and Dimension table design goes back to the early 1970’s and is credited to A.C. Nielson, who
provided customer sales information to grocery and drug stores. Bill Inmon and and Ralph Kimball are two of the
well-known names in the industry today who have made major contributions to data warehousing over the past
two decades.
While database designers could create a perfect star schema over twenty years ago, the database engines could not
retrieve the information with the same efficiency they can today.
In Oracle Version 6, the Rule-Based Optimizer(RBO) would join a star schema through a series of NESTED
LOOPS starting with the fact table joining to one dimension, then having that result joined to the remaining
dimensions through subsequent NESTED LOOPS. When run against non-analyzed tables in Oracle8, this RBO
execution method will work the same in Oracle8 as it did in Version 6.
Over the releases of Oracle7, the Cost-Based Optimizer(CBO) made significant improvements executing against
star schemas. The underlying execution path of Oracle7’s Star Query was building a Cartesian product across the
dimension tables and then joining that virtual table to the fact table using a composite index on the fact table.
Earlier versions of Oracle7 had difficulty identifying star queries without explicit hints or if the number of
dimensions were too many. Release 7.2 was able to explicitly identify a star schema, but sometimes needed help
when, due to volumes, the star query execution wasn’t always the best choice. Release 7.3 made advances in
identifying, expanding the limits, and optimizing Cartesian joins.
Oracle8 introduced another option in the CBO for querying the star schema. The Star Transformation Query
execution path uses single-column bitmap indexes and a query rewrite (transformation). As opposed to the
Oracle7 based path that read the fact table last, the Oracle8 transformation path reads the fact table first using the
bitmap indexes, then joins the dimension tables.

Non-Traditional Example
These methods are best demonstrated through examples of tables, queries, and execution paths. Instead of using
the traditional SALES, STORE, PRODUCT, and TIME star schema model, these examples will be explained
through the simple example of the FACT and 5 dimension (DIM#) tables. In the “real world”, these tables would
have more columns and appropriate storage clauses.
CREATE TABLE dim1( dim1_key NUMBER, dim1_attr VARCHAR2(100), CONSTRAINT dim1_pk PRIMARY KEY(dim1_key));

CREATE TABLE dim2( dim2_key NUMBER, dim2_attr VARCHAR2(100), CONSTRAINT dim2_pk PRIMARY KEY(dim2_key));

CREATE TABLE dim3( dim3_key NUMBER, dim3_attr VARCHAR2(100), CONSTRAINT dim3_pk PRIMARY KEY(dim3_key));

CREATE TABLE dim4( dim4_key NUMBER, dim4_attr VARCHAR2(100), CONSTRAINT dim4_pk PRIMARY KEY(dim4_key));

Paper #311 / Page 2


CREATE TABLE dim5( dim5_key NUMBER, dim5_attr VARCHAR2(100), CONSTRAINT dim5_pk PRIMARY KEY(dim5_key));

CREATE TABLE fact( dim1_key NUMBER, dim2_key NUMBER, dim3_key NUMBER, dim4_key NUMBER, dim5_key NUMBER,

fact1 NUMBER, fact2 NUMBER,

CONSTRAINT fact_pk PRIMARY KEY(dim1_key, dim2_key, dim3_key, dim4_key, dim5_key),

CONSTRAINT fact_dim1_fk FOREIGN KEY(dim1_key) REFERENCES dim1(dim1_key),

CONSTRAINT fact_dim2_fk FOREIGN KEY(dim2_key) REFERENCES dim2(dim2_key),

CONSTRAINT fact_dim3_fk FOREIGN KEY(dim3_key) REFERENCES dim3(dim3_key),

CONSTRAINT fact_dim4_fk FOREIGN KEY(dim4_key) REFERENCES dim4(dim4_key),

CONSTRAINT fact_dim5_fk FOREIGN KEY(dim5_key) REFERENCES dim5(dim5_key));

The tables are populated with variations of the alphabet ( i.e. INSERT INTO dim1 VALUES(1,’a’), INSERT
INTO dim1 VALUES(2,’b’)… ) and have the following row counts:
DIM1( 26 ), DIM2( 30000 ), DIM3( 100000 ), DIM4( 26 ), DIM5( 26 ), and FACT( 2.7 million ).
Notice that DIM2 and DIM3 are the larger dimension tables for the examples.

Star Query Execution Method


The Star Query execution method builds a Cartesian product of the possible dimension rows, based on the
WHERE clause, from each dimension table, then joins this result set to the fact table though a composite key
index read.

How to Implement the Star Query Execution


• Create indexes on dimension attribute columns (dim#_attr) for large dimension tables that will be used in
the WHERE clause. In our example this will avoid unnecessary full table scans on the DIM2(30,000
rows) and DIM3(100,000 rows) tables.
CREATE INDEX dim2_attr_indx ON dim2(dim2_attr);

CREATE INDEX dim3_attr_indx ON dim3(dim3_attr);

• Create a composite B-tree index of dimension foreign keys in the FACT table. This key does not
necessarily have to be the primary key and multiple composite keys may be required based on the queries.
In this example, the composite unique index was created with the primary key constraint. The query will
only use composite indexes where the starting columns are used correctly (i.e. no column functions) in the
WHERE clause.
CREATE UNIQUE INDEX fact_pk ON fact (dim1_key, dim2_key, dim3_key, dim4_key, dim5_key);

• Analyze the tables and indexes for the Cost-Based Optimizer.


ANALYZE TABLE dim# COMPUTE STATISTICS;

ANALYZE TABLE fact ESTIMATE STATISTICS SAMPLE 25 PERCENT;

• Use the STAR hint in the query if necessary to force the execution path. The CBO should identify a star,
but sometimes the data distributions or lack of quality statistics could require a hint to help the optimizer.
SELECT /*+ STAR */

Paper #311 / Page 3


Sample Query
The following query is a good candidate for the Star Query execution because it uses a small number of dimension
tables and the uses the starting columns of the composite primary key index.
SELECT dim1.dim1_attr, dim2.dim2_attr, dim3.dim3_attr, fact.fact1

FROM fact, dim1, dim2, dim3

WHERE fact.dim1_key = dim1.dim1_key /* joins */

AND fact.dim2_key = dim2.dim2_key

AND fact.dim3_key = dim3.dim3_key

AND dim1.dim1_attr IN (’a’,’b’) /* dimension filters */

AND dim2.dim2_attr IN (’c’,’d’)

AND dim3.dim3_attr IN (’e’,’f’);

Explain Plan
In SQL*PLUS use SET AUTOTRACE ON with your PLAN_TABLE accessible to produce an Explain Plan. This
plan will show you if you are actually executing a Star Query. Reading this plan(inside-out, top to bottom), you
can see that the CBO did execute a Star Query plan by doing Cartesian joins of the DIM tables and accessing the
FACT table last with the composite index.
Starting with the index read on the 100,000-row DIM3 table (A) and performing a Cartesian join(C) after
retrieving from the 30,000-row DIM2 table, it does the second Cartesian join (D) on the full table scan of the 26-
row DIM1 table (B). The result of this Cartesian join then becomes the driver of the NESTED LOOP operation to
retrieve from the composite index on the FACT table (E).
SELECT STATEMENT Optimizer=CHOOSE

NESTED LOOPS

MERGE JOIN (CARTESIAN) (D)

MERGE JOIN (CARTESIAN) (C)

INLIST ITERATOR

TABLE ACCESS (BY INDEX ROWID) OF ’DIM2’

INDEX (RANGE SCAN) OF ’DIM2_ATTR_INDX’ (NON-UNIQUE)

SORT (JOIN)

INLIST ITERATOR

TABLE ACCESS (BY INDEX ROWID) OF ’DIM3’ (A)

INDEX (RANGE SCAN) OF ’DIM3_ATTR_INDX’ (NON-UNIQUE)

SORT (JOIN)

TABLE ACCESS (FULL) OF ’DIM1’ (B)

TABLE ACCESS (BY INDEX ROWID) OF ’FACT’ (E)

INDEX (RANGE SCAN) OF ’FACT_PK’ (UNIQUE)

Paper #311 / Page 4


Star Transformation Query Execution Method
Where the Star Query execution goes to the Fact table last in its execution, the Star Transformation Query goes to
the Fact table first via bitmap indexes, then joins to the Dimension tables. This method does not limit the query to
use an ordered composite index and doesn’t default to a Cartesian product joins of the dimensions. The bitmap
indexes reduce storage space requirements and provide flexibility for complex star schemas. This method is
identified as a “transformation” because the CBO actually rewrites the query as a series of subqueries on the
joining dimension tables.

How to Implement the Star Transformation Query Execution


• Set the init.ora parameter STAR_TRANSFORMATION_ENABLED = TRUE
• As with the Star Query Execution, create indexes on dimension attribute columns (dim#_attr) for large
dimension tables that will be used in the WHERE clause.
CREATE INDEX dim2_attr_indx ON dim2(dim2_attr);

CREATE INDEX dim3_attr_indx ON dim3(dim3_attr);

• As with the Star Query Execution, analyze the tables and indexes for the Cost-Based Optimizer.
ANALYZE TABLE dim# COMPUTE STATISTICS;

ANALYZE TABLE fact ESTIMATE STATISTICS SAMPLE 25 PERCENT;

• Create single column bitmap indexes on all of the foreign key columns in the fact table.
CREATE BITMAP INDEX fact_dim1_bm ON fact(dim1_key);

CREATE BITMAP INDEX fact_dim2_bm ON fact(dim2_key);

CREATE BITMAP INDEX fact_dim3_bm ON fact(dim3_key);

CREATE BITMAP INDEX fact_dim4_bm ON fact(dim4_key);

CREATE BITMAP INDEX fact_dim5_bm ON fact(dim5_key);

• Use the STAR_TRANSFORMATION hint in query if necessary to force the execution path.
SELECT /*+ STAR_TRANSFORMATION */

Sample Query
The following query is a good candidate for the Star Transformation query execution because it shows an example
of querying the FACT table outside the ordered composite index by skipping the DIM1 join. A better example
might be joining the FACT to 10-15 DIM tables, but that would produce an explain plan that would fall off the
pages of this paper.
SELECT dim2.dim2_attr, dim3.dim3_attr, dim5.dim5_attr, fact.fact1

FROM fact, dim2, dim3, dim5

WHERE fact.dim2_key = dim2.dim2_key /* joins */

AND fact.dim3_key = dim3.dim3_key

AND fact.dim5_key = dim5.dim5_key

AND dim2.dim2_attr IN (’c’,’d’) /* dimension filters */

AND dim3.dim3_attr IN (’e’,’f’)

AND dim5.dim5_attr IN (’l’,’m’)

Paper #311 / Page 5


Explain Plan
Reading this plan the same way as the Star Query execution plan, you can see that the CBO did execute a Star
Transformation query plan by going to the FACT first and using the bitmap indexes.
In the following plan, there are 3 BITMAP MERGE operations for the 3 bitmaps used (A,B,C) on the FACT table
and the DIM# tables. Notice that the larger DIM2 and DIM3 dimension tables created temporary segments that
could be used later in the plan. The FACT table is accessed by a ROWID read (E) for all possible rows from the
bitmap joins.
The DIM3 temp segment (D) is MERGED with a FULL read of the 26-row DIM5 table. This result is joined to
the FACT result set with a HASH JOIN (F), which is joined by another HASH (H) of the DIM2 temp segment (G).
SELECT STATEMENT Optimizer=CHOOSE

TEMP TABLE GENERATION

TEMP TABLE GENERATION

TEMP TABLE GENERATION

TEMP TABLE GENERATION

HASH JOIN (H)

HASH JOIN (F)

MERGE JOIN (CARTESIAN)

TABLE ACCESS (FULL) OF ’DIM5’

SORT (JOIN)

TABLE ACCESS (FULL) OF ’ORA_TEMP_1_3’ (dim3) (D)

TABLE ACCESS (BY INDEX ROWID) OF ’FACT’ (E)

BITMAP CONVERSION (TO ROWIDS)

BITMAP AND

BITMAP MERGE

BITMAP KEY ITERATION

TABLE ACCESS (FULL) OF ’ORA_TEMP_1_4’ (IAS dim2)

BITMAP INDEX (RANGE SCAN) OF ’FACT_DIM2_BM’ (A)

BITMAP MERGE

BITMAP KEY ITERATION

TABLE ACCESS (FULL) OF ’ORA_TEMP_1_3’ (IAS dim3)

BITMAP INDEX (RANGE SCAN) OF ’FACT_DIM3_BM’ (B)

BITMAP MERGE

BITMAP KEY ITERATION

TABLE ACCESS (FULL) OF ’DIM5’

BITMAP INDEX (RANGE SCAN) OF ’FACT_DIM5_BM’ (C)

TABLE ACCESS (FULL) OF ’ORA_TEMP_1_4’ (dim2) (G)

Paper #311 / Page 6


The internal transformation part of this query would look like this to retrieve the FACT data:
SELECT … FROM fact

WHERE fact.dim2_key IN (SELECT dim2.dim2_key FROM dim2 WHERE dim2.dim2_attr IN ('c','d'))

AND fact.dim3_key IN (SELECT dim3.dim3_key FROM dim3 WHERE dim3.dim3_attr IN ('e','f'))

AND fact.dim5_key IN (SELECT dim5.dim5_key FROM dim5 WHERE AND dim5.dim5_attr ('l','m'))

The following INSERT statement was extracted from the SGA and shows that the attribute index was used when
building the temporary segments for the larger dimension tables.
INSERT INTO "SYS"."ORA_TEMP_1_3"

SELECT /*+ SEMIJOIN_DRIVER */ "DIM3"."DIM3_KEY" "C0","DIM3"."DIM3_ATTR" "C1"

FROM "DIM3" "DIM3"

WHERE "DIM3"."DIM3_ATTR"='e' OR "DIM3"."DIM3_ATTR"='f'

Explain Plan
SELECT STATEMENT Optimizer=CHOOSE

INLIST ITERATOR

TABLE ACCESS (BY INDEX ROWID) OF 'DIM3'

INDEX (RANGE SCAN) OF 'DIM3_ATTR_INDX' (NON-UNIQUE)

Which Method to Choose and When to Choose it?


After reviewing the progression of star schema querying and seeing how the Cost-Based Optimizer handles these
requests, let’s look at the data and design issues that would make you pick one over the other:

Star Query is best used when:


• The query patterns are predefined and ordered to take advantage of the composite key indexes. Multiple
composite key indexes may still be required.
• The number of dimensions is small (5 or under).
• The fact table is densely populated.
• The cardinality is high between fact foreign key columns and the total fact number of rows.
• The established data warehouses/marts that were built under Oracle7 and are working fine with this
method.

Star Transformation Query is best used when:


• The query patterns are undefined so building and maintaining multiple composite keys would be too
difficult, if not impossible.
• The number of joined dimensions is large (over 5).
• Cartesian products would be too costly due to the size of the individual dimension tables.
• The fact table is sparsely populated.

Paper #311 / Page 7


• The cardinality is low between fact foreign key columns and the total fact number of rows making it a
good candidate for bitmap indexes.
• Space is an issue. Bitmap indexes do not store the column value in the index, which greatly reduces the
space requirements.
• The design is snowflake where dimensions are not completely denormalized. The bitmap transformation
will handle the snowflaked dimension within the subquery joining the dimension to the snowflake:
SELECT …FROM fact

WHERE fact.dim1_key IN (SELECT dim1_key

FROM dim1, dim1_snowflake

WHERE dim1.snowflake_key = dim1_snowflake.snowflake_key

AND dim1_snowflake.attr IN (‘a’,’b’,’c’))

• In complex queries where the conditions in the WHERE clause are based on non-foreign key fact table
columns. These columns need to have bitmap indexes to benefit from the bitmap transformation.

Summary
One of my standard interview questions is “How do you implement queries on a star schema?”. The response is
usually, “I create the star schema and it automatically creates star queries from that.” My follow-up question is
“How do you implement queries on a star schema?”. This paper reviewed the Star Query and Star Transformation
Query execution methods as ways to implement queries on a star schema. Both attack the query differently and
can be effectively used when you understand your data distributions and querying requirements. Now you’ll be
bettered prepared for that question in the interview!

References
“Star Queries in Oracle8”, Oracle White Paper, June 1997
“Dimensional Modeling: A Review of the Star Schema”, Dr. Stephen R. Gardner
“Implementing Star Queries in Oracle7.2/7.3”, Kevin Loney
“How To Optimize Queries in A Star Schema Utilizing Star Transformation in Oracle8”, Vilin Roufchaie
Oracle7/8 Documentation

About the Author


Michael Janesch is an Advisory Consultant with Innovative Consulting, a Malvern, Pennsylvania based consulting
firm specializing in Data Warehousing and Business Intelligence solutions. Michael has specialized in Oracle
since 1993 as a Developer and DBA. He can be reached at (610) 993-8700, [email protected],
or www.Innovative-Consult.com

Paper #311 / Page 8

You might also like