0% found this document useful (0 votes)
5 views

Module 5 6 7 8

Uploaded by

saadehsan.17
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Module 5 6 7 8

Uploaded by

saadehsan.17
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 116

5-1

Module 5
Introduction to Indexes
Contents:
Module Overview 5-1
Lesson 1: Core Indexing Concepts 5-2

Lesson 2: Data Types and Indexes 5-8

Lesson 3: Heaps, Clustered, and Nonclustered Indexes 5-12


Lesson 4: Single Column and Composite Indexes 5-22

Lab: Implementing Indexes 5-26

Module Review and Takeaways 5-29

Module Overview
An index is a collection of pages associated with a table. Indexes are used to improve the performance of
queries or enforce uniqueness. Before learning to implement indexes, it is helpful to understand how they
work, how effective different data types are when used within indexes, and how indexes can be
constructed from multiple columns. This module discusses table structures that do not have indexes, and
the different index types available in Microsoft® SQL Server®.

Objectives
After completing this module, you will be able to:
 Explain core indexing concepts.

 Evaluate which index to use for different data types.

 Describe the difference between single and composite column indexes.


5-2 Introduction to Indexes

Lesson 1
Core Indexing Concepts
Although it is possible for Microsoft SQL Server data management software to read all of the pages in a
table when it is calculating the results of a query, doing so is often highly inefficient. Instead, you can use
indexes to point to the location of required data and to minimize the need for scanning entire tables. In
this lesson, you will learn how indexes are structured and learn the principles associated with the design of
indexes. Finally, you will see how indexes can become fragmented over time, and the steps required to
resolve this fragmentation.

Lesson Objectives
After completing this lesson, you will be able to:

 Describe how SQL Server accesses data.

 Describe the need for indexes.

 Explain the concept of b-tree index structures.

 Explain the concepts of selectivity, density, and index depth.

 Understand why index fragmentation occurs.


 Deduce the level of fragmentation on an index.

How SQL Server Accesses Data


SQL Server can access data in a table by reading
all of the pages in the table, which is known as a
table scan, or by using index pages to locate the
required rows. Each page in an index is 8 kilobytes
(KB) in size.
Whenever SQL Server needs to access data in a
table, it has to choose between doing a table scan
or seeking and reading one or more indexes. SQL
Server will choose the option with the least
amount of effort to locate the required rows.

You can always resolve queries by reading the


underlying table data. Indexes are not required,
but accessing data by reading large numbers of pages is considerably slower than methods that use
appropriate indexes.
Sometimes SQL Server creates its own temporary indexes to improve query performance. However, doing
so is up to the optimizer and beyond the control of the database administrator or programmer; these
temporary indexes will not be discussed in this module. Temporary indexes are only used to improve a
query plan if no suitable indexing already exists. A table without an index is referred to as a heap table.

In this module, you will consider standard indexes that are created on tables. SQL Server also includes
other types of index:

 Integrated full-text search is a special type of index that provides flexible searching of text.

 Spatial indexes are used with the GEOMETRY and GEOGRAPHY data types.
Developing SQL Databases 5-3

 Primary and secondary XML indexes assist when querying XML data.

 Columnstore indexes are used to speed up aggregate queries against large data sets.

Each of these other index types are discussed in later modules.

The Need for Indexes


Indexes are not described in ANSI Structured
Query Language (SQL) definitions. Indexes are
considered to be an implementation detail for the
vendor. SQL Server uses indexes for improving the
performance of queries and for implementing
certain constraints.

As mentioned in the last topic, SQL Server can


always read the entire table to return the required
results, but doing so can be inefficient. Indexes can
reduce the effort required to locate results, but
only if they are well designed.

SQL Server also uses indexes as part of its


implementation of PRIMARY KEY and UNIQUE constraints. When you assign a PRIMARY KEY or UNIQUE
constraint to a column or set of columns, SQL Server automatically indexes that column or set of columns.
It does this to make it possible to quickly check whether a given value is already present.

A Useful Analogy
It is useful to consider an analogy that might be easier to relate to. Consider a physical library. Most
libraries store books in a given order, which is basically an alphabetical order within a set of defined
categories.

Note that, even when you store the books in alphabetical order, there are various ways to do it. The order
of the books could be based on the title of the book or the name of the author. Whichever option is
chosen makes one form of search easy and other searches harder. For example, if books were stored in
title order, how would you find the ones that were written by a particular author? An index on a book’s
title and an index on the author would mean a librarian could find books quickly for either type of search.

Index Structures
Tree structures provide rapid search capabilities
for large numbers of entries in a list.

Indexes in database systems are often based on


binary tree structures. Binary trees are simple
structures where at each level, a decision is made
to navigate left or right. However, this style of tree
can quickly become unbalanced and less useful;
therefore, SQL Server uses a balanced tree.
5-4 Introduction to Indexes

Binary Tree Example


Using the topic slide as an example of a binary tree storing the values 1 to 200, consider how to find the
value 136. If the data was stored randomly, each record would need to be examined until the desired
value was found. It would take you a maximum 200 inspections to find the correct value.

Compare this against using an index. In the binary tree structure, 136 is compared against 100. As it is
greater than 100, you inspect the next level down on the right side of the tree. Is 136 less than or greater
than 150? It is less than, so you navigate down the left side. The desired value can be found in the page
containing values 126 to 150, as 136 is greater than 125. Looking at each value in this page for 136, it is
found at the 10th record. So a total of 13 values need to be compared, using a binary tree, against a
possible 200 inspections against a random heap.
SQL Server indexes are based on a form of self-balancing tree. Whereas binary trees have, at most, two
children per node, SQL Server indexes can have a larger number of children per node. This helps improve
the efficiency of the indexes and reduces the overall depth of an index—depth being defined as the
number of levels from the top node (called the root node) to the bottom nodes (called leaf nodes).

Selectivity, Density and Index Depth

Selectivity
Additional indexes on a table are most useful
when they are highly selective. Selectivity is the
most important consideration when selecting
which columns should be included in an index.

For example, imagine how you would locate books


by a specific author in a physical library by using a
card file index. The process would involve the
following steps:
 Finding the first entry for the author in the
index.

 Locating the book in the bookcases, based on the information in the index entry.

 Returning to the index and finding the next entry for the author.

 Locating the book in the bookcases, based on the information in that next index entry.

You would need to keep repeating the same steps until you had found all of the books by that author.
Now imagine doing the same for a range of authors, such as one-third of all of the authors in the library.
You soon reach a point where it would be quicker to just scan the whole library and ignore the author
index, rather than running backward and forward between the index and the bookcases.

Density
Density is a measure of the lack of uniqueness of the data in a table. It is a value between 0 and 1.0, and
can be calculated for a column with the following formula:

Density = 1 / number of unique values in a column


A dense column is one that has a high number of duplicate values. An index will perform at its best when
it has a low level of density.
Developing SQL Databases 5-5

Index depth
Index depth is a measure of the number of levels from the root node to the leaf nodes. Users often
imagine that SQL Server indexes are quite deep, but the reality is different. The large number of children
that each node in the index can have produces a very flat index structure. Indexes that are only three or
four levels deep are very common.

Index Fragmentation
Index fragmentation is the inefficient use of pages
within an index. Fragmentation can occur over
time, as data in a table is modified.

For operations that read data, indexes perform


best when each page of the index is as full as
possible. Although indexes may initially start full,
or relatively full, modifications to the data in the
indexes can cause the need to split index pages.

From our physical library analogy, imagine a


library that has full bookcases. What occurs when
a new book needs to be added? If the book is
added to the end of the library, the process is
easy; however, if the book needs to be added in the middle of a full bookcase, there is a need to readjust
all the surrounding bookcases.

Internal vs. External Fragmentation


Internal fragmentation is similar to what would occur if an existing bookcase was split into two bookcases.
Each bookcase would then be only half full.

External fragmentation relates to where the new bookcase would be physically located. It would probably
need to be placed at the end of the library, even though it would “logically” need to be in a different
order. This means that, to read the bookcases in order, you could no longer just walk directly from one
bookcase to another. Instead, you would need to follow pointers around the library to track a chain
between bookcases.

Detecting Fragmentation
SQL Server provides a measure of fragmentation in the sys.dm_db_index_physical_stats dynamic
management view. The avg_fragmentation_in_percent column shows the percentage of fragmentation.
SQL Server Management Studio also provides details of index fragmentation in the properties page for
each index.
5-6 Introduction to Indexes

Demonstration: Viewing Index Fragmentation


In this demonstration, you will see how to:

 Identify fragmented indexes.

 View the fragmentation of an index in SSMS.

Demonstration Steps
Identify Fragmented Indexes

1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running and then log
on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.

2. Navigate to D:\Demofiles\Mod05, right-click Setup.cmd, and then click Run as administrator.

3. In the User Account Control dialog box, click Yes.

4. On the taskbar, click Microsoft SQL Server Management Studio.

5. In the Connect to Server dialog box, in the Server name box, type MIA-SQL, and then click
Connect.

6. On the File menu, point to Open, click Project/Solution.

7. In the Open Project dialog box, navigate to D:\Demofiles\Mod05, click Demo05.ssmssln, and then
click Open.
8. In Solution Explorer, expand Queries, and then double-click Demonstration 1.sql.

9. Select the code under the Step 1: Open a query window to the AdventureWorks database
comment, and then click Execute.
10. Select the code under the Step 2: Query the index physical stats DMV comment, and then click
Execute.

11. Note the avg_fragmentation_in_percent returned.

12. Select the code under the Step 4: Note that there are choices on the level of detail returned
comment, and then click Execute.

13. Select the code under the Step 5: The final choice is DETAILED comment, and then click Execute.
View the Fragmentation of an Index in SSMS

1. In Object Explorer, expand Databases, expand AdventureWorks, expand Tables, expand


Production.Product, and then expand Indexes.

2. Right-click AK_Product_Name (Unique, Non-Clustered), and then click Properties.

3. In the Index Properties - AK_Product_Name dialog box, in the Select a page pane, click
Fragmentation.

4. Note the Total fragmentation is 75%, and that this matches the results from the query executed in
the previous task step 11.

5. In the Index Properties - AK_Product_Name dialog box, click Cancel.

6. Keep SQL Server Management Studio open for the next demonstration.
Developing SQL Databases 5-7

Categorize Activity
Categorize each item into the appropriate property of an index. Indicate your answer by writing the
category number to the right of each item.

Items

1 A factor of the number of rows


returned, compared to the
total number of rows in the
index.

2 How unique the data is,


compared to the other data in
the index.

3 The number of unique levels


between the root node and
leaf nodes in the index.

4 The most important


consideration when designing
an index.

Category 1 Category 2 Category 3

Selectivity Density Depth


5-8 Introduction to Indexes

Lesson 2
Data Types and Indexes
Not all data types work equally well when included in an index. The size of the data and the selectivity of
the search are the most important considerations for performance, but you should also consider usability.
In this lesson, you will gain a better understanding of the impacts of the different data types on index
performance.

Lesson Objectives
After completing this lesson, you will be able to:

 Discuss the impact of numeric data on indexes.

 Discuss the impact of character data on indexes.

 Discuss the impact of date-related data on indexes.

 Discuss the impact of globally-unique identifier (GUID) data on indexes.

 Discuss the impact of BIT data on indexes.

 Describe the benefits of using computed columns with indexes.

Numeric Index Data


When numeric values are used as components in
indexes, a large number of entries can fit in a small
number of index pages. This makes reading
indexes based on numeric values very fast.
The type of numeric data will have an impact on
the indexes’ overall size and performance.

Data Type Storage Space

tinyint 1 byte

smallint 2 bytes

int 4 bytes

bigint 8 bytes

decimal(p,s) 5 to 17 bytes
numeric(p,s)

smallmoney 4 bytes

money 8 bytes

real 4 bytes

float(n) 4 bytes or 8 bytes


Developing SQL Databases 5-9

As each page in an index is 8 KB in size, an index with only an int data type will hold a maximum of 2,048
values in a single page.
The disadvantage of using smaller numerical data types is that, inherently, the column will be more dense.
As the range of numbers reduces, the number of duplicates increases.

Character Index Data


Character-based indexes are typically less efficient
than numeric indexes, but character data is often
used to search for a record—so, in those
circumstances, an index can be very beneficial.

Character data values tend to be larger than


numeric values. For example, a character column
might hold a customer's name or address details.
This means that far fewer entries can exist in a
given number of index pages, which makes
character-based indexes slower to seek.
Character-based indexes also tend to cause
fragmentation problems because new values are
seldom at the end of an index.

Data Type Storage Space

char 8,000 bytes

varchar 8,000 bytes

text 2,147,483,647 bytes

The preceding table shows the maximum size for each of these columns. As character columns like
varchar will only be as big as the largest data stored in them, these sizes could be considerably smaller.

Date-Related Index Data


Date-related data types are only slightly less
efficient than integer data types. Date-related data
types are relatively small, so can be compared and
sorted quickly.

Data Type Storage Space

date 3 bytes

datetime 8 bytes

datetimeoffset 10 bytes

smalldatetime 4 bytes
5-10 Introduction to Indexes

Data Type Storage Space

time 5 bytes

timestamp 8 bytes

Like numerical data types, the column will inherently be more dense. As the range of dates reduces, the
number of duplicates increases.

GUID Index Data


Globally unique identifier (GUID) values are
reasonably efficient within indexes. There is a
common misconception that they are large, but
they are 16 bytes long and can be compared in a
binary fashion. This means that they pack quite
tightly into indexes and can be compared and
sorted quickly.

Data Type Storage Space

uniqueidentifier 16 bytes

GUIDs are typically used as keys in a table, so


therefore each value is unique—this data type is
very selective and has little density.

The downside to this uniqueness is that it will take longer to perform updates and deletes to records in
the middle of the table. This is because the index may need to be updated and reordered.

BIT Index Data


There is a very common misconception that bit
columns are not useful in indexes. This stems from
the fact that there are only two values, which
means a bit column is very dense.

Remember, though, that the selectivity of queries


is the most important factor on the performance
of an index. For example, consider a transaction
table that contains 100 million rows, where one of
the columns, IsFinalized, indicates whether a
transaction has been completed. There might only
be 500 transactions that are not completed. An
index that uses the IsFinalized column would be
very useful for finding the nonfinalized transactions. In this example, an index would be very useful, as it
would be extremely selective.

Data Type Storage Space

bit 1 bit
Developing SQL Databases 5-11

As the data type is extremely small, many more values can be stored in a page.

Indexing Computed Columns


A computed column is a column in a table that is
derived from the values of other columns in the
same table. For example, in a table that tracks
product sales, you might create a computed
column that multiplies the unit price of a product
by the quantity of that product sold—to calculate
a revenue value for each order. Applications that
query the database could then obtain the revenue
values without having to specify the calculation
themselves.

When you create a computed column, SQL Server


does not store the computed values, it only
calculates them when the column is included in a query. Building an index on a computed column
improves performance because the index includes the computed values, so SQL Server does not need to
calculate them when the query is executed. Furthermore, the values in the index automatically update
when the values in the base columns change, so the index remains up to date.

When you are deciding whether to index computed columns, you should consider the following:
 When the data in the base columns that the computed column references changes, the index is
correspondingly updated. If the data changes frequently, these index updates can impair
performance.
 When you rebuild an index on a computed column, SQL Server recalculates the values in the column.
The amount of time that this takes will depend on the number of rows and the complexity of the
calculation, but if you rebuild indexes often, you should consider the impact that this can have.

 You can only build indexes on computed columns that are deterministic.

Sequencing Activity
Put the following data types in order of the smallest to the maximum possible size.

Steps

BIT

date

bigint

datetimeoffset

uniqueidentifier

char

text
5-12 Introduction to Indexes

Lesson 3
Heaps, Clustered, and Nonclustered Indexes
Tables in SQL Server can be structured in two ways. Rows can be added in any order, or tables can be
structured with rows added in a specific order. In this lesson, you will investigate both options, and gain
an understanding of how each option affects common data operations.

Lesson Objectives
After completing this lesson, you will be able to:

 Describe the attributes of a table created without any order.

 Complete operations on a heap.

 Detail the characteristics of a clustered index, and its benefits over a heap.

 Complete operations on a clustered index.

 Describe how a primary key is different to a clustering key.


 Explain the reasons for using nonclustered indexes, and how they can be used in conjunction with
heap and clustered indexes.

 Complete operations on a nonclustered index.

Heaps
The simplest table structure that is available in SQL
Server is a heap. A heap is a table that has no
enforced order for either the pages within the
table, or for the data rows within each page. Data
rows are added to the first available location
within the table's pages that have sufficient space.
If no space is available, additional pages are added
to the table and the rows are placed in those
pages.

Although no index structure exists for a heap, SQL


Server tracks the available pages by using an entry
in an internal structure called an Index Allocation
Map (IAM).

Physical Library Analogy


In the physical library analogy, a heap would be represented by structuring your library so that every book
is just placed in any available space in a bookcase. Without any other assistance, finding a book would
involve scanning one bookcase after another—a bookcase being the equivalent of a page in SQL Server.
Developing SQL Databases 5-13

Creation
To create a heap in SQL Server, all that is required is the creation of a table.

Creating a heap
CREATE TABLE Library.Book(
BookID INT IDENTITY(1,1) NOT NULL,
ISBN VARCHAR(14) NOT NULL,
Title VARCHAR(4000) NOT NULL,
AuthorID INT NOT NULL,
PublisherID INT NOT NULL,
ReleaseDate DATETIME NOT NULL,
BookData XML NOT NULL
);

Physical Library Analogy


Now imagine that three additional indexes were created in the library, to make it easier to find books by
author, International Standard Book Number (ISBN), and release date.

There was no order to the books on the bookcases, so when an entry was found in the ISBN index, the
entry would refer to the physical location of the book. The entry would include an address like “Bookcase
12, Shelf 5, Book 3.” That is, there would need to be a specific address for a book. An update to the book
that required moving it to a different location would be problematic. One option for resolving this would
be to locate all index entries for the book and update the new physical location.
An alternate option would be to leave a note in the location where the book used to be, pointing to
where the book has been moved. In SQL Server, this is called a forwarding pointer—it means rows can be
updated and moved without needing to update other indexes that point to them.

A further challenge arises if the book needs to be moved again. There are two ways in which this could be
handled—another note could be left pointing to the new location or the original note could be modified
to point to the new location. Either way, the original indexes would not need to be updated. SQL Server
deals with this by updating the original forwarding pointer. This way, performance does not continue to
degrade by having to follow a chain of forwarding pointers.

Remove Forwarding Pointers


When other indexes point to rows in a heap, data modification operations cause forwarding pointers to
be inserted into the heap. Over time, this can cause performance issues.

Forwarding pointers were a common performance problem with tables in SQL Server that were structured
as heaps. They can be resolved via the following command:

Forwarding pointers were a common performance problem with tables in SQL Server that were structured
as heaps. They can be resolved via the following command:

Resolve forwarding pointer issues in a heap


ALTER TABLE Library.Book WITH REBUILD;

You can also use this command to change the compression settings for a table. Page and row
compression are advanced topics that are beyond the scope of this course.
5-14 Introduction to Indexes

Operations on a Heap
The most common operations that are performed
on tables are INSERT, UPDATE, DELETE, and
SELECT. It is important to understand how each of
these operations is affected by structuring a table
as a heap.

Physical Library Analogy


In the library analogy, an INSERT operation would
be executed by locating any gap that was large
enough to hold the book and placing it there. If
no space large enough is available, a new
bookcase would be allocated and the book placed
there. This would continue unless there was a limit
on the number of bookcases that could fit in the library.

A DELETE operation could be imagined as scanning the bookcases until the book is found, removing the
book, and throwing it away. More precisely, it would be like placing a tag on the book, to say that it
should be thrown out the next time the library is cleaned up or space on the bookcase is needed.

An UPDATE operation would be represented by replacing a book with a (potentially) different copy of the
same book. If the replacement book was the same (or smaller) size as the original book, it could be placed
directly back in the same location as the original. However, if the replacement book was larger, the
original book would be removed and the replacement moved to another location. The new location for
the book could be in the same bookcase or in another bookcase.
There is a common misconception that including additional indexes always reduces the performance of
data modification operations. However, it is clear that for the DELETE and UPDATE operations described
above, having another way to find these rows might well be useful. In Module 6, you will see how to
achieve this.

Clustered Indexes
Rather than storing data rows of data as a heap,
you can design tables that have an internal logical
ordering. This kind of table is known as a clustered
index or a rowstore.

A table that has a clustered index has a predefined


order for rows within a page and for pages within
the table. The order is based on a key that consists
of one or more columns. The key is commonly
called a clustering key.

The rows of a table can only be in a single order,


so there can only be one clustered index on a
table. An IAM entry is used to point to a clustered
index.
Developing SQL Databases 5-15

There is a common misconception that pages in a clustered index are “physically stored in order.”
Although this is possible in rare situations, it is not commonly the case. If it were true, fragmentation of
clustered indexes would not exist. SQL Server tries to align physical and logical order while it creates an
index, but disorder can arise as data is modified.
Index and data pages are linked within a logical hierarchy and also double-linked across all pages at the
same level of the hierarchy—to assist when scanning across an index.

Creation
You can create clustered indexes, either directly by using the CREATE INDEX command, or automatically
in situations where a PRIMARY KEY constraint is specified on the table:

Create an index directly on an existing table


CREATE INDEX IX_ISBN ON Library.Book (ISBN);

The following Transact-SQL will create a table. The alter statement then adds a constraint, with the side
effect of a clustered index being created.

Create an index indirectly


CREATE TABLE Library.LogData
( LogID INT IDENTITY(1,1),
LogData XML NOT NULL );

ALTER TABLE Library.LogData ADD CONSTRAINT PK_LogData PRIMARY KEY (LogId);

Updating
You can rebuild, reorganize and disable an index. The last option of disabling an index isn’t really
applicable for clustered indexes, because disabling one doesn’t allow any access to the underlying data in
the table. However, disabling a nonclustered index does have its uses. These will be discussed in a future
topic.

Transact-SQL can be used to rebuild a single index.

Rebuild a specific index


ALTER INDEX IX_ISBN ON Library.Book REBUILD;

You can also rebuild all the indexes on a specified table.

Rebuild all indexes on a table


ALTER INDEX ALL ON Library.Book REBUILD;

The REORGANIZE statement can be used in the same way, either on a specific index or on a whole table.

Reorganize all indexes on a table


ALTER INDEX ALL ON Library.Book REORGANIZE;

Deleting
If a clustered index is created explicitly, then the following Transact-SQL will delete it:

Delete a clustered index


DROP INDEX IX_ISBN ON Library.Book;
5-16 Introduction to Indexes

A table will need to be altered to delete a clustered index, if it was created as a consequence of defining a
constraint.

Delete a clustered index if created as part of adding a constraint


ALTER TABLE Library.LogData DROP CONSTRAINT PK_LogData;

Physical Library Analogy


In the library analogy, a clustered index is similar to storing all books in a specific order. An example of
this would be to store books in ISBN order. Clearly, the library can only be sorted in one direction.

Operations on a Clustered Index


So far in this module, you have seen how common
operations are performed on tables that are
structured as heaps. It is important to understand
how each of those operations is affected when you
are structuring a table that has a clustered index.

Physical Library Analogy


In a library that is structured in ISBN order, an
INSERT operation requires a new book to be
placed in exactly the correct logical ISBN order. If
there is space somewhere on the bookcase that is
in the required position, the book can be placed
into the correct location and all other books
moved to accommodate the new one. If there is not sufficient space, the bookcase needs to be split. Note
that a new bookcase would be physically placed at the end of the library, but would be logically inserted
into the list of bookcases.
INSERT operations would be straightforward if the books were being added in ISBN order. New books
could always be added to the end of the library and new bookcases added as required. In this case, no
splitting is required.

When an UPDATE operation is performed, if the replacement book is the same size or smaller and the
ISBN has not changed, the book can just be replaced in the same location. If the replacement book is
larger, the ISBN has not changed, and there is spare space within the bookcase, all other books in the
bookcase can slide along to enable the larger book to be replaced in the same spot.

If there was insufficient space in the bookcase to accommodate the larger book, the bookcase would need
to be split. If the ISBN of the replacement book was different from the original book, the original book
would need to be removed and the replacement book treated like the insertion of a new book.

A DELETE operation would involve the book being removed from the bookcase. (Again, more formally, it
would be flagged as free in a free space map, but simply left in place for later removal.)
When a SELECT operation is performed, if the ISBN is known, the required book can be quickly located by
efficiently searching the library. If a range of ISBNs was requested, the books would be located by finding
the first book and continuing to collect books in order, until a book was encountered that was out of
range, or until the end of the library was reached.
Developing SQL Databases 5-17

Primary Keys and Clustering Keys


As seen in a previous topic, you can create
clustered indexes directly by using the CREATE
INDEX command or as a side effect of creating a
PRIMARY KEY constraint on the table.

It is very important to understand the distinction


between a primary key and a clustering key. Many
users confuse the two terms or attempt to use
them interchangeably. A primary key is a
constraint. It is a logical concept that is enforced
by an index, but the index may or may not be a
clustered index. When a PRIMARY KEY constraint
is added to a table, the default action in SQL
Server is to make it a clustered primary key—if no other clustered index already exists on the table.

You can override this action by specifying the word NONCLUSTERED when declaring the PRIMARY KEY
constraint.

Creating a PRIMARY KEY without a clustered index


CREATE TABLE dbo.Author
(Name NVARCHAR(100) NOT NULL PRIMARY KEY NONCLUSTERED,
Publisher INT NOT NULL);

A primary key on a SQL table is used to uniquely identify rows in that table, and it must not contain any
NULL values. In most situations, a primary key is a good candidate for a clustering key. However, a real
world scenario, where the primary key may not be the clustered key, is in a table that requires a high
volume of inserts. If this table has a sequential primary key, all inserts will be at the end of the table, in the
last page. SQL Server may need to lock this page whilst inserting, forcing the inserts to become sequential
instead of parallel.

Nonclustered Indexes
You have seen how tables can be structured as
heaps or have clustered indexes. A third option is
that you can create additional indexes on top of
these tables to provide alternative ways to rapidly
locate required data. These additional indexes are
called nonclustered indexes.
A table can have up to 999 nonclustered indexes.
Nonclustered indexes can be defined on a table—
regardless of whether the table uses a clustered
index or a heap—and are used to improve the
performance of important queries.

Whenever you update key columns from the


nonclustered index or update clustering keys on the base table, the nonclustered indexes need to be
updated, too. This affects the data modification performance of the system. Each additional index that is
added to a table increases the work that SQL Server might need to perform when modifying the data
5-18 Introduction to Indexes

rows in the table. You must take care to balance the number of indexes that are created against the
overhead that they introduce.

Creation
Similar to clustered indexes, nonclustered indexes are created explicitly on a table. The columns to be
included also need to be specified.

Creating a nonclustered index


CREATE NONCLUSTERED INDEX IX_Book_Publisher
ON Library.Book (PublisherID, ReleaseDate DESC);

There is an option that is unique to nonclustered indexes. They can have an additional INCLUDE option on
declaration that is used to create covering indexes. These will be discussed in further detail in Module 6:
Advanced Indexing.

Creating a covering nonclustered index


CREATE NONCLUSTERED INDEX NCIX_Author_Publisher
ON Library.Book (BookID)
INCLUDE (AuthorID, PublisherID, ReleaseDate);

Updating
The Transact-SQL for nonclustered indexes is exactly the same as for clustered indexes. You can rebuild,
reorganize and disable an index.
Disabling an index can be very useful for nonclustered indexes on tables that are going to have large
amounts of data, either inserted or deleted. Before performing these data operations, all nonclustered
indexes can be disabled. After the data has been processed, the indexes can then be enabled by executing
a REBUILD statement. This reduces the performance impacts of having nonclustered indexes on tables.

Deletion
The same Transact-SQL that is used for clustered indexes will delete a nonclustered index.

Delete a nonclustered index


DROP INDEX NCIX_Author_Publisher ON Library.Book;

Physical Analogy
The nonclustered indexes can be thought of as indexes that point back to the bookcases. They provide
alternate ways to look up the information in the library. For example, they might give access by author, by
release date, or by publisher. They can also be composite indexes where you might find an index by
release date, within the entries for each author. Composite indexes will be discussed in the next lesson.
Developing SQL Databases 5-19

Operations on Nonclustered Indexes


The operations of a nonclustered index are
dependent on the underlying table structure they
are declared on. However, there are common
considerations across both structures, the most
important being that every extra nonclustered
index is taking more space within the database.
Another downside is that, when data is modified in
the underlying table, each nonclustered index will
need to be kept up to date.

Due to the downsides of nonclustered indexes, if


large batches of inserts or deletes are being
performed, it is common practice to disable
nonclustered indexes before the data operations, and then recreate them.

Physical Analogy
It does not matter how the library is structured—whether the books are stored in ISBN order, or category
and author order, or randomly, a nonclustered index is like an extra card index pointing to the locations
of books in the bookcases. These extra indexes can be on any attribute of a book; for example, the release
date, or whether it has a soft or hard cover. The cards in the index will have a pointer to the book’s
physical location on a shelf.

Each of these extra indexes then need to be maintained by the librarian. When a librarian inserts a new
book in the library, they should make a note of its location in each card index. This is true for removing a
book, or updating its location to another shelf. Each of these operations requires updates to be made to
every card index.
These extra indexes have an advantage when they are used to search for books. The additional indexes
will improve the performance of finding books released in 2003, for example. Without this extra index, the
librarian would have to check the release date of every single book, to see if it matched the required date.

Demonstration: Working with Clustered and Nonclustered Indexes


In this demonstration, you will see how to:

 Create a clustered index.


 Remove fragmentation on a clustered index.

 Create a covering index.

Demonstration Steps
1. In SQL Server Management Studio, in Solution Explorer, double-click Demonstration 2.sql.

2. Select the code under the Step 1: Open a new query window against the tempdb database
comment, and then click Execute.
3. Select the code under the Step 2: Create a table with a primary key specified comment, and then
click Execute.

4. Select the code under the Step 3: Query sys.indexes to view the structure comment, and then click
Execute.
5-20 Introduction to Indexes

5. In Object Explorer, expand Databases, expand System Databases, expand tempdb, expand Tables,
expand dbo.PhoneLog, and then expand Indexes.
6. Note that a clustered index was automatically created on the table.

7. Select the code under the Step 4: Insert some data into the table comment, and then click
Execute.
8. Select the code under the Step 5: Check the level of fragmentation via
sys.dm_db_index_physical_stats comment, and then click Execute.

9. Scroll to the right, and note the avg_fragmentation_in_percent and


avg_page_space_used_in_percent returned.

10. Select the code under the Step 7: Modify the data in the table - this will increase data and cause
page fragmentation comment, and then click Execute.

11. Select the code under the Step 8: Check the level of fragmentation via
sys.dm_db_index_physical_stats comment, and then click Execute.

12. Scroll to the right, and note the avg_fragmentation_in_percent and


avg_page_space_used_in_percent returned.

13. Select the code under the Step 10: Rebuild the table and its indexes comment, and then click
Execute.
14. Select the code under the Step 11: Check the level of fragmentation via
sys.dm_db_index_physical_stats comment, and then click Execute.

15. Scroll to the right, and note the avg_fragmentation_in_percent and


avg_page_space_used_in_percent returned.

16. On the Query menu, click Include Actual Execution Plan.


17. Select the code under the Step 13: Run a query showing the execution plan comment, and then
click Execute.

18. Review the execution plan.


19. Select the code under the Step 14: Create a covering index, point out the columns included
comment, and then click Execute.

20. Select the code under the Step 15: Run the query showing the execution plan (CTR+M) – it now
uses the new index comment, and then click Execute.

21. Review the execution plan.

22. Select the code under the Step 16: Drop the table comment, and then click Execute.

23. Keep SQL Server Management Studio open for the next demonstration.
Developing SQL Databases 5-21

Categorize Activity
Categorize each attribute of an index. Indicate your answer by writing the attribute number to the right of
each index.

Items

1 Data is stored in the table wherever there


is space.

2 Data is stored by a key in a specified


order.

3 Will be defined on top of a heap or


rowstore.

4 Most efficient operation is an insert.

5 Can improve the performance of


updates, deletes and selects.

6 Can improve the performance of selects.

7 Best used when scanning for data.

8 Best used when seeking for data.

Category 1 Category 2 Category 3

Heap Clustered Nonclustered


5-22 Introduction to Indexes

Lesson 4
Single Column and Composite Indexes
The indexes discussed so far have been based on data from single columns. Indexes can also be based on
data from multiple columns, and constructed in ascending or descending order. This lesson investigates
these concepts and the effects that they have on index design, along with details of how SQL Server
maintains statistics on the data that is contained within indexes.

Lesson Objectives
After completing this lesson, you will be able to:

 Describe the differences between single column and composite indexes.

 Describe the differences between ascending and descending indexes.

 Explain how SQL Server keeps statistics on indexes.

Single Column vs. Composite Indexes


Indexes can be constructed on multiple columns
rather than on single columns. Multicolumn
indexes are known as composite indexes.

In applications, composite indexes are often more


useful than single column indexes. The advantages
of composite indexes are:

 Higher selectivity.

 The possibility of avoiding the need to sort


the output rows.

In our physical library analogy, consider a query


that required the location of books by a publisher
within a specific release year. Although a publisher index would be useful for finding all of the books that
the publisher released, it would not help to narrow down the search to those books within a specific year.
Separate indexes on the publisher and the release year would not be useful, but an index that contained
both publisher and release year could be very selective.

Similarly, an index by topic would be of limited value. After the correct topic had been located, it would
be necessary to search all of the books on that topic to determine if they were by the specified author.
The best option would be an author index that also included details of each book's topic. In that case, a
scan of the index pages for the author would be all that was required to work out which books needed to
be accessed.
When you are constructing composite indexes, in the absence of any other design criteria, you should
typically index the most selective column first. The order of columns in a composite index is important,
not only for performance, but also for whether the query optimizer will even use the index. For example,
an index on City, State would not be used in queries where State is the only column in the WHERE clause.
Developing SQL Databases 5-23

Considerations
The following should also be considered when choosing columns to add to a composite index:

 Is the column selective? Only columns that are selective should be used.

 How volatile is the column? Columns that are frequently updated will likely cause an index to be
rebuilt. Choose columns that have more static data.

 Is the column queried upon? This column should be included providing it passes the above
considerations.

 The most selective columns should be first in the composite index; columns with inequality predicates
should be towards the end.

 Keep the number of columns to a minimum, as each column added increases the overall size of the
composite index.
A specific type of composite index is a covering index. This kind of column is outside the scope of this
module but is covered in Module 6: Advanced Indexing.

Ascending vs. Descending Indexes


Each component of an index can be created in an
ascending or descending order. By default, if no
order is specified when creating an index, then it
will be in ascending order. For single column
indexes, ascending and descending indexes are
equally useful. For composite indexes, specifying
the order of individual columns within the index
might be useful.
In general, it makes no difference whether a single
column index is ascending or descending. From
our physical library analogy, you could scan either
the bookshelves or the indexes from either end.
The same amount of effort would be required, no matter which end you started from.
Composite indexes can benefit from each component having a different order. Often this is used to avoid
sorts. For example, you might need to output orders by date descending within customer ascending.
From our physical library analogy, imagine that an author index contains a list of books by release date
within the author index. Answering the query would be easier if the index was already structured this way.
5-24 Introduction to Indexes

Index Statistics
SQL Server keeps statistics on indexes to assist
when making decisions about how to access the
data in a table.

By default, these statistics are automatically


created and updated on indexes.

Earlier in this module, you saw that SQL Server


needs to make decisions about how to access the
data in a table. For each table that is referenced in
a query, SQL Server might decide to read the data
pages or it may decide to use an index.

It is important to realize that SQL Server must


make this decision before it begins to execute a query. This means that it needs to have information that
will assist it in making this determination. For each index, SQL Server keeps statistics that tell it how the
data is distributed. The query optimizer uses these statistics to estimate the cardinality, or number of rows,
that will be in the query result.

Identifying Out of Date Statistics


As data in a table is updated, deleted or inserted, the associated statistics can become out of date. There
are two ways to explore the accuracy of statistics for any given table:
1. Inspect a queries execution plan and check that the “Actual Number of Rows” and “Estimated
Number of Rows” are approximately the same.

2. Use a Transact-SQL command—DBCC SHOW_STATISTICS—and check the Updated column.

If statistics are determined to be out of date they can be manually updated with one of the following
Transact-SQL commands:

Update statistics
/* Update all statistics in a database */
EXEC sp_updatestats;

/* Update all the statistics on a specific table */


UPDATE STATISTICS Production.Product;

/* Update the statistics on a specific index */


UPDATE STATISTICS Production.Product PK_Product_ProductID;

Physical Library Analogy


When discussing the physical library analogy earlier, it was mentioned that, if you were looking up the
books for an author, it could be useful to use an index that is ordered by author. However, if you were
locating books for a range of authors, there would be a point at which scanning the entire library would
be quicker than running backward and forward between the index and the bookshelves.

The key issue here is that, before executing the query, you need to know how selective (and therefore
useful) the indexes would be. The statistics that SQL Server holds on indexes provide this knowledge.
Developing SQL Databases 5-25

Demonstration: Viewing Index Statistics


In this demonstration, you will see how to:

 View the statistics of an index via Transact-SQL.

 View the statistics of an index with SSMS.

 View the database settings related to statistics.

Demonstration Steps
Use Transact-SQL to View Statistics

1. In SQL Server Management Studio, in Solution Explorer, double-click Demonstration 3.sql.

2. Select the code under the comment Step 1: Run the Transact-SQL up to the end step 1, and then
click Execute.

3. Walk through the important columns in the results.

4. On the Query menu, click Include Actual Execution Plan.


5. Select the code under the comment Step 2: Check the freshness of the statistics, CTRL-M to
switch on the Execution Plan, and then click Execute.

6. In the Execution Plan pane, scroll right and point at the last Clustered Index Scan. Note that the actual
and estimated number of rows are equal.
Use SQL Server Management Studio to View Statistics

1. In Object Explorer, expand Databases, expand AdventureWorks, expand Tables, expand


HumanResources.Employee, and then expand Statistics.
2. Right-click AK_Employee_LoginID, and then click Properties.

3. In the Statistics Properties - AK_Employee_LoginID dialog box, in the Select a page section, click
Details.
4. Review the details, and then click Cancel.

Inspect the Statistics Settings for a Database

1. In Object Explorer, under Databases, right-click AdventureWorks, and then click Properties.

2. In the Database Properties - AdventureWorks dialog box, in the Select a page section, click
Options.

3. In the Other options list, scroll to the top, under the Automatic heading, note that the Auto Create
Statistics and Auto Update Statistics are set to True.

4. In the Database Properties - AdventureWorks dialog box, click Cancel.

5. Close SSMS without saving any changes.


5-26 Introduction to Indexes

Lab: Implementing Indexes


Scenario
One of the most important decisions when designing a table is to choose an appropriate table structure.
In this lab, you will choose an appropriate structure for some new tables required for the relationship
management system.

Objectives
After completing this lab, you will be able to:

 Create a table without any indexes.

 Create a table with a clustered index.

 Add a nonclustered key in the form of a covering index.


Estimated Time: 30 minutes

Virtual machine: 20762C-MIA-SQL

User name: ADVENTUREWORKS\Student


Password: Pa55w.rd

Exercise 1: Creating a Heap


Scenario
The design documentation requires you to create tables to store sales related data. You will create these
two tables to support the requirement of the sales department.
The supporting documentation for this exercise is located in D:\Labfiles\Lab05\Starter\Supporting
Documentation.docx.

The main tasks for this exercise are as follows:

1. Prepare the Lab Environment

2. Review the Documentation

3. Create the Tables

 Task 1: Prepare the Lab Environment


1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are both running, and then
log on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.

2. Run Setup.cmd in the D:\Labfiles\Lab05\Starter folder as Administrator.

 Task 2: Review the Documentation


 Review the requirements in Supporting Documentation.docx in the D:\Labfiles\Lab05\Starter
folder and decide how you are going to meet them.

 Task 3: Create the Tables


1. Create a table based on the supporting documentation for Table 1: Sales.MediaOutlet.
2. Create a table based on the supporting documentation for Table 2: Sales.PrintMediaPlacement.

Results: After completing this exercise, you will have created two new tables in the AdventureWorks
database.
Developing SQL Databases 5-27

Exercise 2: Creating a Clustered Index


Scenario
The sales department has started to use the new tables, and are finding that, when trying to query the
data, the performance is unacceptable. They have asked you to make any database changes you can to
improve performance.

The main tasks for this exercise are as follows:

1. Add a Clustered Index to Sales.MediaOutlet


2. Add a Clustered Index to Sales.PrintMediaPlacement

 Task 1: Add a Clustered Index to Sales.MediaOutlet


1. Consider which column is best suited to an index.

2. Using Transact-SQL statements, add a clustered index to that column on the Sales.MediaOutlet
table.

Consider implementing the index by creating a unique constraint.

3. Use Object Explorer to check that the index was created successfully.

 Task 2: Add a Clustered Index to Sales.PrintMediaPlacement


1. Consider which column is best suited to an index.

2. Using Transact-SQL statements, add a clustered index to that column on the


Sales.PrintMediaPlacement table.
3. Use Object Explorer to check that the index was created successfully.

Results: After completing this exercise, you will have created clustered indexes on the new tables.

Exercise 3: Creating a Covering Index


Scenario
The sales team has found that the performance improvements that you have made are not working for
one specific query. You have been tasked with adding additional performance improvements to handle
this query.

The main tasks for this exercise are as follows:

1. Add Some Test Data

2. Run the Poor Performing Query

3. Create a Covering Index


4. Check the Performance of the Sales Query

 Task 1: Add Some Test Data


 Run the Transact-SQL in D:\Labfiles\Lab05\Starter\InsertDummyData.sql to insert test data into
the two tables.
5-28 Introduction to Indexes

 Task 2: Run the Poor Performing Query


1. Switch on Include Actual Execution Plan.

2. Run the Transact-SQL in D:\Labfiles\Lab05\Starter\SalesQuery.sql.

3. Examine the Execution Plan.

4. Note the missing index warning in SQL Server Management Studio.

 Task 3: Create a Covering Index


1. On the Execution Plan tab, right-click the green Missing Index text and click Missing Index
Details.

2. Use the generated Transact-SQL to create the missing covering index.

3. Use Object Explorer to check that the index was created successfully.

 Task 4: Check the Performance of the Sales Query


1. Rerun the sales query.

2. Check the Execution Plan and ensure the database engine is using the new
NCI_PrintMediaPlacement index.
3. Close SQL Server Management Studio without saving any changes.

Results: After completing this exercise, you will have created a covering index suggested by SQL Server
Management Studio.
Developing SQL Databases 5-29

Module Review and Takeaways


Best Practice: Choose columns with a high level of selectivity for indexes.
Rebuild highly fragmented indexes.
Use nonclustered indexes to improve the performance of particular queries, but balance their use
with the overhead that they introduce.
Use actual execution plans to obtain missing index hints.
6-1

Module 6
Designing Optimized Index Strategies
Contents:
Module Overview 6-1
Lesson 1: Index Strategies 6-2

Lesson 2: Managing Indexes 6-7

Lesson 3: Execution Plans 6-16


Lesson 4: The Database Engine Tuning Advisor 6-25

Lesson 5: Query Store 6-27

Lab: Optimizing Indexes 6-34

Module Review and Takeaways 6-37

Module Overview
Indexes play an important role in enabling SQL Server to retrieve data from a database quickly and
efficiently. This module discusses advanced index topics including covering indexes, the INCLUDE clause,
query hints, padding and fill factor, statistics, using DMOs, the Database Tuning Advisor, and Query Store.

Objectives
After completing this module, you will be able to understand:

 What a covering index is, and when to use one.


 The issues involved in managing indexes.

 Actual and estimated execution plans.

 How to use Database Tuning Advisor to improve the performance of queries.

 How to use Query Store to improve query performance.


6-2 Designing Optimized Index Strategies

Lesson 1
Index Strategies
This lesson considers index strategies, including covering indexes, the INCLUDE clause, heaps and
clustered indexes, and filtered indexes.

Lesson Objectives
After completing this lesson, you will be able to:

 Understand when to use a covering index.

 Explain when to use the INCLUDE clause.

 Understand the performance difference between a heap and a clustered index.


 Understand when to use a filtered index.

Covering Indexes

Covering Indexes Include All Columns


A covering index is a nonclustered index that
includes all the columns required by a specific
query. Because the query has all the columns in
the nonclustered index, there is no need for SQL
Server to retrieve data from clustered indexes. This
can dramatically improve the performance of a
query. When an index includes all the columns
required by a query, the index is said to cover the
query.

When Should You Use a Covering Index?


You can use a covering index to improve the performance of any query that is running too slowly.
Although indexes improve query performance, they also create an overhead. When data is entered or
modified, related indexes have to be updated. So adding a new index is always a balance between
decreasing the time taken to retrieve data, against the time taken to add or change data. Create covering
indexes where they can be most effective—for queries that have performance problems, or for queries
that are used frequently.

Drop Unnecessary Indexes


An index is a covering index when it includes all the columns required by the query. However, when that
query is no longer required, the index has no further purpose and should be dropped. Keeping
unnecessary indexes makes data entry and modification slower, and statistics take longer to maintain.
Dropping unnecessary indexes helps to keep your database running smoothly.

Note: A covering index is just a nonclustered index that includes all the columns required
by a particular query. It is sometimes referred to as an “index covering the query”.
Developing SQL Databases 6-3

Using the INCLUDE Clause

Index Limitations
We have already seen that a covering index can
improve the performance of some queries;
however, SQL Server has limits on how large an
index can be. In SQL Server, these limitations are:

 Maximum of 32 columns.

 Clustered indexes must be 900 bytes or less.

 Nonclustered indexes must be 1700 bytes or


less.

 Indexes cannot include LOB columns and key


columns. LOB data types include ntext, text, varchar(max), nvarchar(max), varbinary(max), xml, and
image.
 All columns must be from the same table or view.

Use the INCLUDE clause if you want to create a covering index, but some columns are of excluded data
types, or the index might already be at its maximum size.

Key and Nonkey Columns


Columns in an index are known as key columns, whereas columns added using the INCLUDE clause are
known as nonkey columns. You can add nonkey columns to any index using the INCLUDE clause, but they
give the most benefit when they cover a query.

Nonkey Columns Are at the Leaf Level


Columns added using the INCLUDE clause are added at the leaf level of the index, rather than in the
higher nodes. This makes them suitable for SELECT statement columns, rather than columns used for joins
or sorting. Use the INCLUDE clause to make an index more efficient, keeping smaller columns used for
lookups as key columns, and adding larger columns as nonkey columns.

Note: All columns in an index must be from the same table. If you want an index with
columns from more than one table, create an indexed view.

When to Use the INCLUDE Clause


A nonclustered index may include two table columns—you can use the INCLUDE clause to add three
more columns, and create a covering index. In this example, key columns refer to the two columns of the
nonclustered index; columns added using the INCLUDE clauses are the nonkey columns.
The most efficient way of using the INCLUDE clause is to use it for columns with larger data types that are
only used in the SELECT statement. Make columns that are used in the WHERE or GROUP BY clauses into
key columns, so keeping indexes small and efficient.
6-4 Designing Optimized Index Strategies

The following index is created with columns used for searching and sorting as key columns, with the
larger columns that only appear in the SELECT statement as INCLUDED columns:

Covering Index Using the INCLUDE Clause


USE AdventureWorks2016;
GO

CREATE NONCLUSTERED INDEX AK_Product


ON Production.Product
(ProductID ASC, ListPrice ASC)
INCLUDE (Name, ProductNumber, Color)
GO

Data Stored Twice


Adding columns with larger data types to an index using the INCLUDE clause means that the data is
stored both in the original table, and in the index. So whilst covering indexes can give a performance
boost, this strategy increases the required disk space. There is also an overhead when inserting, updating,
and deleting data as the index is maintained.

For more information about the INCLUDE clause, see Microsoft Docs:

Create Indexes with Included Columns


http://aka.ms/K39f9y

Heap vs. Clustered Index

What Is a Heap?
A heap is a table that has been created without a
clustered index, or a table that had a clustered
index that has now been dropped. A heap is
unordered: data is written to the table in the order
in which it is created. However, you cannot rely on
data being stored in the same order as it was
created, because SQL Server can reorder the data
to store it more efficiently.

Note: You cannot rely on any data set being


in a particular order unless you use the ORDER BY clause—whether or not the table is a heap.
Although data might appear to be in the order you require, SQL Server does not guarantee that
data will always be returned in a particular sequence, unless you specify the order.

Tables are normally created with a primary key, which automatically creates a clustered index. Additional
nonclustered indexes can then be created as required. However, you can also create nonclustered indexes
on a heap. So why would you want to create a heap?
Developing SQL Databases 6-5

When to Use a Heap


There are two main reasons to use a heap:

1. The table is so small that it doesn’t matter. A parameters table, or lookup table that contains only a
few rows, might be stored as a heap.

2. You need to write data to disk as fast as possible. A table that holds log records, for example, might
need to write data without delay.

However, it is fair to say that effective use of heaps is rare. Tables are almost always more efficient when
created with a clustered index because, with a heap, the whole table must be scanned to find a record.

Note: You can make the storage of a heap more efficient by creating a clustered index on
the table, and then dropping it. Be aware, however, that each time a clustered index is created or
dropped from a table, the whole table is rewritten to disk. This can be time-consuming, and
requires there to be enough disk space in tempdb.

For more information about heaps, see Microsoft Docs:

Heaps (Tables without Clustered Indexes)


http://aka.ms/Dqip6i

What Is a Clustered Index?


A clustered index defines the order in which the physical table data is stored. A table can have only one
clustered index because the data can only be stored in one sequence. A clustered index is automatically
created when a primary key is created, providing there is not already a clustered index on the table.
Tables in a relational database almost always have a clustered index. A well-designed clustered index
improves query performance, and uses less system resources.

Filtered Index

What Is a Filtered Index?


A filtered index is created on a subset of the
table’s records. The index must be a nonclustered
index, and the filter defined using a WHERE clause.
Filtered indexes improve query performance when
queries use only some of a table’s data, and that
subset can be clearly defined.

When to Use a Filtered Index


Filtered indexes can be used when a table contains
subsets of data that are queried frequently. For
example:

 Nullable columns that contain few non-NULL values.

 Ranges of values that are queried frequently, such as financial values, dates, or geographic regions.

 Category data, such as status codes.


6-6 Designing Optimized Index Strategies

Consider a products table that is frequently queried for products that have the FinishedGoodsFlag.
Although the table holds both components and finished items, the marketing department almost always
queries the table for finished items. Creating a filtered index will improve performance, especially if the
table is large.

The following code example shows an index filtered on finished goods only:

Filtered Index
USE AdventureWorks2016;
GO

CREATE INDEX ix_product_finished


ON Production.Product (FinishedGoodsFlag)
WHERE FinishedGoodsFlag = 1;

Creating filtered indexes not only increases the performance of some queries, but also reduces the size of
an index, taking up less space on disk and making index maintenance operations faster.

Filtered Index Limitations


However, there are limitations to creating filtered indexes. These are:

 Filtered indexes can only be created on tables, not views.

 Filter predicates support only simple comparisons.


 If you need to perform a data conversion, this can only be done on the right-hand side of the
predicate operator.

For more information about filtered indexes, see Microsoft Docs:


Create Filtered Indexes
http://aka.ms/mudq8y

Check Your Knowledge


Question

Your sales table holds data for customers


across the world. You want to improve the
performance of a query run by the French
office that calculates the total sales, only for
customers based in France. Which index
would be most appropriate, and why?

Select the correct answer.

A covering index.

A clustered index.

A filtered index.

A nonclustered index on a heap.

Add region as a nonkey column using


the INCLUDE clause.
Developing SQL Databases 6-7

Lesson 2
Managing Indexes
This lesson introduces topics related to managing indexes, including fill factor, padding, statistics, and
query hints.

Lesson Objectives
After completing this lesson, you will be able to understand:

 The FILL FACTOR setting, and how to use it.

 The PAD INDEX option.

 What statistics are, and how to manage them.


 When to use query hints.

What Is Fill Factor?


Fill factor is a setting that determines how much
spare space remains on each leaf-level page of an
index and is used to reduce index fragmentation.
As indexes are created and rebuilt: the leaf-level
pages are filled to a certain level—determined by
the fill factor—and the remaining space is left for
future growth. You can configure a default fill
factor for the server, or for each individual index.

Fill Factor Settings


The fill factor sets the amount of spare space that
will be left on each leaf-level page as it fills with
data. The fill factor is expressed as the percentage
of the page that is filled with data—a fill factor of 75 means that 75 percent of the page will be filled with
data, and 25 percent will be left spare. Fill factors of 0 and 100 are treated as the same, with the leaf-level
page being completely filled with data.

How Fill Factor Improves Performance


New data is added to the appropriate leaf-level page. When a page is full, however, it must split before
new data can be added. This means that an additional leaf-level page is created, and the data is split
between the old and the new pages. Page splitting slows down data inserts and should be minimized. Set
the fill factor according to how much data you expect to be added between index reorganizations or
rebuilds.

Note: Adding a fill factor less than 0 or 100 assumes that data will be added roughly evenly
between leaf-level pages. For example, an index with a LastName key would have data added to
different leaf-level pages. However, when data is always added to the last page, as in the case of
an IDENTITY column, fill factor will not necessarily prevent page splits.

View the Default Fill Factor Setting


If you do not specify a fill factor for an index, SQL Server will use the default. Later in this module, you will
see how to set the default, but you can view the default by running sp_configure.
6-8 Designing Optimized Index Strategies

Use Transact-SQL to view the default fill factor for the server:

Sp_configure
EXEC sp_configure 'show advanced options', '1';
RECONFIGURE;

sp_configure 'fill factor';

Note: Fill factor is an advanced option for sp_configure. This means you must set “show
advanced options” to 1 before you can view the default fill factor settings, or change the settings.

What Is Pad Index?


Pad index is a setting used in conjunction with fill
factor. It determines whether space is left on the
intermediate nodes of an index.

The option WITH PAD_INDEX specifies that the


same amount of space should be left at the
intermediate nodes of an index, as is left on the
leaf nodes of an index. Pad index uses the same
percentage as specified with fill factor. However,
SQL Server always leaves at least two rows on an
intermediate index page, and will make sure there
is enough space for one row to be added to an
intermediate page.

PAD_INDEX works in conjunction with FILL FACTOR.

PAD_INDEX
USE AdventureWorks2016;
GO

CREATE INDEX ix_person_lastname


ON Person.Person (LastName)
WITH PAD_INDEX, FILLFACTOR = 10

Implementing Fill Factor and Padding


In the last two topics, we discussed fill factor and
pad index when managing indexes. This topic
covers how to implement the two settings.

Setting Default Fill Factor


If the fill factor is not specified for an index, SQL
Server uses the default. There are two ways of
setting the default—either by using the graphical
user interface (GUI), or by using Transact-SQL. To
set the default using the SSMS GUI, follow these
steps:
Developing SQL Databases 6-9

1. In Object Explorer, right-click the SQL Server instance name and select Properties from the menu.

2. From the Server Properties dialog box, select the Database Settings page.

3. The Default index fill factor is the first option. Set a value between 0 and 100 by typing, or selecting
a value.

4. Click OK to save and close the dialog box.

Alternatively, you can amend the default fill factor using a Transact-SQL script. This has the same effect as
setting the fill factor through the GUI—if you check in the server properties, you will see that the value
has changed.
Setting the Default Fill Factor using Transact-SQL:

Sp_configure
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'fill factor', 75;
GO
RECONFIGURE;
GO

Setting the Fill Factor for an Index


When you create a new index, you can set the fill factor. This will override the server default. You can
specify the fill factor in a number of ways:
1. Use Transact-SQL to set the fill factor.

2. Use the SSMS GUI and index properties.

Create an index and specify the fill factor and pad index.

CREATE INDEX
USE AdventureWorks2016;
GO
-- Creates the IX_AddressID_City_PostalCode index
-- on the Person.Address table with a fill factor of 75.

CREATE INDEX IX_AddressID_City_PostalCode


ON Person.Address
(AddressID, City, PostalCode)
WITH PAD_INDEX, FILLFACTOR = 80;
GO

Amend the fill factor and rebuild the index.

ALTER INDEX
USE AdventureWorks2016;
GO

-- Rebuild the IX_JobCandidate_BusinessEntityID index


-- with a fill factor of 90.

ALTER INDEX IX_JobCandidate_BusinessEntityID


ON HumanResources.JobCandidate
REBUILD WITH (FILLFACTOR = 90);
GO
6-10 Designing Optimized Index Strategies

Best Practice: Amending the fill factor will cause the index to be rebuilt. Make the change
at a time when the database is not being used heavily, or when it is not being used at all.

Index Properties
To change the fill factor using index properties:

1. In Object Explorer, expand the Databases node, in the database you are working on, then expand the
Tables node. Expand the table relating to the index, and expand Indexes.

2. Right-click the index name, and select Properties.

3. Select the Options page, and find the Storage section.

4. For Fill Factor, enter a value between 0 and 100.

5. Pad index may be set to True or False.

6. Click OK to save and close the Properties dialog box.

Note: Setting a very low fill factor increases the size of the index, and makes it less efficient.
Set the fill factor appropriately, depending on the speed at which new data will be added.

Managing Statistics

What Are Statistics?


Statistics are used by the SQL Server query
optimizer to create execution plans. SQL Server
uses internal objects to hold information about
the data that is held within a database; specifically,
how values are distributed in columns or indexed
views. Statistics that are out of date, or missing,
can lead to poor query performance.
SQL Server calculates the statistics by knowing
how many distinct values are held in a column.
Statistics are then used to estimate the number of
rows that are likely to be returned at different stages of a query.

How Are Statistics Used in SQL Server?


The query optimizer is the part of SQL Server that plans how the database engine will execute a query.
The query optimizer uses statistics to choose between different ways of executing a query; for example,
whether to use an index or table scan. For statistics to be useful, they must be up to date and reflect the
current state of the data.

Statistics can be updated automatically or manually, as required.

Automatically Update Statistics


You can determine how statistics are created and updated within your database by setting one of three
options. These are specific to each database, and changed using the ALTER DATABASE statement:

 AUTO_CREATE_STATISTICS. The query optimizer creates statistics as required, to improve query


performance. This is on by default and should normally stay on.
Developing SQL Databases 6-11

 AUTO_UPDATE_STATISTICS. The query optimizer updates out-of-date statistics as required. This is on


by default and should normally stay on.

 AUTO_UPDATE_STATISTICS_ASYNC. This is only used when AUTO_UPDATE_STATISTICS is on. It allows


the query optimizer to select an execution plan before statistics are updated. This is off by default.

Use ALTER DATABASE to set how statistics are updated in your database.

Alter Database Options


ALTER DATABASE AdventureWorks2016
SET AUTO_UPDATE_STATISTICS ON;
GO

There are two ways to create statistics manually: the CREATE STATIATICS command, or the sp_createstats
stored procedure.

CREATE STATISTICS
You can create statistics on a single column, several columns, or filtered statistics. Using CREATE
STATISTICS, you can specify:

 The name of the statistics object.


 The table or indexed view to which the statistics refer.

 The column or columns to be included in the statistics.

 The sample size on which the statistics should be based—this may be a scan of the whole table, a
percentage of rows, or a count of rows.

 A filter for the statistics—filtered statistics are covered later in this lesson.

 Whether statistics should be created per-partition or for the whole table.


 Whether statistics should be excluded from automatic update.

Note: Filtered statistics are created on a subset of rows in a table. Filtered statistics are
created using the WHERE clause as part of the CREATE STATISTICS statement.

For more information about CREATE STATISTICS, see Microsoft Docs:


CREATE STATISTICS (Transact-SQL)
http://aka.ms/gdg0hw

SP_CREATESTATS
If you want a quick way to create single-column statistics for all columns in a database that do not have a
statistics object, you can use the sp_createstats stored procedure. This calls CREATE STATISTICS and is used
for creating single-column statistics on all columns in a database not already covered by statistics. It
accepts a limited selection of the options and parameters supported by CREATE STATISTICS.

For more information about sp_createstats, see Microsoft Docs:

Sp_createstats (Transact-SQL)
http://aka.ms/ak1mcd
6-12 Designing Optimized Index Strategies

For more information about database statistics, see Microsoft Docs:

Statistics
http://aka.ms/X36v84

When to Update Statistics


You may need to update statistics manually, even when AUTO_UPDATE_STATISTICS is on, if statistics are
not being updated often enough. The stale statistics detection method requires approximately 20 percent
of table rows to change before statistics are updated. As a table grows, statistics will be updated less and
less frequently, so you may want to update statistics manually on larger tables.

You could also update statistics manually as part of a work that makes large changes to the number of
rows in a table; for example, bulk insert or truncating a table. This can improve the performance of queries
because the query optimizer will have up-to-date statistics.

Index Maintenance
Index maintenance operations do not alter the distribution of data, and so do not require statistics to be
updated. You do not need to update statistics after running ALTER INDEX REBUILD, DBCC REINDEX, DBCC
INDEXDEFRAG, or ALTER INDEX REORGANIZE.

Statistics are automatically updated when you run ALTER INDEX REBUILD or DBCC DBREINDEX.

Best Practice: Do not update statistics more than necessary, because cached query
execution plans will be marked for recompilation. Excessive updating of statistics may result in
unnecessary CPU time being spent recompiling query execution plans.

Using DMOs to Improve Index Usage


Indexes play a crucial role in optimizing query
performance. There are a number of dynamic
management objects (DMOs) that provide helpful
information about both current index usage, and
missing indexes.

DMOs and Current Indexes


DMOs that provide information about current
indexes:
 sys.dm_db_index_usage_stats—shows when
and how indexes were last used. Use in
combination with sys.objects. The information
can be helpful to identify the following:

o Indexes with high scan count and low seek count.

o Unused indexes, which are not listed under this DMV.

o Indexes with low seeks and high updates.

o Frequently used indexes.

o The last time the index was used.


Developing SQL Databases 6-13

 sys.dm_db_physical_stats—returns information about the index size and type, fragmentation, record
count, and space used. Use to track the rate of fragmentation and to create an effective index
maintenance strategy. Run during off-peak hours as this DMV can affect performance.

 sys.dm_db_index_operational_stats—returns I/O information, plus data about latches, locking, and


waits.

For example, use the sys.dm_db_index_physical_stats DMV to get the level of fragmentation within an
index.

sys.dm_db_index_physical_stats
Use AdventureWorks2016;
GO

SELECT * FROM sys.dm_db_index_physical_stats


(DB_ID(N'AdventureWorks2016'), OBJECT_ID(N'Person.Address'), NULL, NULL ,
'DETAILED');
GO

DMOs and Missing Indexes


Missing indexes can have a huge impact on query performance. Some of the DMVs that provide
information about missing indexes are:
 sys.dm_db_missing_index_details—identifies the table with missing index(es), and the columns
required in an index. Does not work with spatial indexes. Test the index suggestion on a test server
before deploying to production environment.
 sys.dm_db_missing_index_group_stats—shows details of how the index would be used

 sys.dm_db_missing_index_groups—a link DMV that links sys.dm_db_missing_index_details with


sys.dm_db_missing_index_group_stats.

 sys.dm_db_missing_index_columns—a function that takes the index handle as an input, and returns
the index columns required, and how they would be used.

For a list of index related DMOs, see Microsoft Docs:


Index Related Dynamic Management Views and Functions (Transact-SQL)
https://aka.ms/p5xu6z

Consolidating Indexes
Although indexes are effective in improving the
performance of queries, they come with a small
overhead for inserts, deletes and updates. If you
have two or more indexes that contain very similar
fields, it might be worth consolidating the indexes
into one larger index. This will mean that not all
the columns will be used for each query; in turn, it
will take a little more time to read. However,
instead of updating several indexes, only one
index will be updated when records are inserted,
deleted, or updated.
6-14 Designing Optimized Index Strategies

Even when indexes are used regularly, it might still be advantageous to consolidate them into one index,
due to the reduced update overhead when changes occur. You should, however, measure the effect of
changes you make to indexes, to ensure they have the results you intend, and that there are no unwanted
side effects.

Note: The order of columns in an index makes a difference. Although two indexes might
have the same columns, they may produce very different performance results for different
queries.

Using Query Hints


Query hints are signals to the query optimizer that
a query should be executed in a certain way.

The query optimizer is sophisticated and,


providing statistics are up to date, it will almost
always find the best way of executing a query.
Query hints should be the last thing, not the first,
you try in tuning a query. Before attempting to
add query hints, ensure the query is written
properly, and that appropriate indexes are
available.

Common Issues and Troubleshooting


Tips
Common Issue Troubleshooting Tip

Adding a query hint generates an error


8622. This means that the query
optimizer cannot create a query plan.

In the following example, the query optimizer would normally use a MERGE JOIN to join the two tables. In
this code segment, a query hint is used to force a HASH JOIN. The cost of the join increases from 22
percent with a MERGE JOIN, to 67 percent with a HASH JOIN:

The following example uses a query hint to indicate that a HASH JOIN should be used:

Using a Query Hint


SELECT *
FROM Production.Product AS P
INNER JOIN Production.ProductDescription AS PD
ON P.ProductID = PD.ProductDescriptionID
WHERE SafetyStockLevel >= 500
OPTION (HASH JOIN);
GO

There are more than 20 query hints, including types of joins, the maximum number of processors to be
used, and forcing a plan, to be recompiled. For a full list of query hints, see Microsoft Docs:

Query Hints (Transact-SQL)


http://aka.ms/t97w4b
Developing SQL Databases 6-15

Best Practice: Use query hints judiciously. Unless there is a good reason to use a query
hint, the query optimizer will find the best query plan.

Some query hints that usefully give the query optimizer additional information, and can be included, are:

 FAST numberofrows. This tells the query optimizer to retrieve the first n rows quickly, and then return
the full result set. This may help provide a better user experience for some large result sets.

 OPTIMIZE FOR. Tells the query optimizer to use a specific variable when creating a plan. If you know
that you will use one variable much more than any other, this may be useful. Other values are
accepted when the query is executed.

 ROBUST PLAN. This tells the query optimizer to plan for the maximum row size, rather than for
performance. This reduces errors when data has some very wide rows.

Check Your Knowledge


Question

When would you set FILL FACTOR for an


index?

Select the correct answer.

When the index contains an IDENTITY


column.

When a table is used frequently for data


inserts, and is heavily used in queries.

When you want to force the query


optimizer to join tables in a certain way.

When you are running short of disk


space.

When you want to be sure that statistics


are up to date.
6-16 Designing Optimized Index Strategies

Lesson 3
Execution Plans
Execution plans are generated before a SQL script is executed. They are generated by the query optimizer,
and can help database developers and DBAs to understand why a query is being executed in a certain
way.

Lesson Objectives
After completing this lesson, you will be able to explain:

 What an execution plan is, and why it is useful.

 The difference between an actual and estimated execution plan.

 The different methods for capturing an execution plan.

 How to use system DMVs related to execution plans.

 What live statistics are, and how they are helpful.

What Is an Execution Plan?

How Execution Plans Help


Execution plans help you to understand why
queries perform as they do. Database
administrators (DBAs) are frequently faced with
trying to understand:

 Why a particular query takes so long to


execute.
 Why one query takes longer to run than a
seemingly similar query.

 Why a newly created index doesn’t make a


difference to query performance.

 If a new index could help a query to run faster.

Execution plans help you to answer these questions by showing how the SQL Server Database Engine
expects to execute a query, or how it has actually executed a query.

What Is the Query Optimizer?


Transact-SQL is a script language that tells SQL Server what data is required, rather than how to retrieve
data. When you write a query, it is the job of the query optimizer to figure out the best way of handling it.
The query optimizer is the part of SQL Server that predicts how the database engine will execute the
query.

How the Query Optimizer Works


The query optimizer uses statistics to generate a cost-based plan for each query. To select the optimum
plan, the query optimizer calculates a number of ways of executing the query. This is called the execution
plan.
Developing SQL Databases 6-17

Trivial Plans
If a query is simple and executes quickly, the query optimizer does not plan a number of different ways to
execute it—it just uses the most obvious plan. This is known as a trivial execution plan, because it is faster
to execute the query than to compare a number of alternatives. Trivial queries normally retrieve data from
a single table, and do not include calculations or aggregates.

Here is an example of a trivial query plan:

Simple Query with no Calculations or Aggregates


USE AdventureWorks2016
GO

SELECT *
FROM Production.Document

Adding a table join to this query would make the plan nontrivial. The query optimizer would then do
cost-based calculations to select the best execution plan. You can identify trivial execution plans by
running a system DMV (dynamic management view). Run the DMV before and after the query, noting the
number of occurrences of trivial plans.

Count the number of trivial plans occurrences—before and after running your query.

sys.dm_exec_query_optimizer_info
SELECT * FROM sys.dm_exec_query_optimizer_info

Database Statistics
The query optimizer uses statistics to figure out how to execute a query. Statistics describe the data within
the database, including its uniqueness. If statistics are out of date, the query optimizer will make incorrect
calculations, and potentially chose a suboptimal query plan.

Cost-based Selection
The query optimizer uses a cost-based selection process to determine how to execute a query. The cost is
calculated based on a number of factors, including CPU resources, memory, and I/O (input/output)
operations, time to retrieve data from disk. It is not an absolute measure.

The query optimizer cannot try all possible execution plans. It has to balance the time taken to compare
plans with the time taken to execute the query—and it has to cut off at a certain point. It aims to find a
satisfactory plan within a reasonable period of time. Some data definition language (DDL) statements,
such as CREATE, ALTER, or DROP, do not require alternative plans—they are executed straightaway.
6-18 Designing Optimized Index Strategies

Actual vs. Estimated Execution Plans

Selecting the Actual or Estimated


Execution Plan
You can use SSMS to display either an actual or
estimated execution plan for a query. Hover over
the toolbar icons to display tool tips and find icons
for Display Estimated Execution Plan and
Include Actual Execution Plan. Alternatively,
right-click the query window and select either
Display Estimated Execution Plan or Include
Actual Execution Plan from the context-sensitive
menu. You can also use the keyboard shortcuts
Ctrl-L with the query highlighted for the
estimated plan, or Ctrl-M to display the actual plan after the query has run. Whichever option you
choose, SQL Server must generate an execution plan before the query can be executed.

Estimated Execution Plan


The estimated execution plan is the query optimizer’s intended plan. If the query is run straightaway, it
will be no different from the actual plan, except that it has not yet run. Estimated execution plans do not
include the actual number of rows at each stage. The estimated execution plan is useful for:
 Designing or debugging queries that take a long time to run.

 Designing a query that modifies data; for example, a query that includes an UPDATE statement. The
estimated execution plan will display the plan without changing the data.
 SQL Server shows estimates for the number of rows returned.

Actual Execution Plan


The actual execution plan is displayed after a query has been executed. This displays actual data, including
the number of rows accessed for a particular operator. The plan is normally the same as the estimated
plan, but with additional information. If there are missing or out-of-date statistics, the plans may be
different.

Hover over each part of the actual execution plan to display more information about how the query was
executed.

Note: Comparing estimated and actual row counts can help you to identify out-of-date
table statistics. When statistics are up to date, estimated and actual counts will be the same.

Query Plan Cache


Execution plans are stored in the plan cache so they can be reused. Execution plan reuse saves time and
allows queries to run faster, because the query optimizer does not have to consider alternative plans.

For more information about optimizing queries using the plan cache, see MSDN:

Plan Cache Internals


http://aka.ms/Vsiip6
Developing SQL Databases 6-19

Common Execution Plan Elements


Execution plans include logical or physical
operators to build the query plan. More than 100
different operators may appear in an execution
plan. Operators can have an output stream of data
rows, in addition to zero, one, or two input data
streams. Execution plans are read from right to
left.

Many operators are represented with an icon in


the graphical execution plan display. Operators
can be categorized according to the function.
Commonly-used categories include:

 Data retrieval operators

 Join operators

 Parallelized query plans

Data Retrieval Operators


Queries that retrieve data from database tables will have a query plan that includes a scan or seek
operator.
 Scan: a scan reads records sequentially and retrieves the required records. A scan is used when a large
proportion of records are retrieved from a table, or if there is no suitable index.

 Seek: a seek finds specific records by looking them up in an index. This is normally faster than a scan,
because specific records can be quickly located and retrieved using the index.
A scan may be used on a clustered index, a nonclustered index, or a heap. A seek may be used on a
clustered or nonclustered index. Both scan and seek operators may output some or all of the rows they
read, depending on what filters are required for the query.

A query plan can always use a scan, but a seek is used only when there is a suitable index. Indexes that
contain all the columns required by a query are called covering indexes.

Best Practice: The query execution plan is a guide to help you to understand how queries
are being executed. Do not, however, try to manipulate how the query optimizer handles a
query. When table statistics are accurate, and appropriate indexes are available, the query
optimizer will almost always find the fastest way of executing the query.

Join Operators
JOIN clauses are used in queries that retrieve records from more than one table. The query execution plan
includes join operators that combine data that is returned by scans or seeks. The data is transformed into
a single data stream by the join operators.

The query optimizer uses one of three join operators, each of which takes two input data streams and
produces one output data stream:

 Nested loop

 Merge join

 Hash match
6-20 Designing Optimized Index Strategies

Nested Loop
A nested loop join performs a search from the second input data stream for each row in the first input
data stream. Where the first input data stream has 1,000 rows, the second input will be searched once for
each row—so it performs 1,000 searches.

In a graphical query plan, the upper input is the first input and the lower input is the second input. In an
XML or text query plan, the second input will appear as a child of the first input. Nested loop joins are
used when the second input is inexpensive to search, either because it is small or has a covering index.

Merge Join
A merge join combines two sorted inputs by interleaving them. The sequence of the input streams has no
impact on the cost of the join. Merge joins are optimal when the input data streams are already sorted,
and are of similar volumes.

Hash Match
A hash match calculates a hash value for each input data stream, and the hash values are compared. The
operation details vary according to the source query, but typically a complete hash table is calculated for
the first input, then the hash table is searched for individual values from the second input. Hash matches
are optimal for large, unsorted input data streams, and for aggregate calculations.

Parallel Query Plans

When multiple processors are available, the query optimizer might attempt to speed up queries by
running tasks in parallel on more than one CPU. This is known as parallelism and normally involves large
numbers of rows.
The query execution plan does not actually show the individual threads participating in a parallel query
plan; however, it does show a logical sequence of operators, and the operators that use parallelism are
flagged.
In a graphical plan, parallelized operators have a small orange circle containing two arrows overlaid on
the bottom right-hand corner of the icon. In XML query plans, parallelized operators have the “parallel”
attribute set to “true”. In text query plans generated by SHOWPLAN_ALL or STATISTICS PROFILE, the result
set contains a parallel column with a value of 1 for parallelized operators.

Parallel query plans will also contain at least one instance of the Gather Streams operator, which combines
the results of parallelized operators.
For more information about query plan operators, see Microsoft Docs:

Showplan Logical and Physical Operators Reference


http://aka.ms/ifow8r
Developing SQL Databases 6-21

Methods for Capturing Plans


The way you capture a query execution plan varies
according to the format in which the plan is
displayed.

Note: Capturing a query execution plan


requires the SHOWPLAN permission, as well as
permission to the objects referenced in the query.
You cannot generate an execution plan for a
query that you do not have permission to execute.

Graphical Execution Plan


To save a graphical execution plan, right-click the graphical execution plan displayed in the results
window, and click Save Execution Plan As. The Save As dialog appears, allowing you to name the plan
and save it to an appropriate folder. The plan will be saved in XML format, with a .sqlplan extension. By
default, this extension is associated with SSMS and will open in SSMS.

To view a graphical plan in XML format, right-click the plan and click Show Execution Plan XML. The
execution plan is already in XML format, but is displayed graphically by default.

XML Execution Plan


Use the SET SHOWPLAN_XML option to display an estimated execution plan in XML format.

Generate an estimated execution plan in XML format by using the SHOWPLAN_XML option.

SET SHOWPLAN_XML ON
USE AdventureWorks2016;
GO

-- Display an estimated execution plan in XML format


SET SHOWPLAN_XML ON;
GO

SELECT *
FROM HumanResources.Employee
WHERE gender = 'F';
GO

Use the SET STATISTICS XML ON to display the actual execution plan in XML format.

Display the actual execution plan in XML format by using the SET STATISTICS XML ON option.

SET STATISTICS XML ON


USE AdventureWorks2016;
GO

-- Display the actual execution plan in XML format


SET STATISTICS XML ON;
GO

SELECT *
FROM HumanResources.Employee
6-22 Designing Optimized Index Strategies

Note: The toolbar icon Include Actual Execution Plan and the Ctrl-M keyboard shortcut
will toggle, showing the actual execution plan on and off. Ensure this is off before running the
SET statements.

Execution Plan Related DMVs


In addition to query execution plans, SQL Server
also stores performance statistics for cached plans,
such as execution time, I/O activity, and CPU
utilization. These statistics are accessible through
system DMVs to help you troubleshoot
performance problems, or identify areas that
require optimization.

 sys.dm_exec_query_stats returns
performance information for all the cached
plans.
 sys.dm_exec_procedure_stats returns the
same information as sys.dm_exec_query_stats,
but for stored procedures only.

Use this query to find the top 10 cached plans with the highest average run time per execution.

Use DMVs To Find the Longest Running Queries


SELECT TOP(10) OBJECT_NAME(st.objectid, st.dbid) AS obj_name, qs.creation_time,
qs.last_execution_time, SUBSTRING (st.[text], (qs.statement_start_offset/2)+1,
(( CASE statement_end_offset WHEN -1 THEN DATALENGTH(st.[text])
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2)+1) AS sub_statement_text, [text],
query_plan, total_worker_time, qs.execution_count,qs.total_elapsed_time /
qs.execution_count AS avg_duration

FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp

ORDER BY avg_duration DESC;

Use the same query with a small amendment to find the 10 cached plans that use the highest average
CPU per execution.

Amendment to Find Top Average CPU Consumption


SELECT TOP(10) OBJECT_NAME(st.objectid, st.dbid) AS obj_name, qs.creation_time,
qs.last_execution_time, SUBSTRING (st.[text], (qs.statement_start_offset/2)+1,
(( CASE statement_end_offset WHEN -1 THEN DATALENGTH(st.[text])
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2)+1) AS sub_statement_text,
[text], query_plan, total_worker_time, qs.execution_count, qs.total_worker_time /
qs.execution_count AS avg_cpu_time,
qs.total_elapsed_time / qs.execution_count AS avg_elapsed_time

FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp

ORDER BY avg_cpu_time DESC;


Developing SQL Databases 6-23

Note: The unit for time columns in sys.dm_exec_query_stats (for example


total_worker_time) is microseconds (millionths of a second). However, the values are only
accurate to milliseconds (thousandths of a second).

Amend the example to find the 10 most expensive queries by the average logical reads per execution.

Top Ten Most Expensive Cached Plans by Average Logical Read


SELECT TOP(10) OBJECT_NAME(st.objectid, st.dbid) AS obj_name, qs.creation_time,
qs.last_execution_time,
SUBSTRING (st.[text], (qs.statement_start_offset/2)+1,
(( CASE statement_end_offset WHEN -1 THEN DATALENGTH(st.[text])
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2)+1) AS sub_statement_text,
[text], query_plan, total_worker_time, qs.execution_count,
qs.total_logical_reads / qs.execution_count AS avg_logical_reads,
qs.total_elapsed_time / qs.execution_count AS avg_duration

FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp

ORDER BY avg_logical_reads DESC;

You can adapt these queries to return the most expensive queries by any of the measures in
sys.dm_exec_query_stats, or to limit the results to stored procedure cached plans by using
sys.dm_exec_procedure_stats in place of sys.dm_exec_query_stats.
For more details, see sys.dm_exec_query_stats (Transact-SQL) in Microsoft Docs:

sys.dm_exec_query_stats (Transact-SQL)
http://aka.ms/n3fhua

Remember, sys.dm_exec_query_stats only returns information about cached plans. Plans that are not in
the cache, either because they have been recompiled, or because they have not been executed again
since the plan cache was last flushed, will not appear.

Live Query Statistics

Live Query Statistics


Use Live Query Statistics to obtain a real-time view
of how a query is executed. The execution plan is
just a plan—the database engine may execute the
query differently—but, with Live Query Statistics,
you can see statistics as the query is executing.

Live Query Statistics can be used with databases


developed using SQL Server 2014 or later. Live
Query Statistics is a feature of SQL Server
Management Studio, and you should download
the most recent version of SSMS independently of
the database engine. See Microsoft Docs:
6-24 Designing Optimized Index Strategies

Download SQL Server Management Studio (SSMS)


http://aka.ms/o4vgkz

Displaying Live Query Statistics


There are two ways of including Live Query Statistics when you run your query:

1. Click the Include Live Query Statistics icon on the toolbar. The icon is highlighted to show that it is
selected. Execute the query—the Live Query Statistics tab is displayed.

2. Right-click the query window and select Include Livery Query Statistics from the context-sensitive
menu. Execute the query—the Live Query Statistics tab is displayed.

By including Live Query Statistics using either method, you are enabling statistics for the current session. If
you want to view Live Query Statistics from other sessions, including Activity Monitor, you must execute
one of the following statements:

 SET STATISTICS XML ON

 SET STATISTICS PROFILE ON


Alternatively, you can use query_post_execution_showplan to enable the server setting. See Microsoft
Docs:

Monitor System Activity Using Extended Events


http://aka.ms/N8n81l

What Does Live Query Statistics Show?


Live Query Statistics differs from viewing an execution plan, in that you can see query progress. Execution
plans show you either the expected query plan, or how the plan was actually executed—but neither
provides an in-progress view. Live Query Statistics shows you the numbers of rows produced by each
operator, and the elapsed time for each operation.

For more information about Live Query Statistics, see Microsoft Docs:

Live Query Statistics


http://aka.ms/Pvznef

Note: Live Query Statistics is a troubleshooting tool, and should be used for optimizing
queries. It adds an overhead to the query, and can affect performance.

Question: Why would you use the execution plan view?


Developing SQL Databases 6-25

Lesson 4
The Database Engine Tuning Advisor
The Database Engine Tuning Advisor analyzes a workload and makes recommendations to improve its
performance. You can use either the graphical view, or the command-line tool to analyze a trace file.

Lesson Objectives
After completing this lesson, you will be able to:

 Understand what the Database Engine Tuning Advisor is, and when to use it.

 Explain how to use the Database Engine Tuning Advisor.

Introduction to the Database Engine Tuning Advisor


The Database Engine Tuning Advisor is a tool to
help you improve the performance of your
queries. It is a stand-alone tool that must be
started independently of SSMS.
The Database Engine Tuning Advisor may
recommend:

 New indexes or indexed views.

 Statistics that need to be updated.

 Aligned or nonaligned partitions.

 Better use of existing indexes.

Before you can use the Database Engine Tuning Advisor, you must capture a typical workload.

Note: The Database Engine Tuning Advisor is not widely used, although it might be useful
in some circumstances. New tools such as Query Store, discussed later in this module, may be
easier to use for most situations.

Using the Database Engine Tuning Advisor

Using the Database Engine Tuning


Advisor for the First Time
The first time Database Engine Tuning Advisor is
used, it must be started by a member of the
sysadmin role, because system tables are created
in the msdb database.

Workload Formats
The Database Engine Tuning Advisor can accept
any of the following workloads:

 Plan Cache

 SQL Profiler tracer file or table


6-26 Designing Optimized Index Strategies

 Transact-SQL script

 XML file

For more information about using the Database Engine Tuning Advisor, see Microsoft Docs:

Start and Use the Database Engine Tuning Advisor


http://aka.ms/Mdpzxr

Check Your Knowledge


Question

The Database Engine Turning Advisor is


best used in which situations?

Select the correct answer.

When you need to add a large number


of records to a table.

When you want to ensure indexes are


rebuilt regularly.

When you want to identify missing


indexes.

When you need to create an XML file.

Every time you run a Transact-SQL


script.
Developing SQL Databases 6-27

Lesson 5
Query Store
SQL Server includes Query Store, a feature that makes it easier to find and fix problem queries. This lesson
introduces the Query Store, how to enable it, and how to use it.

Lesson Objectives
After completing this lesson, you will be able to:

 Explain the benefits of using Query Store.

 Enable and configure Query Store for a database.

 Use the different Query Store views to monitor query performance.

What Is the Query Store?


Query Store helps you to improve the
performance of troublesome queries. Query Store
is available in all SQL Server editions, and Azure
SQL Database.

It provides better insight into the way queries are


executed, in addition to capturing a history of
every query that is run on the database. Query
Store also keeps every version of the execution
plan used by each query, plus execution statistics.

What Problems Does Query Store Solve?


Query performance problems often occur when
the query plan changes—perhaps because statistics have been updated, the database version has been
upgraded, or the query cache has been cleared. Slow running queries can cause big problems with
applications or websites, potentially making them unusable. Even temporary performance problems can
create a lot of work in trying to understand what has caused the issue.
The most recent execution plan is stored in the plan cache but, prior to Query Store, it could be time-
consuming and difficult to work out what was causing a problem with query performance. Query Store
has been designed to make it easier to understand and fix problem queries.

What Does Query Store Do?


Query Store is a repository for all your queries, execution plans, properties, and metrics. You can see how
long queries took to run, in addition to which execution plan was used by the query. You can also force
the query optimizer to use a particular execution plan. In short, Query Store keeps data that makes it
easier to solve query problems.

How Does Query Store Work?


The information is stored in each user database, with settings to control when old queries should be
deleted. Query Store writes query information to disk, so a permanent record is kept after failovers, server
restarts, and database upgrades. This makes it simpler to identify and fix the most problematic queries,
and analyze query performance.
6-28 Designing Optimized Index Strategies

For more information about Query Store, see Microsoft Docs:

Monitoring Performance By Using the Query Store


http://aka.ms/M1iwvm

Enabling Query Store


Before you can use Query Store, you must enable
it for your database. This can be done either by
using a database properties dialog box, or by
using Transact-SQL.

Amend Database Properties


To enable Query Store using the GUI, start SSMS
and select the relevant database in Object
Explorer. Right-click the database name and select
Properties from the context-sensitive menu.
Amend the first setting under General to Read
Write. By default, this is set to off.

Transact-SQL ALTER DATABASE


This is done using the ALTER DATABASE statement.

Use ALTER DATABASE to enable Query Store for your database.

SET QUERY_STORE = ON
ALTER DATABASE AdventureWorks2016
SET QUERY_STORE = ON;
GO

Query Store works well with the default settings, but you can also configure a number of other settings. If
you have a high workload, configure Query Store to behave when it gets close to disk space capacity.

Query Store Options


Set or amend these settings using ALTER DATABASE.

Setting Options Default Comments

OPERATION_MODE OFF, READ_WRITE OFF When Query Store is running,


or READ_ONLY this should be set to Read
Write. It might automatically
get set to Read Only if it runs
out of disk space.

CLEANUP_POLICY STALE_QUERY 367 Determines how long a query


_THRESHOLD_DAYS with no policy is kept. Set to 0
to disable.

DATA_FLUSH_INTERVAL Number of seconds 900 The interval at which SQL


_SECONDS (15 Server writes in memory data to
mins) disk.

MAX_STORAGE_SIZE_MB Number of MB 100 The maximum disk space in MB


MB reserved for Query Store.
Developing SQL Databases 6-29

Setting Options Default Comments

INTERVAL_LENGTH_ Number of minutes 60 Determines how statistics are


MINUTES mins aggregated.

SIZE_BASED_CLEANUP_MODE AUTO or OFF OFF Auto means that the oldest


queries are deleted once 90
percent of disk space is filled.
Queries are no longer deleted
when there is 80 percent of disk
space. OFF is the default and
no more queries will be
captured when the available
disk space is full. Operation
mode switches to Read Only.

QUERY_CAPTURE_MODE ALL, AUTO or ALL ALL means that all queries will
NONE be captured. AUTO captures
queries with high execution
count and resource
consumption. NONE does not
capture queries.

MAX_PLANS_PER_QUERY Number 200 Determines the number of


plans that can be kept per
query.

QUERY_STORE ON, OFF or CLEAR Determines whether or not


Query Store is enabled for a
database. Clear removes the
contents of Query Store.

You can view the settings either by using database properties in SSMS Object Explorer, or by using
Transact-SQL.
View the Query Store settings using sys.database_query_store_options.

sys.database_query_store_options
SELECT *
FROM sys.database_query_store_options;

Note: Query Store is only used with user databases, it cannot be enabled for master,
msdb, or tempdb.
6-30 Designing Optimized Index Strategies

Using Query Store


After Query Store has been enabled, a Query Store
folder for the database is created in Object
Explorer. Query Store then starts to store
information about queries that have been
executed. Data collected by the Query Store can
be accessed, either by using SSMS, or by using
Transact-SQL queries.

Query Store in SSMS


The Query Store folder in SSMS includes four
views that show different information:

1. Regressed Queries.

2. Overall Resource Consumption.

3. Top Resource Consuming Queries.

4. Tracked Queries.

Regressed Queries
This view shows queries whose performance has degraded over a period of time. You use a drop-down
box to select performance measured by CPU time, duration, logical round count, logical write count,
memory consumption, or physical reads. You can also see the execution plan for each query.

Overall Resource Consumption


This view displays resource consumption over time. Histograms can show consumption by CPU time,
duration, logical read count, logical write count, memory consumption, or physical reads.

Top Resource Consuming Queries


This view shows the top 25 most expensive queries over time by CPU time, duration, logical read count,
logical write count, memory consumption, or physical reads. You can also see the execution plan for each
query.

Tracked Queries
This view shows historical data for a single query.

Query Store using Transact-SQL


In addition to the graphical view, system catalog views are available to expose the data collected by the
Query Store. These catalog views are similar to the query plan cache DMVs discussed earlier in this
module. Some of the Query Store catalog views are:

 sys.query_store_runtime_stats. Similar to sys.dm_exec_query_stats, this catalog view exposes


performance information captured by Query Store.

 sys.query_store_plan. Similar to sys.dm_exec_query_plan, this catalog view exposes query plans data
captured by Query Store.

 sys.query_store_query_text. Similar to sys.dm_exec_query_text, this catalog view exposes the query


text ID and the query text relating to queries captured by the Query Store.

For more information on Query Store catalog views, see MSDN:

Query Store Catalog Views (Transact-SQL)


http://aka.ms/u60m6c
Developing SQL Databases 6-31

Improving Query Performance


After you have identified the queries that are
taking the longest time to run, you can use the
tools in the Query Store to discover why this
happens. You can use the Top Resource
Consuming Queries view to see the query plans
used by the query over time, and to view the
associated execution plan. The execution plan
shows you where there are issues in your query:
perhaps where you have joined to a column
without an index, or used a function that is
causing the query to slow down. Using this view,
you can force a query to use the most efficient
plan.

The following code sample demonstrates how you can directly query the data captured by the Query
Store:

Querying the Query Store System Views


SELECT plan_id, query_id, [compatibility_level], is_forced_plan,
count_compiles, last_compile_start_time, last_execution_time
FROM sys.query_store_plan

Demonstration: Using Query Store with Azure SQL Database


Demonstration Steps
1. Ensure the MT17B-WS2016-NAT, 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are
running, and log on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password
Pa55w.rd.
2. Open SSMS and connect to the server you created earlier—for example,
20762ce20160405.database.windows.net, using SQL Server Authentication.

3. In the Login box, type Student, in the Password box, type Pa55w.rd, and then click Connect.

4. In Object Explorer, expand Databases right-click AdventureWorksLT, and then click Properties.

5. In the Database Properties - AdventureWorksLT dialog box, click the Query Store page, and in
the General section, ensure the Operation Mode (Requested) is set to Read Write. Point out the
Query Store settings to students. When you are finished, click OK.

6. In Object Explorer, expand the AdventureWorksLT node to see that a folder called Query Store has
been created.

7. Expand the Query Store node to show the four views.

8. On the File menu, point to Open, and then click File.

9. In the Open File dialog box, navigate to D:\Demofiles\Mod06, and open QueryStore_Demo.sql.

10. Select the code under the comment Create a covering index on the TempProduct table, and then
click Execute. First, the query creates a covering index on the TempProduct table, and then uses
three columns from this table—point out that the text columns have been included as nonkey
columns.
6-32 Designing Optimized Index Strategies

11. Select the code under the comment Clear the Query Store, and then click Execute.

12. Select the code under the comment Work load 1, and then click Execute.

13. Repeat the previous step another five times, waiting a few seconds between each execution.

14. Select the code under the comment Work load 2, and then click Execute.

15. Repeat the previous step another three times, waiting a few seconds between each execution.
16. Select the code under the comment Regress work load 1, and then click Execute.

17. Select the code under the comment Work load 1, and then click Execute.

18. Repeat the previous step another five times, waiting a few seconds between each execution.

19. In Object Explorer, open the Top Resource Consuming Queries window, and see the difference
between the two execution plans for Workload 1.

20. Demonstrate how you can change the metric using the drop-down box. Note the force plan button.
21. On the File menu, point to Open, and then click File.

22. In the Open File dialog box, navigate to D:\Demofiles\Mod06, select


QueryStore_Demo_CatalogViews.sql, and then click Open.
23. Select the code under the comment Create a covering index on the TempProduct table, and then
click Execute.

24. Select the code under the comment Clear the Query Store, and then click Execute.
25. Select the code under the comment Work load 1, and then click Execute.

26. Repeat the previous step.

27. Select the code under the comment Work load 2, and then click Execute.
28. Repeat the previous step another two times, waiting a few seconds between each execution.

29. Select the code under the comment Regress work load 1, and then click Execute.

30. Select the code under the comment Work load 1, and then click Execute.

31. Select the code under the comment Examine sys.query_store_query_text and
sys.query_context_settings, and then click Execute.

32. Select the code under the comment Examine sys.query_store_query, and then click Execute.
33. Select the code under the comment Examine sys.query_store_plan, and then click Execute.

34. Select the code under the comment Examine sys.query_store_runtime_stats_interval, and then
click Execute.
35. Select the code under the comment Examine runtime statistics, and then click Execute.

36. Close SSMS without saving any changes.


Developing SQL Databases 6-33

Verify the correctness of the statement by placing a mark in the column to the right.

Statement Answer

True or false? Due to limitations of


using the cloud, Azure SQL
Database does not contain all the
Query Store functionality that is
available in the SQL Server on-
premises product.
6-34 Designing Optimized Index Strategies

Lab: Optimizing Indexes


Scenario
You have been hired by the IT Director of the Adventure Works Bicycle Company to work with their DBA
to improve the use of indexes in the database. You want to show the DBA how to use Query Store to
improve query performance and identify missing indexes. You will also highlight the importance of having
a clustered index on each table.

Objectives
In this lab, you will practice:

 Using Query Store to monitor queries and identify missing indexes.

 Compare a heap against a table with a clustered index.

Estimated Time: 45 minutes

Virtual machine: 20762C-MIA-SQL

User name: ADVENTUREWORKS\Student


Password: Pa55w.rd

Exercise 1: Using Query Store


Scenario
You are the DBA for the Adventure Works Bicycle Company. You have been working with a consultant to
implement the features in Query Store, and now want to simulate a typical query load.
The main tasks for this exercise are as follows:

1. Use Query Store to Monitor Query Performance

 Task 1: Use Query Store to Monitor Query Performance


1. Log on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
2. In the D:\Labfiles\Lab06\Starter folder, right-click Setup.cmd, and then click Run as
administrator.

3. In SSMS, connect to the MIA-SQL database engine instance using Windows authentication.

4. Open QueryStore_Lab1.sql.

5. Make AdventureWorks2016 the current database.

6. Execute the code to create an indexed view.


7. Select the code under the comment Clear the Query Store, and then click Execute.

8. Select the code under the comment Run a select query six times, and then click Execute.

9. Repeat five times, waiting a few seconds each time.

10. Select the code to Update the statistics with fake figures, and then click Execute.

11. Repeat Step 10 twice.

12. In Query Store, double-click Top Resource Consuming Queries.


13. Examine the different views of the Top 25 Resources report and force the original query plan to be
used.

14. Switch to the QueryStore_Lab1.sql tab and repeat Step 10 three times.
Developing SQL Databases 6-35

15. Switch to the Top Resource Consuming Queries tab to identify which query plans used a clustered
index seek and which ones used a clustered index scan.

16. Keep SSMS open for the next lab exercise.

Results: After completing this lab exercise you will have used Query Store to monitor query performance,
and used it to force a particular execution plan to be used.

Exercise 2: Heaps and Clustered Indexes


Scenario
You are the DBA for the Adventure Works Bicycle Company. You have had complaints that a number of
queries have been running slowly. When you take a closer look, you realize that a number of tables have
been created without a clustered index. Before adding a clustered index, you decide to run some tests to
find out what difference a clustered index makes to query performance.

The main tasks for this exercise are as follows:

1. Compare a Heap with a Clustered Index

 Task 1: Compare a Heap with a Clustered Index


1. Open ClusterVsHeap_lab.sql, and run each part of the script in turn.

2. Make AdventureWorks2016 the current database.

3. Run the script to create a table as a heap.


4. Run the script to create a table with a clustered index.

5. Run the script to SET STATISTICS ON. Run each set of select statements on both the heap, and the
clustered index.
6. Open HeapVsClustered_Timings.docx, and use the document to note the CPU times for each.

7. Run the script to select from each table.

8. Run the script to select from each table with the ORDER BY clause.
9. Run the script to select from each table with the WHERE clause.

10. Run the script to select from each table with both the WHERE clause and ORDER BY clause.

11. Run the script to insert data into each table.

12. Compare your results with the timings in the Solution folder.

13. If you have time, run the select statements again and Include Live Query Statistics.

14. Close SQL Server Management Studio, without saving any changes.

15. Close Wordpad.

Results: After completing this lab exercise, you will:

Understand the effect of adding a clustered index to a table.

Understand the performance difference between a clustered index and a heap.


6-36 Designing Optimized Index Strategies

Question: Which Query Store features will be most beneficial to your SQL Server
environment?

Question: In which situation might a heap be a better choice than a table with a clustered
index?

Question: Why is it sometimes quicker to retrieve records from a heap than a clustered
index with a simple SELECT statement?
Developing SQL Databases 6-37

Module Review and Takeaways


Indexes are one of the most important tools in improving the efficiency of queries, and the performance
of your database. However, we have seen that each index has a cost in terms of additional overhead when
inserting records into a table—in addition to maintenance as leaf-level pages are filled. Use FILL FACTOR
and PAD INDEX appropriately, depending on the structure of your table, and how much data will be
added.

Use the Query Store to understand how the most resource intensive queries are performing, and to take
corrective action before they become a problem. Because it stores historical query plans, you can compare
them over time to see when and why a plan has changed. After enabling the Query Store on your
databases, it automatically runs in the background, collecting run-time statistics and query plans; it also
categorizes queries, so it is easy to find those using the most resources, or the longest-running operations.
Query Store separates data into time windows, so you can uncover database usage patterns over time.

Best Practice: Understand how queries are being executed using the estimated and actual
execution plans, in addition to using Query Store. When you need to optimize a query, you will
then be well prepared and have a good understanding of how SQL Server executes your
Transact-SQL script.
7-1

Module 7
Columnstore Indexes
Contents:
Module Overview 7-1
Lesson 1: Introduction to Columnstore Indexes 7-2

Lesson 2: Creating Columnstore Indexes 7-7

Lesson 3: Working with Columnstore Indexes 7-12


Lab: Using Columnstore Indexes 7-17

Module Review and Takeaways 7-21

Module Overview
Introduced in Microsoft® SQL Server® 2012, columnstore indexes are used in large data warehouse
solutions by many organizations. This module highlights the benefits of using these indexes on large
datasets, the improvements made to columnstore indexes in the latest versions of SQL Server, and the
considerations needed to use columnstore indexes effectively in your solutions.

Objectives
After completing this module, you will be able to:

 Describe columnstore indexes and identify suitable scenarios for their use.

 Create clustered and nonclustered columnstore indexes.


 Describe considerations for using columnstore indexes.
7-2 Columnstore Indexes

Lesson 1
Introduction to Columnstore Indexes
This lesson provides an overview of the types of columnstore indexes available in SQL Server; the
advantages they have over their similar row based indexes; and under what circumstances you should
consider using them. By the end of this lesson, you will see the potential cost savings to your business of
using clustered columnstore index, purely in terms of the gigabytes of disk storage.

Lesson Objectives
After completing this lesson, you will be able to:

 Explain the differences between rowstore and columnstore indexes.

 Describe the properties of nonclustered columnstore indexes.

 Describe the properties of a clustered columnstore index and how it differs from a nonclustered
columnstore index.

What are Columnstore Indexes?


Columnstore indexes reference data in a columnar
fashion, and use compression to reduce the disk
I/O when responding to queries.
Traditional rowstore tables are stored on disk in
pages; each page contains a number of rows and
includes all the associated columns with each row.
Columnstore indexes also store data in pages, but
they store all the column values in a page—so the
page consists of the same column of data from
multiple rows.

Consider a data warehouse containing fact tables


that are used to calculate aggregated data across
multiple dimensions. These fact tables might consist of many rows, perhaps numbering tens of millions.

Totaling Sales Orders by Product

Using a code example:

Totaling Sales Orders by Product

SELECT ProductID,
SUM(LineTotal) AS ProductTotalSales
FROM Sales.OrderDetail
GROUP BY ProductID
ORDER BY ProductID

Thinking about the previous example using a row based index, it will need to load into memory all the
rows and columns in all the pages, for all the products. With a column based index, the query only needs
to load the pages associated with the two referenced columns, ProductID and LineTotal. This makes
columnstore indexes a good choice for large data sets.
Developing SQL Databases 7-3

Using a columnstore index can improve the performance for a typical data warehouse query by up to 10
times. There are two key characteristics of columnstore indexes that impact this gain.
 Storage. Columnstore indexes store data in a compressed columnar data format instead of by row.
This makes it possible to achieve compression ratios of seven times greater than a standard rowstore
table.
 Batch mode execution. Columnstore indexes process data in batches (of 1,000-row blocks) instead
of row by row. Depending on filtering and other factors, a query might also benefit from “segment
elimination,” which involves bypassing million-row chunks (segments) of data and further reducing
I/O.

Columnstore indexes perform well because:

 Columns often store matching data—for example, a set of States enabling the database engine to
compress the data better. This compression can reduce or eliminate any I/O bottlenecks in your
system, while also reducing the memory footprint as a whole.

 High compression rates improve overall query performance because the results have a smaller in-
memory footprint.

 Instead of processing individual rows, batch execution also improves query performance. This can
typically be a performance improvement of around two to four times because processing is
undertaken on multiple rows simultaneously.

 Aggregation queries often select only a few columns from a table, which also reduces the total I/O
required from the physical media.
Nonclustered and clustered indexes are supported in Azure® SQL Database Premium Edition. For a full
list of the columnstore features available in different versions of SQL Server, see Microsoft Docs:

Columnstore Indexes Versioned Feature Summary


http://aka.ms/gj0fky

There are two types of columnstore indexes—nonclustered and clustered columnstore indexes—that both
function in the same way. The difference is that a nonclustered index will normally be a secondary index
created on top of a rowstore table; a clustered columnstore index will be the primary storage for a table.

Nonclustered Columnstore Indexes


Nonclustered columnstore indexes contain a copy
of part or all of the columns in an underlying
table. Because this kind of index is a copy of the
data, one of the disadvantages is that it will take
up more space than if you were using a rowstore
table alone.

A nonclustered columnstore index has the


following characteristics:
 It can include some or all of the columns in
the table.

 It can be combined with other rowstore


indexes on the same table.
7-4 Columnstore Indexes

 It is a full or partial copy of the data and takes more disk space than a rowstore table.

Columnstore Features by Version


In SQL Server 2014, tables with nonclustered columnstore indexes were read-only. You had to drop and
recreate the index to update the data – this restriction was removed in SQL Server 2016.

SQL Server 2016 also introduced support for filtered nonclustered columnstore indexes. This allows a
predicate condition to filter which rows are included in the index. Use this feature to create an index on
only the cold data of an operational workload. This will greatly reduce the performance impact of having
a columnstore index on an online transaction processing (OLTP) table.

Clustered Columnstore Indexes


Clustered columnstore indexes, like their rowstore
alternatives, optimize the arrangement of the
physical data on disk or in memory. In a
columnstore index, this will store all the columns
next to each other on disk; the structure of the
index dictates how the data is stored on disk.
To reduce the impact of fragmentation and to
improve performance, a clustered columnstore
index might use a deltastore. You can think of a
deltastore as a temporary b-tree table with rows
that a tuple-mover process moves into the
clustered columnstore index at an appropriate
time. This moving of row data is performed in the background. When querying the index, it will
automatically combine results from the columnstore and deltastore to ensure that the query receives the
correct results.

A clustered columnstore index has the following characteristics:


 It includes all of the columns in the table.

 It does not store the columns in a sorted order, but optimizes storage for compression and
performance.

 It can be updated.

New features
You can now have a nonclustered row index on top of a clustered columnstore index, making it possible
to have efficient table seeks on an underlying columnstore. You can also enforce a primary key constraint
by using a unique rowstore index.

SQL Server 2016 introduced columnstore indexes on memory optimized tables—the most relevant use
being for real-time operational analytical processing.

SQL Server 2017 introduced support for non-persisted computed columns in nonclustered columnstore
indexes.

Note: You cannot include persisted computed columns, and you cannot create
nonclustered indexes on computer columns.
Developing SQL Databases 7-5

For further information, see:

Columnstore Indexes for Real-Time Operational Analytics


http://aka.ms/bpntqk

Demonstration: The Benefits of Using Columnstore Indexes


In this demonstration, you will see how to create a columnstore index.

Demonstration Steps
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running and then log
on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.

2. Run D:\Demofiles\Mod07\Setup.cmd as an administrator to revert any changes.

3. In the User Account Control dialog box, click Yes. When the script completes, press any key to close
the window.

4. On the taskbar, click Microsoft SQL Server Management Studio.


5. In the Connect to Server window, in the Server name box, type MIA-SQL. Ensure Windows
Authentication is selected in the Authentication box, and then click Connect.

6. On the File menu, point to Open, click File.


7. In the Open File dialog box, navigate to D:\Demofiles\Mod07\Demo\Demo.sql script file, and
then click Open.

8. Select the code under the Step 1 comment, and then click Execute.
9. Select the code under the Step 2 comment, and then click Execute.

10. Select the code under the Step 3 comment, and then click Execute.

11. Select the code under the Step 1 comment, and then click Execute.
12. Close Microsoft SQL Server Management Studio without saving changes.
7-6 Columnstore Indexes

Categorize Activity
Categorize each index property into the appropriate index type. Indicate your answer by writing the
category number to the right of each property.

Items

1 Perform the best when seeking


for specific data.

2 A high degree of compression


is possible, due to data being
of the same category.

3 Can greatly improve the


performance of database
queries.

4 Implemented as a b-tree index


structure.

5 Perform best when


aggregating data.

6 Can be stored in memory


optimized tables.

Category 1 Category 2 Category 3

Rowstore Columnstore Applies to


Index Index both types
of index
Developing SQL Databases 7-7

Lesson 2
Creating Columnstore Indexes
This lesson shows you the techniques required to create columnstore indexes on a table. You will see how
to quickly create indexes by using Transact-SQL, or the SQL Server Management Studio user interface.

Lesson Objectives
After completing this lesson, you will be able to:

 Create nonclustered columnstore indexes.

 Create clustered columnstore indexes.

 Create tables that use both columnstore and rowstore indexes.

Creating a Nonclustered Columnstore Index


You can create a nonclustered columnstore index
by using a Transact-SQL statement or by using
SQL Server Management Studio (SSMS).

Transact-SQL
To create a nonclustered columnstore index, use
the CREATE NONCLUSTERED COLUMNSTORE
INDEX statement, as shown in the following code
example:

Creating a Nonclustered Columnstore Index

CREATE NONCLUSTERED COLUMNSTORE INDEX


NCCSIX_FactInternetSales
ON FactInternetSales (
OrderQuantity
,UnitPrice
,DiscountAmount);

A nonclustered index does not need to include all the columns from the underlying table. In the
preceding example, only three columns are included in the index.

You can also restrict nonclustered indexes to a subset of the rows contained in a table:

Example of a filtered nonclustered columnstore index

CREATE NONCLUSTERED COLUMNSTORE INDEX NCCSIX_FactInternetSales


ON FactInternetSales (
OrderQuantity
,UnitPrice
,DiscountAmount);
WHERE ShipDate < '2013-01-01';

The business reason for wanting to limit a columnstore index to a subset of rows is that it’s possible to use
a single table for both OLTP and analytical processing. In the preceding example, the index supports
analytical processing on historical orders that shipped before 2013.
7-8 Columnstore Indexes

SQL Server Management Studio


You can also use Object Explorer in SSMS to create columnstore indexes:

1. In Object Explorer, expand Databases.

2. Expand the required Database; for example, AdventureWorksDW.

3. Expand Tables, and then expand the required table; for example, FactFinance.
4. Right-click Indexes, point to New Index, and then click Nonclustered Columnstore Index.

5. Add at least one column to the index, and then click OK.

Creating a Clustered Columnstore Index


Similar to the nonclustered columnstore indexes, a
clustered columnstore index is created by using
Transact-SQL or SSMS. The main difference
between the declaration of a clustered index and
nonclustered index is that the former must contain
all the columns in the table being indexed.

Transact-SQL
To create a clustered columnstore index, use the
CREATE CLUSTERED COLUMNSTORE INDEX
statement as shown in the following code
example:

Creating a Clustered Columnstore Index

CREATE CLUSTERED COLUMNSTORE INDEX CCSIX_FactSalesOrderDetails


ON FactSalesOrderDetails;

An optional parameter on a CREATE statement for a clustered index is DROP_EXISTING. You can use this
to rebuild an existing clustered columnstore index or to convert an existing rowstore table into a
columnstore table.

Note: To use the DROP_EXISTING option, the new columnstore index must have the same
name as the index it is replacing.
Developing SQL Databases 7-9

The following example creates a clustered rowstore table, and then converts it into a clustered
columnstore table:

Converting a Rowstore Table to a Columnstore Table

CREATE TABLE ExampleFactTable (


ProductKey [int] NOT NULL,
OrderDateKey [int] NOT NULL,
DueDateKey [int] NOT NULL,
ShipDateKey [int],
CostPrice [money] NOT NULL);
GO
CREATE CLUSTERED INDEX CCI_ExampleFactTable ON ExampleFactTable (ProductKey);
GO
CREATE CLUSTERED COLUMNSTORE INDEX CCI_ExampleFactTable ON ExampleFactTable
WITH (DROP_EXISTING = ON);
GO

Common Issues and Troubleshooting Tips


Common Issue Troubleshooting Tip

Unable to create columnstore index on Ensure you are using at least V12 of an
an Azure SQL
Azure SQL Database. Database. The pricing tier of the database
also
has to be a minimum of Premium.

SQL Server Management Studio


You can also create a clustered columnstore index by using SSMS:

1. In Object Explorer, expand Databases.


2. Expand the required Database; for example, AdventureWorksDW.

3. Expand Tables, and then expand the required table; for example, FactFinance.

4. Right-click Indexes, point to New Index, and then click Clustered Columnstore Index.

5. Click OK.

Note: You don’t need to select columns to create a clustered columnstore index, because
all the columns of a table must be included.
7-10 Columnstore Indexes

Creating a Clustered Columnstore Table with Primary and Foreign Keys


In SQL Server 2014, clustered columnstore indexes
are limited in several ways, one of the most
significant being that you cannot have any other
index on the clustered columnstore table. This
essentially means that, in SQL Server 2014, you
cannot have a primary key, foreign keys, or unique
value constraints, on a clustered columnstore
table.

These limitations were removed in SQL Server


2016—these features are all supported in SQL
Server 2016 and SQL Server 2017.

This is an example of creating a table with both a


primary key and a clustered columnstore index:

Create a table with a primary key and columnstore index

CREATE TABLE ColumnstoreAccount (


accountkey INT NOT NULL,
Accountdescription NVARCHAR(50),
accounttype NVARCHAR(50),
unitsold INT,
CONSTRAINT PK_NC_ColumnstoreAccount PRIMARY KEY NONCLUSTERED(accountkey ASC),
INDEX CCI_columnstore_account CLUSTERED COLUMNSTORE);

After you create the table, you can add a foreign key constraint:

Add a foreign key constraint

ALTER TABLE ColumnstoreAccount WITH CHECK ADD CONSTRAINT FK_ColumnstoreAccount_DimAccount


FOREIGN KEY(AccountKey)
REFERENCES DimAccount (AccountKey)

ALTER TABLE ColumnstoreAccount CHECK CONSTRAINT FK_ColumnstoreAccount_DimAccount


GO

Check the table—it shows that two indexes and two keys exist:
 CCI_columnstore_account: a clustered columnstore index.

 PK_NC_ColumnstoreAccount: a unique nonclustered rowstore index.

 FK_ColumnstoreAccount_DimAccount: a foreign key to the DimAccount table.

 PK_NC_ColumnstoreAccount: the primary key.

The previous Transact-SQL results in a columnstore index with a nonclustered index that enforces a
primary key constraint on both indexes.
Developing SQL Databases 7-11

Demonstration: Creating Columnstore Indexes Using SQL Server


Management Studio
In this demonstration, you will see how to:

 Create a nonclustered columnstore index using SSMS.

 Create a clustered columnstore index using SSMS.

Demonstration Steps
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running and then log
on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.

2. On the taskbar, click Microsoft SQL Server Management Studio.

3. In the Connect to Server window, in the Server name box, type MIA-SQL. Ensure that Windows
Authentication is selected in the Authentication box, and then click Connect.
4. In Object Explorer, expand Databases, expand AdventureWorksDW, expand Tables, and then
expand dbo.AdventureWorksDWBuildVersion.

5. Right-click Indexes, point to New Index, and then click Clustered Columnstore Index.

6. In the New Index dialog box, click OK to create the index.


7. In Object Explorer, expand Indexes to show the new clustered index.

8. In Object Explorer, expand dbo.FactResellersSales.

9. Right-click Indexes, point to New Index, and then click Non-Clustered Columnstore Index.
10. In the Columnstore columns table, click Add.

11. Select the SalesOrderNumber, UnitPrice, and ExtendedAmount check boxes, and then click OK.

12. In the New Index dialog box, click OK.

13. In Object Explorer, expand Indexes to show the created nonclustered index.

14. Close Microsoft SQL Server Management Studio without saving.

Question: How will you create your indexes in a database—with SSMS or Transact-SQL?
7-12 Columnstore Indexes

Lesson 3
Working with Columnstore Indexes
When working with columnstore indexes, you should consider fragmentation and how SQL Server
processes the insertion of data into the index. From SQL Server 2016, columnstore tables can be created in
memory. This makes real-time operational analytics possible.

Lesson Objectives
After completing this lesson, you will be able to:

 Efficiently add data into a columnstore table.

 Check the fragmentation of an index and choose the best approach to resolving the fragmentation.

 Create a memory optimized table to support real-time operational analytics.

Managing Columnstore Indexes


Columnstore indexes have similar management
considerations as rowstore indexes—however,
special consideration must be given to DML
operations.

For data to be manipulated and changed in a


columnstore table, SQL Server manages a
deltastore. The deltastore collects up to 1,048,576
rows before compressing them into the
compressed rowgroup, and then marking that
rowgroup as closed. The tuple-mover background
process then adds the closed rowgroup back into
the columnstore index.

Bypassing the Deltastore


To ensure that data is inserted directly into the columnstore, you should load the data in batches of
between 102,400 and 1,048,576 rows. This bulk data loading is performed through normal insertion
methods, including using the bcp utility, SQL Server Integration Services, and Transact-SQL insert
statements from a staging table.

When bulk loading data, you have the following options for optimizations:

 Parallel Load: perform multiple concurrent bulk imports (bcp or bulk insert), each loading separate
data.

 Log Optimization: the bulk load will be minimally logged when the data is loaded into a compressed
rowgroup. Minimal logging is not available when loading data with a batch size of less than 102,400
rows.
Developing SQL Databases 7-13

Index Fragmentation
When it comes to managing, columnstore indexes
are no different to rowstore indexes. The utilities
and techniques used to keep an index healthy are
the same.

Over time, and after numerous DML operations,


indexes can become fragmented and their
performance degrades. SSMS provides Transact-
SQL commands and user interface elements to
help you manage this fragmentation.

SQL Server Management Studio


You can examine the level of fragmentation by
following these steps:

1. In Object Explorer, expand Databases.

2. Expand the required Database; for example, AdventureWorksDW.

3. Expand Tables, and then expand the required table; for example, FactFinance.

4. Expand Indexes, right-click on the desired index, and in the context menu, click Properties.

5. In the Select a page panel, click Fragmentation.

Transact-SQL
SQL Server provides dynamic management views and functions that make it possible for a database
administrator to inspect and review the health of indexes.
One of these functions is sys.dm_db_index_physical_stats that can be run against all the databases on a
server, a specific table in a database, or even a specific index.

The following code sample shows a useful query that joins the results from the
sys.dm_db_index_physical_stats view with the system index table, and then returns the fragmentation
and names of the indexes for a specific database:

Show indexes with fragmentation greater than five percent for a specific database

SELECT DB_NAME(database_id) AS 'Database'


, OBJECT_NAME(dm.object_id) AS 'Table'
, si.name AS 'Index'
, dm.index_type_desc
, dm.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats
(DB_ID(N'AdventureworksDW'), NULL, NULL, NULL, 'DETAILED') dm
JOIN sys.indexes si ON si.object_id = dm.object_id AND si.index_id = dm.index_id
WHERE avg_fragmentation_in_percent > 5
ORDER BY avg_fragmentation_in_percent DESC;

When you identify that an index requires maintenance, there are two options available in SQL Server: you
can either rebuild or reorganize it. The previous guidance for this was:

 If the fragmentation is between five percent and 30 percent: Reorganize.


 If the fragmentation is greater than 30 percent: Rebuild.

However, the reorganizing of columnstore indexes is enhanced in SQL Server 2016 and SQL Server 2017;
therefore, it is rarely necessary to rebuild an index.
7-14 Columnstore Indexes

Use the following Transact-SQL to reorganize an index:

Transact-SQL to reorganize an index online

ALTER INDEX CCI_columnstore_account ON ColumnstoreAccount


REORGANIZE WITH (COMPRESS_ALL_ROW_GROUPS = ON)
GO

ALTER INDEX CCI_columnstore_account ON ColumnstoreAccount REORGANIZE


GO

The first statement in this code sample adds deltastore rowgroups into the columnstore index. Using the
COMPRESS_ALL_ROW_GROUPS option forces all open and closed rowgroups into the index, in a similar
way to rebuilding an index. After the query adds these deltastore rowgroups to the columnstore, the
second statement then combines these, possibly smaller, rowgroups into one or more larger rowgroups.
With a large number of smaller rowgroups, performing the reorganization a second time will improve the
performance of queries against the index. Using these statements in SQL Server 2016 or SQL Server 2017
means that, in most situations, you no longer need to rebuild a columnstore index.

Note: Rebuilding an index will mean SQL Server can move the data in the index between
segments to achieve better overall compression. If a large number of rows are deleted, and the
index fragmentation is more than 30 percent, rebuilding the index may be the best option, rather
than reorganizing.

For more information on all the available views and functions, see:

http://aka.ms/vched5

Columnstore Indexes and Memory Optimized Tables


In SQL Server 2016 and SQL Server 2017, you can
now have both a clustered columnstore index and
rowstore index on an in-memory table. The
combination of these indexes on a table enables
real-time operational analytics and all of its
associated benefits, including:
 Simplicity. No need to implement an ETL
process to move data into reporting tables.
Analytics can run directly on the operational
data.

 Reduced costs. Removes the need to develop


and support ETL processes. The associated
hardware and infrastructure to support the ETL are no longer necessary.

 Zero data latency. Data is analyzed in real time. There are no background or schedule processes to
move data to enable analytics to be completed.
The combination of indexes enables analytical queries to run against the columnstore index and OLTP
operations to run against the OLTP b-tree indexes. The OLTP workloads will continue to perform, but you
may incur some additional overhead when maintaining the columnstore index.
Developing SQL Databases 7-15

For more information on Real-Time Operational Analytics, see:


http://aka.ms/bpntqk

As with other similar in-memory tables, you must declare the indexes on memory optimized columnstore
tables at creation. To support larger datasets—for example, those used in data warehouses—the size of
in-memory tables has increased from a previous limit of 256 GB to 2 TB in SQL Server 2016 and SQL
Server 2017.

The Transact-SQL to create an in-memory table is simple. Add WITH (MEMORY_OPTIMIZED = ON) at the
end of a table declaration.

Transact-SQL to create a columnstore in-memory table.

CREATE TABLE InMemoryAccount (


accountkey int NOT NULL,
accountdescription nvarchar (50),
accounttype nvarchar(50),
unitsold int,
CONSTRAINT [PK_NC_InMemoryAccount] PRIMARY KEY NONCLUSTERED([accountkey]
ASC),
INDEX CCI_InMemoryAccount CLUSTERED COLUMNSTORE)
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)
GO

Note: The above Transact-SQL will not work on a database without a memory optimized
file group. Before creating any memory optimized tables, the database must have a memory
optimized file group associated with it.

The following is an example of the code required to create a memory optimized filegroup:

Adding a memory optimized filegroup


ALTER DATABASE AdventureWorksDW
ADD FILEGROUP AdventureWorksDW_Memory_Optimized_Data CONTAINS MEMORY_OPTIMIZED_DATA
GO

ALTER DATABASE AdventureWorksDW ADD


FILE (name='AdventureworksDW_MOD', filename='D:\AdventureworksDW_MOD')
TO FILEGROUP AdventureWorksDW_Memory_Optimized_Data
GO

You can alter and join this new in-memory table in the same way as its disk based counterpart. However,
you should use caution when altering an in-memory table, because this is an offline task. There should be
twice the memory available to store the current table—a temporary table is used before being switched
over when it has been rebuilt.

Depending on the performance requirements, you can control the durability of the table by using:

 SCHEMA_ONLY: creates a nondurable table.

 SCHEMA_AND_DATA: creates a durable table; this is the default if no option is supplied.


If SQL Server restarts, a schema-only table loses all of its data. Temporary tables, or transient data, are
examples of where you might use a nondurable table. Because SCHEMA_ONLY durability avoids both
transaction logging and checkpoints, I/O operations can be reduced significantly.
7-16 Columnstore Indexes

Introduced in SQL Server 2014, the Memory Optimization Advisor is a GUI tool inside SSMS. The tool will
analyze an existing disk based table and warn if there are any features of that table—for example, an
unsupported type of index—that aren’t possible on a memory-optimized table. It can then migrate the
data contained in the disk based table to a new memory-optimized table. The Memory Optimization
Advisor is available on the context menu of any table in Management Studio.

Check Your Knowledge


Question

When would you consider converting a


rowstore table, containing dimension
data in a data warehouse, to a
columnstore table?

Select the correct answer.

When mission critical analytical


queries join one or more fact tables to
the dimension table, and those fact
tables are columnstore tables.

When the data contained in the


dimension table has a high degree of
randomness and uniqueness.

When the dimension table has very


few rows.

When the dimension table has many


millions of rows, with columns
containing small variations in data.

It is never appropriate to convert a


dimension table to a columnstore
table.
Developing SQL Databases 7-17

Lab: Using Columnstore Indexes


Scenario
Adventure Works has created a data warehouse for analytics processing of its current online sales
business. Due to large business growth, existing analytical queries are no longer performing as required.
Disk space is also becoming more of an issue.

You have been tasked with optimizing the existing database workloads and, if possible, reducing the
amount of disk space being used by the data warehouse.

Objectives
After completing this lab, you will be able to:

 Create clustered and nonclustered columnstore indexes.

 Examine an execution plan to check the performance of queries.

 Convert disk based tables into memory optimized tables.

Lab Setup
Estimated Time: 45 minutes

Virtual machine: 20762C-MIA-SQL

User name: ADVENTUREWORKS\Student


Password: Pa55w.rd

Dropping and recreating indexes can take time, depending on the performance of the lab machines.

Exercise 1: Create a Columnstore Index on the FactProductInventory Table


Scenario
You plan to improve the performance of the AdventureWorksDW data warehouse by using columnstore
indexes. You need to improve the performance of queries that use the FactProductInventory tables
without causing any database downtime, or dropping any existing indexes. Disk usage for this table is not
an issue.

You must retain the existing indexes on the FactProductInventory table, and ensure you do not impact
current applications by any alterations you make.

The main tasks for this exercise are as follows:

1. Prepare the Lab Environment

2. Examine the Existing Size of the FactProductInventory Table and Query Performance

3. Create a Columnstore Index on the FactProductInventory Table

 Task 1: Prepare the Lab Environment


1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are both running, and then
log on to MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.

2. In the D:\Labfiles\Lab07\Starter folder, run Setup.cmd as Administrator.


7-18 Columnstore Indexes

 Task 2: Examine the Existing Size of the FactProductInventory Table and Query
Performance
1. In SQL Server Management Studio, in the D:\Labfiles\Lab07\Starter folder, open the Query
FactProductInventory.sql script file.

2. Configure SQL Server Management Studio to include the actual execution plan.

3. Execute the script against the AdventureWorksDW database. Review the execution plan, making a
note of the indexes used, the execution time, and disk space used.

 Task 3: Create a Columnstore Index on the FactProductInventory Table


1. Based on the scenario for this exercise, decide whether a clustered or nonclustered columnstore index
is appropriate for the FactProductInventory table.

2. Create the required columnstore index. Re-execute the query to verify that the new columnstore
index is used, along with existing indexes.

3. What, if any, are the disk space and query performance improvements?

Results: After completing this exercise, you will have created a columnstore index and improved the
performance of an analytical query. This will have been done in real time without impacting transactional
processing.

Exercise 2: Create a Columnstore Index on the FactInternetSales Table


Scenario
You need to improve the performance of queries that use the FactInternetSales table. The table has also
become large and there are concerns over the disk space being used. You can use scheduled downtime to
amend the table and its existing indexes.
Due to existing processing requirements, you must retain the foreign keys on the FactInternetSales table,
but you can add any number of new indexes to the table.

The main tasks for this exercise are as follows:

1. Examine the Existing Size of the FactInternetSales Table and Query Performance

2. Create a Columnstore Index on the FactInternetSales Table

 Task 1: Examine the Existing Size of the FactInternetSales Table and Query
Performance
1. In SQL Server Management Studio, in the D:\Labfiles\Lab07\Starter folder, open the Query
FactInternetSales.sql script file.

2. Configure SQL Server Management Studio to include the actual execution plan.

3. Execute the script against the AdventureWorksDW database. Review the execution plan, making a
note of the indexes used, the execution time, and disk space used.

 Task 2: Create a Columnstore Index on the FactInternetSales Table


1. Based on the scenario for this exercise, decide whether a clustered or nonclustered columnstore index
is appropriate for the FactInternetSales table.

2. Create the required columnstore index. Depending on your chosen index, you may need to drop and
recreate keys on the table.
Developing SQL Databases 7-19

3. Re-execute the query to verify that the new columnstore index is used, along with the existing
indexes.
4. What, if any, are the disk space and query performance improvements?

Results: After completing this exercise, you will have greatly reduced the disk space taken up by the
FactInternetSales table, and improved the performance of analytical queries against the table.

Exercise 3: Create a Memory Optimized Columnstore Table


Scenario
Due to the improved performance and reduced disk space that columnstore indexes provide, you have
been tasked with taking the FactInternetSales table from disk and into memory.

The main tasks for this exercise are as follows:

1. Use the Memory Optimization Advisor

2. Enable the Memory Optimization Advisor to Create a Memory Optimized FactInternetSales Table

3. Examine the Performance of the Memory Optimized Table

 Task 1: Use the Memory Optimization Advisor


1. In SQL Server Management Studio, run the Memory Optimization Advisor on the
FactInternetSales table.
2. Note that there are several issues that need to be resolved before the Memory Optimization Advisor
can automatically convert the table.

 Task 2: Enable the Memory Optimization Advisor to Create a Memory Optimized


FactInternetSales Table
1. Using either SQL Server Management Studio, or Transact-SQL statements, drop all the foreign keys
and the clustered columnstore index.
2. Memory optimized tables cannot have more than eight indexes. Choose another three indexes to
drop.

Note: Hint: consider the rows being used in the Query FactInternetSales.sql to guide your
decision.

3. Use Memory Optimization Advisor on the FactInternetSales table.

4. Instead of running the migration with the wizard, script the results for the addition of a columnstore
index.

Note: The Memory Optimization Advisor won’t suggest columnstore indexes as they are
not applicable in all situations. Therefore, these have to be added manually.

5. Note the statements to create a memory optimized filegroup, and the code to copy the existing data.

6. Add a clustered columnstore index to the create table script.

7. Run the edited Transact-SQL.


7-20 Columnstore Indexes

 Task 3: Examine the Performance of the Memory Optimized Table


1. In SQL Server Management Studio, in the D:\Labfiles\Lab07\Starter folder, open the Query
FactProductInventory.sql script file.

2. Configure SQL Server Management Studio to include the actual execution plan.

3. Execute the script against the AdventureWorksDW database, and then review the disk space used
the execution plan.

Results: After completing this exercise, you will have created a memory optimized version of the
FactInternetSales disk based table, using the Memory Optimization Advisor.

Question: Why do you think the disk space savings were so large for the disk based
clustered columnstore index?
Developing SQL Databases 7-21

Module Review and Takeaways


Best Practice: Introduced in SQL Server 2012, columnstore indexes are used in large data
warehouse solutions by many organizations. This module highlighted the benefits of using these
indexes on large datasets; the improvements made to columnstore indexes in SQL Server 2016;
and the considerations needed to use columnstore indexes effectively in your solutions.
8-1

Module 8
Designing and Implementing Views
Contents:
Module Overview 8-1
Lesson 1: Introduction to Views 8-2

Lesson 2: Creating and Managing Views 8-9

Lesson 3: Performance Considerations for Views 8-18


Lab: Designing and Implementing Views 8-22

Module Review and Takeaways 8-25

Module Overview
This module describes the design and implementation of views. A view is a special type of query—one
that is stored and can be used in other queries—just like a table. With a view, only the query definition is
stored on disk; not the result set. The only exception to this is indexed views, when the result set is also
stored on disk, just like a table.
Views simplify the design of a database by providing a layer of abstraction, and hiding the complexity of
table joins. Views are also a way of securing your data by giving users permissions to use a view, without
giving them permissions to the underlying objects. This means data can be kept private, and can only be
viewed by appropriate users.

Objectives
After completing this module, you will be able to:

 Understand the role of views in database design.


 Create and manage views.

 Understand the performance considerations with views.


8-2 Designing and Implementing Views

Lesson 1
Introduction to Views
After completing this lesson, you will be able to:

 Describe a view.

 Describe the different types of views in SQL Server®.

 Explain the benefits of using views.

 Work with dynamic management views.


 Work with other types of system view.

Lesson Objectives
In this lesson, you will explore the role of views in the design and implementation of a database. You will
also investigate the system views supplied with Microsoft® SQL Server data management software.
A view is a named SELECT query that produces a result set for a particular purpose. Unlike the underlying
tables that hold data, a view is not part of the physical schema. Views are dynamic, virtual tables that
display specific data from tables.

The data returned by a view might filter the table data, or perform operations on the table data to make it
suitable for a particular need. For example, you might create a view that produces data for reporting, or a
view that is relevant to a specific group of users. The effective use of views in database design improves
performance, security, and manageability of data.

In this lesson, you will learn about views, the different types of views, and how to use them.

What Is a View?
A view is a stored query expression. The query
expression defines what the view will return; it is
given a name, and is stored ready for use when
the view is referenced. Although a view behaves
like a table, it does not store any data. So a view
object takes up very little space—the data that is
returned comes from the underlying base tables.

Views are defined by using a SELECT statement.


They are named—so the definition can be stored
and referenced. They can be referenced in SELECT
statements; and they can reference other views.

Filter Data Using Views


Views can filter the base tables by limiting the columns that a view returns. For example, an application
might show a drop-down list of employee names. This data could be retrieved from the Employee table;
however, not all the columns in the Employee table might be suitable for including in a selection box. By
creating a view, you can limit the returned columns to only those that are necessary, and only those that
users are permitted to see. This is known as vertical filtering.
Developing SQL Databases 8-3

Horizontal filtering limits the rows that a view returns. For example, a Sales table might hold details of
sales for an organization, but sales staff are only permitted to view sales for their own region. You could
create a view that returns only the rows for a particular state or region.

When to Use a View


Views can be used to simplify the way data is presented, hiding complex relationships between tables.
Views can also be used to prevent unauthorized access to data. If a user does not have permissions to see
salary information, for example, a view can be created that displays data without the salary data.
Appropriate groups of users can then be given permissions for that view.

Types of Views
There are two main groups of views: user-defined
views that you create and manage in a database,
and system views that SQL Server manages.

User-defined Views
You can create three types of user-defined views:

 Views or standard views. A view combines


data from one or more base tables, or views.
Any computations, such as joins or
aggregations, are performed during query
execution for each query that references the
view. This is sometimes called a standard view,
or a nonindexed view.
 Indexed views. An indexed view stores data by creating a clustered index on the view. By indexing
the view, the data is stored on disk and so can be retrieved more quickly in future. This can
significantly improve the performance of some queries, including those that aggregate a large
number of rows, or where tables are joined and the results stored. If the underlying data changes
frequently, however, an indexed view is less likely to be suitable. Indexed views are discussed later in
this module.
 Partitioned views. Partitioned views join data from one or more tables using the UNION operator.
Rows from one table are joined to the rows from one or more other tables into a single view. The
columns must be the same, and CHECK constraints on the underlying tables enforce which rows
belong to which tables. Local partitioned views include tables from the same SQL Server instance, and
distributed partitioned views include tables that are located on different servers.

System Views
In addition, there are different types of system views, including:
 Dynamic management views (DMVs) provide dynamic state information, such as data about the
current session or the queries that are currently executing.

 System catalog views provide information about the state of the SQL Server Database Engine.
 Compatibility views are provided for backwards compatibility and replace the system tables used by
previous versions of SQL Server.

 Information schema views provide internal, table-independent metadata that comply with the ISO
standard definition for the INFORMATION_SCHEMA.
8-4 Designing and Implementing Views

Advantages of Views
Views have a number of benefits in your database.

Simplify
Views simplify the complex relationships between
tables by showing only relevant data. Views help
users to focus on a subset of data that is relevant
to them, or that they are permitted to work with.
Users do not need to see the complex queries that
are often involved in creating the view; they work
with the view as if it were a single table.

Security
Views provide security by permitting users to see
only what they are authorized to see. You can use views to limit the access to certain data sets. By only
including data that users are authorized to see, private data is kept private. Views are widely used as a
security mechanism by giving users access to data through the view, but not granting permissions to the
underlying base tables.

Provide an Interface
Views can provide an interface to the underlying tables for users and applications. This provides a layer of
abstraction in addition to backwards compatibility if the base tables change.
Many external applications cannot execute stored procedures or Transact-SQL code, but can select
data from tables or views. By creating a view, you can isolate the data that is needed.
Creating a view as an interface makes it easier to maintain backwards compatibility. Providing the
view still works, the application will work—even if changes have been made to the underlying
schema. For example, if you split a Customer table into two, CustomerGeneral and CustomerCredit, a
Customer view can make it appear that the Customer table still exists, allowing existing applications
to query the data without modifications.

Format Data for Reporting


Correctly formatted data can be provided to reporting applications, thereby removing the need for
complex queries at the application layer.

Reporting applications often need to execute complex queries to retrieve the report data. Rather than
embedding this logic in the reporting application, a view can supply the data in the format required
by the reporting application.
Developing SQL Databases 8-5

Dynamic Management Views


Dynamic management views (DMVs) provide
information about the internal state of a SQL
Server database or server. DMVs, together with
dynamic management functions (DMFs), are
known as dynamic management objects (DMOs).
The key difference between DMVs and DMFs is
that DMFs take parameters.

All DMOs have the name prefix sys.dm_ and are


stored in the system schema. As their name
implies, they return dynamic results, based on the
current state of the object that you are querying.

You can view a complete list of DMVs under the


Views node for a database in Object Explorer. DMFs are listed by category under the System Functions
node.
You can use DMOs to view and monitor the internal health and performance of a server, along with
aspects of its configuration. They also have an important role in assisting with troubleshooting problems,
such as blocking issues, and with performance tuning.

Note: The schema and data returned by DMVs and DMFs may change in future releases of
SQL Server, impacting forward compatibility. It is recommended that you explicitly define the
columns of interest in SELECT statements, rather than using SELECT *, to ensure the expected
number and order of columns are returned.

Other System Views


In addition to DMOs, SQL Server provides catalog
views, compatibility views, and information
schema views that you can use to access system
information.

Catalog Views
SQL Server exposes information relating to
database objects through catalog views. Catalog
views provide metadata that describes both user
created database objects, and SQL Server system
objects. For example, you can use catalog views to
retrieve metadata about tables, indexes, and other
database objects.
8-6 Designing and Implementing Views

This sample code uses the sys.tables catalog view, together with the OBJECTPROPERTY system function,
to retrieve all the tables that have IDENTITY columns. The sys.tables catalog view returns a row for each
user table in the database.

Using the sys.tables Catalog View

USE AdventureWorks2016;
GO

SELECT SCHEMA_NAME(schema_id) AS 'Schema', name AS 'Table'


FROM sys.tables
WHERE OBJECTPROPERTY(object_id,'TableHasIdentity') = 1
ORDER BY 'Schema', 'Table';
GO

Catalog views are categorized by their functionality. For example, object catalog views report on object
metadata.

Some catalog views inherit from others. For example, sys.views and sys.tables inherit from sys.objects.
That is, sys.objects returns metadata for all user-defined database objects, whereas sys.views returns a
row for each user-defined view, and sys.tables returns a row for each user-defined table.

Note: Catalog views are updated with information on the new releases of SQL Server. Use
SELECT * FROM sys.objects—for example, to get all the data from a catalog view.

For more information about catalog views and the different categories of catalog views, see Microsoft
Docs:

System Catalog Views (Transact-SQL)


http://aka.ms/lfswev

Compatibility Views
Before catalog views were introduced in SQL Server 2005, you would use system tables to retrieve
information about internal objects. For backward compatibility, a set of compatibility views are available
so that applications continue to work. However, compatibility views only expose information relevant to
SQL Server 2000. For new development work, you should use the more up-to-date catalog views.

For more information on compatibility views, see Microsoft Docs:

Compatibility Views (Transact-SQL)


http://aka.ms/w0a477

Information Schema Views


Information schema views comply with ISO standards for SQL. This was developed because different
database vendors use different methods of storing and accessing metadata. The ISO standard provides a
common format for all database products.
Developing SQL Databases 8-7

Examples of commonly used INFORMATION_SCHEMA views include:

Using INFORMATION_SCHEMA Views

SELECT * FROM INFORMATION_SCHEMA.TABLES;


SELECT * FROM INFORMATION_SCHEMA.PARAMETERS;
SELECT * FROM INFORMATION_SCHEMA.COLUMNS;
SELECT * FROM INFORMATION_SCHEMA.COLUMN_PRIVILEGES;
SELECT * FROM INFORMATION_SCHEMA.CHECK_CONSTRAINTS;

For more details about information schema views, see Microsoft Docs:

Information Schema Views (Transact-SQL)


http://aka.ms/yh3gmm

Demonstration: Querying Catalog Views and DMVs


In this demonstration, you will see how to:
 Query catalog views and INFORMATION_SCHEMA views.

 Query DMVs.

Demonstration Steps
Query System Views and Dynamic Management Views

1. Ensure that the MT17B-WS2016-NAT, 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are
running, and then log on to 20762C-MIA-SQL as AdventureWorks\Student with the password
Pa55w.rd.

2. Start SQL Server Management Studio.


3. In the Connect to Server dialog box, in the Server name box, type the name of the Azure server you
created before the course started; for example, <servername>.database.windows.net.

4. In the Authentication list, click SQL Server Authentication.

5. In the Login box, type Student, in the Password box, type Pa55w.rd, and then click Connect.
6. On the File menu, point to Open, and then click File.

7. In the Open File dialog box, navigate to D:\Demofiles\Mod08, click Mod_08_Demo_1A.sql, and
then click Open.
8. On the toolbar, in the Available Databases list, click AdventureWorksLT.

9. Select the code under the Step 2 - Query sys.views comment, and then click Execute.

10. Select the code under the Step 3 - Query sys.tables comment, and then click Execute.

11. Select the code under the Step 4 - Query sys.objects comment, and then click Execute.

12. Select the code under the Step 5 - Query information_schema.tables comment, and then click
Execute.

13. In Object Explorer, expand Databases, expand AdventureWorksLT, expand Views, and then expand
System Views. Note the system views and user-defined views.

14. Select the code under the Step 6 - Query sys.dm_exec_connections comment, and then click
Execute.
8-8 Designing and Implementing Views

15. Select the code under the Step 7 - Query sys.dm_exec_sessions comment, and then click Execute.

16. Select the code under the Step 8 - Query sys.dm_exec_requests comment, and then click Execute.

17. Select the code under the Step 9 - Query sys.dm_exec_query_stats comment, and then click
Execute.

18. Select the code under the Step 10 - Modify the query to add a TOP(20) and an ORDER BY
comment, and then click Execute.

19. Leave SSMS open for the next demonstration.

Check Your Knowledge


Question

What is the purpose of a view?

Select the correct answer.

To list all the DMVs available in SQL


Server.

To present relevant information to


users and hide complexity.

To remove the need for any security


in a SQL Server database.

To encrypt data in certain tables.

To make tables faster to update.


Developing SQL Databases 8-9

Lesson 2
Creating and Managing Views
In this lesson, you will learn how to create a view, in addition to how to alter and drop a view. You will
learn about how views, and the objects on which they are based, have owners. You will also learn how to
find information about existing views, and work with updateable views. You will find information about
existing views, and how to obfuscate the definition of views.

Lesson Objectives
After completing this lesson, you will be able to:

 Create a view.

 Drop a view.

 Alter a view.

 Explain the concept of ownership chains, and how it applies to views.

 Retrieve information about views.

 Work with updateable views.

 Obfuscate view definitions.

Create a View

CREATE VIEW
To create a new view, use the CREATE VIEW
command. At its simplest, you create a view by
giving it a name and writing a SELECT statement.
To show only the names of current employees,
you can create an employee list view that includes
only Title, FirstName, MiddleName, and LastName.
As with tables, column names must be unique. If
you are using an expression, it must have an alias.

Create a new view.

Use the CREATE VIEW command

CREATE VIEW Person.vwEmployeeList AS


SELECT Title, FirstName, MiddleName, LastName
FROM person.person
INNER JOIN HumanResources.Employee
ON Person.BusinessEntityID = Employee.BusinessEntityID
WHERE Employee.CurrentFlag = 1;
GO

Best Practice: It is good practice to prefix the name of your view with vw; for example,
vwEmployeeList. Although database developers differ in their naming conventions, most would
agree that it is beneficial to be able to see clearly which objects are tables, and which are views.
8-10 Designing and Implementing Views

Within the SELECT statement, you can reference other views instead of, or in addition to, base tables. Up
to 32 levels of nesting are permitted. However, the practice of deeply nesting views quickly becomes
difficult to understand and debug. Any performance problems can also be difficult to fix.

In a view definition, you cannot use either the INTO keyword or the OPTION clause. Also, because view
definitions are permanent objects within the database, you cannot reference a temporary table or table
variable. Views have no natural output order so queries that access the views need to specify the order for
the returned rows.

Note: You cannot guarantee ordered results in a view definition. Although you can use the
ORDER BY clause, it is only used to determine the rows returned by the TOP, OFFSET, or FOR XML
clauses. It does not determine the order of the returned rows.

Once created, views behave much like a table; for example, you can query the view just as you would
query a table.

View Attributes
There are three view attributes:

WITH ENCRYPTION
The WITH ENCRYPTION attribute obfuscates the view definition in catalog views where the text of
CREATE VIEW is held. It also prevents the view definition being displayed from Object Explorer. WITH
ENCRYPTION also stops the view from being included in SQL Server replication.

WITH SCHEMABINDING
You can specify the WITH SCHEMABINDING option to stop the underlying table(s) being changed in
a way that would affect the view definition. Indexed views must use the WITH SCHEMABINDING
option.

WITH VIEW_METADATA
The WITH VIEW_METADATA attribute determines how SQL Server returns information to ODBC
drivers and the OLE DB API. Normally, metadata about the underlying tables is returned, rather than
metadata about the view. This is a potential security loophole—by using WITH VIEW_METADATA, the
metadata returned is the view name, and not the underlying table names.

Note: The WITH ENCRYPTION attribute does not encrypt the data being returned by the
view; it only encrypts the view definition stored in catalog views, such as sys.syscomments.

The WITH CHECK Option


The WITH CHECK option is used if you want to ensure that, when updates are made to underlying tables
through the view, they comply with any filtering that the view defines. For example, consider a view that
includes a WHERE clause to return only employees working in the Sales department. This would prevent
you from adding a new record to the underlying table for an employee working in Finance.
Developing SQL Databases 8-11

After you have created a view, you can work with it as if it were a table.

Querying Views

SELECT *
FROM Person.vwEmployeeList;
GO

You can join a view to a table as if it were a table.

Joining Views to Tables

SELECT E.FirstName, E.LastName, H.NationalIDNumber


FROM Person.vwEmployeeList E
INNER JOIN HumanResources.Employee H
ON E.BusinessEntityID = H.BusinessEntityID;
GO

Drop a View
To remove a view from a database, use the DROP
VIEW statement. This removes the definition of the
view, and all associated permissions.

Use the DROP VIEW statement to delete a view


from the database.

DROP VIEW

DROP VIEW Person.vwEmployeeList;


GO

Even if a view is recreated with exactly the same


name as a view that has been dropped, permissions that were formerly associated with the view are
removed.

Best Practice: Keep database documentation up to date, including the purpose for each
view you create, and where they are used. This will help to identify views that are no longer
required. These views can then be dropped from the database. Keeping old views that have no
use makes database administration more complex, and adds unnecessary work, particularly at
upgrade time.

If a view was created using the WITH SCHEMABINDING option, you will need to either alter the view, or
drop the view, if you want to make changes to the structure of the underlying tables.

You can drop multiple views with one comma-delimited list, as shown in the following example:

Dropping Multiple Views

DROP VIEW Person.vwEmployeeList, Person.vwSalaries;


GO
8-12 Designing and Implementing Views

Alter a View
After a view is defined, you can modify its
definition without dropping and recreating the
view.

The ALTER VIEW statement modifies a previously


created view. This includes indexed views, which
are discussed in the next lesson. One advantage of
ALTER VIEW is that any associated permissions are
retained.

For example, if you want to remove the Title field


from the Person.vwEmployeeList view, and add the
BusinessEntityID.
For example, use the following code to remove the Title field from the Person.vwEmployeeList view, and
add the BusinessEntityID column to the view:

Altering a View

ALTER VIEW Person.vwEmployeeList AS


SELECT Person.BusinessEntityID, FirstName, MiddleName, LastName
FROM Person.Person
INNER JOIN HumanResources.Employee
ON Person.BusinessEntityID = Employee.BusinessEntityID
WHERE Employee.CurrentFlag = 1;
GO

Ownership Chains and Views


An ownership chain refers to database objects that
reference each other in a chain. Each database
object has an owner—SQL Server compares the
owner of an object to the owner of the calling
object.

When you are querying a view, you must have an


unbroken chain of ownership from the view to the
underlying tables. Users who execute a query on a
view, and who have permissions on the underlying
tables, will always be allowed to query the view.

Checking Permissions in an Ownership


Chain
Views are often used to provide a layer of security between the underlying tables, and the data that users
see. Access is allowed to a view, but not the underlying tables. For this to function correctly, an unbroken
ownership chain must exist.

For example, John has no access to a table that Nupur owns. If Nupur creates a view or stored procedure
that accesses the table and gives John permission to the view, John can then access the view and, through
it, the data in the underlying table. However, if Nupur creates a view or stored procedure that accesses a
table that Tim owns and grants John access to the view or stored procedure, John would not be able to
Developing SQL Databases 8-13

use the view or stored procedure—even if Nupur has access to Tim's table, because of the broken
ownership chain. Two options are available to correct this situation:
 Tim could own the view or stored procedure instead of Nupur.

 John could be granted permission to the underlying table.

It is not always desirable to grant permissions to the underlying table, and views are often used as a way
of limiting access to certain data.

Ownership Chains vs. Schemas


SQL Server 2005 introduced the concept of schemas. At that point, the two-part naming for objects
changed from owner.object to schema.object. All objects still have owners, including schemas. Security is
simplified if schema owners also own the objects that are contained in the schemas.

Sources of Information about Views


After views have been created, there are a number
of ways you can find information about them,
including their definition.

SSMS Object Explorer


SQL Server Management Studio (SSMS) Object
Explorer lists system views and user defined views
under the database that contains them. Here you
can also see the columns, triggers, indexes, and
statistics that are defined on the view.

You can use the standard SSMS scripting


functionality from Object Explorer to script the
view to a defined location. However, if the view was created using the WITH ENCRYPTION option, you
cannot script the definition.
You can also use the Script View as menu option to alter, drop, or select from a view. In each case, the
script is opened in a new query window for you to inspect, and then run.

Note: Despite appearances, each database does not have its own system views. Object
Explorer simply gives you a view onto the system views available to all databases.

Catalog Views
There are a number of catalog views that give you information about views, including:

 sys.objects

 sys.views

 sys.sql_expression_dependencies

 sys.dm_sql_referenced_entities

The sys.sql_expression_dependencies catalog view lets you find column level dependencies. If you
change the name of an object that a view references, you must modify the view so that it references the
new name. Before renaming an object, it is helpful to display the dependencies of the object so you can
determine whether the proposed change will affect any views.
8-14 Designing and Implementing Views

You can find overall dependencies by querying the sys.sql_expression_dependencies view. You can find
column-level dependencies by querying the sys.dm_sql_referenced_entities view.
Display the referenced entities for Person.vwEmployeeList.

Using sys.dm_sql_referenced_entities

USE AdventureWorks2016;
GO
SELECT referenced_schema_name, referenced_entity_name, referenced_minor_name,
referenced_class_desc, is_caller_dependent
FROM sys.dm_sql_referenced_entities ('Person.vwEmployeeList', 'OBJECT');
GO

For more information about sys.dm_sql_references_entities, see Microsoft Docs:

sys.dm_sql_referenced_entities (Transact-SQL)
http://aka.ms/wr0oz6

For more information about sys.sql_expression_dependencies, see Microsoft Docs:

sys.sql_expression_dependencies (Transact-SQL)
http://aka.ms/mgon4g

System Stored Procedure


You could display object definitions, including unencrypted views, by executing the system stored
procedure sp_helptext and passing it the name of the view. The view definition will not be displayed,
however, if WITH ENCRYPTION was used to create the view.
You can use a system stored procedure to display a view definition.

Display a view definition using sp_helptext

USE AdventureWorks2016;
GO

EXEC sp_helptext 'Person.vwEmployeeList';


GO

System Function
The OBJECT_DEFINITION() function returns the definition of an object in relational format. This is more
appropriate for an application to use than the output of a system stored procedure such as sp_helptext.
Again, the view must not have been created using the WITH ENCRYPTION attribute.

Use a system function to return the view definition.

Return the view definition with OBJECT_DEFINITION()

USE AdventureWorks2016;
GO
SELECT OBJECT_DEFINITION (OBJECT_ID(N'Person.vwEmployeeList')) AS [View Definition];
GO
Developing SQL Databases 8-15

Updateable Views
An updateable view lets you modify data in the
underlying table or tables. This means that, in
addition to being able to query the view, you can
also insert, update, or delete rows through the
view.

Limitations for Updateable Columns


If you want to update data through your view, you
must ensure that the columns:

 Are from one table only.

 Directly reference the base table columns.

 Do not include an aggregate function: AVG, COUNT, SUM, MIN, MAX, GROUPING, STDEV, STDEVP,
VAR, and VARP.
 Are not be affected by DISTINCT, GROUP BY, HAVING clauses.

 Are not computed columns formed by using any other columns.

 Do not have TOP used in the view definition.

Although views can contain aggregated values from the base tables, you cannot update these columns.
Columns that are involved in operations, such as GROUP BY, HAVING, or DISTINCT, cannot be updated.

INSTEAD OF Triggers
Updates through views cannot affect columns from more than one base table. However, to work around
this restriction, you can create INSTEAD OF triggers. Triggers are discussed in Module 11: Responding to
Data Manipulation via Triggers.

Updates Must Comply with Table Constraints


Data that is modified through a view must still comply with the base table constraints. This means that
updates must still comply with constraints such as NULL or NOT NULL, primary and foreign keys, and
defaults. The data must comply with the base table’s constraints, as if the base table was being modified
directly. This can be difficult if not all the columns in the base table are displayed in the view. For example,
an INSERT operation on the view would fail if the base table had NOT NULL columns that were not
displayed in the view.

WITH CHECK Option


You can modify a row so that the row no longer meets the view definition. For example, a view might
include WHERE State = ‘WA’. If a user updated a row and set State = ‘CA’, this row would not be returned
when the view is next queried. It would seem to have disappeared.

If you want to stop updates that do not meet the view definition, specify the WITH CHECK option. SQL
Server will then ensure that any data modifications meet the view definition. In the previous example, it
would prevent anyone from modifying a row that did not include State = ‘WA’.
8-16 Designing and Implementing Views

Hide View Definitions


As we have seen, the view definition can be
retrieved from catalog views. As views are
sometimes used as a security mechanism to
prevent users from seeing data from some
columns of a table, developers may want to
prevent access to the view definition.

Adding the WITH ENCRYPTION option to a view


means that the view definition cannot be retrieved
from using OBJECT_DEFINITION( ) or sp_helptext.
WITH ENCRYPTION also blocks the SCRIPT VIEW
AS functionality from SSMS.

If you encrypt your view definitions, it is therefore


critical to keep an accurate and up-to-date copy of all view definitions for maintenance purposes.

Whilst encrypting view definitions provides a certain level of security, this also makes it more difficult to
troubleshoot when there are performance problems. The encryption is not strong by today’s standards—
there are third-party tools that can decrypt the source code. Do not rely on this option if protecting the
view definition is critical to your business.

Use the WITH ENCRYPTION option to hide the view definition.

WITH ENCRYPTION option

CREATE VIEW vwVacationByJobTitle


WITH ENCRYPTION
AS
SELECT JobTitle, Gender, VacationHours
FROM HumanResources.Employee
WHERE CurrentFlag = 1;
GO

Note: WITH ENCRYPTION also stops the view being published with SQL Server replication.

Demonstration: Creating, Altering, and Dropping a View


In this demonstration, you will see how to:

 Create a view using two AdventureWorksLT tables.

 Query the view and order the result set.

 Alter the view to add the encryption option.

 Drop the view.

Demonstration Steps
1. In SSMS, on the File menu, point to Open, and then click File.

2. In the Open File dialog box, navigate to D:\Demofiles\Mod08, click Mod_08_Demo_2A.sql, and
then click Open.

3. On the toolbar, in the Available Databases list, click AdventureWorksLT.


Developing SQL Databases 8-17

4. Select the code under the Step 2 - Create a new view comment, and then click Execute.

5. Select the code under the Step 3 - Query the view comment, and then click Execute.

6. Select the code under the Step 4 - Query the view and order the results comment, and then click
Execute.

7. Select the code under the Step 5 - Query the view definition via OBJECT_DEFINITION comment,
and then click Execute.

8. Select the code under the Step 6 - Alter the view to use WITH ENCRYPTION comment, and then
click Execute.

9. Select the code under the Step 7 - Requery the view definition via OBJECT_DEFINITION
comment, and then click Execute.

10. Note that the query definition is no longer accessible because the view is encrypted.
11. Select the code under the Step 8 - Drop the view comment, and then click Execute.

12. Close SSMS without saving any changes.

Check Your Knowledge


Question

What does the WITH CHECK option do?

Select the correct answer.

Checks that the data in the view does


not contain mistakes.

Checks that the view definition is well


formed.

Checks that there is no corruption in


the underlying tables.

Checks that inserted data conforms to


the view definition.

Checks that inserted data complies


with table constraints.
8-18 Designing and Implementing Views

Lesson 3
Performance Considerations for Views
This lesson discusses how the query optimizer handles views, what an indexed view is, and when you
might use them. It also considers a special type of view—the partitioned view.

Lesson Objectives
After completing this lesson, you will be able to:

 Explain how the query optimizer handles views.

 Understand indexed views and when to use them.

 Understand nested views and when to use them.


 Understand partitioned views and when to use them.

Views and Dynamic Resolution


Standard views are expanded and incorporated
into the queries in which they are referenced; the
objects that they reference are resolved at
execution time.

A single query plan is created that merges the


query being executed and the definition of any
views that it accesses. A separate query plan for
the view is not created.
Merging the view query into the outer query is
called “inlining” the query. It can be very beneficial
to performance because SQL Server can eliminate
unnecessary joins and table accesses from queries.
Standard views do not appear in execution plans for queries because the views are not accessed. The
underlying objects that they reference will be seen in the execution plans.
You should avoid using SELECT * in a view definition. As an example, you will notice that, if you add a new
column to the base table, the view will not reflect the column until the view has been refreshed. You can
correct this situation by executing an updated ALTER VIEW statement or by calling the sp_refreshview
system stored procedure.
Developing SQL Databases 8-19

Indexed Views
An indexed view has a clustered index added to it.
By adding a clustered index, the view is
“materialized” and the data is permanently stored
on disk. Complex views that include aggregations
and joins can benefit from having an index added
to the view. The data stored on disk is faster to
retrieve, because any calculations and
aggregations do not need to be done at run time.

Creating Indexed Views


You can create an indexed view using Transact-
SQL. Before creating the view, you should check
that the SET options on the underlying tables have
specific values. The definition of the view must also be deterministic, so that the data stored will always be
the same.
To create the view, use the CREATE VIEW <name> WITH SCHEMABINDING statement followed by a
CREATE UNIQUE CLUSTERED INDEX statement.

For more information about creating indexed views, see Microsoft Docs:

Create Indexed Views


http://aka.ms/ypi7xz

Using Indexed Views


Having created an indexed view, you use it in two ways:

1. It can be referenced directly in a FROM clause. Depending on the query, the indexed view will be
faster to access than a nonindexed view. The performance improvement for certain queries can be
dramatic when an index is added to a view.

2. The indexed view can be used by the query optimizer, in place of the underlying tables, whenever
there is a performance benefit.

When updates to the underlying data are made, SQL Server automatically makes updates to the data that
is stored in the indexed view. This means that there is an overhead to using an indexed view, because
modifications to data in the underlying tables may not be as quick. Although an indexed view is
materialized and the data is stored on disk, it is not a table. The definition of an indexed view is defined by
a SELECT statement and the data is modified in the underlying table.

Indexed views have a negative impact on the performance of INSERT, DELETE, and UPDATE operations on
the underlying tables because the view must also be updated. However, for some queries, they
dramatically improve the performance of SELECT queries on the view. They are most useful for data that is
regularly selected, but less frequently updated.

Best Practice: Indexed views are useful in decision support systems that are regularly
queried, but updated infrequently. A data warehouse or data mart might use indexed views
because much of the data is aggregated for reporting.
8-20 Designing and Implementing Views

Nested View Considerations


A nested view is one that calls another view, which
may in turn call one or more other views, and so
on. Although the developer of the original view
might not have planned it, some views end up as
“Russian dolls”, with views inside views, inside
views.

SQL Server does not restrict how deep you can


nest views, but there are implications to using
nesting. Issues that arise include:
 Broken ownership chains. To grant
permission to others, the view and the
underlying table must have the same owner.

 Performance. Slow running queries can be difficult to debug when views are nested within one
another. In theory, the query optimizer handles the script as if the views were not nested; in practice,
it can make bad decisions trying to optimize each part of the code. This type of performance problem
can be difficult to debug.

 Maintenance. When developers leave, the views they created may still be used in someone else’s
code—because the application depends on the view, it cannot be deleted. But no one wants to
amend the view because they do not understand the full implications of how the original view is
used. The business has to put up with poorly performing queries and views, because it would take too
long to go back and understand how the original view is being used.
Having pointed out the pitfalls of nested views, there are also some advantages. After a view has been
written, tested, and documented, it can be used in different parts of an application, just like a table.
However, it is important to understand the potential problems.

Partitioned Views
A partitioned view is a view onto a partitioned
table. The view makes the table appear to be one
table, even though it is actually several tables.

Partitioned Tables
To understand partitioned views, we first have to
understand partitioned tables. A partitioned table
is a large table that has been split into a number
of smaller tables. Although the actual size of the
table may vary, tables are normally partitioned
when performance problems occur, or
maintenance jobs take an unacceptable time to
complete. To solve these problems, the table is
split into a number of smaller tables, using one of the columns as the criteria. For example, a customer
table might be partitioned on the date of the last order with a separate table for each year. This speeds up
queries, and allows maintenance jobs to complete more quickly. A WITH CHECK constraint is created to
ensure that data within each table complies with the constraint. All tables must have the same columns,
and all columns must be of the same data type and size.
Developing SQL Databases 8-21

In a local partitioned view, all the constituent tables are located on the same SQL Server instance. A
distributed partitioned view is where at least one table resides on a different server.

Update Data Using a Partitioned View


A partitioned view is a view onto the constituent tables of a partitioned table. The view allows the tables
to be used as if they were one table, so simplifying the management of partitioned tables. You can update
the underlying tables through the view; SQL Server ensures that inserts, updates or deletions affect the
correct underlying table.

Performance Benefits of Partitioned Views


Large tables benefit from being partitioned because smaller tables are faster to work with. They are faster
to query, and faster to index. When used in conjunction with a partitioned view, you can hide the
complexity of working with several tables, and simplify application logic.

Question: Can you think of queries in your SQL Server environment that use nested views?
What advantages and disadvantages are there with using nested views?
8-22 Designing and Implementing Views

Lab: Designing and Implementing Views


Scenario
A new web-based stock promotion is being tested at the Adventure Works Bicycle Company. Your
manager is worried that providing access from the web-based system directly to the database tables will
be insecure, so has asked you to design some views for the web-based system.

The Sales department has also asked you to create a view that enables a temporary worker to enter new
customer data without viewing credit card, email address, or phone number information.

Objectives
After completing this lab, you will be able to:

 Create standard views.

 Create updateable views.

Estimated Time: 45 minutes

Virtual machine: 20762C-MIA-SQL

User name: AdventureWorks\Student

Password: Pa55w.rd

Exercise 1: Creating Standard Views


Scenario
The web-based stock promotion requires two new views: OnlineProducts and Available Models. The
documentation for each view is shown in the following tables:

View 1: OnlineProducts
View Column Table Column

ProductID Production.Product,ProductID

Name Production.Product,Name

Product Number Production.Product,ProductNumber

Color Production.Product.Color. If NULL, return ‘N/A’

Availability Production.Product.DaysToManufacture. If 0
returns ‘In stock’, If 1 returns ‘Overnight’. If 2
return ‘2 to 3 days delivery’. Otherwise, return
‘Call us for a quote’.

Size Production.Product.Size

Unit of Measure Production.Product.SizeUnitMeasureCode

Price Production.Product.ListPrice

Weight Production.Product.Weight
Developing SQL Databases 8-23

This view is based on the Production.Product table. Products should be displayed only if the product is on
sale, which can be determined using the SellStartDate and SellEndDate columns.

View 2: Available Models


View Column Table Column

Product ID Production.Product.ProductID

Product Name Production.Product.Name

Product Model ID Production.ProductModel.ProductModelID

Product Model Production.ProductMode.Name

This view is based on two tables: Production.Product and Production.ProductModel. Products should be
displayed only if the product is on sale, which can be determined using the SellStartDate and SellEndDate
columns.
The main tasks for this exercise are as follows:

1. Prepare the Environment

2. Design and Implement the Views


3. Test the Views

 Task 1: Prepare the Environment


1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are both running, and then
log on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
2. Run Setup.cmd in the D:\Labfiles\Lab08\Starter folder as Administrator.

 Task 2: Design and Implement the Views


1. Review the documentation for the new views.

2. Using SSMS, connect to MIA-SQL using Windows Authentication.


3. Open a new query window.

4. Write and execute scripts to create the new views.

 Task 3: Test the Views


 Query both views to ensure that they return the data required in the original documentation.

Results: After completing this exercise, you will have two new views in the AdventureWorks database.
8-24 Designing and Implementing Views

Exercise 2: Creating an Updateable View


Scenario
The Sales department has asked you to create an updateable view based on the Sales.CustomerPII table,
enabling a temporary worker to enter a batch of new customers while keeping the credit card, email and
phone number information secure.

The view must contain three columns from the Sales.CustomerPII table: CustomerID, FirstName and
LastName. You must be able to update the view with new customers.

View Columns Table Columns

CustomerID Sales.CustomerPII.CustomerID

FirstName Sales.CustomerPII.FirstName

LastName Sales.CustomerPII.LastName

The main tasks for this exercise are as follows:

1. Design and Implement the Updateable View

2. Test the Updateable View

 Task 1: Design and Implement the Updateable View


1. Review the requirements for the updateable view.

2. Write and execute a script to create the new view.

 Task 2: Test the Updateable View


1. Write and execute a SELECT query to check that the view returns the correct columns. Order the
result set by CustomerID.

2. Write and execute an INSERT statement to add a new record to the view.

3. Check that the new record appears in the view results.

4. Close SSMS without saving any changes.

Results: After completing this exercise, you will have a new updateable view in the database.

Question: What are three requirements for a view to be updateable?

Question: What is a standard, nonindexed view?


Developing SQL Databases 8-25

Module Review and Takeaways


In this module, we have discussed what a view is, why you would use one, and the advantages of using
views. We have looked at system views, and practiced creating views.

You have created views:

 Based on one underlying table.

 Based on two underlying tables.

 Based on one underlying table that can be updated.

We have discussed the problems with nesting views, and the advantages of creating an indexed view.

Review Question(s)
Question: When you create a new view, what does SQL Server store in the database?

You might also like