Module 5 6 7 8
Module 5 6 7 8
Module 5
Introduction to Indexes
Contents:
Module Overview 5-1
Lesson 1: Core Indexing Concepts 5-2
Module Overview
An index is a collection of pages associated with a table. Indexes are used to improve the performance of
queries or enforce uniqueness. Before learning to implement indexes, it is helpful to understand how they
work, how effective different data types are when used within indexes, and how indexes can be
constructed from multiple columns. This module discusses table structures that do not have indexes, and
the different index types available in Microsoft® SQL Server®.
Objectives
After completing this module, you will be able to:
Explain core indexing concepts.
Lesson 1
Core Indexing Concepts
Although it is possible for Microsoft SQL Server data management software to read all of the pages in a
table when it is calculating the results of a query, doing so is often highly inefficient. Instead, you can use
indexes to point to the location of required data and to minimize the need for scanning entire tables. In
this lesson, you will learn how indexes are structured and learn the principles associated with the design of
indexes. Finally, you will see how indexes can become fragmented over time, and the steps required to
resolve this fragmentation.
Lesson Objectives
After completing this lesson, you will be able to:
In this module, you will consider standard indexes that are created on tables. SQL Server also includes
other types of index:
Integrated full-text search is a special type of index that provides flexible searching of text.
Spatial indexes are used with the GEOMETRY and GEOGRAPHY data types.
Developing SQL Databases 5-3
Primary and secondary XML indexes assist when querying XML data.
Columnstore indexes are used to speed up aggregate queries against large data sets.
A Useful Analogy
It is useful to consider an analogy that might be easier to relate to. Consider a physical library. Most
libraries store books in a given order, which is basically an alphabetical order within a set of defined
categories.
Note that, even when you store the books in alphabetical order, there are various ways to do it. The order
of the books could be based on the title of the book or the name of the author. Whichever option is
chosen makes one form of search easy and other searches harder. For example, if books were stored in
title order, how would you find the ones that were written by a particular author? An index on a book’s
title and an index on the author would mean a librarian could find books quickly for either type of search.
Index Structures
Tree structures provide rapid search capabilities
for large numbers of entries in a list.
Compare this against using an index. In the binary tree structure, 136 is compared against 100. As it is
greater than 100, you inspect the next level down on the right side of the tree. Is 136 less than or greater
than 150? It is less than, so you navigate down the left side. The desired value can be found in the page
containing values 126 to 150, as 136 is greater than 125. Looking at each value in this page for 136, it is
found at the 10th record. So a total of 13 values need to be compared, using a binary tree, against a
possible 200 inspections against a random heap.
SQL Server indexes are based on a form of self-balancing tree. Whereas binary trees have, at most, two
children per node, SQL Server indexes can have a larger number of children per node. This helps improve
the efficiency of the indexes and reduces the overall depth of an index—depth being defined as the
number of levels from the top node (called the root node) to the bottom nodes (called leaf nodes).
Selectivity
Additional indexes on a table are most useful
when they are highly selective. Selectivity is the
most important consideration when selecting
which columns should be included in an index.
Locating the book in the bookcases, based on the information in the index entry.
Returning to the index and finding the next entry for the author.
Locating the book in the bookcases, based on the information in that next index entry.
You would need to keep repeating the same steps until you had found all of the books by that author.
Now imagine doing the same for a range of authors, such as one-third of all of the authors in the library.
You soon reach a point where it would be quicker to just scan the whole library and ignore the author
index, rather than running backward and forward between the index and the bookcases.
Density
Density is a measure of the lack of uniqueness of the data in a table. It is a value between 0 and 1.0, and
can be calculated for a column with the following formula:
Index depth
Index depth is a measure of the number of levels from the root node to the leaf nodes. Users often
imagine that SQL Server indexes are quite deep, but the reality is different. The large number of children
that each node in the index can have produces a very flat index structure. Indexes that are only three or
four levels deep are very common.
Index Fragmentation
Index fragmentation is the inefficient use of pages
within an index. Fragmentation can occur over
time, as data in a table is modified.
External fragmentation relates to where the new bookcase would be physically located. It would probably
need to be placed at the end of the library, even though it would “logically” need to be in a different
order. This means that, to read the bookcases in order, you could no longer just walk directly from one
bookcase to another. Instead, you would need to follow pointers around the library to track a chain
between bookcases.
Detecting Fragmentation
SQL Server provides a measure of fragmentation in the sys.dm_db_index_physical_stats dynamic
management view. The avg_fragmentation_in_percent column shows the percentage of fragmentation.
SQL Server Management Studio also provides details of index fragmentation in the properties page for
each index.
5-6 Introduction to Indexes
Demonstration Steps
Identify Fragmented Indexes
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running and then log
on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
5. In the Connect to Server dialog box, in the Server name box, type MIA-SQL, and then click
Connect.
7. In the Open Project dialog box, navigate to D:\Demofiles\Mod05, click Demo05.ssmssln, and then
click Open.
8. In Solution Explorer, expand Queries, and then double-click Demonstration 1.sql.
9. Select the code under the Step 1: Open a query window to the AdventureWorks database
comment, and then click Execute.
10. Select the code under the Step 2: Query the index physical stats DMV comment, and then click
Execute.
12. Select the code under the Step 4: Note that there are choices on the level of detail returned
comment, and then click Execute.
13. Select the code under the Step 5: The final choice is DETAILED comment, and then click Execute.
View the Fragmentation of an Index in SSMS
3. In the Index Properties - AK_Product_Name dialog box, in the Select a page pane, click
Fragmentation.
4. Note the Total fragmentation is 75%, and that this matches the results from the query executed in
the previous task step 11.
6. Keep SQL Server Management Studio open for the next demonstration.
Developing SQL Databases 5-7
Categorize Activity
Categorize each item into the appropriate property of an index. Indicate your answer by writing the
category number to the right of each item.
Items
Lesson 2
Data Types and Indexes
Not all data types work equally well when included in an index. The size of the data and the selectivity of
the search are the most important considerations for performance, but you should also consider usability.
In this lesson, you will gain a better understanding of the impacts of the different data types on index
performance.
Lesson Objectives
After completing this lesson, you will be able to:
tinyint 1 byte
smallint 2 bytes
int 4 bytes
bigint 8 bytes
decimal(p,s) 5 to 17 bytes
numeric(p,s)
smallmoney 4 bytes
money 8 bytes
real 4 bytes
As each page in an index is 8 KB in size, an index with only an int data type will hold a maximum of 2,048
values in a single page.
The disadvantage of using smaller numerical data types is that, inherently, the column will be more dense.
As the range of numbers reduces, the number of duplicates increases.
The preceding table shows the maximum size for each of these columns. As character columns like
varchar will only be as big as the largest data stored in them, these sizes could be considerably smaller.
date 3 bytes
datetime 8 bytes
datetimeoffset 10 bytes
smalldatetime 4 bytes
5-10 Introduction to Indexes
time 5 bytes
timestamp 8 bytes
Like numerical data types, the column will inherently be more dense. As the range of dates reduces, the
number of duplicates increases.
uniqueidentifier 16 bytes
The downside to this uniqueness is that it will take longer to perform updates and deletes to records in
the middle of the table. This is because the index may need to be updated and reordered.
bit 1 bit
Developing SQL Databases 5-11
As the data type is extremely small, many more values can be stored in a page.
When you are deciding whether to index computed columns, you should consider the following:
When the data in the base columns that the computed column references changes, the index is
correspondingly updated. If the data changes frequently, these index updates can impair
performance.
When you rebuild an index on a computed column, SQL Server recalculates the values in the column.
The amount of time that this takes will depend on the number of rows and the complexity of the
calculation, but if you rebuild indexes often, you should consider the impact that this can have.
You can only build indexes on computed columns that are deterministic.
Sequencing Activity
Put the following data types in order of the smallest to the maximum possible size.
Steps
BIT
date
bigint
datetimeoffset
uniqueidentifier
char
text
5-12 Introduction to Indexes
Lesson 3
Heaps, Clustered, and Nonclustered Indexes
Tables in SQL Server can be structured in two ways. Rows can be added in any order, or tables can be
structured with rows added in a specific order. In this lesson, you will investigate both options, and gain
an understanding of how each option affects common data operations.
Lesson Objectives
After completing this lesson, you will be able to:
Detail the characteristics of a clustered index, and its benefits over a heap.
Heaps
The simplest table structure that is available in SQL
Server is a heap. A heap is a table that has no
enforced order for either the pages within the
table, or for the data rows within each page. Data
rows are added to the first available location
within the table's pages that have sufficient space.
If no space is available, additional pages are added
to the table and the rows are placed in those
pages.
Creation
To create a heap in SQL Server, all that is required is the creation of a table.
Creating a heap
CREATE TABLE Library.Book(
BookID INT IDENTITY(1,1) NOT NULL,
ISBN VARCHAR(14) NOT NULL,
Title VARCHAR(4000) NOT NULL,
AuthorID INT NOT NULL,
PublisherID INT NOT NULL,
ReleaseDate DATETIME NOT NULL,
BookData XML NOT NULL
);
There was no order to the books on the bookcases, so when an entry was found in the ISBN index, the
entry would refer to the physical location of the book. The entry would include an address like “Bookcase
12, Shelf 5, Book 3.” That is, there would need to be a specific address for a book. An update to the book
that required moving it to a different location would be problematic. One option for resolving this would
be to locate all index entries for the book and update the new physical location.
An alternate option would be to leave a note in the location where the book used to be, pointing to
where the book has been moved. In SQL Server, this is called a forwarding pointer—it means rows can be
updated and moved without needing to update other indexes that point to them.
A further challenge arises if the book needs to be moved again. There are two ways in which this could be
handled—another note could be left pointing to the new location or the original note could be modified
to point to the new location. Either way, the original indexes would not need to be updated. SQL Server
deals with this by updating the original forwarding pointer. This way, performance does not continue to
degrade by having to follow a chain of forwarding pointers.
Forwarding pointers were a common performance problem with tables in SQL Server that were structured
as heaps. They can be resolved via the following command:
Forwarding pointers were a common performance problem with tables in SQL Server that were structured
as heaps. They can be resolved via the following command:
You can also use this command to change the compression settings for a table. Page and row
compression are advanced topics that are beyond the scope of this course.
5-14 Introduction to Indexes
Operations on a Heap
The most common operations that are performed
on tables are INSERT, UPDATE, DELETE, and
SELECT. It is important to understand how each of
these operations is affected by structuring a table
as a heap.
A DELETE operation could be imagined as scanning the bookcases until the book is found, removing the
book, and throwing it away. More precisely, it would be like placing a tag on the book, to say that it
should be thrown out the next time the library is cleaned up or space on the bookcase is needed.
An UPDATE operation would be represented by replacing a book with a (potentially) different copy of the
same book. If the replacement book was the same (or smaller) size as the original book, it could be placed
directly back in the same location as the original. However, if the replacement book was larger, the
original book would be removed and the replacement moved to another location. The new location for
the book could be in the same bookcase or in another bookcase.
There is a common misconception that including additional indexes always reduces the performance of
data modification operations. However, it is clear that for the DELETE and UPDATE operations described
above, having another way to find these rows might well be useful. In Module 6, you will see how to
achieve this.
Clustered Indexes
Rather than storing data rows of data as a heap,
you can design tables that have an internal logical
ordering. This kind of table is known as a clustered
index or a rowstore.
There is a common misconception that pages in a clustered index are “physically stored in order.”
Although this is possible in rare situations, it is not commonly the case. If it were true, fragmentation of
clustered indexes would not exist. SQL Server tries to align physical and logical order while it creates an
index, but disorder can arise as data is modified.
Index and data pages are linked within a logical hierarchy and also double-linked across all pages at the
same level of the hierarchy—to assist when scanning across an index.
Creation
You can create clustered indexes, either directly by using the CREATE INDEX command, or automatically
in situations where a PRIMARY KEY constraint is specified on the table:
The following Transact-SQL will create a table. The alter statement then adds a constraint, with the side
effect of a clustered index being created.
Updating
You can rebuild, reorganize and disable an index. The last option of disabling an index isn’t really
applicable for clustered indexes, because disabling one doesn’t allow any access to the underlying data in
the table. However, disabling a nonclustered index does have its uses. These will be discussed in a future
topic.
The REORGANIZE statement can be used in the same way, either on a specific index or on a whole table.
Deleting
If a clustered index is created explicitly, then the following Transact-SQL will delete it:
A table will need to be altered to delete a clustered index, if it was created as a consequence of defining a
constraint.
When an UPDATE operation is performed, if the replacement book is the same size or smaller and the
ISBN has not changed, the book can just be replaced in the same location. If the replacement book is
larger, the ISBN has not changed, and there is spare space within the bookcase, all other books in the
bookcase can slide along to enable the larger book to be replaced in the same spot.
If there was insufficient space in the bookcase to accommodate the larger book, the bookcase would need
to be split. If the ISBN of the replacement book was different from the original book, the original book
would need to be removed and the replacement book treated like the insertion of a new book.
A DELETE operation would involve the book being removed from the bookcase. (Again, more formally, it
would be flagged as free in a free space map, but simply left in place for later removal.)
When a SELECT operation is performed, if the ISBN is known, the required book can be quickly located by
efficiently searching the library. If a range of ISBNs was requested, the books would be located by finding
the first book and continuing to collect books in order, until a book was encountered that was out of
range, or until the end of the library was reached.
Developing SQL Databases 5-17
You can override this action by specifying the word NONCLUSTERED when declaring the PRIMARY KEY
constraint.
A primary key on a SQL table is used to uniquely identify rows in that table, and it must not contain any
NULL values. In most situations, a primary key is a good candidate for a clustering key. However, a real
world scenario, where the primary key may not be the clustered key, is in a table that requires a high
volume of inserts. If this table has a sequential primary key, all inserts will be at the end of the table, in the
last page. SQL Server may need to lock this page whilst inserting, forcing the inserts to become sequential
instead of parallel.
Nonclustered Indexes
You have seen how tables can be structured as
heaps or have clustered indexes. A third option is
that you can create additional indexes on top of
these tables to provide alternative ways to rapidly
locate required data. These additional indexes are
called nonclustered indexes.
A table can have up to 999 nonclustered indexes.
Nonclustered indexes can be defined on a table—
regardless of whether the table uses a clustered
index or a heap—and are used to improve the
performance of important queries.
rows in the table. You must take care to balance the number of indexes that are created against the
overhead that they introduce.
Creation
Similar to clustered indexes, nonclustered indexes are created explicitly on a table. The columns to be
included also need to be specified.
There is an option that is unique to nonclustered indexes. They can have an additional INCLUDE option on
declaration that is used to create covering indexes. These will be discussed in further detail in Module 6:
Advanced Indexing.
Updating
The Transact-SQL for nonclustered indexes is exactly the same as for clustered indexes. You can rebuild,
reorganize and disable an index.
Disabling an index can be very useful for nonclustered indexes on tables that are going to have large
amounts of data, either inserted or deleted. Before performing these data operations, all nonclustered
indexes can be disabled. After the data has been processed, the indexes can then be enabled by executing
a REBUILD statement. This reduces the performance impacts of having nonclustered indexes on tables.
Deletion
The same Transact-SQL that is used for clustered indexes will delete a nonclustered index.
Physical Analogy
The nonclustered indexes can be thought of as indexes that point back to the bookcases. They provide
alternate ways to look up the information in the library. For example, they might give access by author, by
release date, or by publisher. They can also be composite indexes where you might find an index by
release date, within the entries for each author. Composite indexes will be discussed in the next lesson.
Developing SQL Databases 5-19
Physical Analogy
It does not matter how the library is structured—whether the books are stored in ISBN order, or category
and author order, or randomly, a nonclustered index is like an extra card index pointing to the locations
of books in the bookcases. These extra indexes can be on any attribute of a book; for example, the release
date, or whether it has a soft or hard cover. The cards in the index will have a pointer to the book’s
physical location on a shelf.
Each of these extra indexes then need to be maintained by the librarian. When a librarian inserts a new
book in the library, they should make a note of its location in each card index. This is true for removing a
book, or updating its location to another shelf. Each of these operations requires updates to be made to
every card index.
These extra indexes have an advantage when they are used to search for books. The additional indexes
will improve the performance of finding books released in 2003, for example. Without this extra index, the
librarian would have to check the release date of every single book, to see if it matched the required date.
Demonstration Steps
1. In SQL Server Management Studio, in Solution Explorer, double-click Demonstration 2.sql.
2. Select the code under the Step 1: Open a new query window against the tempdb database
comment, and then click Execute.
3. Select the code under the Step 2: Create a table with a primary key specified comment, and then
click Execute.
4. Select the code under the Step 3: Query sys.indexes to view the structure comment, and then click
Execute.
5-20 Introduction to Indexes
5. In Object Explorer, expand Databases, expand System Databases, expand tempdb, expand Tables,
expand dbo.PhoneLog, and then expand Indexes.
6. Note that a clustered index was automatically created on the table.
7. Select the code under the Step 4: Insert some data into the table comment, and then click
Execute.
8. Select the code under the Step 5: Check the level of fragmentation via
sys.dm_db_index_physical_stats comment, and then click Execute.
10. Select the code under the Step 7: Modify the data in the table - this will increase data and cause
page fragmentation comment, and then click Execute.
11. Select the code under the Step 8: Check the level of fragmentation via
sys.dm_db_index_physical_stats comment, and then click Execute.
13. Select the code under the Step 10: Rebuild the table and its indexes comment, and then click
Execute.
14. Select the code under the Step 11: Check the level of fragmentation via
sys.dm_db_index_physical_stats comment, and then click Execute.
20. Select the code under the Step 15: Run the query showing the execution plan (CTR+M) – it now
uses the new index comment, and then click Execute.
22. Select the code under the Step 16: Drop the table comment, and then click Execute.
23. Keep SQL Server Management Studio open for the next demonstration.
Developing SQL Databases 5-21
Categorize Activity
Categorize each attribute of an index. Indicate your answer by writing the attribute number to the right of
each index.
Items
Lesson 4
Single Column and Composite Indexes
The indexes discussed so far have been based on data from single columns. Indexes can also be based on
data from multiple columns, and constructed in ascending or descending order. This lesson investigates
these concepts and the effects that they have on index design, along with details of how SQL Server
maintains statistics on the data that is contained within indexes.
Lesson Objectives
After completing this lesson, you will be able to:
Higher selectivity.
Similarly, an index by topic would be of limited value. After the correct topic had been located, it would
be necessary to search all of the books on that topic to determine if they were by the specified author.
The best option would be an author index that also included details of each book's topic. In that case, a
scan of the index pages for the author would be all that was required to work out which books needed to
be accessed.
When you are constructing composite indexes, in the absence of any other design criteria, you should
typically index the most selective column first. The order of columns in a composite index is important,
not only for performance, but also for whether the query optimizer will even use the index. For example,
an index on City, State would not be used in queries where State is the only column in the WHERE clause.
Developing SQL Databases 5-23
Considerations
The following should also be considered when choosing columns to add to a composite index:
Is the column selective? Only columns that are selective should be used.
How volatile is the column? Columns that are frequently updated will likely cause an index to be
rebuilt. Choose columns that have more static data.
Is the column queried upon? This column should be included providing it passes the above
considerations.
The most selective columns should be first in the composite index; columns with inequality predicates
should be towards the end.
Keep the number of columns to a minimum, as each column added increases the overall size of the
composite index.
A specific type of composite index is a covering index. This kind of column is outside the scope of this
module but is covered in Module 6: Advanced Indexing.
Index Statistics
SQL Server keeps statistics on indexes to assist
when making decisions about how to access the
data in a table.
If statistics are determined to be out of date they can be manually updated with one of the following
Transact-SQL commands:
Update statistics
/* Update all statistics in a database */
EXEC sp_updatestats;
The key issue here is that, before executing the query, you need to know how selective (and therefore
useful) the indexes would be. The statistics that SQL Server holds on indexes provide this knowledge.
Developing SQL Databases 5-25
Demonstration Steps
Use Transact-SQL to View Statistics
2. Select the code under the comment Step 1: Run the Transact-SQL up to the end step 1, and then
click Execute.
6. In the Execution Plan pane, scroll right and point at the last Clustered Index Scan. Note that the actual
and estimated number of rows are equal.
Use SQL Server Management Studio to View Statistics
3. In the Statistics Properties - AK_Employee_LoginID dialog box, in the Select a page section, click
Details.
4. Review the details, and then click Cancel.
1. In Object Explorer, under Databases, right-click AdventureWorks, and then click Properties.
2. In the Database Properties - AdventureWorks dialog box, in the Select a page section, click
Options.
3. In the Other options list, scroll to the top, under the Automatic heading, note that the Auto Create
Statistics and Auto Update Statistics are set to True.
Objectives
After completing this lab, you will be able to:
Results: After completing this exercise, you will have created two new tables in the AdventureWorks
database.
Developing SQL Databases 5-27
2. Using Transact-SQL statements, add a clustered index to that column on the Sales.MediaOutlet
table.
3. Use Object Explorer to check that the index was created successfully.
Results: After completing this exercise, you will have created clustered indexes on the new tables.
3. Use Object Explorer to check that the index was created successfully.
2. Check the Execution Plan and ensure the database engine is using the new
NCI_PrintMediaPlacement index.
3. Close SQL Server Management Studio without saving any changes.
Results: After completing this exercise, you will have created a covering index suggested by SQL Server
Management Studio.
Developing SQL Databases 5-29
Module 6
Designing Optimized Index Strategies
Contents:
Module Overview 6-1
Lesson 1: Index Strategies 6-2
Module Overview
Indexes play an important role in enabling SQL Server to retrieve data from a database quickly and
efficiently. This module discusses advanced index topics including covering indexes, the INCLUDE clause,
query hints, padding and fill factor, statistics, using DMOs, the Database Tuning Advisor, and Query Store.
Objectives
After completing this module, you will be able to understand:
Lesson 1
Index Strategies
This lesson considers index strategies, including covering indexes, the INCLUDE clause, heaps and
clustered indexes, and filtered indexes.
Lesson Objectives
After completing this lesson, you will be able to:
Covering Indexes
Note: A covering index is just a nonclustered index that includes all the columns required
by a particular query. It is sometimes referred to as an “index covering the query”.
Developing SQL Databases 6-3
Index Limitations
We have already seen that a covering index can
improve the performance of some queries;
however, SQL Server has limits on how large an
index can be. In SQL Server, these limitations are:
Maximum of 32 columns.
Use the INCLUDE clause if you want to create a covering index, but some columns are of excluded data
types, or the index might already be at its maximum size.
Note: All columns in an index must be from the same table. If you want an index with
columns from more than one table, create an indexed view.
The following index is created with columns used for searching and sorting as key columns, with the
larger columns that only appear in the SELECT statement as INCLUDED columns:
For more information about the INCLUDE clause, see Microsoft Docs:
What Is a Heap?
A heap is a table that has been created without a
clustered index, or a table that had a clustered
index that has now been dropped. A heap is
unordered: data is written to the table in the order
in which it is created. However, you cannot rely on
data being stored in the same order as it was
created, because SQL Server can reorder the data
to store it more efficiently.
Tables are normally created with a primary key, which automatically creates a clustered index. Additional
nonclustered indexes can then be created as required. However, you can also create nonclustered indexes
on a heap. So why would you want to create a heap?
Developing SQL Databases 6-5
1. The table is so small that it doesn’t matter. A parameters table, or lookup table that contains only a
few rows, might be stored as a heap.
2. You need to write data to disk as fast as possible. A table that holds log records, for example, might
need to write data without delay.
However, it is fair to say that effective use of heaps is rare. Tables are almost always more efficient when
created with a clustered index because, with a heap, the whole table must be scanned to find a record.
Note: You can make the storage of a heap more efficient by creating a clustered index on
the table, and then dropping it. Be aware, however, that each time a clustered index is created or
dropped from a table, the whole table is rewritten to disk. This can be time-consuming, and
requires there to be enough disk space in tempdb.
Filtered Index
Ranges of values that are queried frequently, such as financial values, dates, or geographic regions.
Consider a products table that is frequently queried for products that have the FinishedGoodsFlag.
Although the table holds both components and finished items, the marketing department almost always
queries the table for finished items. Creating a filtered index will improve performance, especially if the
table is large.
The following code example shows an index filtered on finished goods only:
Filtered Index
USE AdventureWorks2016;
GO
Creating filtered indexes not only increases the performance of some queries, but also reduces the size of
an index, taking up less space on disk and making index maintenance operations faster.
A covering index.
A clustered index.
A filtered index.
Lesson 2
Managing Indexes
This lesson introduces topics related to managing indexes, including fill factor, padding, statistics, and
query hints.
Lesson Objectives
After completing this lesson, you will be able to understand:
Note: Adding a fill factor less than 0 or 100 assumes that data will be added roughly evenly
between leaf-level pages. For example, an index with a LastName key would have data added to
different leaf-level pages. However, when data is always added to the last page, as in the case of
an IDENTITY column, fill factor will not necessarily prevent page splits.
Use Transact-SQL to view the default fill factor for the server:
Sp_configure
EXEC sp_configure 'show advanced options', '1';
RECONFIGURE;
Note: Fill factor is an advanced option for sp_configure. This means you must set “show
advanced options” to 1 before you can view the default fill factor settings, or change the settings.
PAD_INDEX
USE AdventureWorks2016;
GO
1. In Object Explorer, right-click the SQL Server instance name and select Properties from the menu.
2. From the Server Properties dialog box, select the Database Settings page.
3. The Default index fill factor is the first option. Set a value between 0 and 100 by typing, or selecting
a value.
Alternatively, you can amend the default fill factor using a Transact-SQL script. This has the same effect as
setting the fill factor through the GUI—if you check in the server properties, you will see that the value
has changed.
Setting the Default Fill Factor using Transact-SQL:
Sp_configure
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'fill factor', 75;
GO
RECONFIGURE;
GO
Create an index and specify the fill factor and pad index.
CREATE INDEX
USE AdventureWorks2016;
GO
-- Creates the IX_AddressID_City_PostalCode index
-- on the Person.Address table with a fill factor of 75.
ALTER INDEX
USE AdventureWorks2016;
GO
Best Practice: Amending the fill factor will cause the index to be rebuilt. Make the change
at a time when the database is not being used heavily, or when it is not being used at all.
Index Properties
To change the fill factor using index properties:
1. In Object Explorer, expand the Databases node, in the database you are working on, then expand the
Tables node. Expand the table relating to the index, and expand Indexes.
Note: Setting a very low fill factor increases the size of the index, and makes it less efficient.
Set the fill factor appropriately, depending on the speed at which new data will be added.
Managing Statistics
Use ALTER DATABASE to set how statistics are updated in your database.
There are two ways to create statistics manually: the CREATE STATIATICS command, or the sp_createstats
stored procedure.
CREATE STATISTICS
You can create statistics on a single column, several columns, or filtered statistics. Using CREATE
STATISTICS, you can specify:
The sample size on which the statistics should be based—this may be a scan of the whole table, a
percentage of rows, or a count of rows.
A filter for the statistics—filtered statistics are covered later in this lesson.
Note: Filtered statistics are created on a subset of rows in a table. Filtered statistics are
created using the WHERE clause as part of the CREATE STATISTICS statement.
SP_CREATESTATS
If you want a quick way to create single-column statistics for all columns in a database that do not have a
statistics object, you can use the sp_createstats stored procedure. This calls CREATE STATISTICS and is used
for creating single-column statistics on all columns in a database not already covered by statistics. It
accepts a limited selection of the options and parameters supported by CREATE STATISTICS.
Sp_createstats (Transact-SQL)
http://aka.ms/ak1mcd
6-12 Designing Optimized Index Strategies
Statistics
http://aka.ms/X36v84
You could also update statistics manually as part of a work that makes large changes to the number of
rows in a table; for example, bulk insert or truncating a table. This can improve the performance of queries
because the query optimizer will have up-to-date statistics.
Index Maintenance
Index maintenance operations do not alter the distribution of data, and so do not require statistics to be
updated. You do not need to update statistics after running ALTER INDEX REBUILD, DBCC REINDEX, DBCC
INDEXDEFRAG, or ALTER INDEX REORGANIZE.
Statistics are automatically updated when you run ALTER INDEX REBUILD or DBCC DBREINDEX.
Best Practice: Do not update statistics more than necessary, because cached query
execution plans will be marked for recompilation. Excessive updating of statistics may result in
unnecessary CPU time being spent recompiling query execution plans.
sys.dm_db_physical_stats—returns information about the index size and type, fragmentation, record
count, and space used. Use to track the rate of fragmentation and to create an effective index
maintenance strategy. Run during off-peak hours as this DMV can affect performance.
For example, use the sys.dm_db_index_physical_stats DMV to get the level of fragmentation within an
index.
sys.dm_db_index_physical_stats
Use AdventureWorks2016;
GO
sys.dm_db_missing_index_columns—a function that takes the index handle as an input, and returns
the index columns required, and how they would be used.
Consolidating Indexes
Although indexes are effective in improving the
performance of queries, they come with a small
overhead for inserts, deletes and updates. If you
have two or more indexes that contain very similar
fields, it might be worth consolidating the indexes
into one larger index. This will mean that not all
the columns will be used for each query; in turn, it
will take a little more time to read. However,
instead of updating several indexes, only one
index will be updated when records are inserted,
deleted, or updated.
6-14 Designing Optimized Index Strategies
Even when indexes are used regularly, it might still be advantageous to consolidate them into one index,
due to the reduced update overhead when changes occur. You should, however, measure the effect of
changes you make to indexes, to ensure they have the results you intend, and that there are no unwanted
side effects.
Note: The order of columns in an index makes a difference. Although two indexes might
have the same columns, they may produce very different performance results for different
queries.
In the following example, the query optimizer would normally use a MERGE JOIN to join the two tables. In
this code segment, a query hint is used to force a HASH JOIN. The cost of the join increases from 22
percent with a MERGE JOIN, to 67 percent with a HASH JOIN:
The following example uses a query hint to indicate that a HASH JOIN should be used:
There are more than 20 query hints, including types of joins, the maximum number of processors to be
used, and forcing a plan, to be recompiled. For a full list of query hints, see Microsoft Docs:
Best Practice: Use query hints judiciously. Unless there is a good reason to use a query
hint, the query optimizer will find the best query plan.
Some query hints that usefully give the query optimizer additional information, and can be included, are:
FAST numberofrows. This tells the query optimizer to retrieve the first n rows quickly, and then return
the full result set. This may help provide a better user experience for some large result sets.
OPTIMIZE FOR. Tells the query optimizer to use a specific variable when creating a plan. If you know
that you will use one variable much more than any other, this may be useful. Other values are
accepted when the query is executed.
ROBUST PLAN. This tells the query optimizer to plan for the maximum row size, rather than for
performance. This reduces errors when data has some very wide rows.
Lesson 3
Execution Plans
Execution plans are generated before a SQL script is executed. They are generated by the query optimizer,
and can help database developers and DBAs to understand why a query is being executed in a certain
way.
Lesson Objectives
After completing this lesson, you will be able to explain:
Execution plans help you to answer these questions by showing how the SQL Server Database Engine
expects to execute a query, or how it has actually executed a query.
Trivial Plans
If a query is simple and executes quickly, the query optimizer does not plan a number of different ways to
execute it—it just uses the most obvious plan. This is known as a trivial execution plan, because it is faster
to execute the query than to compare a number of alternatives. Trivial queries normally retrieve data from
a single table, and do not include calculations or aggregates.
SELECT *
FROM Production.Document
Adding a table join to this query would make the plan nontrivial. The query optimizer would then do
cost-based calculations to select the best execution plan. You can identify trivial execution plans by
running a system DMV (dynamic management view). Run the DMV before and after the query, noting the
number of occurrences of trivial plans.
Count the number of trivial plans occurrences—before and after running your query.
sys.dm_exec_query_optimizer_info
SELECT * FROM sys.dm_exec_query_optimizer_info
Database Statistics
The query optimizer uses statistics to figure out how to execute a query. Statistics describe the data within
the database, including its uniqueness. If statistics are out of date, the query optimizer will make incorrect
calculations, and potentially chose a suboptimal query plan.
Cost-based Selection
The query optimizer uses a cost-based selection process to determine how to execute a query. The cost is
calculated based on a number of factors, including CPU resources, memory, and I/O (input/output)
operations, time to retrieve data from disk. It is not an absolute measure.
The query optimizer cannot try all possible execution plans. It has to balance the time taken to compare
plans with the time taken to execute the query—and it has to cut off at a certain point. It aims to find a
satisfactory plan within a reasonable period of time. Some data definition language (DDL) statements,
such as CREATE, ALTER, or DROP, do not require alternative plans—they are executed straightaway.
6-18 Designing Optimized Index Strategies
Designing a query that modifies data; for example, a query that includes an UPDATE statement. The
estimated execution plan will display the plan without changing the data.
SQL Server shows estimates for the number of rows returned.
Hover over each part of the actual execution plan to display more information about how the query was
executed.
Note: Comparing estimated and actual row counts can help you to identify out-of-date
table statistics. When statistics are up to date, estimated and actual counts will be the same.
For more information about optimizing queries using the plan cache, see MSDN:
Join operators
Seek: a seek finds specific records by looking them up in an index. This is normally faster than a scan,
because specific records can be quickly located and retrieved using the index.
A scan may be used on a clustered index, a nonclustered index, or a heap. A seek may be used on a
clustered or nonclustered index. Both scan and seek operators may output some or all of the rows they
read, depending on what filters are required for the query.
A query plan can always use a scan, but a seek is used only when there is a suitable index. Indexes that
contain all the columns required by a query are called covering indexes.
Best Practice: The query execution plan is a guide to help you to understand how queries
are being executed. Do not, however, try to manipulate how the query optimizer handles a
query. When table statistics are accurate, and appropriate indexes are available, the query
optimizer will almost always find the fastest way of executing the query.
Join Operators
JOIN clauses are used in queries that retrieve records from more than one table. The query execution plan
includes join operators that combine data that is returned by scans or seeks. The data is transformed into
a single data stream by the join operators.
The query optimizer uses one of three join operators, each of which takes two input data streams and
produces one output data stream:
Nested loop
Merge join
Hash match
6-20 Designing Optimized Index Strategies
Nested Loop
A nested loop join performs a search from the second input data stream for each row in the first input
data stream. Where the first input data stream has 1,000 rows, the second input will be searched once for
each row—so it performs 1,000 searches.
In a graphical query plan, the upper input is the first input and the lower input is the second input. In an
XML or text query plan, the second input will appear as a child of the first input. Nested loop joins are
used when the second input is inexpensive to search, either because it is small or has a covering index.
Merge Join
A merge join combines two sorted inputs by interleaving them. The sequence of the input streams has no
impact on the cost of the join. Merge joins are optimal when the input data streams are already sorted,
and are of similar volumes.
Hash Match
A hash match calculates a hash value for each input data stream, and the hash values are compared. The
operation details vary according to the source query, but typically a complete hash table is calculated for
the first input, then the hash table is searched for individual values from the second input. Hash matches
are optimal for large, unsorted input data streams, and for aggregate calculations.
When multiple processors are available, the query optimizer might attempt to speed up queries by
running tasks in parallel on more than one CPU. This is known as parallelism and normally involves large
numbers of rows.
The query execution plan does not actually show the individual threads participating in a parallel query
plan; however, it does show a logical sequence of operators, and the operators that use parallelism are
flagged.
In a graphical plan, parallelized operators have a small orange circle containing two arrows overlaid on
the bottom right-hand corner of the icon. In XML query plans, parallelized operators have the “parallel”
attribute set to “true”. In text query plans generated by SHOWPLAN_ALL or STATISTICS PROFILE, the result
set contains a parallel column with a value of 1 for parallelized operators.
Parallel query plans will also contain at least one instance of the Gather Streams operator, which combines
the results of parallelized operators.
For more information about query plan operators, see Microsoft Docs:
To view a graphical plan in XML format, right-click the plan and click Show Execution Plan XML. The
execution plan is already in XML format, but is displayed graphically by default.
Generate an estimated execution plan in XML format by using the SHOWPLAN_XML option.
SET SHOWPLAN_XML ON
USE AdventureWorks2016;
GO
SELECT *
FROM HumanResources.Employee
WHERE gender = 'F';
GO
Use the SET STATISTICS XML ON to display the actual execution plan in XML format.
Display the actual execution plan in XML format by using the SET STATISTICS XML ON option.
SELECT *
FROM HumanResources.Employee
6-22 Designing Optimized Index Strategies
Note: The toolbar icon Include Actual Execution Plan and the Ctrl-M keyboard shortcut
will toggle, showing the actual execution plan on and off. Ensure this is off before running the
SET statements.
sys.dm_exec_query_stats returns
performance information for all the cached
plans.
sys.dm_exec_procedure_stats returns the
same information as sys.dm_exec_query_stats,
but for stored procedures only.
Use this query to find the top 10 cached plans with the highest average run time per execution.
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp
Use the same query with a small amendment to find the 10 cached plans that use the highest average
CPU per execution.
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp
Amend the example to find the 10 most expensive queries by the average logical reads per execution.
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp
You can adapt these queries to return the most expensive queries by any of the measures in
sys.dm_exec_query_stats, or to limit the results to stored procedure cached plans by using
sys.dm_exec_procedure_stats in place of sys.dm_exec_query_stats.
For more details, see sys.dm_exec_query_stats (Transact-SQL) in Microsoft Docs:
sys.dm_exec_query_stats (Transact-SQL)
http://aka.ms/n3fhua
Remember, sys.dm_exec_query_stats only returns information about cached plans. Plans that are not in
the cache, either because they have been recompiled, or because they have not been executed again
since the plan cache was last flushed, will not appear.
1. Click the Include Live Query Statistics icon on the toolbar. The icon is highlighted to show that it is
selected. Execute the query—the Live Query Statistics tab is displayed.
2. Right-click the query window and select Include Livery Query Statistics from the context-sensitive
menu. Execute the query—the Live Query Statistics tab is displayed.
By including Live Query Statistics using either method, you are enabling statistics for the current session. If
you want to view Live Query Statistics from other sessions, including Activity Monitor, you must execute
one of the following statements:
For more information about Live Query Statistics, see Microsoft Docs:
Note: Live Query Statistics is a troubleshooting tool, and should be used for optimizing
queries. It adds an overhead to the query, and can affect performance.
Lesson 4
The Database Engine Tuning Advisor
The Database Engine Tuning Advisor analyzes a workload and makes recommendations to improve its
performance. You can use either the graphical view, or the command-line tool to analyze a trace file.
Lesson Objectives
After completing this lesson, you will be able to:
Understand what the Database Engine Tuning Advisor is, and when to use it.
Before you can use the Database Engine Tuning Advisor, you must capture a typical workload.
Note: The Database Engine Tuning Advisor is not widely used, although it might be useful
in some circumstances. New tools such as Query Store, discussed later in this module, may be
easier to use for most situations.
Workload Formats
The Database Engine Tuning Advisor can accept
any of the following workloads:
Plan Cache
Transact-SQL script
XML file
For more information about using the Database Engine Tuning Advisor, see Microsoft Docs:
Lesson 5
Query Store
SQL Server includes Query Store, a feature that makes it easier to find and fix problem queries. This lesson
introduces the Query Store, how to enable it, and how to use it.
Lesson Objectives
After completing this lesson, you will be able to:
SET QUERY_STORE = ON
ALTER DATABASE AdventureWorks2016
SET QUERY_STORE = ON;
GO
Query Store works well with the default settings, but you can also configure a number of other settings. If
you have a high workload, configure Query Store to behave when it gets close to disk space capacity.
QUERY_CAPTURE_MODE ALL, AUTO or ALL ALL means that all queries will
NONE be captured. AUTO captures
queries with high execution
count and resource
consumption. NONE does not
capture queries.
You can view the settings either by using database properties in SSMS Object Explorer, or by using
Transact-SQL.
View the Query Store settings using sys.database_query_store_options.
sys.database_query_store_options
SELECT *
FROM sys.database_query_store_options;
Note: Query Store is only used with user databases, it cannot be enabled for master,
msdb, or tempdb.
6-30 Designing Optimized Index Strategies
1. Regressed Queries.
4. Tracked Queries.
Regressed Queries
This view shows queries whose performance has degraded over a period of time. You use a drop-down
box to select performance measured by CPU time, duration, logical round count, logical write count,
memory consumption, or physical reads. You can also see the execution plan for each query.
Tracked Queries
This view shows historical data for a single query.
sys.query_store_plan. Similar to sys.dm_exec_query_plan, this catalog view exposes query plans data
captured by Query Store.
The following code sample demonstrates how you can directly query the data captured by the Query
Store:
3. In the Login box, type Student, in the Password box, type Pa55w.rd, and then click Connect.
4. In Object Explorer, expand Databases right-click AdventureWorksLT, and then click Properties.
5. In the Database Properties - AdventureWorksLT dialog box, click the Query Store page, and in
the General section, ensure the Operation Mode (Requested) is set to Read Write. Point out the
Query Store settings to students. When you are finished, click OK.
6. In Object Explorer, expand the AdventureWorksLT node to see that a folder called Query Store has
been created.
9. In the Open File dialog box, navigate to D:\Demofiles\Mod06, and open QueryStore_Demo.sql.
10. Select the code under the comment Create a covering index on the TempProduct table, and then
click Execute. First, the query creates a covering index on the TempProduct table, and then uses
three columns from this table—point out that the text columns have been included as nonkey
columns.
6-32 Designing Optimized Index Strategies
11. Select the code under the comment Clear the Query Store, and then click Execute.
12. Select the code under the comment Work load 1, and then click Execute.
13. Repeat the previous step another five times, waiting a few seconds between each execution.
14. Select the code under the comment Work load 2, and then click Execute.
15. Repeat the previous step another three times, waiting a few seconds between each execution.
16. Select the code under the comment Regress work load 1, and then click Execute.
17. Select the code under the comment Work load 1, and then click Execute.
18. Repeat the previous step another five times, waiting a few seconds between each execution.
19. In Object Explorer, open the Top Resource Consuming Queries window, and see the difference
between the two execution plans for Workload 1.
20. Demonstrate how you can change the metric using the drop-down box. Note the force plan button.
21. On the File menu, point to Open, and then click File.
24. Select the code under the comment Clear the Query Store, and then click Execute.
25. Select the code under the comment Work load 1, and then click Execute.
27. Select the code under the comment Work load 2, and then click Execute.
28. Repeat the previous step another two times, waiting a few seconds between each execution.
29. Select the code under the comment Regress work load 1, and then click Execute.
30. Select the code under the comment Work load 1, and then click Execute.
31. Select the code under the comment Examine sys.query_store_query_text and
sys.query_context_settings, and then click Execute.
32. Select the code under the comment Examine sys.query_store_query, and then click Execute.
33. Select the code under the comment Examine sys.query_store_plan, and then click Execute.
34. Select the code under the comment Examine sys.query_store_runtime_stats_interval, and then
click Execute.
35. Select the code under the comment Examine runtime statistics, and then click Execute.
Verify the correctness of the statement by placing a mark in the column to the right.
Statement Answer
Objectives
In this lab, you will practice:
3. In SSMS, connect to the MIA-SQL database engine instance using Windows authentication.
4. Open QueryStore_Lab1.sql.
8. Select the code under the comment Run a select query six times, and then click Execute.
10. Select the code to Update the statistics with fake figures, and then click Execute.
14. Switch to the QueryStore_Lab1.sql tab and repeat Step 10 three times.
Developing SQL Databases 6-35
15. Switch to the Top Resource Consuming Queries tab to identify which query plans used a clustered
index seek and which ones used a clustered index scan.
Results: After completing this lab exercise you will have used Query Store to monitor query performance,
and used it to force a particular execution plan to be used.
5. Run the script to SET STATISTICS ON. Run each set of select statements on both the heap, and the
clustered index.
6. Open HeapVsClustered_Timings.docx, and use the document to note the CPU times for each.
8. Run the script to select from each table with the ORDER BY clause.
9. Run the script to select from each table with the WHERE clause.
10. Run the script to select from each table with both the WHERE clause and ORDER BY clause.
12. Compare your results with the timings in the Solution folder.
13. If you have time, run the select statements again and Include Live Query Statistics.
14. Close SQL Server Management Studio, without saving any changes.
Question: Which Query Store features will be most beneficial to your SQL Server
environment?
Question: In which situation might a heap be a better choice than a table with a clustered
index?
Question: Why is it sometimes quicker to retrieve records from a heap than a clustered
index with a simple SELECT statement?
Developing SQL Databases 6-37
Use the Query Store to understand how the most resource intensive queries are performing, and to take
corrective action before they become a problem. Because it stores historical query plans, you can compare
them over time to see when and why a plan has changed. After enabling the Query Store on your
databases, it automatically runs in the background, collecting run-time statistics and query plans; it also
categorizes queries, so it is easy to find those using the most resources, or the longest-running operations.
Query Store separates data into time windows, so you can uncover database usage patterns over time.
Best Practice: Understand how queries are being executed using the estimated and actual
execution plans, in addition to using Query Store. When you need to optimize a query, you will
then be well prepared and have a good understanding of how SQL Server executes your
Transact-SQL script.
7-1
Module 7
Columnstore Indexes
Contents:
Module Overview 7-1
Lesson 1: Introduction to Columnstore Indexes 7-2
Module Overview
Introduced in Microsoft® SQL Server® 2012, columnstore indexes are used in large data warehouse
solutions by many organizations. This module highlights the benefits of using these indexes on large
datasets, the improvements made to columnstore indexes in the latest versions of SQL Server, and the
considerations needed to use columnstore indexes effectively in your solutions.
Objectives
After completing this module, you will be able to:
Describe columnstore indexes and identify suitable scenarios for their use.
Lesson 1
Introduction to Columnstore Indexes
This lesson provides an overview of the types of columnstore indexes available in SQL Server; the
advantages they have over their similar row based indexes; and under what circumstances you should
consider using them. By the end of this lesson, you will see the potential cost savings to your business of
using clustered columnstore index, purely in terms of the gigabytes of disk storage.
Lesson Objectives
After completing this lesson, you will be able to:
Describe the properties of a clustered columnstore index and how it differs from a nonclustered
columnstore index.
SELECT ProductID,
SUM(LineTotal) AS ProductTotalSales
FROM Sales.OrderDetail
GROUP BY ProductID
ORDER BY ProductID
Thinking about the previous example using a row based index, it will need to load into memory all the
rows and columns in all the pages, for all the products. With a column based index, the query only needs
to load the pages associated with the two referenced columns, ProductID and LineTotal. This makes
columnstore indexes a good choice for large data sets.
Developing SQL Databases 7-3
Using a columnstore index can improve the performance for a typical data warehouse query by up to 10
times. There are two key characteristics of columnstore indexes that impact this gain.
Storage. Columnstore indexes store data in a compressed columnar data format instead of by row.
This makes it possible to achieve compression ratios of seven times greater than a standard rowstore
table.
Batch mode execution. Columnstore indexes process data in batches (of 1,000-row blocks) instead
of row by row. Depending on filtering and other factors, a query might also benefit from “segment
elimination,” which involves bypassing million-row chunks (segments) of data and further reducing
I/O.
Columns often store matching data—for example, a set of States enabling the database engine to
compress the data better. This compression can reduce or eliminate any I/O bottlenecks in your
system, while also reducing the memory footprint as a whole.
High compression rates improve overall query performance because the results have a smaller in-
memory footprint.
Instead of processing individual rows, batch execution also improves query performance. This can
typically be a performance improvement of around two to four times because processing is
undertaken on multiple rows simultaneously.
Aggregation queries often select only a few columns from a table, which also reduces the total I/O
required from the physical media.
Nonclustered and clustered indexes are supported in Azure® SQL Database Premium Edition. For a full
list of the columnstore features available in different versions of SQL Server, see Microsoft Docs:
There are two types of columnstore indexes—nonclustered and clustered columnstore indexes—that both
function in the same way. The difference is that a nonclustered index will normally be a secondary index
created on top of a rowstore table; a clustered columnstore index will be the primary storage for a table.
It is a full or partial copy of the data and takes more disk space than a rowstore table.
SQL Server 2016 also introduced support for filtered nonclustered columnstore indexes. This allows a
predicate condition to filter which rows are included in the index. Use this feature to create an index on
only the cold data of an operational workload. This will greatly reduce the performance impact of having
a columnstore index on an online transaction processing (OLTP) table.
It does not store the columns in a sorted order, but optimizes storage for compression and
performance.
It can be updated.
New features
You can now have a nonclustered row index on top of a clustered columnstore index, making it possible
to have efficient table seeks on an underlying columnstore. You can also enforce a primary key constraint
by using a unique rowstore index.
SQL Server 2016 introduced columnstore indexes on memory optimized tables—the most relevant use
being for real-time operational analytical processing.
SQL Server 2017 introduced support for non-persisted computed columns in nonclustered columnstore
indexes.
Note: You cannot include persisted computed columns, and you cannot create
nonclustered indexes on computer columns.
Developing SQL Databases 7-5
Demonstration Steps
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running and then log
on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
3. In the User Account Control dialog box, click Yes. When the script completes, press any key to close
the window.
8. Select the code under the Step 1 comment, and then click Execute.
9. Select the code under the Step 2 comment, and then click Execute.
10. Select the code under the Step 3 comment, and then click Execute.
11. Select the code under the Step 1 comment, and then click Execute.
12. Close Microsoft SQL Server Management Studio without saving changes.
7-6 Columnstore Indexes
Categorize Activity
Categorize each index property into the appropriate index type. Indicate your answer by writing the
category number to the right of each property.
Items
Lesson 2
Creating Columnstore Indexes
This lesson shows you the techniques required to create columnstore indexes on a table. You will see how
to quickly create indexes by using Transact-SQL, or the SQL Server Management Studio user interface.
Lesson Objectives
After completing this lesson, you will be able to:
Transact-SQL
To create a nonclustered columnstore index, use
the CREATE NONCLUSTERED COLUMNSTORE
INDEX statement, as shown in the following code
example:
A nonclustered index does not need to include all the columns from the underlying table. In the
preceding example, only three columns are included in the index.
You can also restrict nonclustered indexes to a subset of the rows contained in a table:
The business reason for wanting to limit a columnstore index to a subset of rows is that it’s possible to use
a single table for both OLTP and analytical processing. In the preceding example, the index supports
analytical processing on historical orders that shipped before 2013.
7-8 Columnstore Indexes
3. Expand Tables, and then expand the required table; for example, FactFinance.
4. Right-click Indexes, point to New Index, and then click Nonclustered Columnstore Index.
5. Add at least one column to the index, and then click OK.
Transact-SQL
To create a clustered columnstore index, use the
CREATE CLUSTERED COLUMNSTORE INDEX
statement as shown in the following code
example:
An optional parameter on a CREATE statement for a clustered index is DROP_EXISTING. You can use this
to rebuild an existing clustered columnstore index or to convert an existing rowstore table into a
columnstore table.
Note: To use the DROP_EXISTING option, the new columnstore index must have the same
name as the index it is replacing.
Developing SQL Databases 7-9
The following example creates a clustered rowstore table, and then converts it into a clustered
columnstore table:
Unable to create columnstore index on Ensure you are using at least V12 of an
an Azure SQL
Azure SQL Database. Database. The pricing tier of the database
also
has to be a minimum of Premium.
3. Expand Tables, and then expand the required table; for example, FactFinance.
4. Right-click Indexes, point to New Index, and then click Clustered Columnstore Index.
5. Click OK.
Note: You don’t need to select columns to create a clustered columnstore index, because
all the columns of a table must be included.
7-10 Columnstore Indexes
After you create the table, you can add a foreign key constraint:
Check the table—it shows that two indexes and two keys exist:
CCI_columnstore_account: a clustered columnstore index.
The previous Transact-SQL results in a columnstore index with a nonclustered index that enforces a
primary key constraint on both indexes.
Developing SQL Databases 7-11
Demonstration Steps
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running and then log
on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
3. In the Connect to Server window, in the Server name box, type MIA-SQL. Ensure that Windows
Authentication is selected in the Authentication box, and then click Connect.
4. In Object Explorer, expand Databases, expand AdventureWorksDW, expand Tables, and then
expand dbo.AdventureWorksDWBuildVersion.
5. Right-click Indexes, point to New Index, and then click Clustered Columnstore Index.
9. Right-click Indexes, point to New Index, and then click Non-Clustered Columnstore Index.
10. In the Columnstore columns table, click Add.
11. Select the SalesOrderNumber, UnitPrice, and ExtendedAmount check boxes, and then click OK.
13. In Object Explorer, expand Indexes to show the created nonclustered index.
Question: How will you create your indexes in a database—with SSMS or Transact-SQL?
7-12 Columnstore Indexes
Lesson 3
Working with Columnstore Indexes
When working with columnstore indexes, you should consider fragmentation and how SQL Server
processes the insertion of data into the index. From SQL Server 2016, columnstore tables can be created in
memory. This makes real-time operational analytics possible.
Lesson Objectives
After completing this lesson, you will be able to:
Check the fragmentation of an index and choose the best approach to resolving the fragmentation.
When bulk loading data, you have the following options for optimizations:
Parallel Load: perform multiple concurrent bulk imports (bcp or bulk insert), each loading separate
data.
Log Optimization: the bulk load will be minimally logged when the data is loaded into a compressed
rowgroup. Minimal logging is not available when loading data with a batch size of less than 102,400
rows.
Developing SQL Databases 7-13
Index Fragmentation
When it comes to managing, columnstore indexes
are no different to rowstore indexes. The utilities
and techniques used to keep an index healthy are
the same.
3. Expand Tables, and then expand the required table; for example, FactFinance.
4. Expand Indexes, right-click on the desired index, and in the context menu, click Properties.
Transact-SQL
SQL Server provides dynamic management views and functions that make it possible for a database
administrator to inspect and review the health of indexes.
One of these functions is sys.dm_db_index_physical_stats that can be run against all the databases on a
server, a specific table in a database, or even a specific index.
The following code sample shows a useful query that joins the results from the
sys.dm_db_index_physical_stats view with the system index table, and then returns the fragmentation
and names of the indexes for a specific database:
Show indexes with fragmentation greater than five percent for a specific database
When you identify that an index requires maintenance, there are two options available in SQL Server: you
can either rebuild or reorganize it. The previous guidance for this was:
However, the reorganizing of columnstore indexes is enhanced in SQL Server 2016 and SQL Server 2017;
therefore, it is rarely necessary to rebuild an index.
7-14 Columnstore Indexes
The first statement in this code sample adds deltastore rowgroups into the columnstore index. Using the
COMPRESS_ALL_ROW_GROUPS option forces all open and closed rowgroups into the index, in a similar
way to rebuilding an index. After the query adds these deltastore rowgroups to the columnstore, the
second statement then combines these, possibly smaller, rowgroups into one or more larger rowgroups.
With a large number of smaller rowgroups, performing the reorganization a second time will improve the
performance of queries against the index. Using these statements in SQL Server 2016 or SQL Server 2017
means that, in most situations, you no longer need to rebuild a columnstore index.
Note: Rebuilding an index will mean SQL Server can move the data in the index between
segments to achieve better overall compression. If a large number of rows are deleted, and the
index fragmentation is more than 30 percent, rebuilding the index may be the best option, rather
than reorganizing.
For more information on all the available views and functions, see:
http://aka.ms/vched5
Zero data latency. Data is analyzed in real time. There are no background or schedule processes to
move data to enable analytics to be completed.
The combination of indexes enables analytical queries to run against the columnstore index and OLTP
operations to run against the OLTP b-tree indexes. The OLTP workloads will continue to perform, but you
may incur some additional overhead when maintaining the columnstore index.
Developing SQL Databases 7-15
As with other similar in-memory tables, you must declare the indexes on memory optimized columnstore
tables at creation. To support larger datasets—for example, those used in data warehouses—the size of
in-memory tables has increased from a previous limit of 256 GB to 2 TB in SQL Server 2016 and SQL
Server 2017.
The Transact-SQL to create an in-memory table is simple. Add WITH (MEMORY_OPTIMIZED = ON) at the
end of a table declaration.
Note: The above Transact-SQL will not work on a database without a memory optimized
file group. Before creating any memory optimized tables, the database must have a memory
optimized file group associated with it.
The following is an example of the code required to create a memory optimized filegroup:
You can alter and join this new in-memory table in the same way as its disk based counterpart. However,
you should use caution when altering an in-memory table, because this is an offline task. There should be
twice the memory available to store the current table—a temporary table is used before being switched
over when it has been rebuilt.
Depending on the performance requirements, you can control the durability of the table by using:
Introduced in SQL Server 2014, the Memory Optimization Advisor is a GUI tool inside SSMS. The tool will
analyze an existing disk based table and warn if there are any features of that table—for example, an
unsupported type of index—that aren’t possible on a memory-optimized table. It can then migrate the
data contained in the disk based table to a new memory-optimized table. The Memory Optimization
Advisor is available on the context menu of any table in Management Studio.
You have been tasked with optimizing the existing database workloads and, if possible, reducing the
amount of disk space being used by the data warehouse.
Objectives
After completing this lab, you will be able to:
Lab Setup
Estimated Time: 45 minutes
Dropping and recreating indexes can take time, depending on the performance of the lab machines.
You must retain the existing indexes on the FactProductInventory table, and ensure you do not impact
current applications by any alterations you make.
2. Examine the Existing Size of the FactProductInventory Table and Query Performance
Task 2: Examine the Existing Size of the FactProductInventory Table and Query
Performance
1. In SQL Server Management Studio, in the D:\Labfiles\Lab07\Starter folder, open the Query
FactProductInventory.sql script file.
2. Configure SQL Server Management Studio to include the actual execution plan.
3. Execute the script against the AdventureWorksDW database. Review the execution plan, making a
note of the indexes used, the execution time, and disk space used.
2. Create the required columnstore index. Re-execute the query to verify that the new columnstore
index is used, along with existing indexes.
3. What, if any, are the disk space and query performance improvements?
Results: After completing this exercise, you will have created a columnstore index and improved the
performance of an analytical query. This will have been done in real time without impacting transactional
processing.
1. Examine the Existing Size of the FactInternetSales Table and Query Performance
Task 1: Examine the Existing Size of the FactInternetSales Table and Query
Performance
1. In SQL Server Management Studio, in the D:\Labfiles\Lab07\Starter folder, open the Query
FactInternetSales.sql script file.
2. Configure SQL Server Management Studio to include the actual execution plan.
3. Execute the script against the AdventureWorksDW database. Review the execution plan, making a
note of the indexes used, the execution time, and disk space used.
2. Create the required columnstore index. Depending on your chosen index, you may need to drop and
recreate keys on the table.
Developing SQL Databases 7-19
3. Re-execute the query to verify that the new columnstore index is used, along with the existing
indexes.
4. What, if any, are the disk space and query performance improvements?
Results: After completing this exercise, you will have greatly reduced the disk space taken up by the
FactInternetSales table, and improved the performance of analytical queries against the table.
2. Enable the Memory Optimization Advisor to Create a Memory Optimized FactInternetSales Table
Note: Hint: consider the rows being used in the Query FactInternetSales.sql to guide your
decision.
4. Instead of running the migration with the wizard, script the results for the addition of a columnstore
index.
Note: The Memory Optimization Advisor won’t suggest columnstore indexes as they are
not applicable in all situations. Therefore, these have to be added manually.
5. Note the statements to create a memory optimized filegroup, and the code to copy the existing data.
2. Configure SQL Server Management Studio to include the actual execution plan.
3. Execute the script against the AdventureWorksDW database, and then review the disk space used
the execution plan.
Results: After completing this exercise, you will have created a memory optimized version of the
FactInternetSales disk based table, using the Memory Optimization Advisor.
Question: Why do you think the disk space savings were so large for the disk based
clustered columnstore index?
Developing SQL Databases 7-21
Module 8
Designing and Implementing Views
Contents:
Module Overview 8-1
Lesson 1: Introduction to Views 8-2
Module Overview
This module describes the design and implementation of views. A view is a special type of query—one
that is stored and can be used in other queries—just like a table. With a view, only the query definition is
stored on disk; not the result set. The only exception to this is indexed views, when the result set is also
stored on disk, just like a table.
Views simplify the design of a database by providing a layer of abstraction, and hiding the complexity of
table joins. Views are also a way of securing your data by giving users permissions to use a view, without
giving them permissions to the underlying objects. This means data can be kept private, and can only be
viewed by appropriate users.
Objectives
After completing this module, you will be able to:
Lesson 1
Introduction to Views
After completing this lesson, you will be able to:
Describe a view.
Lesson Objectives
In this lesson, you will explore the role of views in the design and implementation of a database. You will
also investigate the system views supplied with Microsoft® SQL Server data management software.
A view is a named SELECT query that produces a result set for a particular purpose. Unlike the underlying
tables that hold data, a view is not part of the physical schema. Views are dynamic, virtual tables that
display specific data from tables.
The data returned by a view might filter the table data, or perform operations on the table data to make it
suitable for a particular need. For example, you might create a view that produces data for reporting, or a
view that is relevant to a specific group of users. The effective use of views in database design improves
performance, security, and manageability of data.
In this lesson, you will learn about views, the different types of views, and how to use them.
What Is a View?
A view is a stored query expression. The query
expression defines what the view will return; it is
given a name, and is stored ready for use when
the view is referenced. Although a view behaves
like a table, it does not store any data. So a view
object takes up very little space—the data that is
returned comes from the underlying base tables.
Horizontal filtering limits the rows that a view returns. For example, a Sales table might hold details of
sales for an organization, but sales staff are only permitted to view sales for their own region. You could
create a view that returns only the rows for a particular state or region.
Types of Views
There are two main groups of views: user-defined
views that you create and manage in a database,
and system views that SQL Server manages.
User-defined Views
You can create three types of user-defined views:
System Views
In addition, there are different types of system views, including:
Dynamic management views (DMVs) provide dynamic state information, such as data about the
current session or the queries that are currently executing.
System catalog views provide information about the state of the SQL Server Database Engine.
Compatibility views are provided for backwards compatibility and replace the system tables used by
previous versions of SQL Server.
Information schema views provide internal, table-independent metadata that comply with the ISO
standard definition for the INFORMATION_SCHEMA.
8-4 Designing and Implementing Views
Advantages of Views
Views have a number of benefits in your database.
Simplify
Views simplify the complex relationships between
tables by showing only relevant data. Views help
users to focus on a subset of data that is relevant
to them, or that they are permitted to work with.
Users do not need to see the complex queries that
are often involved in creating the view; they work
with the view as if it were a single table.
Security
Views provide security by permitting users to see
only what they are authorized to see. You can use views to limit the access to certain data sets. By only
including data that users are authorized to see, private data is kept private. Views are widely used as a
security mechanism by giving users access to data through the view, but not granting permissions to the
underlying base tables.
Provide an Interface
Views can provide an interface to the underlying tables for users and applications. This provides a layer of
abstraction in addition to backwards compatibility if the base tables change.
Many external applications cannot execute stored procedures or Transact-SQL code, but can select
data from tables or views. By creating a view, you can isolate the data that is needed.
Creating a view as an interface makes it easier to maintain backwards compatibility. Providing the
view still works, the application will work—even if changes have been made to the underlying
schema. For example, if you split a Customer table into two, CustomerGeneral and CustomerCredit, a
Customer view can make it appear that the Customer table still exists, allowing existing applications
to query the data without modifications.
Reporting applications often need to execute complex queries to retrieve the report data. Rather than
embedding this logic in the reporting application, a view can supply the data in the format required
by the reporting application.
Developing SQL Databases 8-5
Note: The schema and data returned by DMVs and DMFs may change in future releases of
SQL Server, impacting forward compatibility. It is recommended that you explicitly define the
columns of interest in SELECT statements, rather than using SELECT *, to ensure the expected
number and order of columns are returned.
Catalog Views
SQL Server exposes information relating to
database objects through catalog views. Catalog
views provide metadata that describes both user
created database objects, and SQL Server system
objects. For example, you can use catalog views to
retrieve metadata about tables, indexes, and other
database objects.
8-6 Designing and Implementing Views
This sample code uses the sys.tables catalog view, together with the OBJECTPROPERTY system function,
to retrieve all the tables that have IDENTITY columns. The sys.tables catalog view returns a row for each
user table in the database.
USE AdventureWorks2016;
GO
Catalog views are categorized by their functionality. For example, object catalog views report on object
metadata.
Some catalog views inherit from others. For example, sys.views and sys.tables inherit from sys.objects.
That is, sys.objects returns metadata for all user-defined database objects, whereas sys.views returns a
row for each user-defined view, and sys.tables returns a row for each user-defined table.
Note: Catalog views are updated with information on the new releases of SQL Server. Use
SELECT * FROM sys.objects—for example, to get all the data from a catalog view.
For more information about catalog views and the different categories of catalog views, see Microsoft
Docs:
Compatibility Views
Before catalog views were introduced in SQL Server 2005, you would use system tables to retrieve
information about internal objects. For backward compatibility, a set of compatibility views are available
so that applications continue to work. However, compatibility views only expose information relevant to
SQL Server 2000. For new development work, you should use the more up-to-date catalog views.
For more details about information schema views, see Microsoft Docs:
Query DMVs.
Demonstration Steps
Query System Views and Dynamic Management Views
1. Ensure that the MT17B-WS2016-NAT, 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are
running, and then log on to 20762C-MIA-SQL as AdventureWorks\Student with the password
Pa55w.rd.
5. In the Login box, type Student, in the Password box, type Pa55w.rd, and then click Connect.
6. On the File menu, point to Open, and then click File.
7. In the Open File dialog box, navigate to D:\Demofiles\Mod08, click Mod_08_Demo_1A.sql, and
then click Open.
8. On the toolbar, in the Available Databases list, click AdventureWorksLT.
9. Select the code under the Step 2 - Query sys.views comment, and then click Execute.
10. Select the code under the Step 3 - Query sys.tables comment, and then click Execute.
11. Select the code under the Step 4 - Query sys.objects comment, and then click Execute.
12. Select the code under the Step 5 - Query information_schema.tables comment, and then click
Execute.
13. In Object Explorer, expand Databases, expand AdventureWorksLT, expand Views, and then expand
System Views. Note the system views and user-defined views.
14. Select the code under the Step 6 - Query sys.dm_exec_connections comment, and then click
Execute.
8-8 Designing and Implementing Views
15. Select the code under the Step 7 - Query sys.dm_exec_sessions comment, and then click Execute.
16. Select the code under the Step 8 - Query sys.dm_exec_requests comment, and then click Execute.
17. Select the code under the Step 9 - Query sys.dm_exec_query_stats comment, and then click
Execute.
18. Select the code under the Step 10 - Modify the query to add a TOP(20) and an ORDER BY
comment, and then click Execute.
Lesson 2
Creating and Managing Views
In this lesson, you will learn how to create a view, in addition to how to alter and drop a view. You will
learn about how views, and the objects on which they are based, have owners. You will also learn how to
find information about existing views, and work with updateable views. You will find information about
existing views, and how to obfuscate the definition of views.
Lesson Objectives
After completing this lesson, you will be able to:
Create a view.
Drop a view.
Alter a view.
Create a View
CREATE VIEW
To create a new view, use the CREATE VIEW
command. At its simplest, you create a view by
giving it a name and writing a SELECT statement.
To show only the names of current employees,
you can create an employee list view that includes
only Title, FirstName, MiddleName, and LastName.
As with tables, column names must be unique. If
you are using an expression, it must have an alias.
Best Practice: It is good practice to prefix the name of your view with vw; for example,
vwEmployeeList. Although database developers differ in their naming conventions, most would
agree that it is beneficial to be able to see clearly which objects are tables, and which are views.
8-10 Designing and Implementing Views
Within the SELECT statement, you can reference other views instead of, or in addition to, base tables. Up
to 32 levels of nesting are permitted. However, the practice of deeply nesting views quickly becomes
difficult to understand and debug. Any performance problems can also be difficult to fix.
In a view definition, you cannot use either the INTO keyword or the OPTION clause. Also, because view
definitions are permanent objects within the database, you cannot reference a temporary table or table
variable. Views have no natural output order so queries that access the views need to specify the order for
the returned rows.
Note: You cannot guarantee ordered results in a view definition. Although you can use the
ORDER BY clause, it is only used to determine the rows returned by the TOP, OFFSET, or FOR XML
clauses. It does not determine the order of the returned rows.
Once created, views behave much like a table; for example, you can query the view just as you would
query a table.
View Attributes
There are three view attributes:
WITH ENCRYPTION
The WITH ENCRYPTION attribute obfuscates the view definition in catalog views where the text of
CREATE VIEW is held. It also prevents the view definition being displayed from Object Explorer. WITH
ENCRYPTION also stops the view from being included in SQL Server replication.
WITH SCHEMABINDING
You can specify the WITH SCHEMABINDING option to stop the underlying table(s) being changed in
a way that would affect the view definition. Indexed views must use the WITH SCHEMABINDING
option.
WITH VIEW_METADATA
The WITH VIEW_METADATA attribute determines how SQL Server returns information to ODBC
drivers and the OLE DB API. Normally, metadata about the underlying tables is returned, rather than
metadata about the view. This is a potential security loophole—by using WITH VIEW_METADATA, the
metadata returned is the view name, and not the underlying table names.
Note: The WITH ENCRYPTION attribute does not encrypt the data being returned by the
view; it only encrypts the view definition stored in catalog views, such as sys.syscomments.
After you have created a view, you can work with it as if it were a table.
Querying Views
SELECT *
FROM Person.vwEmployeeList;
GO
Drop a View
To remove a view from a database, use the DROP
VIEW statement. This removes the definition of the
view, and all associated permissions.
DROP VIEW
Best Practice: Keep database documentation up to date, including the purpose for each
view you create, and where they are used. This will help to identify views that are no longer
required. These views can then be dropped from the database. Keeping old views that have no
use makes database administration more complex, and adds unnecessary work, particularly at
upgrade time.
If a view was created using the WITH SCHEMABINDING option, you will need to either alter the view, or
drop the view, if you want to make changes to the structure of the underlying tables.
You can drop multiple views with one comma-delimited list, as shown in the following example:
Alter a View
After a view is defined, you can modify its
definition without dropping and recreating the
view.
Altering a View
For example, John has no access to a table that Nupur owns. If Nupur creates a view or stored procedure
that accesses the table and gives John permission to the view, John can then access the view and, through
it, the data in the underlying table. However, if Nupur creates a view or stored procedure that accesses a
table that Tim owns and grants John access to the view or stored procedure, John would not be able to
Developing SQL Databases 8-13
use the view or stored procedure—even if Nupur has access to Tim's table, because of the broken
ownership chain. Two options are available to correct this situation:
Tim could own the view or stored procedure instead of Nupur.
It is not always desirable to grant permissions to the underlying table, and views are often used as a way
of limiting access to certain data.
Note: Despite appearances, each database does not have its own system views. Object
Explorer simply gives you a view onto the system views available to all databases.
Catalog Views
There are a number of catalog views that give you information about views, including:
sys.objects
sys.views
sys.sql_expression_dependencies
sys.dm_sql_referenced_entities
The sys.sql_expression_dependencies catalog view lets you find column level dependencies. If you
change the name of an object that a view references, you must modify the view so that it references the
new name. Before renaming an object, it is helpful to display the dependencies of the object so you can
determine whether the proposed change will affect any views.
8-14 Designing and Implementing Views
You can find overall dependencies by querying the sys.sql_expression_dependencies view. You can find
column-level dependencies by querying the sys.dm_sql_referenced_entities view.
Display the referenced entities for Person.vwEmployeeList.
Using sys.dm_sql_referenced_entities
USE AdventureWorks2016;
GO
SELECT referenced_schema_name, referenced_entity_name, referenced_minor_name,
referenced_class_desc, is_caller_dependent
FROM sys.dm_sql_referenced_entities ('Person.vwEmployeeList', 'OBJECT');
GO
sys.dm_sql_referenced_entities (Transact-SQL)
http://aka.ms/wr0oz6
sys.sql_expression_dependencies (Transact-SQL)
http://aka.ms/mgon4g
USE AdventureWorks2016;
GO
System Function
The OBJECT_DEFINITION() function returns the definition of an object in relational format. This is more
appropriate for an application to use than the output of a system stored procedure such as sp_helptext.
Again, the view must not have been created using the WITH ENCRYPTION attribute.
USE AdventureWorks2016;
GO
SELECT OBJECT_DEFINITION (OBJECT_ID(N'Person.vwEmployeeList')) AS [View Definition];
GO
Developing SQL Databases 8-15
Updateable Views
An updateable view lets you modify data in the
underlying table or tables. This means that, in
addition to being able to query the view, you can
also insert, update, or delete rows through the
view.
Do not include an aggregate function: AVG, COUNT, SUM, MIN, MAX, GROUPING, STDEV, STDEVP,
VAR, and VARP.
Are not be affected by DISTINCT, GROUP BY, HAVING clauses.
Although views can contain aggregated values from the base tables, you cannot update these columns.
Columns that are involved in operations, such as GROUP BY, HAVING, or DISTINCT, cannot be updated.
INSTEAD OF Triggers
Updates through views cannot affect columns from more than one base table. However, to work around
this restriction, you can create INSTEAD OF triggers. Triggers are discussed in Module 11: Responding to
Data Manipulation via Triggers.
If you want to stop updates that do not meet the view definition, specify the WITH CHECK option. SQL
Server will then ensure that any data modifications meet the view definition. In the previous example, it
would prevent anyone from modifying a row that did not include State = ‘WA’.
8-16 Designing and Implementing Views
Whilst encrypting view definitions provides a certain level of security, this also makes it more difficult to
troubleshoot when there are performance problems. The encryption is not strong by today’s standards—
there are third-party tools that can decrypt the source code. Do not rely on this option if protecting the
view definition is critical to your business.
Note: WITH ENCRYPTION also stops the view being published with SQL Server replication.
Demonstration Steps
1. In SSMS, on the File menu, point to Open, and then click File.
2. In the Open File dialog box, navigate to D:\Demofiles\Mod08, click Mod_08_Demo_2A.sql, and
then click Open.
4. Select the code under the Step 2 - Create a new view comment, and then click Execute.
5. Select the code under the Step 3 - Query the view comment, and then click Execute.
6. Select the code under the Step 4 - Query the view and order the results comment, and then click
Execute.
7. Select the code under the Step 5 - Query the view definition via OBJECT_DEFINITION comment,
and then click Execute.
8. Select the code under the Step 6 - Alter the view to use WITH ENCRYPTION comment, and then
click Execute.
9. Select the code under the Step 7 - Requery the view definition via OBJECT_DEFINITION
comment, and then click Execute.
10. Note that the query definition is no longer accessible because the view is encrypted.
11. Select the code under the Step 8 - Drop the view comment, and then click Execute.
Lesson 3
Performance Considerations for Views
This lesson discusses how the query optimizer handles views, what an indexed view is, and when you
might use them. It also considers a special type of view—the partitioned view.
Lesson Objectives
After completing this lesson, you will be able to:
Indexed Views
An indexed view has a clustered index added to it.
By adding a clustered index, the view is
“materialized” and the data is permanently stored
on disk. Complex views that include aggregations
and joins can benefit from having an index added
to the view. The data stored on disk is faster to
retrieve, because any calculations and
aggregations do not need to be done at run time.
For more information about creating indexed views, see Microsoft Docs:
1. It can be referenced directly in a FROM clause. Depending on the query, the indexed view will be
faster to access than a nonindexed view. The performance improvement for certain queries can be
dramatic when an index is added to a view.
2. The indexed view can be used by the query optimizer, in place of the underlying tables, whenever
there is a performance benefit.
When updates to the underlying data are made, SQL Server automatically makes updates to the data that
is stored in the indexed view. This means that there is an overhead to using an indexed view, because
modifications to data in the underlying tables may not be as quick. Although an indexed view is
materialized and the data is stored on disk, it is not a table. The definition of an indexed view is defined by
a SELECT statement and the data is modified in the underlying table.
Indexed views have a negative impact on the performance of INSERT, DELETE, and UPDATE operations on
the underlying tables because the view must also be updated. However, for some queries, they
dramatically improve the performance of SELECT queries on the view. They are most useful for data that is
regularly selected, but less frequently updated.
Best Practice: Indexed views are useful in decision support systems that are regularly
queried, but updated infrequently. A data warehouse or data mart might use indexed views
because much of the data is aggregated for reporting.
8-20 Designing and Implementing Views
Performance. Slow running queries can be difficult to debug when views are nested within one
another. In theory, the query optimizer handles the script as if the views were not nested; in practice,
it can make bad decisions trying to optimize each part of the code. This type of performance problem
can be difficult to debug.
Maintenance. When developers leave, the views they created may still be used in someone else’s
code—because the application depends on the view, it cannot be deleted. But no one wants to
amend the view because they do not understand the full implications of how the original view is
used. The business has to put up with poorly performing queries and views, because it would take too
long to go back and understand how the original view is being used.
Having pointed out the pitfalls of nested views, there are also some advantages. After a view has been
written, tested, and documented, it can be used in different parts of an application, just like a table.
However, it is important to understand the potential problems.
Partitioned Views
A partitioned view is a view onto a partitioned
table. The view makes the table appear to be one
table, even though it is actually several tables.
Partitioned Tables
To understand partitioned views, we first have to
understand partitioned tables. A partitioned table
is a large table that has been split into a number
of smaller tables. Although the actual size of the
table may vary, tables are normally partitioned
when performance problems occur, or
maintenance jobs take an unacceptable time to
complete. To solve these problems, the table is
split into a number of smaller tables, using one of the columns as the criteria. For example, a customer
table might be partitioned on the date of the last order with a separate table for each year. This speeds up
queries, and allows maintenance jobs to complete more quickly. A WITH CHECK constraint is created to
ensure that data within each table complies with the constraint. All tables must have the same columns,
and all columns must be of the same data type and size.
Developing SQL Databases 8-21
In a local partitioned view, all the constituent tables are located on the same SQL Server instance. A
distributed partitioned view is where at least one table resides on a different server.
Question: Can you think of queries in your SQL Server environment that use nested views?
What advantages and disadvantages are there with using nested views?
8-22 Designing and Implementing Views
The Sales department has also asked you to create a view that enables a temporary worker to enter new
customer data without viewing credit card, email address, or phone number information.
Objectives
After completing this lab, you will be able to:
Password: Pa55w.rd
View 1: OnlineProducts
View Column Table Column
ProductID Production.Product,ProductID
Name Production.Product,Name
Availability Production.Product.DaysToManufacture. If 0
returns ‘In stock’, If 1 returns ‘Overnight’. If 2
return ‘2 to 3 days delivery’. Otherwise, return
‘Call us for a quote’.
Size Production.Product.Size
Price Production.Product.ListPrice
Weight Production.Product.Weight
Developing SQL Databases 8-23
This view is based on the Production.Product table. Products should be displayed only if the product is on
sale, which can be determined using the SellStartDate and SellEndDate columns.
Product ID Production.Product.ProductID
This view is based on two tables: Production.Product and Production.ProductModel. Products should be
displayed only if the product is on sale, which can be determined using the SellStartDate and SellEndDate
columns.
The main tasks for this exercise are as follows:
Results: After completing this exercise, you will have two new views in the AdventureWorks database.
8-24 Designing and Implementing Views
The view must contain three columns from the Sales.CustomerPII table: CustomerID, FirstName and
LastName. You must be able to update the view with new customers.
CustomerID Sales.CustomerPII.CustomerID
FirstName Sales.CustomerPII.FirstName
LastName Sales.CustomerPII.LastName
2. Write and execute an INSERT statement to add a new record to the view.
Results: After completing this exercise, you will have a new updateable view in the database.
We have discussed the problems with nesting views, and the advantages of creating an indexed view.
Review Question(s)
Question: When you create a new view, what does SQL Server store in the database?