0% found this document useful (0 votes)
12 views

Module 1 2 3 4

Uploaded by

saadehsan.17
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Module 1 2 3 4

Uploaded by

saadehsan.17
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 118

1-1

Module 1
An Introduction to Database Development
Contents:
Module Overview 1-1
Lesson 1: Introduction to the SQL Server Platform 1-2

Lesson 2: SQL Server Database Development Tasks 1-11

Module Review and Takeaways 1-17

Module Overview
Before beginning to work with Microsoft® SQL Server® in either a development or an administration
role, it is important to understand the scope of the SQL Server platform. In particular, it is useful to
understand that SQL Server is not just a database engine—it is a complete platform for managing
enterprise data.

SQL Server provides a strong data platform for all sizes of organizations, in addition to a comprehensive
set of tools to make development easier, and more robust.

Objectives
After completing this module, you will be able to:

 Describe the SQL Server platform.


 Understand the common tasks undertaken during database development and the SQL Server Tools
provided to support these tasks.
1-2 An Introduction to Database Development

Lesson 1
Introduction to the SQL Server Platform
Microsoft SQL Server data management software is a platform for developing business applications that
are data focused. Rather than being a single, monolithic application, SQL Server is structured as a series of
components. It is important to understand the use of each component.

You can install more than one copy of SQL Server on a server. Each copy is called an instance and you can
configure and manage them separately.

There are various editions of SQL Server, and each one has a different set of capabilities. It is important to
understand the target business cases for each, and how SQL Server has evolved through a series of
improving versions over many years. It is a stable and robust data management platform.

Lesson Objectives
After completing this lesson, you will be able to:
 Describe the overall SQL Server platform.

 Explain the role of each of the components that make up the SQL Server platform.

 Describe the functionality that SQL Server instances provide.

 Explain the available SQL Server editions.

 Explain how SQL Server has evolved through a series of versions.

SQL Server Architecture


The SQL Server Database Engine is one component
in the suite of products that comprise SQL Server.
However, the database engine is not a homogenous
piece of software; it is made up of different
modules, each with a separate function.
SQL Server Operating System

Underpinning SQL Server Database Engine is the


SQL Server Operating System (SQLOS). This
performs functions such as managing memory,
managing locks, the thread scheduler, the buffer
pool, and much more. It also provides access to
external components, such as the common
language runtime (CLR).
If you want to explore the SQLOS in more detail, look at the list of dynamic management views (DMVs)
related to the SQLOS. A full list is provided in Microsoft Docs:

SQL Server Operating System Related Dynamic Management Views (Transact-SQL)


http://aka.ms/Ye1hmt
Developing SQL Databases 1-3

Database Engine

The SQL Server Database Engine is made of up two main parts:

1. The storage engine.

2. The query processor.

The storage engine manages access to data stored in the database, including how the data is physically
stored on disk, backups and restores, indexes, and more.

The query processor ensures that queries are formatted correctly; it plans how best to execute a query,
and then executes the query.

SQL Server is an integrated and enterprise-ready platform for data management that offers a low total
cost of ownership.

Enterprise Ready
SQL Server provides a very secure, robust, and stable relational database management system, but there is
much more to it than that. You can use SQL Server to manage organizational data and provide both
analysis of, and insights into, that data. Microsoft also provides other enterprise development
environments—for example, Visual Studio®—that have excellent integration and support for SQL Server.

The SQL Server Database Engine is one of the highest performing database engines available and
regularly features in the top tier of industry performance benchmarks. You can review industry
benchmarks and scores on the Transaction Processing Performance Council (TPC) website.

High Availability
Impressive performance is necessary, but not at the cost of availability. Many enterprises are now finding
it necessary to provide access to their data 24 hours a day, seven days a week. The SQL Server platform
was designed with the highest levels of availability in mind. As each version of the product has been
released, more capabilities have been added to minimize any potential downtime.
Security

Uppermost in the minds of enterprise managers is the requirement to secure organizational data. It is not
possible to retrofit security after an application or product has been created. From the outset, SQL Server
has been built with the highest levels of security as a goal. SQL Server includes encryption features such as
Always Encrypted, designed to protect sensitive data such as credit card numbers, and other personal
information.

Scalability

Organizations require data management capabilities for systems of all sizes. SQL Server scales from the
smallest needs, running on a single desktop, to the largest—a high-availability server farm—via a series of
editions that have increasing capabilities.

Cost of Ownership

Many competing database management systems are expensive both to purchase and to maintain. SQL
Server offers very low total cost of ownership. SQL Server tooling (both management and development)
builds on existing Windows® knowledge. Most users tend to quickly become familiar with the tools. The
productivity users can achieve when they use the various tools is enhanced by the high degree of
integration between them. For example, many of the SQL Server tools have links to launch and
preconfigure other SQL Server tools.
1-4 An Introduction to Database Development

SQL Server Components


SQL Server is an excellent relational database
engine, but as a data platform, it offers much more
than this. SQL Server consists of several components
as described in the following table:

Component Description

SQL Server A relational database engine based on Structured Query Language (SQL), the
Database core service for storing, processing, and securing data, replication, full-text
Engine search, tools for managing relational and XML data.

Analysis An online analytical processing (OLAP) engine that works with analytic cubes and
Services (SSAS) supports data mining applications.

Reporting Offers a reporting engine based on web services and provides a web portal and
Services (SSRS) end-user reporting tools. It is also an extensible platform that you can use to
develop report applications.

Integration Used to orchestrate the movement of data between SQL Server components and
Services (SSIS) other external systems. Traditionally used for extract, transform and load (ETL)
operations.

Master Data Provides tooling for managing master or reference data and includes hierarchies,
Services (MDS) granular security, transactions, data versioning, and business rules, as well as an
add-in for Excel® for managing data.

Data Quality With DQS, you can build a knowledge base and use it to perform data quality
Services (DQS) tasks, including correction, enrichment, standardization, and de-duplication of
data.

Replication A set of technologies for copying and distributing data and database objects
between multiple databases.
Developing SQL Databases 1-5

Alongside these server components, SQL Server provides the following management tools:

Tool Description

SQL Server An integrated data management environment designed for developers


Management Studio and database administrators to manage the core database engine.
(SSMS)

SQL Server Provides basic configuration management of services, client and server
Configuration Manager protocols, and client aliases.

SQL Server Profiler A graphical user interface to monitor and assist in the management of
performance of database engine and Analysis Service components.

Database Engine Provides guidance on, and helps to create, the optimal sets of indexes,
Tuning Advisor indexed views, and partitions.

Data Quality Services A graphical user interface that connects to a DQS server, and then
Client provides data cleansing operations and monitoring of their
performance.

SQL Server Data Tools An integrated development environment for developing business
(SSDT) intelligence (BI) solutions utilizing SSAS, SSRS, and SSIS.

Connectivity Components that facilitate communication between clients and


Components servers. For example, ODBC, and OLE DB.
1-6 An Introduction to Database Development

SQL Server Instances


It is sometimes useful to install more than one copy
of a SQL Server Database Engine on a single server.
You can install many instances, but the first instance
will become the default. All other instances should
then be named. Some components can be installed
once, and used by multiple instances of the
database engine, such as SQL Server Data Tools
(SSDT).

Multiple Instances

The ability to install multiple instances of SQL Server


components on a single server is useful in several
situations:
 You may require different administrators or security environments for sets of databases. Each instance
of SQL Server can be managed and secured separately.

 Applications may require SQL Server configurations that are inconsistent or incompatible with the
server requirements of other applications. Each instance of SQL Server can be configured
independently.
 You might want to support application databases with different levels of service, particularly in
relation to availability. To meet different service level agreements (SLAs), you can create SQL Server
instances to separate workloads.

 You might want to support different versions or editions of SQL Server.


 Applications might require different server-level collations. Although each database can have a
different collation, an application might be dependent on the collation of the tempdb database
which is shared between all databases. In this case you can create separate instances, with each
instance having the correct collation.

You can install different versions of SQL Server side by side, using multiple instances. This can assist when
testing upgrade scenarios or performing upgrades.
Default and Named Instances

One instance can be the default instance on a database server; this instance will have no name.
Connection requests will connect to the default instance if it is sent to a computer without specifying an
instance name. There is no requirement to have a default instance, because you can name every instance.

All other instances of SQL Server require an instance name, in addition to the server name, and are known
as ‘‘named’’ instances. You cannot install all components of SQL Server in more than one instance. A
substantial change in SQL Server 2012 was the introduction of multiple instance support for SQL Server
Integration Services (SSIS).

You do not have to install SSDT more than once. A single installation of the tools can manage and
configure all installed instances.
Developing SQL Databases 1-7

SQL Server Editions


SQL Server is available in a wide variety of editions.
These have different price points and different
levels of capability.

Use Cases for SQL Server Editions

Each SQL Server edition is targeted to a specific


business use case, as shown in the following table:

Edition Business Use Case

Enterprise Provides comprehensive high-end datacenter capabilities and includes end-to-


end BI.

Standard Provides basic data management and BI capabilities.

Developer Includes all the capabilities of the Enterprise edition, licensed to be used in
development and test environments. It cannot be used as a production server.

Express Free database that supports learning and building small desktop data-driven
applications.

Web Gives a low total-cost-of-ownership option to provide database capabilities to


small and large scale web properties.

Azure® SQL Helps you to build database applications on a scalable and robust cloud
Database platform.
This is the Azure version of SQL Server.
1-8 An Introduction to Database Development

SQL Server Versions


SQL Server has a rich history of innovation that has
been achieved while maintaining strong levels of
stability. It has been available for many years, yet it
is still rapidly evolving, with new capabilities and
features. Indeed, this evolution has begun to
increase in speed, with Microsoft supporting SQL
Server in the cloud with Azure.

Early Versions
The earliest versions of SQL Server (1.0 and 1.1)
were based on the OS/2 operating system.

Versions 4.2 and later moved to the Windows


operating system, initially on the Windows NT operating system.

Later Versions
Version 7.0 saw a significant rewrite of the product. Substantial advances were made to reduce the
administration workload for the product. OLAP Services, which later became Analysis Services, was
introduced.

SQL Server 2000 featured support for multiple instances and collations. It also introduced support for
data mining. SSRS was introduced after the product release as an add-on enhancement, along with
support for 64-bit processors.

SQL Server 2005 provided support for non-relational data that was stored and queried as XML, and SSMS
was released to replace several previous administrative tools. SSIS replaced a tool formerly known as Data
Transformation Services (DTS). Dynamic management views (DMVs) and functions were introduced to
provide detailed health monitoring, performance tuning, and troubleshooting. Also, substantial high-
availability improvements were included in the product, and database mirroring was introduced.

SQL Server 2008 provided AlwaysOn technologies to reduce potential downtime. Database compression
and encryption technologies were added. Specialized date-related and time-related data types were
introduced, including support for time zones within date/time data. Full-text indexing was integrated
directly within the database engine. (Previously, full-text indexing was based on interfaces to services at
the operating system level.) Additionally, a Windows PowerShell® provider for SQL Server was introduced.

SQL Server 2008 R2 added substantial enhancements to SSRS—the introduction of advanced analytic
capabilities with PowerPivot; support for managing reference data with the introduction of Master Data
Services; and the introduction of StreamInsight, with which users could query data that was arriving at
high speed, before storing the data in a database.

SQL Server 2012 introduced tabular data models into SSAS. New features included: an enhancement of
FileTable called Filestream; Semantic Search, with which users could extract statistically relevant words; the
ability to migrate BI projects into Microsoft Visual Studio 2010.
SQL Server 2014 included substantial performance gains from the introduction of in-memory tables and
native stored procedures. It also increased integration with Microsoft Azure.
SQL Server 2016 was a major release and added three important security features: Always Encrypted,
dynamic data masking, and row-level security. This version also included stretch database to archive data
in Microsoft Azure, Query Store to maintain a history of execution plans, PolyBase to connect to Hadoop
data, temporal tables, and support for R, plus in-memory enhancements and columnstore indexes.
Developing SQL Databases 1-9

Current Version
SQL Server 2017 includes many fixes and enhancements, including:

 SQL Graph. Enables many-to-many relationships to be modelled more easily. Extensions to Transact-
SQL includes new syntax to create tables as EDGES or NODES, and the MATCH keyword for querying.

 Adaptive query processing. This is a family of features that help queries to run more efficiently.
They are batch mode adaptive joins, batch mode memory grant feedback, and interleaved
execution for multi-statement table valued functions..
 Automatic database tuning. This allows query performance to either be fixed automatically, or to
provide insight into potential problems so that fixes can be applied.

 In-memory enhancements. This includes computed columns in memory-optimized tables, support


for CROSS APPLY in natively compiled modules, and support for JSON functions in natively compiled
modules.

 SQL Server Analysis Services. This includes several enhancements for tabular models.
 Machine Learning. R is now known as SQL Server Machine Learning and includes support for Python
as well as R. SQL Server 2017 includes

Compatibility
Businesses can run different versions of databases on an instance of SQL Server. Each version of SQL
Server can build and maintain databases created on previous versions of SQL Server. For example, SQL
Server 2016 can read and create databases at compatibility level 100; that is, databases created on SQL
Server 2008. The compatibility level specifies the supported features of the database. For more
information on compatibility levels, see Microsoft Docs:

ALTER DATABASE Compatibility Level (Transact-SQL)


http://aka.ms/xrf36h
1-10 An Introduction to Database Development

Categorize Activity
Place each item into the appropriate category. Indicate your answer by writing the category number to
the right of each item.

Items

1 Database Engine

2 Data Quality Services Client

3 Enterprise

4 Master Data Services

5 Connectivity

6 Developer

7 Replication

8 Profiler

9 Web

10 Integration Services

11 SQL Server Management


Studio

12 Standard

13 SQL Server Data Tools

Category 1 Category 2 Category 3

Component Tool Edition


Developing SQL Databases 1-11

Lesson 2
SQL Server Database Development Tasks
Microsoft provides numerous tools and integrated development environments to support database
developers. This lesson investigates some of the common tasks undertaken by developers, how SQL Server
supports those tasks, and which tools you can use to complete them.

Lesson Objectives
After completing this lesson, you will be able to:

 List the common tasks that a database developer undertakes.

 Describe the functionality of some of the SQL Server tools.

 Use SSMS and SSDT to connect to local and cloud-based databases.

Common Database Development Tasks


Developing software to solve business problems
normally requires some form of data processing. To
solve these problems successfully, the data has to
be stored, manipulated, and managed effectively.
This topic will discuss at a high level the different
kinds of tasks you, as a developer, might have to
undertake. Later in this course, other modules will
expand on these tasks in more detail.

Storing and managing data


The primary SQL Server object for storing and
retrieving data is the table. Creating, deleting, and
altering tables are some of the most important
tasks you will perform. After a table is created, you can insert data into a table, amend data in a table, and
move data between tables.
With SQL Server views, you can create a single logical view across multiple tables, filtering data so that
only relevant information is returned.

Indexes can also be added to tables to ensure good performance when querying the data. As the volume
of data becomes greater, ensuring good performance becomes important.
Processing data programmatically

SQL Server encapsulates business logic through the use of stored procedures and functions. Rather than
build business logic into multiple applications, client applications can call stored procedures and functions
to perform data operations. This centralizes business logic, and makes it easier to maintain.

SSIS supports more complex data processing in the form of extraction and transformation.

Enforcing and inspecting data quality

Data quality is maintained by placing constraints on columns in a table. For example, you can specify a
data type to restrict the type of data that can be stored. This constrains the column to only holding, for
example, integers, date and time, or character data types. Columns can be further constrained by
specifying the length of the data type, or whether or not it can be left empty (null).
1-12 An Introduction to Database Development

Primary keys and unique keys ensure a value is unique amongst other rows in the table. You can also link
tables together by creating foreign keys.

If a database is poorly designed, without using any of these data quality constraints, a developer might
have to inspect and resolve data issues—removing duplicate data or performing some kind of data
cleansing.

Securing data

This can include:

 Restricting access to the data to certain groups or specific individuals.


 Protecting data by using a backup routine to ensure nothing is lost in the case of a hardware failure.

 Encrypting data to reduce the possibility of unauthorized access to sensitive data.

These tasks are usually performed by a database administrator. However, developers often provide
guidance about who needs access to what data; the business requirements around availability and
scheduling of backups; and what form of encryption is required.

Development Tools for SQL Server


SQL Server provides a range of tools for developers
to complete the previously described tasks. In some
circumstances, more than one tool might be
suitable.

SQL Server Management Studio


SSMS is the primary tool that Microsoft provides for
interacting with SQL Server services. It is an
integrated environment that exists in the Visual
Studio platform shell. SSMS shares many common
features with Visual Studio.
You use SSMS to execute queries and return results,
but it can also help you to analyze queries. It offers rich editors for a variety of document types, including
SQL files and xml files. When you are working with SQL files, SSMS provides IntelliSense® to help you
write queries.
You can perform most SQL Server relational database management tasks by using the Transact-SQL
language, but many developers prefer the graphical administration tools because they are typically easier
to use. SSMS provides graphical interfaces for configuring databases and servers.

SSMS can connect to a variety of SQL Server services, including the Database Engine, Analysis Services,
Integration Services, and Reporting Services. SSMS uses the Visual Studio environment and will be familiar
to Visual Studio developers.

SQL Server Data Tools

SSDT brings SQL Server functionality into Visual Studio. With SSDT, Visual Studio you can develop both
on-premises, and cloud-based applications using SQL Server components. You can work with .NET
Framework code and database-specific code, such as Transact-SQL, in the same environment. If you want
to change the database design, you do not have to leave Visual Studio and open SSMS; you can work with
the schema within SSDT.
Developing SQL Databases 1-13

Visual Studio Code


Visual Studio Code (VS Code) is a free, open source code editor for Windows, Linux, and macOS. VS Code
uses an extension framework to add functionality to the base editor. To connect to SQL Server from VS
Code, you must install the mssql extension, that gives you the capability to connect to SQL Server and
execute Transact-SQL commands. You can download VS Code from Microsoft.
SQL Operations Studio

SQL Operations Studio is a free, lighweight administration tool for SQL Server than runs on Windows,
Linux, and macOS. SQL Operations Studio offers many of the same features as SQL Server Management
Studio, and includes new features such as customizable dashboards that you can use to get an overview
of server performance. At the time of writing, SQL Operations Studio is in public preview, and features are
subject to change. You can download VS Code from Microsoft.

SQL Server Profiler

SQL Server Profiler is a graphical user interface tool that is used to view the output of a SQL Trace. You
use SQL Server Profiler to monitor the performance of the Database Engine or Analysis Services by
capturing traces and saving them to a file or table. You can use Profiler to step through problem queries
to investigate the causes.

You can also use the saved trace to replicate the problem on a test server, making it easier to diagnose
the problem.

Profiler also supports auditing an instance of SQL Server. Audits record security-related actions so they
can be reviewed later.
Database Engine Tuning Advisor

The Database Engine Tuning Advisor analyzes databases and makes recommendations that you can use to
optimize performance. You can use it to select and create an optimal set of indexes, indexed views, or
table partitions. Common usage includes the following tasks:

 Identify and troubleshoot the performance of a problem query.

 Tune a large set of queries across one or more databases.

 Perform an exploratory what-if analysis of potential physical design changes.

Data Quality Services (DQS) Client


With the DQS client application, you can complete various data quality operations, including creating
knowledge bases, creating and running data quality projects, and performing administrative tasks. After
completing these operations, you can perform a number of data quality tasks, including correction,
enrichment, standardization, and deduplication of data.
1-14 An Introduction to Database Development

Demonstration: Using SSMS and SSDT


In this demonstration, you will see how to:

 Use SSMS to connect to an on-premises instance of SQL Server.

 Run a Transact-SQL script.

 Open an SSMS project.

 Connect to servers and databases.

 Use SSDT to run a Transact-SQL script.

Demonstration Steps
Use SSMS to Connect to an On-premises Instance of SQL Server

1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running and then log
on to 20762C-MIA-SQL as AdventureWorks\Student with the password Pa55w.rd.

2. Navigate to D:\Demofiles\Mod01, right-click Setup.cmd, and then click Run as administrator.

3. In the User Account Control dialog box, click Yes.

4. On the taskbar, click Microsoft SQL Server Management Studio.

5. In the Connect to Server dialog box, ensure that Server type is set to Database Engine.

6. In the Server name text box, type MIA-SQL.


7. In the Authentication drop-down list, select Windows Authentication, and then click Connect.

Run a Transact-SQL Script


1. In Object Explorer, expand Databases, expand AdventureWorks, and then review the database
objects.

2. In Object Explorer, right-click AdventureWorks, and then click New Query.

3. Type the query shown in the snippet below:

SELECT * FROM Production.Product ORDER BY ProductID;

Note the use of IntelliSense while you are typing this query, and then on the toolbar, click Execute.
Note how the results can be returned.

4. On the File menu, click Save SQLQuery1.sql As.

5. In the Save File As dialog box, navigate to D:\Demofiles\Mod01, and then click Save. Note that this
saves the query to a file.

6. On the Results tab, right-click the cell for ProductID 1 (first row and first cell), and then click Save
Results As.

7. In the Save Grid Results dialog box, navigate to the D:\Demofiles\Mod01 folder.

8. In the File name box, type Demonstration2AResults, and then click Save. Note that this saves the
query results to a file.

9. On the Query menu, click Display Estimated Execution Plan. Note that SSMS can do more than just
execute queries.
10. On the Tools menu, click Options.
Developing SQL Databases 1-15

11. In the Options dialog box, expand Query Results, expand SQL Server, and then click General.
Review the available configuration options, and then click Cancel.
12. On the File menu, click Close.

Open a SQL Server Management Studio Project

1. On the File menu, point to Open, and then click Project/Solution.

2. In the Open Project dialog box, open the D:\Demofiles\Mod01\Demo01.ssmssln project.

3. In Solution Explorer, note the contents of Solution Explorer, and that by using a project or solution
you can save the state of the IDE. This means that any open connections, query windows, or Solution
Explorer panes will reopen in the state they were saved in.

4. On the File menu, click Close Solution.

Connect to Servers and Databases

1. In Object Explorer, click Connect and note the other SQL Server components to which connections
can be made.

2. On the File menu, point to New, and then click Database Engine Query to open a new connection.

3. In the Connect to Database Engine dialog box, in the Server name box, type MIA-SQL.

4. In the Authentication drop-down list, select Windows Authentication, and then click Connect.

5. In the Available Databases drop-down list on the toolbar, click tempdb. Note that this changes the
database against which the query is executed.

6. Right-click in the query window, point to Connection, and then click Change Connection. This will
reconnect the query to another instance of SQL Server.
7. In the Connect to Database Engine dialog box, click Cancel.

8. Close SSMS.

Use SSDT to Run a Transact-SQL Script

1. On the taskbar, click Visual Studio 2015.

2. On the Tools menu, click Connect to Database.


3. In the Choose Data Source dialog box, in the Data source list, click Microsoft SQL Server, and then
click Continue.

4. In the Add Connection dialog box, in the Server name box, type MIA-SQL.

5. In the Select or enter a database name drop-down list, click AdventureWorks, and then click OK.
6. In Server Explorer, expand Data Connections.

7. Right-click mia-sql.AdventureWorks.dbo, and then click New Query.

8. In the SQLQuery1.sql pane type:

SELECT * FROM Production.Product ORDER BY ProductID;

9. On the toolbar, click Execute.

10. Note that you can view results, just as you can in SSMS.

11. Close Visual Studio 2015 without saving any changes.


1-16 An Introduction to Database Development

Check Your Knowledge


Question

Which one of the following tools can you


use to create and deploy SSIS packages?

Select the correct answer.

SQL Server Management Studio.

SQL Server Profiler.

Database Engine Tuning Advisor.

SQL Server Data Tools.

SQL Server Configuration Manager.


Developing SQL Databases 1-17

Module Review and Takeaways


Review Question(s)
Question: Which IDE do you think you will use to develop on SQL Server, SSMS or SSDT?
2-1

Module 2
Designing and Implementing Tables
Contents:
Module Overview 2-1
Lesson 1: Designing Tables 2-2

Lesson 2: Data Types 2-12

Lesson 3: Working with Schemas 2-26


Lesson 4: Creating and Altering Tables 2-31

Lab: Designing and Implementing Tables 2-38

Module Review and Takeaways 2-41

Module Overview
In a relational database management system (RDBMS), user and system data is stored in tables. Each table
consists of a set of rows that describe entities and a set of columns that hold the attributes of an entity.
For example, a Customer table might have columns such as CustomerName and CreditLimit, and a row
for each customer. In Microsoft SQL Server® data management software tables are contained within
schemas that are very similar in concept to folders that contain files in the operating system. Designing
tables is one of the most important tasks that a database developer undertakes, because incorrect table
design leads to the inability to query the data efficiently.

After an appropriate design has been created, it is important to know how to correctly implement the
design.

Objectives
After completing this module, you will be able to:

 Design tables using normalization, primary and foreign keys.

 Work with identity columns.


 Understand built-in and user data types.

 Use schemas in your database designs to organize data, and manage object security.

 Work with computed columns and temporary tables.


2-2 Designing and Implementing Tables

Lesson 1
Designing Tables
The most important aspect of designing tables involves determining what data each column will hold. All
organizational data is held within database tables, so it is critical to store the data with an appropriate
structure.

The best practices for table and column design are often represented by a set of rules that are known as
“normalization” rules. In this lesson, you will learn the most important aspects of normalized table design,
along with the appropriate use of primary and foreign keys. In addition, you will learn to work with the
system tables that are supplied when SQL Server is installed.

Lesson Objectives
After completing this lesson, you will be able to:

 Describe what a table is.


 Normalize data.

 Describe common normal forms.

 Explain the role of primary keys.

 Explain the role of foreign keys.

 Work with system tables.

 Design tables for concurrency.

 Use identity columns and sequences.

What Is a Table?
Relational databases store data about entities in
tables, and tables are defined by columns and
rows. Rows represent entities, and columns define
the attributes of the entities. By default, the rows
of a table have no predefined order, and can be
used as a security boundary.
Tables

In the terminology of formal relational database


management systems, tables are referred to as
“relations.”

Tables store data about entities such as customers,


suppliers, orders, products, and sales. Each row of a table represents the details of a single entity, such as a
single customer, supplier, order, product, or sale.

Columns define the information that is being held about each entity. For example, a Product table might
have columns such as ProductID, Size, Name, and UnitWeight. Each of these columns is defined by
using a specific data type. For example, the UnitWeight column of a product might be allocated a
decimal (18,3) data type.
Developing SQL Databases 2-3

Naming Conventions
There is strong disagreement within the industry over naming conventions for tables. The use of prefixes
(such as tblCustomer or tblProduct) is widely discouraged. Prefixes were commonly used in higher level
programming languages before the advent of strong data typing—that is, the use of strict data types
rather than generic data types—but are now rare. The main reason for this is that names should represent
the entities, not how they are stored. For example, during a maintenance operation, it might become
necessary to replace a table with a view, or vice versa. This could lead to views named tblProduct or
tblCustomer when trying to avoid breaking existing code.

Another area of strong disagreement relates to whether table names should be singular or plural. For
example, should a table that holds the details of a customer be called Customer or Customers?
Proponents of plural naming argue that the table holds the details of many customers; proponents of
singular naming say that it is common to expose these tables via object models in higher level languages,
and that the use of plural names complicates this process. Although we might have a Customers table, in
a high level language, we are likely to have a Customer object. SQL Server system tables and views have
plural names.

The argument is not likely to be resolved either way and is not a problem that is specific to the SQL
language. For example, an array of customers in a higher level language could sensibly be called
“Customers,” yet referring to a single customer via “Customers[49]” seems awkward. The most important
aspect of naming conventions is that you should adopt one that you can work with and apply
consistently.
Security

You can use tables as security boundaries because users can be assigned permissions at the table level.
However, note that SQL Server supports the assignment of permissions at the column level, in addition to
the table level; row-level security is available for tables in SQL Server. Row, column, and table security can
also be implemented by using a combination of views, stored procedures, and/or triggers.

Row Order

Tables are containers for rows but, by default, they do not define any order for the rows that they contain.
When users select rows from a table, they should only specify the order that the rows should be returned
in if the output order matters. SQL Server might have to expend additional sorting effort to return rows in
a given order, and it is important that this effort is only made when necessary.

Normalizing Data
Normalization is a systematic process that you can
use to improve the design of databases.

Normalization

Edgar F. Codd (1923–2003) was an English


computer scientist who is widely regarded as
having invented the relational model. This model
underpins the development of relational database
management systems. Codd introduced the
concept of normalization and helped it evolve
over many years, through a series of “normal
forms.”
2-4 Designing and Implementing Tables

Codd introduced first normal form in 1970, followed by second normal form, and then third normal form
in 1971. Since that time, higher forms of normalization have been introduced by theorists, but most
database designs today are considered to be “normalized” if they are in third normal form.

Intentional Denormalization
Not all databases should be normalized. It is common to intentionally denormalize databases for
performance reasons or for ease of end-user analysis.

For example, dimensional models that are widely used in data warehouses (such as the data warehouses
that are commonly used with SQL Server Analysis Services) are intentionally designed not to be
normalized.

Tables might also be denormalized to avoid the need for time-consuming calculations or to minimize
physical database design constraints, such as locking.

Common Normalization Forms


In general, normalizing a database design leads to
an improved design. You can avoid most common
table design errors in database systems by
applying normalization rules.

Normalization

You should use normalization to:

 Free the database of modification anomalies.

 Minimize redesign when the structure of the


database needs to be changed.

 Ensure that the data model is intuitive to


users.

 Avoid any bias toward particular forms of querying.

Although there is disagreement on the interpretation of these rules, there is general agreement on most
common symptoms of violating the rules.

First Normal Form

To adhere to the first normal form, you must eliminate repeating groups in individual tables. To do this,
you should create a separate table for each set of related data, and identify each set of related data by
using a primary key.
For example, a Product table should not include columns such as Supplier1, Supplier2, and Supplier3.
Column values should not include repeating groups. For example, a column should not contain a comma-
separated list of suppliers.
Duplicate rows should not exist in tables. You can use unique keys to avoid having duplicate rows. A
candidate key is a column or set of columns that you can use to uniquely identify a row in a table. An
alternate interpretation of first normal form rules would disallow the use of nullable columns.

Second Normal Form


To adhere to second normal form, you must create separate tables for sets of values that apply to multiple
records, and relate these tables by using a foreign key.
Developing SQL Databases 2-5

For example, a second normal form error would be to hold the details of products that a supplier provides
in the same table as the details of the supplier's credit history. You should store these values in a separate
table.

Third Normal Form

To adhere to third normal form, eliminate fields that do not depend on the key.

Imagine a Sales table that has OrderNumber, ProductID, ProductName, SalesAmount, and SalesDate
columns. This table is not in third normal form. A candidate key for the table might be the OrderNumber
column. However, the ProductName column only depends on the ProductID column and not on the
candidate key, so the Sales table should be separated from a Products table, and probably linked to it by
ProductID.
Formal database terminology is precise, but can be hard to follow when it is first encountered. In the next
demonstration, you will see examples of common normalization errors.

Demonstration: Working with Normalization


In this demonstration, you will see how to alter a table to conform to third normal form.

Demonstration Steps
1. Ensure that the MT17B-WS2016-NAT, 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are
running, and then log on to 20762C-MIA-SQL as AdventureWorks\Student with the password
Pa55w.rd.
2. In File Explorer, navigate to D:\Demofiles\Mod02, right-click Setup.cmd, and then click Run as
administrator.

3. In the User Account Control dialog box, click Yes.

4. On the taskbar, click Microsoft SQL Server Management Studio.

5. In the Connect to Server dialog box, connect to MIA-SQL, using Windows Authentication.

6. On the File menu, point to Open, click Project/Solution.


7. In the Open Project dialog box, navigate to the D:\Demofiles\Mod02 folder, click Demo.ssmssln,
and then click Open.

8. In Solution Explorer, expand Queries, and then double-click 1 - Normalization.sql.

9. Select the code under the Step 1: Set AdventureWorks as the current database comment, and
then click Execute.

10. Select the code under the Step 2: Create a table for denormalizing comment, and then click
Execute.

11. Select the code under the Step 3: Alter the table to conform to third normal form comment, and
then click Execute.

12. Select the code under the Step 4: Drop and recreate the ProductList table comment, and then
click Execute.

13. Select the code under the Step 5: Populate the ProductList table comment, and then click Execute.

14. Close SQL Server Management Studio without saving any changes.
2-6 Designing and Implementing Tables

Primary Keys
A primary key is a form of constraint that uniquely
identifies each row within a table. A candidate key
is a column or set of columns that you could use
to identify a row uniquely—it is a candidate to be
chosen for the primary key. A primary key must be
unique and cannot be NULL.

Consider a table that holds an EmployeeID


column and a NationalIDNumber column, along
with the employee's name and personal details.
The EmployeeID and NationalIDNumber
columns are both candidate keys. In this case, the
EmployeeID column might be the primary key,
but either candidate key could be used. You will see later that some data types will lead to better
performing systems when they are used as primary keys, but logically any candidate key can be
nominated to be the primary key.
It might be necessary to combine multiple columns into a key before the key can be used to uniquely
identify a row. In formal database terminology, no candidate key is more important than any other
candidate key. However, when tables are correctly normalized, they will usually have only a single
candidate key that could be used as a primary key. Ideally, keys that are used as primary keys should not
change over time.

Natural vs. Surrogate Keys


Natural keys are formed from data within the table. A surrogate key is another form of key that you can
use as a unique identifier within a table, but it is not derived from “real” data.

For example, a Customer table might have a CustomerID or CustomerCode column that contains
numeric, GUID, or alphanumeric codes. The surrogate key would not be related to the other attributes of
a customer.

The use of surrogate keys is another subject that can lead to strong debate between database
professionals.

Foreign Keys
A foreign key is a key in one table that matches, or
references, a unique key in another table. Foreign
keys are used to create a relationship between
tables. A foreign key is a constraint because it
limits the data that can be held in the field to a
value that matches a field in the related table.

To create a foreign key, you should create a field


of the same data type as the unique or primary
key of another table. This will then be related to
the table containing the unique or primary key in
the table definition. You can create a foreign key
using either CREATE TABLE or ALTER TABLE.
Developing SQL Databases 2-7

For example, a CustomerOrders table might include a CustomerID column. A foreign key reference is
used to ensure that any CustomerID value that is entered in the CustomerOrders table does in fact exist
in the Customers table.

In SQL Server, the reference is only checked if the column that holds the foreign key value is not NULL.

Self-Referencing Tables

A table can hold a foreign key reference to itself. For example, an Employees table might contain a
ManagerID column. An employee's manager is also an employee; therefore, a foreign key reference can
be made from the ManagerID column of the Employees table to the EmployeeID column in the same
table.

Reference Checking

SQL Server cannot update or delete referenced keys unless you enable options that cascade the changes
to related tables. For example, you cannot change the ID for a customer when there are orders in a
CustomerOrders table that reference the customer's ID.

Tables might also include multiple foreign key references. For example, an Orders table might have
foreign keys that refer to both a Customers table and a Products table.

Terminology
Foreign keys are used to “enforce referential integrity.” Foreign keys are a form of constraint and will be
covered in more detail in a later module.

The ANSI SQL 2003 definition refers to self-referencing tables as having “recursive foreign keys.”

Working with System Tables


System tables are the tables provided with the SQL
Server Database Engine. You should not directly
modify system tables, or query them directly.
Catalog views are used to get information from
system tables.
System Tables in Earlier Versions

If you have worked with SQL Server 2000 or earlier


versions, you might be expecting databases to
contain a large number of system tables.

Users sometimes modified these system tables,


and this caused issues when applying service packs
or updates. More seriously, this could lead to unexpected behavior or failures if the data was not changed
correctly. Also, users often took dependencies on the format of these system tables. That made it difficult
for new versions of SQL Server to improve the table designs with breaking existing applications. As an
example, when it was necessary to expand the syslogins table, a new sysxlogins table was added instead
of changing the existing table.

In SQL Server 2005, system tables were hidden and replaced by a set of system views that show the
contents of the system tables. These views are permission-based and display data to a user only if the user
has appropriate permission.
2-8 Designing and Implementing Tables

System Tables in the msdb Database


SQL Server Agent uses the msdb database, primarily for organizing scheduled background tasks that are
known as “jobs.” A large number of system tables are still present in the msdb database. Again, while it is
acceptable to query these tables, they should not be directly modified. Unless the table is documented, no
dependency on its format should be taken when designing applications.

Designing for Concurrency


SQL Server uses locking to manage transaction
concurrency. When a transaction modifies data, it
acquires an appropriate lock to protect the data.
The lock resources refer to the resource being
locked by the transaction; for example, row, page,
or table. All locks held by transactions are released
when the transaction commits.

SQL Server minimizes the locking cost by


acquiring locks at a level appropriate for a task.
Locking at lower granularity levels, such as rows,
increases concurrency but has an overhead of
maintaining a large number of locks used to lock
many rows. Locking at higher granularity levels, such as tables, decreases concurrency because the entire
table is inaccessible to other transactions. However, the overhead is less because fewer locks need to be
maintained.

Designing for an OLTP System

The way in which your database is used will determine the best design for concurrency. You may need to
monitor your database over time to determine if the design is sufficient, and make alterations if locking
becomes a frequent problem. Your goal is to ensure transactions are as small and as fast as possible, and
less likely to block other transactions.

The higher the number of users in your database, the more locking you will have, because they are more
likely to simultaneously access the same row, table, and pages. The more locking you have, the lower the
performance of the system, because one user must wait for another user to finish their transaction, and
the application may temporarily freeze. You may also find that there are certain times of the day when
things slow down, such as afternoons, when staff return to the office after lunch. One option is to change
the transaction isolation level; however, this creates other potential problems, with logical errors in your
data. The better solution is to use normalization to separate the data as much as possible. While this
creates extra joins in your queries, it does help with concurrency.
If you have a table with 25 columns, and three users attempt to modify three different columns in the
same row, SQL Server takes a row lock for the first user, and the other two must wait for the first user to
complete. The third user must wait for the first two to complete. By splitting the table into three tables,
and separating the columns that are likely to be modified, each user will be modifying a different table,
without blocking the other users.

Designing for a Data Warehouse

In a data warehouse, you won't have lots of users making modifications, so locking won't be an issue.
Users will only read data from the data warehouse, so it is safe to denormalize the data, because one table
does not need to describe one entity.
Developing SQL Databases 2-9

Implementing Surrogate Keys


If a table has no natural key, you must use a
surrogate key. A surrogate key is not part of the
original data, but is added to uniquely identify the
row. IDENTITY and SEQUENCE both produce
unique surrogate keys, but are used in different
situations.

IDENTITY

The IDENTITY property creates a unique value,


based on the seed and the increment values
provided. These are normally sequential values.

As you learned in a previous lesson, a primary key


column uniquely identifies each row. When designing tables, you will find that some entities have a
natural primary key, such as CustomerCode, whereas Categories might not. Whether you decide against
using a natural primary key, or use a surrogate key because there is no natural key, the IDENTITY
property will create a sequence of unique numbers.
The IDENTITY property is used in a CREATE TABLE or ALTER TABLE statement, and requires two values:
the starting seed, and the increment.
In the following example, the first row in the Categories table will have a CategoryID value of 1. The next
row will have a value of 2:

In the following example, the first row in the Categories table will have a CategoryID value of 1. The next
row will have a value of 2:

IDENTITY Property
CREATE TABLE Categories
(
CategoryID int IDENTITY(1,1),
Category varchar(25) NOT NULL
);

The data type of the column is int, so the starting seed can be any value that the integer stores, but it is
common practice to start at 1, and increment by 1.

When inserting a record into a table with an identity column, if a transaction fails and is rolled back, then
the value that the row would have been assigned as the identity is not used for the next successful insert.

IDENTITY_INSERT

You can insert an explicit value into an identity column using SET IDENTITY_INSERT ON. The insert value
cannot be a value that has already been used; however, you can use a value from a failed transaction that
was not used, and keep the number sequence intact. You can also use the value from a row that has been
deleted. There can only be one IDENTITY_INSERT per session, so be sure to include SET
IDENTITY_INSERT OFF after the insert statement has completed.

SET IDENTITY_INSERT Categories ON


INSERT INTO Categories (CategoryID, Category)
VALUES (5, 'Cat Food')
SET IDENTITY_INSERT Categories OFF
2-10 Designing and Implementing Tables

Deleting Rows from a Table with an Identity Column


After deleting all rows from a table with an identity column, the seed value of the next inserted row will
be the next number in sequence. If the last value was 237, then the next ID will be 238. There are two
ways of reseeding the table:

 Use TRUNCATE TABLE instead of DELETE FROM, though keep in mind that truncate is only
minimally logged in the transaction log:

TRUNCATE TABLE CATEGORIES;

 Run the DBCC CHECKIDENT command to reseed the table. The reseed value should be 0 to set the
first row back to 1:

DBCC CHECKIDENT('Categories', RESEED, 0)

@@IDENTITY vs. SCOPE_IDENTITY()

@@IDENTITY and SCOPE_IDENTITY() are very similar in function, but with a subtle difference. Both
return the identity value of the last inserted record. However, @@IDENTITY returns the last insert,
regardless of session, whereas SCOPE_IDENTITY() will return the value from the current session. If you
insert a row and need the identity value, use SCOPE_IDENTITY()—if you use @@IDENTITY after an
insert, and another session also inserts a new row, you might pick up that value, rather than your row
value.

INSERT INTO Categories (Category)


VALUES ('Dog Food')
SELECT SCOPE_IDENTITY() AS CategoryID;

CategoryID
----------------
6

SEQUENCE
The SEQUENCE object performs a similar function to IDENTITY, but has a lot more flexibility. You create a
SEQUENCE object at the database level, and values can be used by multiple tables. Whereas IDENTITY
creates a new value with each row insert, SEQUENCE returns a value when requested. This value does not
need to be inserted into a table:

CREATE SEQUENCE CategoryID AS int


START WITH 1
INCREMENT BY 1;

SEQUENCE is useful when you want control over the values you are inserting. In the above example, the
Categories table uses IDENTITY to generate the CategoryID values. Consider if there were two tables in
your database, one for MainCategories, and another for SubCategories, but you wanted the
CategoryID to be unique across both tables—you could use SEQUENCE.

NEXT VALUE FOR generates the next value.


Developing SQL Databases 2-11

NEXT VALUE FOR


INSERT INTO MainCategories (CategoryID, MainCategory)
VALUES
(NEXT VALUE FOR CategoryID, ‘Food’),
(NEXT VALUE FOR CategoryID, ‘Drink’);
GO

INSERT INTO SubCategories (CategoryID, SubCategory)


VALUES
(NEXT VALUE FOR CategoryID, ‘Cat Food’),
(NEXT VALUE FOR CategoryID, ‘Wine’);

Assuming both tables were empty, the results would be as follows:

Assuming both tables were empty, the results would be as follows:

Results of inserting rows using the SEQUENCE object


CategoryID MainCategory
----------- --------------
1 Food
2 Drink

CategoryID SubCategory
----------- -----------------
3 Cat Food
4 Wine

Restart the Sequence


You can restart your sequence at any time, though consider the consequences if you have a primary key
on the columns you are using the sequence values for, as values must be unique:

ALTER SEQUENCE CategoryID


RESTART WITH 1;

MINVALUE and MAXVALUE


The sequence can be limited by using the MINVALUE and MAXVALUE properties—when you reach the
MAXVALUE limit, you will receive an error.

The sequence can be limited by using the MINVALUE and MAXVALUE properties—when you reach the
MAXVALUE limit, you will receive an error.

MINVALUE and MAXVALUE


ALTER SEQUENCE CategoryID
MINVALUE 1
MAXVALUE 2000;

Note: If you want to know the next value in the sequence, you can run the code: SELECT
NEXT VALUE FOR CategoryID. However, each time this runs, the sequence value will increment,
even if you don’t use this value in an insert statement.

Question: Would it be reasonable to have columns called, for example, AddressLine1,


AddressLine2, and AddressLine3 in a normalized design?
2-12 Designing and Implementing Tables

Lesson 2
Data Types
The most basic types of data that get stored in database systems are numbers, dates, and strings. There is
a range of data types that can be used for each of these. In this lesson, you will see the Microsoft-supplied
data types that you can use for numeric and date-related data. You will also see what NULL means and
how to work with it. In the next lesson, you will see how to work with string data types.

Lesson Objectives
After completing this lesson, you will be able to:

 Understand the purpose of data types.

 Use exact numeric data types.

 Use approximate numeric data types.

 Use date and time data types.

 Work with unique identifiers.

 Understand when to use NULL and NOT NULL.

 Create alias data types.

 Convert data between data types.

 Work with international character data.

Introduction to Data Types


Data types determine what can be stored in
locations within SQL Server, such as columns,
variables, and parameters. For example, a tinyint
column can only store whole numbers from 0 to
255. Data types also determine the types of values
that can be returned from expressions.

Constraining Values
Data types are a form of constraint that is placed
on the values that can be stored in a location. For
example, if you choose a numeric data type, you
will not be able to store text.

In addition to constraining the types of values that can be stored, data types also constrain the range of
values that can be stored. For example, if you choose a smallint data type, you can only store values
between –32,768 and +32,767.

Query Optimization

When SQL Server identifies that the value in a column is an integer, it might be able to generate an
entirely different and more efficient query plan to one where it identifies that the location is holding text
values.

The data type also determines which sorts of operations are permitted on that data and how those
operations work.
Developing SQL Databases 2-13

Self-Documenting Nature
Choosing an appropriate data type provides a level of self-documentation. If all values were stored in a
string value (which could potentially represent any type of value) or XML data types, you would probably
need to store documentation about what sort of values can be stored in the string locations.

Data Types
There are three basic sets of data types:

 System data types. SQL Server provides a large number of built-in (or intrinsic) data types. Examples
of these include integer, varchar, and date.
 Alias data types. Users can also define data types that provide alternate names for the system data
types and, potentially, further constrain them. These are known as alias data types. For example, you
could use an alias data type to define the name PhoneNumber as being equivalent to nvarchar(16).
Alias data types can help to provide consistency of data type usage across applications and databases.

 User-defined data types. By using managed code via SQL Server integration with the common
language runtime (CLR), you can create entirely new data types. There are two categories of these
CLR types. One category is system CLR data types, such as the geometry and geography spatial data
types. The other is user-defined CLR data types, which enable users to create their own data types.

Exact Numeric Data Types


Numeric data types can be exact or approximate.
Exact data types are the most common data type
that is used in business applications.

Integer Data Types

SQL Server offers a choice of integer data types


that are used for storing whole numbers, based
upon the size of the storage location for each:

 tinyint is stored in a single byte (that is, 8 bits)


and can be used to store the values 0 to 255.
Note that, unlike the other integer data types,
tinyint cannot store any negative values.

 smallint is stored in 2 bytes (that is, 16 bits) and stores values from –32,768 to 32,767.

 int is stored in 4 bytes (that is, 32 bits) and stores values from –2,147,483,648 to 2,147,483,647. It is a
very commonly used data type. SQL Server uses the full word “integer” as a synonym for “int.”

 bigint is stored in 8 bytes (that is, 64 bits) and stores very large integer values. Although it is easy to
refer to a 64-bit value, it is hard to comprehend how large these values are. If you placed a value of
zero in a 64-bit integer location, and executed a loop to add one to the value, on most common
servers currently available, you would not reach the maximum value for many months.

Exact Fractional Data Types

SQL Server provides a range of data types for storing exact numeric values that include decimal places:

 decimal is an ANSI-compatible data type you use to specify the number of digits of precision and the
number of decimal places (referred to as the scale). A decimal(12,5) location can store up to 12
digits with up to five digits after the decimal point. You should use the decimal data type for
2-14 Designing and Implementing Tables

monetary or currency values in most systems, and any exact fractional values, such as sales quantities
(where part quantities can be sold) or weights.
 numeric is a data type that is functionally equivalent to decimal.

 money and smallmoney are data types that are specific to SQL Server and have been present since
the early days of the platform. They were used to store currency values with a fixed precision of four
decimal places.

Note: Four is often the wrong number of decimal places for many monetary applications,
and the money and smallmoney data types are not standard data types. In general, use decimal
for monetary values.

bit Data Type

bit is a data type that is stored in a single bit. The storage of the bit data type is optimized. If there are
eight or fewer bit columns in a table, they are stored in a single byte. bit values are commonly used to
store the equivalent of Boolean values in higher level languages.
Note that there is no literal string format for bit values in SQL Server. The string values TRUE and FALSE
can be converted to bit values, as can the integer values 1 and 0. TRUE is converted to 1 and FALSE is
converted to 0.
Higher level programming languages differ on how they store true values in Boolean columns. Some
languages store true values as 1; others store true values as -1. To avoid any chance of mismatch, in
general, when working with bits in applications, test for false values by using the following code:

IF (@InputValue = 0)

Test for positive values by using the following code:

IF (@InputValue <> 0)

This is preferable to testing for a value being equal to 1 because it will provide more reliable code.
bit, along with other data types, is also nullable, which can be a surprise to new users. That means that a
bit location can be in three states: NULL, 0, or 1. (Nullability is discussed in more detail later in this
module.)

To learn more about data types, see Microsoft developer docuemtnation:

Data Types (Transact-SQL


http://go.microsoft.com/fwlink/?LinkID=233787
Developing SQL Databases 2-15

Approximate Numeric Data Types


SQL Server provides two approximate numeric
data types. They are used more commonly in
scientific applications than in business
applications. A common design error is to use the
float or real data types for storing business values
such as monetary values.

Approximate Numeric Values

The real data type is a 4-byte (that is, 32-bit)


numeric value that is encoded by using ISO
standard floating-point encoding.

The float data type is a data type that is specific to


SQL Server and occupies either 4 or 8 bytes, enabling the storage of approximate values with a defined
scale. The scale values permitted are from 1 to 53 and the default scale is 53. Even though a range of
values is provided for in the syntax, the current SQL Server implementation of the float data type is that, if
the scale value is from 1 to 24, the scale is implemented as 24. For any larger value, a scale of 53 is used.
Common Errors

A very common error for new developers is to use approximate numeric data types to store values that
need to be stored exactly. This causes rounding and processing errors. A “code smell” for identifying
programs that new developers have written is a column of numbers that do not exactly add up to the
displayed totals. It is common for small rounding errors to creep into calculations; for example, a total that
is incorrect by 1 cent in dollar-based or euro-based currencies.
The inappropriate use of numeric data types can cause processing errors.

Look at the following code and decide how many times the PRINT statement would be executed:

Look at the following code and decide how many times the PRINT statement would be executed:

How Many Times Is PRINT Executed?


DECLARE @Counter float;
SET @Counter = 0;
WHILE (@Counter <> 1.0) BEGIN
SET @Counter += 0.1;
PRINT @Counter;
END;

In fact, this query would never stop running, and would need to be cancelled.
2-16 Designing and Implementing Tables

After cancelling the query, if you looked at the output, you would see the following code:

Resultset from Previous Code Fragment


0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7

What has happened? The problem is that the value 0.1 cannot be stored exactly in a float or real data
type, so the termination value of the loop is never hit exactly. If a decimal value had been used instead,
the loop would have executed as expected.
Consider how you would write the answer to 1÷3 in decimal form. The answer isn't 0.3, it is 0.3333333
recurring. There is no way in decimal form to write 1÷3 as an exact decimal fraction. You have to
eventually settle for an approximate value.
The same problem occurs in binary fractions; it just occurs at different values—0.1 ends up being stored as
the equivalent of 0.099999 recurring. 0.1 in decimal form is a nonterminating fraction in binary. Therefore,
when you put the system in a loop adding 0.1 each time, the value never exactly equals 1.0, which can be
stored precisely.

Date and Time Data Types


SQL Server supports a rich set of data types for
working with values that are related to dates and
times. SQL Server also provides a large number of
functions for working with dates and times.

date and time Data Types

The date data type complies with the ANSI


Structured Query Language (SQL) standard
definition for the Gregorian calendar. The default
string format is YYYY-MM-DD. This format is the
same as the ISO 8601 definition for DATE. date
has a range of values from 0001-01-01 to 9999-
12-31 with an accuracy of one day.

The time data type is aligned to the SQL standard form of hh:mm:ss, with optional decimal places up to
hh:mm:ss.nnnnnnn. Note that when you are defining the data type, you need to specify the number of
decimal places, such as time(4), if you do not want to use the default value of seven decimal places, or if
Developing SQL Databases 2-17

you want to save some storage space. The format that SQL Server uses is similar to the ISO 8601 definition
for TIME.
The ISO 8601 standard makes it possible to use 24:00:00 to represent midnight and to have a leap second
over 59. These are not supported in the SQL Server implementation.

The datetime2 data type is a combination of a date data type and a time data type.
datetime Data Type

The datetime data type is an older data type that has a smaller range of allowed dates and a lower
precision or accuracy. It is a commonly used data type, particularly in older Transact-SQL code. A common
error is not allowing for the 3 milliseconds accuracy of the data type. For example, using the datetime
data type, executing the following code would cause the value '20110101 00:00:00.000' to be stored:

DECLARE @When datetime;


SET @When = '20101231 23:59:59.999';

Another problem with the datetime data type is that the way it converts strings to dates is based on
language format settings. A value in the form “YYYYMMDD” will always be converted to the correct date,
but a value in the form “YYYY-MM-DD” might end up being interpreted as “YYYY-DD-MM,” depending
on the settings for the session.

It is important to understand that this behavior does not happen with the new date data type, so a string
that was in the form “YYYY-MM-DD” could be interpreted as two different dates by the date (and
datetime2) data type and the datetime data type. You should specifically check any of the formats that
you intend to use, or always use formats that cannot be misinterpreted. Another option that was
introduced in SQL Server 2012 can help. A series of functions that enable date and time values to be
created from component parts was introduced. For example, there is now a DATEFROMPARTS function
that you can use to create a date value from a year, a month, and a day.

Note: Be careful when working with string literal representations of dates and time, as they
can be interpreted in different ways depending on the location. For example, 01/06/2016 might
be June 1, 2016, or the January 6, 2016.
You can solve this ambiguity by expressing dates and times according to the ISO standard, where
dates are represented as YYYY-MM-DD. In the previous example, this would be 2016-06-01.

Time Zones
The datetimeoffset data type is a combination of a datetime2 data type and a time zone offset. Note
that the data type is not aware of the time zone; it can simply store and retrieve time zone values.

Note that the time zone offset values extend for more than a full day (a range of –14:00 to +14:00). A
range of system functions has been provided for working with time zone values, and for all of the data
types related to dates and times.

For more information about data and time data type, see Microsoft Docs:

Date and Time Data Types and Functions (Transact-SQL)


https://aka.ms/Yw0h3v
2-18 Designing and Implementing Tables

For more information about using data and time data types, see Technet:

Using Date and Time Data


http://go.microsoft.com/fwlink/?LinkID=209249

Unique Identifiers
Globally unique identifiers (GUIDs) have become
common in application development. They are
used to provide a mechanism where any process
can generate a number and know that it will not
clash with a number that any other process has
generated.

GUIDs
Numbering systems have traditionally depended
on a central source for the next value in a
sequence, to make sure that no two processes use
the same value. GUIDs were introduced to avoid
the need for anyone to function as the “number
allocator.” Any process (on any system) can generate a value and know that it will not clash with a value
generated by any process across time and space, and on any system to an extremely high degree of
probability.

This is achieved by using extremely large values. When discussing the bigint data type earlier, you learned
that the 64-bit bigint values were really large. GUIDs are 128-bit values. The magnitude of a 128-bit value
is well beyond our capabilities of comprehension.

uniqueidentifier Data Type

The uniqueidentifier data type in SQL Server is typically used to store GUIDs. Standard arithmetic
operators such as =, <> (or !=), <, >, <=, and >= are supported, in addition to NULL and NOT NULL
checks.
The IDENTITY property is used to automatically assign values to columns. (IDENTITY is discussed in
Module 3.) The IDENTITY property is not used with uniqueidentifier columns. New values are not
calculated by code in your process. They are calculated by calling system functions that generate a value
for you. In SQL Server, this function is the NEWID() function.

The random nature of GUIDs has also caused significant problems in current storage subsystems. SQL
Server 2005 introduced the NEWSEQUENTIALID() function to try to circumvent the randomness of the
values that the NEWID() function generated. However, the function does so at the expense of some
guarantee of uniqueness.

The usefulness of the NEWSEQUENTIALID() function is limited because the main reason for using GUIDs
is to enable other layers of code to generate the values, and know that they can just insert them into a
database without clashes. If you need to request a value from the database via NEWSEQUENTIALID(), it
usually would have been better to use an IDENTITY column instead.

A common development error is to store GUIDs in string values rather than in uniqueidentifier columns.

Note: Replication systems also commonly use uniqueidentifier columns. Replication is an


advanced topic that is beyond the scope of this course.
Developing SQL Databases 2-19

NULL and NOT NULL


Nullability determines whether or not a value must
be entered. For example, a column constraint
might allow NULLs, or not allow NULLs by
specifying NOT NULL. Allowing NULLS when
values should be entered, is another common
design error.

NULL

NULL is a state of a column in a particular row,


rather than a type of value that is stored in a
column. You do not say that a value equals NULL;
you say that a value is NULL. This is why, in
Transact-SQL, you do not check whether a value is
NULL with the equality operator. For example, you would not write the following code:

WHERE Color = NULL;

Instead, you would write the following code:

WHERE Color IS NULL;

Common Errors
New developers often confuse NULL values with zero, blank (or space), zero-length strings, and so on. The
misunderstanding is exacerbated by other database engines that treat NULL and zero-length strings or
zeroes as identical. NULL indicates the absence of a value.

Careful consideration must be given to the nullability of a column. In addition to specifying a data type
for a column, you specify whether a value needs to be present. Often, this is referred to as whether a
column value is mandatory.

Look at the NULL and NOT NULL declarations in the following code sample and decide why each decision
might have been made:
Look at the NULL and NOT NULL declarations in the following code sample and decide why each decision
might have been made:

NULL or NOT NULL?


CREATE TABLE Sales.Opportunity
(
OpportunityID int NOT NULL,
Requirements nvarchar(50) NOT NULL,
ReceivedDate date NOT NULL,
LikelyC1osingDate date NULL,
SalesPersonID int NULL,
Rating int NOT NULL
);

For more information about allowing NULL values, see Technet:

Allowing Null Values


http://go.microsoft.com/fwlink/?LinkID=209251
2-20 Designing and Implementing Tables

You can set the default behavior for new columns using the ANSI NULL default. For details about how this
works, see MSDN:
SET ANSI_NULL_DFLT_ON (Transact-SQL)
http://go.microsoft.com/fwlink/?LinkID=233793

Alias Data Types


An alias data type is based on a system type, and
is useful when numerous tables within a database
share a similar column type. For example, a retail
database contains a table for Store, Supplier,
Customer, and Employee: each of these tables
contains address columns, including postal code.
You could create an alias data type to set the
required format of the post code column, and use
this type for all PostCode columns in each of the
tables. Create a new alias data type using the
CREATE TYPE command. This creates the user
type in the current database only:

CREATE TYPE dbo.PostalCode


FROM varchar(8);

After creating an alias data type, it is used in the same way as a system data type. The alias data type is
used as the data type when creating a column. In the following code, the PostCode column uses the new
PostalCode data type. There is no need to specify the width of the column because this was done when
the type was created.

Example of using an alias data type:

Example of using an alias data type:

PostalCode Is an Alias Data Type


CREATE TABLE dbo.Store
(
StoreID int IDENTITY(1,1) PRIMARY KEY,
StoreName nvarchar(30) NOT NULL,
Address1 nvarchar(30) NOT NULL,
Address2 nvarchar(30) NULL,
City nvarchar(25) NOT NULL,
PostCode PostalCode NOT NULL
);

When declaring variables or parameters, the data type assignment is again used in exactly the same way
as system data types:

DECLARE @PostCode AS PostalCode

To discover which alias data types have already been created within a database, query the sys.types
system view within the context of the relevant database.
Developing SQL Databases 2-21

Note: You can create alias data types in the model database, so every time a new database
is created, the user data types will automatically be created.

Converting Data Between Data Types


When converting between data types, there is
some choice in deciding which conversion
function to use, and some rules for others. The
CAST and CONVERT functions can be used for all
data types, whereas TRY_PARSE should only be
used when converting from string to date and/or
time, or for numeric values.

CAST and CONVERT

The CAST and CONVERT functions are similar;


however, you should use CAST if your code needs
to conform to SQL-92 standardization. CONVERT
is more useful for datetime conversions, as it offers
a range of formats for converting datetime expressions.

The CAST function accepts an expression for the first parameter then, after the AS keyword, the data type
to which the expression should be converted. The following example converts today’s date to a string.
There is no option to format the date, so using CONVERT would be a better choice:
Example of using CAST:

Example of using CAST:

CAST
SELECT CAST(GETDATE() AS nvarchar(50)) AS DateToday;

DateToday
-----------------------
Jan 25 2016 3:21PM

CAST will try to convert one data type to another.

The following code passes a string with a whole number, so SQL Server can easily convert this to an
integer type:

The following code passes a string with a whole number, so SQL Server can easily convert this to an
integer type:

CAST
SELECT CAST('93751' AS int) AS ConvertedString;

ConvertedString
-------------------
93751
2-22 Designing and Implementing Tables

If the string is changed to a decimal, an error will be thrown:

CAST
SELECT CAST('93751.3' AS int) AS ConvertedString;

Msg 245, Level 16, State 1, Line 4


Conversion failed when converting the varchar value '93751.2' to data type int.

In the following code, CONVERT accepts three parameters—the data type to which the expression should
be converted, the expression, and the datetime format for conversion:

In the following code, CONVERT accepts three parameters—the data type to which the expression should
be converted, the expression, and the datetime format for conversion:

CONVERT
SELECT CONVERT(nvarchar(50), GETDATE(), 106) AS DateToday;

DateToday
-----------------------
25 Jan 2016

PARSE

The structure of the PARSE function is similar to CAST; however, it accepts an optional parameter through
the USING keyword that enables you to set the culture of the expression. If no culture parameter is
provided, the function will use the language of the current session. PARSE should only be used for
converting strings to date/time or numbers, including money.
In the following example, the session language uses the British English culture, which uses the date
format DD/MM/YYYY. The US English date expression is in the American format MM/DD/YYYY, and is
parsed into the British English language:

In the following example, the session language uses the British English culture, which uses the date
format DD/MM/YYYY. The US English date expression is in the American format MM/DD/YYYY, and is
parsed into the British English language:

PARSE
SET LANGUAGE 'British English';
SELECT PARSE('10/13/2015' AS datetime2 USING 'en-US') AS MyDate;

MyDate
-------------
2015-10-13 00:00:00.0000000

If the optional parameter is excluded, then the parser will try to convert the date, and throw an error:
If the optional parameter is excluded, then the parser will try to convert the date, and throw an error:

PARSE
SET LANGUAGE 'British English';
SELECT PARSE('10/13/2015' AS datetime2) AS MyDate;

--------------
Msg 9819, Level 16, State 1, Line 7
Error converting string value '10/13/2015' into data type datetime2 using culture ''.
Developing SQL Databases 2-23

To find out which languages are present on an instance of SQL Server, run the following code:

SELECT * FROM sys.syslanguages

Use the alias column value in the SET LANGUAGE statement:

SET LANGUAGE 'Turkish';

TRY_CAST and TRY_CONVERT

TRY_CAST operates in much the same way as CAST, but will return NULL rather than an error, if the
expression cannot be cast into the intended data type.

The following query executes the code used in the above CAST example that failed, but this time returns
NULL:

TRY_CAST
SELECT TRY_CAST('93751.3' AS int) AS ConvertedString;

ConvertedString
----------------
NULL

This is useful for eloquently handling errors in your code, as a NULL is easier to deal with than an error
message.
It can also be used with the CASE statement, as per the following example:

TRY_CAST
SELECT
CASE
WHEN TRY_CAST('93751.3' AS int) IS NULL THEN 'FAIL'
ELSE 'SUCCESS'
END AS ConvertedString;

ConvertedString
----------------
FAIL

Just as TRY_CAST is similar to CAST, TRY_CONVERT works the same as CONVERT but returns NULL instead
of an error when an expression cannot be converted. It, too, can also be used in a CASE statement.

Example of using TRY_CONVERT:

TRY_CONVERT
SELECT
CASE
WHEN TRY_CONVERT(varchar(25), 93751.3) IS NULL THEN 'FAIL'
ELSE 'SUCCESS'
END AS ConvertedString;

ConvertedString
----------------
SUCCESS
2-24 Designing and Implementing Tables

TRY_PARSE
Following the format of the previous TRY functions, TRY_PARSE is identical to PARSE, but returns a NULL
instead of an error when an expression cannot be parsed.

When using TRY_PARSE, and running the code sample that failed, A NULL is returned, rather than an
error:

TRY_PARSE
SET LANGUAGE 'British English';
SELECT TRY_PARSE('10/13/2015' AS datetime2) AS MyDate;

MyDate
---------
NULL

If you use CAST, CONVERT or PARSE in your application code and the parser throws an error that you
haven’t handled, it may cause issues in the application. Use TRY_CAST, TRY_CONVERT and TRY_PARSE
when you need to handle an error, and use the CASE statement to provide alternative values when a
conversion is not possible.

SQL Server Data Type Conversion Chart


http://aka.ms/xd82ey

Working with International Character Data


Traditionally, most computer systems stored one
character per byte. This only allowed for 256
different character values, which is not enough to
store characters from many languages.

Multibyte Character Issues


Asian languages, such as Chinese and Japanese,
need to store thousands of characters. You may
not have ever considered it, but how would you
type these characters on a keyboard? There are
two basic ways that this is accomplished. One
option is to have an English-like version of the
language that can be used for entry. Japanese has
a language form called Romaji that uses English-like characters for representing words. Chinese has a
form called Pinyin that is also somewhat English-like.

Users can enter the number beside the character to select the intended word. It might not seem
important to an English-speaking person but, given that the first option means “horse”, the second option
is like a question mark, and the third option means “mother”, there is definitely a need to select the
correct option!
Developing SQL Databases 2-25

Character Groups
An alternate way to enter the characters is via radical groupings.

Character Group Example


DECLARE @Hello nvarchar(10);

SET @Hello = N'Hello';


SET @Hello = N'你好';
SET @Hello = N'こんにちは';

Note the third character in the preceding code example. The left-hand part of that character, 女, means
“woman”. Rather than entering English-like characters (that could be quite unfamiliar to the writers),
select a group of characters based on what is known as a radical.

Note that the character representing “mother” is the first character on the second line. For this sort of
keyboard entry to work, the characters must be in appropriate groups, not just stored as one large sea of
characters. An additional complexity is that the radicals themselves are also in groups. In the screenshot,
you can see that the woman radical was part of the third group of radicals.

Unicode
In the 1980s, work was done by a variety of researchers, to determine how many bytes are required to be
able to hold all characters from all languages, but also store them in their correct groupings. The answer
from all researchers was three bytes. You can imagine that three was not an ideal number for computing
and at the time, users were mostly working with 2 byte (that is, 16-bit) computer systems.
Unicode introduced a two-byte character set that attempts to fit the values from the 3 bytes into 2 bytes.
Inevitably, there had to be some trade-offs.

Unicode allows any combination of characters, which are drawn from any combination of languages, to
exist in a single document. There are multiple encodings for Unicode with UTF-7, UTF-8, UTF-16, and UTF-
32. (UTF is universal text format.) SQL Server currently implements double-byte UTF-16 characters for its
Unicode implementation.
For string literal values, an N prefix on a string allows the entry of double-byte characters into the string,
rather than just single-byte characters. (N stands for “National” in National Character Set.)

When working with character strings, the LEN function returns the number of characters (Unicode or not)
whereas DATALENGTH returns the number of bytes.

Question: What would be a suitable data type for storing the value of a check box that can
be 0 for cleared, 1 for selected, or -1 for disabled?
2-26 Designing and Implementing Tables

Lesson 3
Working with Schemas
A schema is a namespace that allows objects within a database to be logically separated to make them
easier to manage. Objects may be separated according to the owner, according to their function, or any
other way that makes sense for a particular database.

Schemas were introduced with SQL Server 2005. They can be thought of as containers for objects such as
tables, views, and stored procedures. Schemas provide organization and structure when a database
includes large numbers of objects.

You can also assign security permissions at the schema level, rather than for individual objects that are
contained within the schemas. Doing this can greatly simplify the design of system security requirements.

Lesson Objectives
After completing this lesson, you will be able to:
 Describe the role of a schema.

 Understand object name resolution.

 Create schemas.

What Is a Schema?
Schemas are used to contain objects and to
provide a security boundary for the assignment of
permissions. In SQL Server, schemas are used as
containers for objects, rather like a folder is used
to hold files at the operating system level. Since
their introduction in SQL Server 2005, schemas can
be used to contain objects such as tables, stored
procedures, functions, types, and views. Schemas
form a part of the multipart naming convention
for objects. In SQL Server, an object is formally
referred to by a name of the form
Server.Database.Schema.Object.

Security Boundary

Schemas can be used to simplify the assignment of permissions. An example of applying permissions at
the schema level would be to assign the EXECUTE permission on a schema to a user. The user could then
execute all stored procedures within the schema. This simplifies the granting of permissions because there
is no need to set up individual permissions on each stored procedure.

It is important to understand that schemas are not used to define physical storage locations for data, as
occurs in some other database engines.

Upgrading Older Applications

If you are upgrading applications from SQL Server 2000 and earlier versions, it is important to understand
that the naming convention changed when schemas were introduced. Previously, names were of the form
Server.Database.Owner.Object.
Developing SQL Databases 2-27

Objects still have owners, but the owner's name does not form a part of the multipart naming convention
from SQL Server 2005 onward. When upgrading databases from earlier versions, SQL Server will
automatically create a schema that has the same name as existing object owners, so that applications that
use multipart names will continue to work.

Object Name Resolution


It is important to use at least two-part names
when referring to objects in SQL Server code, such
as stored procedures, functions, and views.

Object Name Resolution


When object names are referred to in the code,
SQL Server must determine which underlying
objects are being referred to. For example,
consider the following statement:

SELECT ProductID, Name, Size FROM


Product;

More than one Product table could exist in separate schemas of the same database. When single-part
names are used, SQL Server must then determine which Product table is being referred to.
Most users have default schemas assigned, but not all types of users have these. Default schemas are
assigned to users based on standard Windows® and SQL Server logins. You can also assign default
schemas to Windows groups when using SQL Server 2012. Users without default schemas are considered
to have the dbo schema as their default schema.

When locating an object, SQL Server will first check the user's default schema. If the object is not found,
SQL Server will then check the dbo schema to try to locate it.

It is important to include schema names when referring to objects, instead of depending upon schema
name resolution, such as in this modified version of the previous statement:

SELECT ProductID, Name, Size FROM Production.Product;

Apart from rare situations, using multipart names leads to more reliable code that does not depend upon
default schema settings.
2-28 Designing and Implementing Tables

Creating Schemas
Schemas are created by using the CREATE
SCHEMA command. This command can also
include the definition of objects to be created
within the schema at the time the schema is
created.

CREATE SCHEMA

Schemas have both names and owners. In the first


example shown on the slide, a schema named
Reporting is being created. It is owned by the user,
Terry. Although both schemas and the objects
contained in the schemas have owners and the
owners do not have to be the same, having
different owners for schemas and the objects contained within them can lead to complex security issues.

Object Creation at Schema Creation Time

Besides creating schemas, the CREATE SCHEMA statement can include options for object creation.
Although the code example that follows might appear to be three statements (CREATE SCHEMA,
CREATE TABLE, and GRANT), it is in fact a single statement. Both CREATE TABLE and GRANT are
options that are being applied to the CREATE SCHEMA statement.

Within the newly created KnowledgeBase schema, the Article table is being created and the SELECT
permission on the database is being granted to Salespeople.
Statements such as the second CREATE SCHEMA statement can lead to issues if the entire statement is
not executed together.

Object creation when the schema is created.

CREATE SCHEMA
CREATE SCHEMA Reporting
AUTHORIZATION Terry;

CREATE SCHEMA KnowledgeBase


AUTHORIZATION Paul;

CREATE TABLE Article


(
Article1D int IDENTITY (1,1) PRIMARY KEY,
Articlecontents XML
);

GRANT SELECT TO SalesPeople;


Developing SQL Databases 2-29

Demonstration: Working with Schemas


In this demonstration, you will see how to:

 Create a schema.

 Create a schema with an included object.

 Drop a schema.

Demonstration Steps
1. Ensure that you have completed the previous demonstrations in this module.

2. On the taskbar, click Microsoft SQL Server Management Studio.

3. In the Connect to Server dialog box, in the Server box, type the URL of the Azure server <Server
Name>.database.windows.net (where <Server Name> is the name of the server you created).

4. In the Authentication list, click SQL Server Authentication.

5. In the User name box, type Student, and in the Password box, type Pa55w.rd, and then click
Connect.

6. On the File menu, point to Open, click Project/Solution.

7. In the Open Project dialog box, navigate to the D:\Demofiles\Mod02 folder, click Demo.ssmssln,
and then click Open.
8. In Solution Explorer, under Queries, double-click 2 - Schemas.sql.

9. In the Available Databases list, click AdventureWorksLT.

10. Select the code under the Step 2: Create a Schema comment, and then click Execute.

11. Select the code under the Step 3: Create a table using the new schema comment, and then click
Execute.

12. Select the code under the Step 4: Drop the schema comment, and then click Execute. Note that the
schema cannot be dropped while objects exist in it.

13. Select the code under the Step 5: Drop and the table and then the schema comment, and then
click Execute.

14. Leave SQL Server Management Studio open for the next demonstration.
2-30 Designing and Implementing Tables

Check Your Knowledge


Question

Which of the following objects cannot be


stored in a schema?

Select the correct answer.

Table

Function

Database role

View

Stored procedure
Developing SQL Databases 2-31

Lesson 4
Creating and Altering Tables
Now that you understand the core concepts surrounding the design of tables, this lesson introduces you
to the Transact-SQL syntax that is used when defining, modifying, or dropping tables. Temporary tables
are a special form of table that can be used to hold temporary result sets. Computed columns are used to
create columns where the value held in the column is automatically calculated, either from expressions
involving other columns from the table, or from the execution of functions.

Lesson Objectives
After completing this lesson, you will be able to:

 Create tables

 Drop tables

 Alter tables

 Use temporary tables

 Work with computed columns

Creating Tables
Tables are created by using the CREATE TABLE
statement. This statement is also used to define
the columns that are associated with the table,
and identify constraints such as primary and
secondary keys.
CREATE TABLE

When you create tables by using the CREATE


TABLE statement, make sure that you supply both
a schema name and a table name. If the schema
name is not specified, the table will be created in
the default schema of the user who is executing
the statement. This could lead to the creation of
scripts that are not robust, because they could generate different schema designs when different users
execute them.

Nullability

You should specify NULL or NOT NULL for each column in the table. SQL Server has defaults for this that
you can change via the ANSI_NULL_DEFAULT setting. Scripts should always be designed to be as reliable
as possible—specifying nullability in data definition language (DDL) scripts helps to improve script
reliability.

Primary Key

You can specify a primary key constraint beside the name of a column if only a single column is included
in the key. It must be included after the list of columns when more than one column is included in the
key.
2-32 Designing and Implementing Tables

In the following example, the SalesID value is only unique for each SalesRegisterID value:

Specifying a Primary Key


CREATE TABLE PetStore.SalesReceipt
(
SalesRegisterID int NOT NULL,
SalesID int NOT NULL,
CustomerID int NOT NULL,
SalesAmount decimal(18,2) NOT NULL,
PRIMARY KEY (SalesRegisterID, SalesID)
);

Primary keys are constraints and are more fully described, along with other constraints, later in this course.

Dropping Tables
The DROP TABLE statement is used to delete a
table from a database. If a table is referenced by a
foreign key constraint, it cannot be dropped.

When dropping a table, all permissions,


constraints, indexes, and triggers that are related
to the table are also dropped. Deletion is
permanent. SQL Server has no equivalent to the
Windows Recycle Bin—after the table is dropped,
it is permanently removed.

Code that references the table, such as code that


is contained within stored procedures, functions,
and views, is not dropped. This can lead to
“orphaned” code that refers to nonexistent objects. SQL Server 2008 introduced a set of dependency
views that can be used to locate code that references nonexistent objects. The details of both referenced
and referencing entities are available from the sys.sql_expression_dependencies view. Referenced and
referencing entities are also available separately from the sys.dm_sql_referenced_entities and
sys.dm_sql_referencing_entities dynamic management views. Views are discussed later in this course.

Using the DRO statement.

DROP
DROP TABLE PetStore.Owner;
GO
Developing SQL Databases 2-33

Altering Tables
Altering a table is useful because permissions on
the table are retained, along with the data in the
table. If you drop and recreate the table with a
new definition, both the permissions on the table
and the data in the table are lost. However, if the
table is referenced by a foreign key, it cannot be
dropped, though it can be altered.

Tables are modified by using the ALTER TABLE


statement. You can use this statement to add or
drop columns and constraints, or to enable or
disable constraints and triggers. Constraints and
triggers are discussed in later modules.

Note that the syntax for adding and dropping columns is inconsistent. The word COLUMN is required for
DROP, but not for ADD. In fact, it is not an optional keyword for ADD either. If the word COLUMN is
omitted in a DROP, SQL Server assumes that it is a constraint being dropped.

In the following example, the PreferredName column is being added to the PetStore.Owner table. Then,
the PreferredName column is being dropped from the PetStore.Owner table. Note the difference in
syntax regarding the word COLUMN.
Use ALTER TABLE to add or delete columns.

ALTER TABLE
ALTER TABLE Petstore.Owner
ADD PreferredName nvarchar(30) NULL;
GO

ALTER TABLE Petstore.Owner


DROP COLUMN PreferredName;
GO
2-34 Designing and Implementing Tables

Demonstration: Working with Tables


In this demonstration, you will see how to:

 Create tables and alter tables.

 Drop tables.

Demonstration Steps
1. Ensure that you have completed the previous demonstrations in this module.

2. In SQL Server Management Studio, in Solution Explorer, under Queries, double-click 3 - Create
Tables.sql.

3. In the Available Databases list, click AdventureWorksLT.

4. Select the code under the Step 2: Create a table comment, and then click Execute.

5. Select the code under the Step 3: Alter the SalesLT.Courier table comment, and then click Execute.

6. Select the code under the Step 4: Drop the tables comment, and then click Execute.

7. Leave SQL Server Management Studio open for the next demonstration

Temporary Tables
Temporary tables are used to hold temporary
result sets within a user's session. They are created
within the tempdb database and deleted
automatically when they go out of scope. This
typically occurs when the code in which they were
created completes or aborts. Temporary tables are
very similar to other tables, except that they are
only visible to the creator and in the same scope
(and subscopes) within the session. They are
automatically deleted when a session ends or
when they go out of scope. Although temporary
tables are deleted when they go out of scope, you
should explicitly delete them when they are no longer needed to reduce resource requirements on the
server. Temporary tables are often created in code by using the SELECT INTO statement.

A table is created as a temporary table if its name has a number sign (#) prefix. A global temporary table
is created if the name has a double number sign (##) prefix. Global temporary tables are visible to all
users and are not commonly used.

Passing Temporary Tables

Temporary tables are also often used to pass rowsets between stored procedures. For example, a
temporary table that is created in a stored procedure is visible to other stored procedures that are
executed from within the first procedure. Although this use is possible, it is not considered good practice
in general. It breaks common rules of abstraction for coding and also makes it more difficult to debug or
troubleshoot the nested procedures. SQL Server 2008 introduced table-valued parameters (TVPs) that can
provide an alternate mechanism for passing tables to stored procedures or functions. (TVPs are discussed
later in this course.)
Developing SQL Databases 2-35

The overuse of temporary tables is a common Transact-SQL coding error that often leads to performance
and resource issues. Extensive use of temporary tables can be an indicator of poor coding techniques,
often due to a lack of set-based logic design.

Demonstration: Working with Temporary Tables


In this demonstration, you will see how to:

 Create local temporary tables.

 Create global temporary tables.

 Access a global temporary table from another session.

Demonstration Steps
1. Ensure that you have completed the previous demonstrations in this module.
2. In SQL Server Management Studio, in Solution Explorer, under Queries, double-click 4 - Temporary
Tables.sql.

3. Right-click the query pane, point to Connection, and then click Change Connection.

4. In the Connect to Database Engine window dialog box, in the Server name box, type MIA-SQL, in
the Authentication box, select Windows Authentication, and then click Connect.

5. Select the code under the Step 1: Create a local temporary table comment, and then click Execute.

6. In Solution Explorer, under Queries, double-click 5 - Temporary Tables.sql.

7. Select the code under the Step 1: Select and execute the following query comment, and then click
Execute. Note that this session cannot access the local temporary table from the other session.
8. Switch to the 4 - Temporary Tables.sql pane.

9. Select the code under the Step 3: Create a global temporary table comment, and then click
Execute.
10. Switch to the 5 - Temporary Tables.sql pane.

11. Select the code under the Step 2: Select and execute the following query comment, and then click
Execute. Note that this session can access the global temporary table from the other session.
12. Switch to the 4 - Temporary Tables.sql pane.

13. Select the code under the Step 5: Drop the two temporary tables comment, and then click
Execute.
14. Leave SQL Server Management Studio open for the next demonstration
2-36 Designing and Implementing Tables

Computed Columns
Computed columns are derived from other
columns or from the result of executing functions.

Computed columns were introduced in SQL Server


2000. In this example, the YearOfBirth column is
calculated by executing the DATEPART function
to extract the year from the DateOfBirth column
in the same table.

You can also see the word PERSISTED added to


the definition of the computed column. Persisted
computed columns were introduced in SQL Server
2005.

Defining a persisted computer column.

Computed Column
CREATE TABLE PetStore.Pet
(
Pet1D int IDENTITY (1,1) PRIMARY KEY,
PetName nvarchar(30) NOT NULL,
DateOfBirth date NOT NULL,
YearOfBirth AS DATEPART(year, DateOfBirth) PERSISTED
);
GO

A nonpersisted computed column is calculated every time a SELECT operation occurs on the column and
it does not consume space on disk. A persisted computed column is calculated when the data in the row
is inserted or updated and does consume space on the disk. The data in the column is then selected like
the data in any other column.

The difference between persisted and nonpersisted computed columns relates to when the computational
performance impact is exerted.

 Nonpersisted computed columns work best for data that is modified regularly, but rarely selected.
 Persisted computed columns work best for data that is modified rarely, but selected regularly.

 In most business systems, data is read much more regularly than it is updated. For this reason, most
computed columns would perform best as persisted computed columns.
Developing SQL Databases 2-37

Demonstration: Working with Computed Columns


In this demonstration, you will see how to:

 Work with computed columns.

 Use PERSISTED columns.

Demonstration Steps
1. Ensure that you have completed the previous demonstrations in this module.

2. In SQL Server Management Studio, in Solution Explorer, under Queries, double-click 6 - Computed
Columns.sql.

3. Right-click the query pane, point to Connection, and then click Change Connection.

4. In the Connect to Database Engine window dialog box, in the Server name box, type the URL for
the Azure account, in the Authentication box, select SQL Server Authentication, in the Login box,
type Student, and in the Password box, type Pa55w.rd, and then click Connect.

5. In the Available Databases list, click AdventureWorksLT.


6. Select the code under the Step 2: Create a table with two computed columns comment, and then
click Execute.

7. Select the code under the Step 3: Populate the table with data comment, and then click Execute.
8. Select the code under the Step 4: Return the results from the SalesLT.SalesOrderDates table
comment, and then click Execute.

9. Select the code under the Step 5: Update a row in the SalesLT.SalesOrderDates table comment,
and then click Execute.

10. Select the code under the Step 6: Create a table with a computed column that is not persisted
comment, and then click Execute.

11. Select the code under the Step 7: Populate the table with data comment, and then click Execute.

12. Select the code under the Step 8 - Return the results from the SalesLT.TotalSales table comment,
and then click Execute.
13. Close SQL Server Management Studio without saving any changes.

Question: When creating a computed column, why is it good practice to include the
PERSISTED keyword? What are the consequences of excluding PERSISTED when the table has
several million records?
2-38 Designing and Implementing Tables

Lab: Designing and Implementing Tables


Scenario
A business analyst from your organization has given you a draft design for some new tables being added
to a database. You need to provide an improved schema design, based on good design practices. After
you have designed the schema and tables, you need to implement them in the TSQL database.

Objectives
After completing this lab, you will be able to:

 Choose an appropriate level of normalization for table data.

 Create a schema.

 Create tables.
Estimated Time: 45 minutes

Virtual machine: 20762C-MIA-SQL

User name: ADVENTUREWORKS\Student


Password: Pa55w.rd

Exercise 1: Designing Tables


Scenario
A business analyst from your organization has given you a first pass at a schema design for some new
tables being added to the TSQL database. You need to provide an improved schema design, based on
good design practices and an appropriate level of normalization. The business analyst was also confused
about when data should be nullable. You need to decide about nullability for each column in your
improved design.
The main tasks for this exercise are as follows:

1. Prepare the Environment

2. Review the Design


3. Improve the Design

 Task 1: Prepare the Environment


1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are both running, and then
log on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.

2. Run Setup.cmd in the D:\Labfiles\Lab02\Starter folder as Administrator.

 Task 2: Review the Design


1. Open the Schema Design for Marketing Development Tables.docx from the
D:\Labfiles\Lab02\Starter folder.

2. Review the proposed structure for the new tables.

 Task 3: Improve the Design


1. Complete the Allow Nulls? column for each table.

2. Save your document.


Developing SQL Databases 2-39

3. Review the suggested solution in Schema Design for Marketing Development Tables.docx in the
D:\Labfiles\Lab02\Solution folder.
4. Close WordPad.

Results: After completing this exercise, you will have an improved schema and table design.

Exercise 2: Creating Schemas


Scenario
The new tables will be isolated in their own schema. You need to create the required schema called
DirectMarketing and assign ownership to the dbo user.

The main tasks for this exercise are as follows:

1. Create a Schema

 Task 1: Create a Schema


1. Using SSMS, connect to MIA-SQL using Windows Authentication.

2. Open Project.ssmssln from the D:\Labfiles\Lab02\Starter\Project folder.


3. In the Lab Exercise 2.sql file, write and execute a query to create the DirectMarketing schema, and
set the authorization to the dbo user.

Results: After completing this exercise, you will have a new schema in the database.

Exercise 3: Creating Tables


Scenario
You need to create the tables that you designed earlier in this lab. You should use appropriate nullability
for each column and each table should have a primary key. At this point, there is no need to create
CHECK or FOREIGN KEY constraints.

The main tasks for this exercise are as follows:

1. Create the Competitor Table

2. Create the TVAdvertisement Table

3. Create the CampaignResponse Table

 Task 1: Create the Competitor Table


1. In the Lab Exercise 3.sql file, write and execute a query to create the Competitor table that you
designed in Exercise 1 in the DirectMarketing schema.

2. In Object Explorer, verify that the new table exists.

 Task 2: Create the TVAdvertisement Table


1. In the Lab Exercise 3.sql file, write and execute a query to create the TVAdvertisement table that you
designed in Exercise 1 in the DirectMarketing schema.

2. Refresh Object Explorer and verify that the new table exists.
2-40 Designing and Implementing Tables

 Task 3: Create the CampaignResponse Table


1. In the Lab Exercise 3.sql file, write and execute a query to create the CampaignResponse table that
you designed in Exercise 1 in the DirectMarketing schema.

2. Refresh Object Explorer and verify that the new table exists.

3. Review the Computed text property of the ResponseProfit column.

4. Close SQL Server Management Studio without saving changes.

Results: After completing this exercise you will have created the Competitor, TVAdvertisement, and the
CampaignResponse tables. You will have created table columns with the appropriate NULL or NOT NULL
settings, and primary keys.

Question: When should a column be declared as nullable?


Developing SQL Databases 2-41

Module Review and Takeaways


Best Practice: All tables should have primary keys.
Foreign keys should be declared within the database in almost all circumstances. Developers
often suggest that the application will ensure referential integrity, but experience shows that this
is a poor option. Databases are often accessed by multiple applications, and bugs are also easy to
miss when they first start to occur.
3-1

Module 3
Advanced Table Designs
Contents:
Module Overview 3-1
Lesson 1: Partitioning Data 3-2

Lesson 2: Compressing Data 3-13

Lesson 3: Temporal Tables 3-19


Lab: Using Advanced Table Designs 3-27

Module Review and Takeaways 3-31

Module Overview
The physical design of a database can have a significant impact on the ability of the database to meet the
storage and performance requirements set out by the stakeholders. Designing a physical database
implementation includes planning the file groups, how to use partitioning to manage large tables, and
using compression to improve storage and performance. Temporal tables are a new feature in SQL
Server® 2016 and offer a straightforward solution to collecting changes to your data.

Objectives
At the end of this module, you will be able to:

 Describe the considerations for using partitioned tables in a SQL Server database.
 Plan for using data compression in a SQL Server database.

 Use temporal tables to store and query changes to your data.


3-2 Advanced Table Designs

Lesson 1
Partitioning Data
Databases that contain very large tables are often difficult to manage and might not scale well. This lesson
explains how you can use partitioning to overcome these problems, ensuring that databases remain
efficient, and can grow in a managed, orderly manner.

Lesson Objectives
At the end of this lesson, you will be able to:

 Understand common situations where partitioning can be applied.

 Use partition functions.

 Use partition schemas.

 Create a partitioned schema.

 Add indexes to your partitions.


 Understand the SWITCH, MERGE, and SPLIT operations.

 Design a partition strategy.

Common Scenarios for Partitioning


You can use SQL Server to partition tables and
indexes. A partitioned object is a table or index
that is divided into smaller units based on the
values in a particular column, called the
partitioning key. Partitioning divides data
horizontally—often called sharding—so partitions
are formed of groups of rows. Before SQL Server
2016 Service Pack 1, partitioning is only available
in the SQL Server Enterprise and Developer
Editions. In SQL Server 2016 Service Pack 1 and
later editions, partitioning is available in all
editions of SQL Server, and also in Azure™ SQL
Database.

Partitioning improves manageability for large tables and indexes, particularly when you need to load large
volumes of data into tables, or remove data from tables. You can manipulate data in partitioned tables by
using a set of dedicated commands that enable you to merge, split and switch partitions. These
operations often only move metadata, rather than moving data—which makes tasks such as loading data
much faster. They are also less resource intensive than loading data by using INSERTs. In some cases,
partitioning can also improve query performance, by moving older data out, thereby reducing the volume
of data to be queried.
Developing SQL Databases 3-3

Common scenarios for partitioning a table include:


 Implementing a sliding window. In a sliding window scenario, you move data into and out of tables
based on date ranges. For example, in a data warehouse, you could partition large fact tables; you
could move data out of a table when it is older than one year and no longer of use to data analysts,
and then load newer data to replace it. You will learn more about sliding window scenarios in this
lesson.

 Enabling separate maintenance operations. You can perform maintenance operations, such as
rebuilding and reorganizing indexes on a partition-by-partition basis, which might be more efficient
than doing it for the whole table. This is particularly useful when only some of the partitions contain
data that changes—because there is no need to maintain indexes for partitions in which the data
doesn't change. For example, in a table called Orders, where only the current orders are updated, you
can create separate partitions for current orders and completed orders, and only rebuild indexes on
the partition that contains the current orders.

 Performing partial backups. You can use multiple filegroups to store partitioned tables and indexes.
If some partitions use read-only filegroups, you can use partial backups to back up only the primary
filegroups and the read/write filegroups; this is more efficient than backing up the entire database.
You can use partitioning to create partitioned tables and partitioned indexes. When you create a
partitioned table, you do not create it on a filegroup, as you do a nonpartitioned table. Instead, you
create it on a partition scheme, which defines a filegroup or, more usually, a set of filegroups. In turn, a
partition scheme is based on a partition function, which defines the boundary values that will be used to
divide table data into partitions.

Partition Functions
When creating a partition function, you need to
first plan the column in the table that you will use
to partition the data. You should then decide
which values in that column will be the boundary
values. It is common practice in tables that contain
a datetime or smalldatetime column to use this
to partition data, because this means you can
divide the table based on time intervals. For
example, you could partition a table that contains
order information by order date. You could then
maintain current orders in one partition, and
archive older orders into one or more additional
partitions.

The values that you choose for the partition function will have an effect on the size of each partition. For
example, in the OrderArchive table, you could choose boundary values that divide data based on yearly
intervals. The bigger the gap between the intervals, the bigger each partition will probably be. The
number of values that you include in the partition function determines the number of partitions in the
table. For example, if you include two boundary values, there will be three partitions in the table. The
additional partition is created to store the values outside of the second boundary.

Note: You cannot use the following data types in a partition function: text, ntext, image,
xml, timestamp, varchar(max), nvarchar(max), varbinary(max), alias data types, and CLR
user-defined data types.
3-4 Advanced Table Designs

To create a partition function, use the CREATE PARTITION FUNCTION Transact-SQL statement. You
must specify the name of the function, and the data type that the function will use. This should be the
same as the data type of the column that you will use as the partitioning key. You must also detail the
boundary values, and either RANGE LEFT or RANGE RIGHT to specify how to handle values in the table
that fall exactly on the boundary values:
 RANGE LEFT is the default value. This value forms the upper boundary of a partition. For example, if
a partition function used a boundary value of midnight on December 31, 2015, any values in the
partitioned table date column that were equal to or less than this date and time would be placed into
the partition. All values from January 1, 2016 00:00:00 would be stored in the second partition.

 With RANGE RIGHT, the value is the lower boundary of the partition. If you specified January 1, 2016
00:00:00, all dates including and later than this date would go into one partition; all dates before this
would go into another partition. This produces the same result as the RANGE LEFT example.

Note: The definition of the partition function does not include any objects, columns, or
filegroup storage information. This independence means you can reuse the function for as many
tables, indexes, or indexed views as you like. This is particularly useful for partitioning dates.

The following code example creates a partition function called YearlyPartitionFunction that specifies
three boundary values, and will therefore create four partitions:

CREATE PARTITION FUNCTION AS RANGE LEFT Transact-SQL Statement


CREATE PARTITION FUNCTION YearlyPartitionFunction (smalldatetime)
AS RANGE LEFT
FOR VALUES ('2013-12-31 00:00', '2014-12-31 00:00', '2015-12-31 00:00');
GO

The YearlyPartitionFunction in the preceding code example can be applied to any table. After
partitioning has been added to a table (you will see this in a later lesson), the datetime column used for
partitioning value will determine which partition the row will be stored in:

Partition
Minimum Value Maximum Value
Number

1 Earliest date prior or equal to 2013-12-31 00:00 2013-12-31 00:00

2 2014-01-01 00:00 2014-12-31 00:00

3 2015-01-01 00:00 2015-12-31 00:00

4 2016-01-01 00:00 All dates from 2016-01-01 00:00


Developing SQL Databases 3-5

If the function used the RANGE RIGHT option, then the maximum values would become the minimum
values:

Partition
Minimum Value Maximum Value
Number

1 Earliest date prior or equal to 2013-12-30 00:00 2013-12-30 00:00

2 2013-12-31 00:00 2014-12-30 00:00

3 2014-12-31 00:00 2015-12-30 00:00

4 2015-12-31 00:00 All dates from 2015-12-31 00:00

In this case, RANGE LEFT works better with the dates used, as each partition can then contain data for one
year. RANGE RIGHT would work better using the dates in the following code example.

The following code uses RANGE RIGHT to divide rows into annual partitions:

CREATE PARTITION FUNCTION AS RANGE RIGHT Transact-SQL Statement


CREATE PARTITION FUNCTION YearlyPartitionFunction (smalldatetime)
AS RANGE RIGHT
FOR VALUES ('2014-01-01 00:00', '2015-01-01 00:00', '2016-01-01 00:00');
GO

Note: A table or index can have a maximum of 15,000 partitions in SQL Server.

Partition Schemes
Partition schemes map table or index partitions to
filegroups. When planning a partition scheme,
think about the filegroups that your partitioned
table will use. By using multiple filegroups for your
partitioned table, you can separately back up
discrete parts of the table by backing up the
appropriate filegroup. It is common practice to
use one filegroup for each partition, but this is not
a requirement; you can use a single filegroup to
store all partitioned data, or map some partitions
to a single filegroup, and others to separate
filegroups.

For example, if you plan to store read-only data in a partitioned table, you might place all filegroups that
contain read-only data on the same filegroup, so you can manage the data together.
To create a partition scheme, use the CREATE PARTITION SCHEME Transact-SQL statement. You must
specify a name for the scheme, the partition function that it references, and the filegroups that it will use.
3-6 Advanced Table Designs

The following code example creates a scheme called OrdersByYear that references the function
PartitionByYearFunction and uses four filegroups, Orders1, Orders2, Orders3, and Orders4:

CREATE PARTITION SCHEME Transact-SQL Statement


CREATE PARTITION SCHEME OrdersByYear
AS PARTITION PartitionByYearFunction
TO (Orders1, Orders2, Orders3, Orders4);
GO

Note: You must create the partition function using the CREATE PARTITION FUNCTION
statement before you create your partition scheme.

Creating a Partitioned Table


After creating your partition function and scheme,
you are ready to partition your tables. When you
create an object, it includes an ON clause to
instruct SQL Server where it should store the
object. In common practice, this clause is often
omitted and the object is stored on the default
filegroup.
To create a partitioned table, you use the CREATE
TABLE statement with the ON clause, and specify
which partition scheme to use. The partition
scheme will then determine which filegroup each
row will be stored in. Your table must have an
appropriate partition key column for the scheme to be able to partition the data, and it must be the same
data type as specified when creating the function.

The following code example creates a table named Orders, which will use the OrdersByYear scheme:

Creating a Partitioned Table


CREATE TABLE Orders
(
OrderID int IDENTITY(1,1) PRIMARY KEY,
CustomerID int NOT NULL,
ShippingAddressID int NOT NULL,
BillingAddressID int NOT NULL,
OrderDate smalldatetime NOT NULL
)
ON OrdersByYear(OrderDate);
GO

If the partitioning key is a computed column, this column must be PERSISTED.


Developing SQL Databases 3-7

Partitioned Indexes
You create a partitioned index in much the same
way as a table using the ON clause, but a table
and its indexes can be partitioned using different
schemes. However, you must partition the
clustered index and table in the same way,
because the clustered index cannot be stored
separately from the table. If a table and all its
indexes are identically partitioned by using the
same partition scheme, then they are considered
to be aligned. When storage is aligned, both the
rows in a table and the indexes that depend on
these rows will be stored in the same filegroup.
Therefore, if a single partition is backed up or restored, both the data and indexes are kept together. An
index that is partitioned differently to its dependent table is considered nonaligned.

An index is partitioned by specifying a partition scheme in the ON clause.

The following example creates a nonclustered index on the OrderID column of the Orders table:

Create a Partitioned Nonclustered Index


CREATE NONCLUSTERED INDEX ixOrderID
ON dbo.Orders (OrderID) ON OrdersByYear(OrderDate);

Notice that, when you partition an index, you are not limited to using the columns in the index when
specifying the partitioning key. SQL Server includes the partitioning key in the definition of the index,
which means you can partition the index using the same scheme as the table.
The following code creates the Orders table with a partitioned clustered index on the OrderID and
OrderDate column of the Orders table:

Create a Table with a Partitioned Clustered Index


CREATE TABLE dbo.Orders
(
OrderID int IDENTITY(1,1),
OrderDate datetime NOT NULL,
CONSTRAINT PK_Orders PRIMARY KEY CLUSTERED (OrderDate, OrderID)
)
ON OrdersByYear(OrderDate);

When an index is partitioned, you can rebuild or reorganize the entire index, or a single partition of an
index. The sys.dm_db_index_physical_stats dynamic management view (DMV) provides fragmentation
information for each partition, so you can see which partitions are most heavily fragmented. You can then
create a targeted defragmentation strategy based on this data.

The following code uses the sys.dm_db_index_physical_stats DMV to show fragmentation in each partition
of a table:

Show Fragmentation by Partition Using sys.dm_db_index_physical_stats


SELECT i.name, s.index_type_desc, s.partition_number, avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats
(DB_ID(N'TSQL'), OBJECT_ID(N'dbo.Orders'), NULL, NULL, NULL) AS s
INNER JOIN sys.indexes AS i ON s.object_id = i.object_id AND s.index_id = i.index_id;
3-8 Advanced Table Designs

SWITCH, MERGE and SPLIT Operations


You can manipulate the partitions in a partitioned
table by performing SWITCH, MERGE and SPLIT
operations, and by using dedicated partitioning
functions and catalog views.

Switching Partitions
One of the major benefits of partitioned tables is
the ability to switch individual partitions in and
out of a partitioned table. By using switching, you
can archive data quickly and with minimal impact
on other database operations. This is because, if it
is configured correctly, switching usually only
involves swapping the metadata of two partitions
in different tables, not the actual data. Consequently, the operation has minimal effect on performance.
You can switch partitions between partitioned tables, or you can switch a partition from a partitioned
table to a nonpartitioned table.

Consider the following points when planning partition switching:

 Both the source partition (or table) and the destination partition (or table) must be in the same
filegroup, so you need to take account of this when planning filegroups for a database.
 The target partition (or table) must be empty; you cannot perform a SWITCH operation by using two
populated partitions.
 The two partitions or tables involved must have the same schema (columns, data types, and so on).
The rows that the partitions contain must also fall within exactly the same range of values for the
partitioning column; this ensures that you cannot switch rows with inappropriate values into a
partitioned table. You should use CHECK constraints to ensure that the partitioning column values are
valid for the partition being switched. For example, for a table that is partitioned by a date value, you
could create a CHECK constraint on the table that you are switching; this then checks that all values
fall between two specified dates.

Splitting Partitions
Because you need to maintain an empty partition to switch partitions, it is usually necessary to split an
existing partition to create a new empty partition that you can then use to switch data. To split a partition,
you first need to alter the partition scheme to specify the filegroup that the new partition will use (this
assumes that your solution maps partitions one-to-one with filegroups). When you alter the scheme, you
specify this filegroup as the next used filegroup, which means that it will automatically be used for the
new partition that you create when you perform the split operation.

The following code example adds the next used filegroup NewFilegroup to the OrderArchiveScheme
partition scheme:

ALTER PARTITION SCHEME Transact-SQL Statement


ALTER PARTITION SCHEME OrderArchiveScheme NEXT USED NewFilegroup;
GO

You can then alter the partition function to split the range and create a new partition.
Developing SQL Databases 3-9

The following code example adds a new partition by splitting the range:

ALTER PARTITION FUNCTION Transact-SQL Statement


ALTER PARTITION FUNCTION OrderArchiveFunction () SPLIT RANGE (‘2012-07-01 00:00’)

You can now switch the empty partition as required. To do this, you need the partition number, which you
can get by using the $PARTITION function, and specifying the value for which you want to identify the
partition.

The following code example switches a partition from the Orders table to the OrderArchive table:

ALTER TABLE Transact-SQL Statement


DECLARE @p int = $PARTITION.OrderArchiveFunction('2012-04-01 00:00');
ALTER TABLE Orders
SWITCH TO OrderArchive PARTITION @p
GO

Merging Partitions
Merging partitions does the opposite of splitting a partition, because it removes a range boundary instead
of adding one.

The following code example merges two partitions:

ALTER TABLE Transact-SQL Statement


ALTER PARTITION FUNCTION OrderArchiveFunction
MERGE RANGE (‘2011-04-01 00:00’);
GO

Designing Partition Strategies for Common Scenarios


You can use SWITCH, MERGE, and SPLIT to
implement a sliding window strategy for archiving
or purging data. For example, if you have a table
that stores current orders, you could periodically
SWITCH data that is older than three months out
of this table and into a staging table, and then
archive it. Managing old data by using a sliding
window is much more efficient than extracting
data from a table, because the SWITCH operation
does not require you to move the data.

A table that participates in a sliding window


strategy typically includes:

 A partition function that uses a datetime column. To implement a sliding window, you should use
a datetime data type.

 Partitions that map to the appropriate time period. For example, if you want to SWITCH out one
month's data at a time, each partition should contain only the data for a single month. You specify
the time periods by defining the boundary values in the partition function.

 Empty partitions. Performing MERGE and SPLIT operations on empty partitions maintains the
number of partitions in the table, and makes the table easier to manage.
3-10 Advanced Table Designs

Sliding Windows Strategies


The way that you implement a sliding window depends on whether you use RANGE LEFT or RANGE
RIGHT in the partition function. If you use RANGE LEFT to create the partition function, you can create
and maintain a partitioned table as described in the following example:

1. Create a partitioned table with four partitions, each of which represents a period of one month.
Partition 1 contains the oldest data, partition 2 contains the current data, partition 3 is empty, and
partition 4 is empty. The table looks like this:

Partition 1: Partition 2: Partition 3: Partition 4:


Oldest Data Current Data Empty Empty
2. Switch out partition 1 to a staging table for purging or archiving. The table now looks like this:

Partition 1: Partition 2: Partition 3: Partition 4:


Empty Current Data Empty Empty
3. Merge the now empty partition 1 with partition 2. This partition now contains the oldest data. The
table now has three partitions, one populated and two empty. The empty middle partition (which was
partition 3) will be used to load the current data. The table now looks like this:

Partition 2:
Partition 1: Partition 3:
Empty: load
Oldest Data Empty
current data
4. Split the other empty partition to return the table to the same state as it was in step 1.
If you use RANGE RIGHT to create the partition function, you can create and maintain a partitioned table
as described in the following example:

1. Create a partitioned table with four partitions, each of which represents a period of one month.
Partition 1 is empty, partition 2 contains the oldest data, partition 3 contains the current data, and
partition 4 is empty. The table looks like this:

Partition 1: Partition 2: Partition 3: Partition 4:


Empty Oldest Data Current Data Empty
2. Switch out partition 2 to a staging table. The table now looks like this:

Partition 1: Partition 2: Partition 3: Partition 4:


Empty Empty Current Data Empty
3. Merge the empty partitions 1 and 2. The table now has three partitions, and looks like this:

Partition 1: Partition 2: Partition 3:


Empty Oldest Data Empty
4. Split partition 3 to return the table to the same state as it was in step 1.
Developing SQL Databases 3-11

Considerations for Implementing a Sliding Window

When planning a sliding window strategy, consider the following points:

 Partition boundary values with RANGE LEFT. When partitioning on a column that uses the
datetime data type, you should choose the partition boundary value that you specify with RANGE
LEFT carefully. SQL Server performs explicit rounding of times in datetime values that can have
unexpected consequences. For example, if you create a partition function with a RANGE LEFT
boundary value of 2012-10-30 23:59:59.999, SQL Server will round this up to 2012-10-31
00:00:00.000; as a result, rows with the value of midnight will be added to the left partition instead
of the right. This could lead to inconsistencies because some rows for a particular date might be in a
different partition to the other rows with the same date. To avoid SQL Server performing rounding on
times in this way, specify the boundary value as 2012-10-30 23:59:59.997 instead of 2012-10-30
23:59:59.999; this will ensure that rows are added to partitions as expected. For the datetime2 and
datetimeoffset data types, you can specify a boundary of 2012-10-30 23:59:59.999 without
experiencing this problem. If you use RANGE RIGHT in the partition function, specifying a time value
of 00:00:00:000 will ensure that all rows for a single date are in the same partition, regardless of the
data type that you use.

 CHECK constraint. You must create a check constraint on the staging table to which you will switch
the partition containing the old data. The check constraint should ensure that both partitions contain
dates for exactly the same period, and that NULL values are not allowed.

The code example below adds a check constraint to the Orders_Staging table:

ALTER TABLE …WITH CHECK Transact-SQL Statement


ALTER TABLE Orders_Staging
WITH CHECK ADD CONSTRAINT CheckDates
CHECK (OrderDate >= '2010-07-01' and OrderDate < '2010-10-01' AND OrderDate IS NOT NULL);
GO

Demonstration: Creating a Partitioned Table


In this demonstration, you will see how to partition data.

Demonstration Steps
Creating a Partitioned Table
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are both running, and then
log on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.

2. Run D:\Demofiles\Mod03\Setup.cmd as an administrator.

3. In the User Account Control dialog box, click Yes, and then if prompted with the question Do you
want to continue with this operation? type Y, then press Enter.

4. Start SQL Server Management Studio and connect to the MIA-SQL database engine instance using
Windows® authentication.

5. Open the Demo.ssmssln solution in the D:\Demofiles\Mod03\Demo folder.

6. In Solution Explorer, open the 1 - Partitioning.sql script file.

7. Select and execute the query under Step 1 to use the master database.

8. Select and execute the query under Step 2 to create four filegroups, and add a file to each filegroup.
3-12 Advanced Table Designs

9. Select and execute the query under Step 3 to switch to the AdventureWorks database.

10. Select and execute the query under Step 4 to create the partition function.

11. Select and execute the query under Step 5 to create the OrdersByYear partition scheme.

12. Select and execute the query under Step 6 to create the Sales.SalesOrderHeader_Partitioned table.

13. Select and execute the query under Step 7 to copy data into the
Sales.SalesOrderHeader_Partitioned table.

14. Select and execute the query under Step 8 to check the rows counts within each of the partitions.

15. Keep SQL Server Management Studio open for the next demonstration.

Check Your Knowledge


Question

What is the maximum number of


partitions you can add to a table or
index?

Select the correct answer.

256

1,000

15,000

256,000
Developing SQL Databases 3-13

Lesson 2
Compressing Data
SQL Server includes the ability to compress data in SQL Server databases. Compression reduces the space
required to store data and can improve performance for workloads that are I/O intensive. This lesson
describes the options for using SQL Server compression and its benefits. It also describes the
considerations for planning data compression.

Note: In versions of SQL Server before SQL Server 2016 Service Pack 1, compression was
only available in Enterprise edition. In SQL Server 2016 Service Pack 1 and later, compression is
available in all editions of SQL Server.

Lesson Objectives
At the end of this lesson, you will be able to:
 Describe the benefits of data compression.

 Add page compression to your databases.

 Use row compression on your tables.

 Add Unicode compression to your databases.

 Understand the considerations for compressing data.

Why Compress Data?


SQL Server data compression can help you to
achieve savings in storage space. For workloads
that require a lot of disk I/O activity, using
compression can improve performance. This
performance boost occurs because compressed
data requires fewer data pages for storage, so
queries that access compressed data require fewer
pages to be retrieved.
You can use SQL Server compression to compress
the following SQL Server objects:

 Tables
 Nonclustered indexes

 Indexed views

 Individual partitions in a partitioned table or index—each partition can be set to PAGE, ROW or
NONE

 Spatial indexes
In SQL Server, you can implement compression in two ways: page compression and row compression. You
can also implement Unicode compression for the nchar(n) and nvarchar(n) data types.
3-14 Advanced Table Designs

Note: You can see the compression state of the partitions in a partitioned table by
querying the data_compression column of the sys.partitions catalog view.

Page Compression
Page compression takes advantage of data
redundancy to reclaim storage space. Each
compressed page compression includes a
structure called the compression information (CI)
structure below the page header. The CI structure
is used to store compression metadata.

Page compression compresses data in three ways:

1. Row compression. When you implement


page compression, SQL Server automatically
implements row compression; in other words,
page compression incorporates row
compression.

2. Prefix compression. SQL Server scans each compressed column to identify values that have a
common prefix. It then records the prefixes in the CI structure and assigns an identifier for each
prefix—which it then uses in each column to replace the shared prefixes. Because an identifier is
usually much smaller than the prefix that it replaces, SQL Server can potentially reclaim a considerable
amount of space. For example, imagine a set of parts in a product table that all have an identifier that
begins TFG00, followed by a number. If the Products table contains a large number of these products,
prefix compression would eliminate the redundant TFG00 values from the column, and replace them
with a smaller alias.

3. Dictionary compression. Dictionary compression works in a similar way to prefix compression, but
instead of just identifying prefixes, dictionary compression identifies entire repeated values, and
replaces them with an identifier that is stored in the CI structure. For example, in the products table,
there is a column called Color that contains values such as Blue, Red, and Green that are repeated
extensively throughout the column. Dictionary compression would replace each color value with an
identifier, and store each color value in the CI structure, along with its corresponding identifier.

SQL Server processes these three compression operations in the order that they are shown in the previous
bullet list.
Developing SQL Databases 3-15

Row Compression
Row compression saves space by changing the
way it stores fixed length data types. Instead of
storing them as fixed length types, row
compression stores them in variable length format.
For example, the integer data type normally takes
up four bytes of space. If you had a column that
used the integer data type, the amount of space
that each row has in the column would vary,
depending on the values in the rows. A value of six
would only consume a single byte, whereas a
value of 6,000 would consume two bytes. Row
compression only works with certain types of data;
it does not affect variable length data types, or other data types including: xml, image, text, and ntext.

When you implement row compression for a table, SQL Server adds an extra four bits to each compressed
column to store the length of the data type. However, this small increase in size is normally outweighed
by the space saved. For NULL values, the four bits is the only space that they consume.

Unicode Compression
SQL Server Unicode compression uses the
Standard Compression Scheme for Unicode (SCSU)
algorithm to extend the compression capabilities
to include the Unicode data types nchar(n) and
nvarchar(n). When you implement row or page
compression on an object, Unicode columns are
automatically compressed by using SCSU. Note
that nvarchar(MAX) columns cannot be
compressed with row compression, but can
benefit from page compression.

Unicode data types use more bytes to store the


equivalent non-Unicode values. SQL Server uses
the UCS-2 Unicode encoding scheme, which uses two bytes to store each character. Non-Unicode data
types use only one byte per character. The SCSU algorithm essentially stores Unicode values as non-
Unicode values and converts them back to Unicode as required. This can yield significant storage savings,
with compression for English language Unicode data types yielding 50 percent savings.

Note: The compression ratio for Unicode compression varies between different languages,
particularly for languages whose alphabets contain significantly more characters. For example,
Unicode compression for the Japanese language yields just 15 percent savings.
3-16 Advanced Table Designs

Considerations for Compression


The benefits of using SQL Server compression
include the ability to reclaim storage space and
improved performance. However, compressing
and uncompressing data requires a significant
amount of server resources, particularly processor
resources. Consequently, you should plan
compression to ensure that the benefits of
compressing data will outweigh the costs.

Planning Compression to Reclaim


Storage Space
The amount of storage space that you can reclaim
by using compression depends on the type of data
that you are compressing. Consider the following points when planning data compression:

 If your data includes a large number of fixed length data types, you can potentially benefit from row
compression. For the greatest benefit, a large number of the values should consume less space than
the data types allow in total. For example, the smallint data type consumes 2 bytes of space. If a
smallint column contains values that are less than 256, you can save a whole byte of data for each
value. However, if the majority of the values are greater than this, the benefit of compressing the data
is less, because the overall percentage of space saved is lower.

 If your data includes a large amount of redundant, repeating data, you might be able to benefit from
page compression. This applies to repeating prefixes, in addition to entire words or values.
 Whether or not you will benefit from Unicode compression depends on the language that your data
is written in.
You can use the stored procedure sp_estimate_data_compression_savings to obtain an estimation of
the savings that you could make by compressing a table. When you execute
sp_estimate_data_compression_savings, you supply the schema name, the table name, the index id if
this is included in the calculation, the partition id if the table is partitioned, and the type of compression
(ROW, PAGE, or NONE). sp_estimate_data_compression_savings takes a representative sample of the
table data and places it in tempdb, where is it compressed, and then supplies a result set that displays the
potential savings that you could make by compressing the table.

The following code estimates the potential space savings of implementing row compression in the
Internet.Orders table:

sp_estimate_data_compression_savings
USE Sales;
GO
EXEC sp_estimate_data_compression_savings 'Internet', 'Orders', NULL, NULL, 'ROW’;
GO

Planning Compression to Improve Performance


Because compressed data uses fewer data pages, reading compressed data requires fewer page reads,
which in turn requires reduced disk I/O, the result of which can be improved performance. Compressed
data is stored on disk in a compressed state, and it remains in a compressed state when the data pages
are loaded into memory. Data is only uncompressed when it is required, for example, for join operations,
or when an application reads the data. As a result, SQL Server can fit more pages into memory than if the
data were uncompressed, which can potentially boost performance further. The benefits of reduced disk
Developing SQL Databases 3-17

I/O and more efficient in-memory page storage are greater for workloads that scan large amounts of
data, rather than for queries that return just a small subset of data.
To decide whether to implement compression, you must balance the performance improvements against
the cost, in terms of CPU resources, of compressing and uncompressing the data. For row compression,
this cost is typically a seven to 10 percent increase in CPU utilization. For page compression, this figure is
usually higher.

Two factors that can help you to assess the value of implementing compression for a table or index are:

1. The frequency of data change operations relative to other operations. The lower the percentage of
data change operations, such as updates, the greater the benefit of compression. Updates typically
require access to only a small part of the data, and so do not involve accessing a large number of
data pages.

2. The proportion of operations that involve a scan. The higher this value, the greater the benefit of
compression. Scans involve a large number of data pages, so you can improve performance
considerably if a significant percentage of the workload involves scans.

You can use the sys.dm_db_index_operational_stats dynamic management view (DMV) to obtain the
information required to assess the frequency of data change operations and the proportion of operations
that involve a scan.

Demonstration: Compressing Data


In this demonstration, you will see how to compress data.

Demonstration Steps
Compressing Data

1. In Server Management Studio, in Solution Explorer, open the 2 - Compressing Data.sql script file.
2. Select and execute the query under Step 1 to use the AdventureWorks database.

3. Select and execute the query under Step 2 to run the sp_estimate_data_compression_savings
procedure against the Sales.SalesOrderDetail table.

4. Select and execute the query under Step 3 to add row compression to the Sales.SalesOrderDetail
table.

5. Select and execute the code under Step 4 to run the sp_estimate_data_compression_savings
procedure against the Sales.SalesOrderDetail table to see if the table can be further compressed.

6. Select and execute the query under Step 5 to rebuild indexes 1 and 3.

7. Select and execute the query under Step 6 to run sp_estimate_data_compression_savings


procedure against the Sales.SalesOrderDetail table to show how the size of the table has been
reduce.

8. Keep SQL Server Management Studio open for the next demonstration.
3-18 Advanced Table Designs

Check Your Knowledge


Question

You have a Customers table with the


following columns: Title, FirstName,
MiddleInitial, LastName, Address1,
Address2, City, PostalCode, Telephone,
and Email. Which of the following
options will give the best reduction in
storage?

Select the correct answer.

Add ROW compression to the table.

Add PAGE compression to the table.

Add Unicode compression to the


table.

Create a nonclustered index on the


FirstName and LastName columns.

None of the above.


Developing SQL Databases 3-19

Lesson 3
Temporal Tables
Most developers, at some point in their careers, have faced the problem of capturing and storing changed
data, including what was changed, and when it was changed. In addition to the Slowing Changing
Dimension (SCD) component found in data warehousing, it is common to add triggers or custom code to
extract the changes and store this for future reference. The introduction of temporal tables means that
SQL Server can capture data change information automatically. These tables are also known as system-
versioned tables.

Lesson Objectives
After completing this lesson, you will be able to:

 Describe the benefits of using temporal tables in your database.

 Create temporal tables.

 Add system-versioning to an existing table.

 Understand considerations, including limitations, of using temporal tables.

 Use system-versioning with memory optimized tables.

What are System-Versioned Temporal Tables?


Temporal tables solve the issue of capturing and
storing changes to the data. Developers often
need to capture changes for auditing or reporting
purposes. The slowing changing dimension (SCD)
component in a data warehouse is a common way
to capture changes from the OLTP system, so that
data can be viewed and analyzed over periods of
time. However, in an OLTP system, a simpler
approach is often more appropriate. Temporal
tables solve this issue, and can also be queried
within the database.

With the temporal table feature, you can record all


changes to your data. You create a system-versioned table either by creating a new table, or modifying an
existing table. When a new table is created, a pair of tables are created—one for the current data, and one
for the historical data. Two datetime2 columns are added to both the current and historical tables to
store the valid date range of the data: SysStartTime and SysEndTime. The current row will have a
SysEndTime value of 9999-12-31, and all records inserted within a single transaction will have the same
UTC time. When a row is updated, the data prior to the update is copied to the historical table, and the
SysEndTime column is set to the date that the data is changed. The row in the current table is then
updated to reflect the change.

For a relationship to be established with the historical table, the current table must have a primary key.
This also means you can indirectly query the historical table to see the full history for any given record.
The historical table can be named at the time of creation, or SQL Server will give it a default name.
3-20 Advanced Table Designs

Creating a Temporal Table


To create a new system-versioned table, use the
standard CREATE TABLE code with two datetime2
start and end time columns, declare these columns
as the dates for the PERIOD FOR SYSTEM_TIME,
and then specify system-versioning ON.

Create a new table and set the


SYSTEM_VERSIONING feature ON.

Create Temporal Table


CREATE TABLE dbo.Employee
(
EmployeeID int NOT NULL PRIMARY KEY
CLUSTERED,
ManagerID int NULL,
FirstName varchar(50) NOT NULL,
LastName varchar(50) NOT NULL,
SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL,
SysEndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL,
PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.EmployeeHistory));

You must include the SysStartTime and SysEndTime columns, and the PERIOD FOR SYSTEM_TIME
which references these columns. You can change the name of these columns and change the references
to them in the PERIOD FOR SYSTEM_TIME parameters, as shown in the following code:

The Manager table in the following example has the date columns named as DateFrom and DateTo—
these names are also used in the history table:
The Manager table in the following example has the date columns named as DateFrom and DateTo—
these names are also used in the history table:

Change the Names of the Start and End System Columns


CREATE TABLE dbo.Manager
(
ManagerID int NOT NULL PRIMARY KEY CLUSTERED,
FirstName varchar(50) NOT NULL,
LastName varchar(50) NOT NULL,
DateFrom datetime2 GENERATED ALWAYS AS ROW START NOT NULL,
DateTo datetime2 GENERATED ALWAYS AS ROW END NOT NULL,
PERIOD FOR SYSTEM_TIME (DateFrom, DateTo)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.ManagerHistory));

If you want SQL Server to name the historical table, run the preceding code excluding the
(HISTORY_TABLE = dbo.EmployeeHistory) clause. Furthermore, if you include the HIDDEN keyword when
specifying the start and end time columns, they don’t appear in the results of a SELECT * FROM statement.
However, you can specify the start and end column names in the select list to include them.
Developing SQL Databases 3-21

Adding System-Versioning to an Existing Table


An existing table can be converted to a system-
versioned table by:

 Adding the start and end datetime2 columns.

 Setting the system-versioning flag on.

The following code adds the system datetime2


columns to the Sales table. After these are created,
the code alters the Sales table so that
SYSTEM_VERSIONING is ON and data changes are
stored. In this example, the history table has been
named:

Make an Existing Table System-Versioned


ALTER TABLE dbo.Sales
ADD
SysStartTime datetime2(0) GENERATED ALWAYS AS ROW START CONSTRAINT DF_SysStartTime
DEFAULT SYSUTCDATETIME(),
SysEndTime datetime2(0) GENERATED ALWAYS AS ROW END CONSTRAINT DF_SysEndTime DEFAULT
CONVERT(datetime2 (0), '9999-12-31 23:59:59'),
PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime);
GO

ALTER TABLE dbo.Sales


SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.SalesHistory));

Temporal Table Considerations


There are a number of points that you should
consider before using a system-versioned table:

 The system date columns SysStartTime and


SysEndTime must use the datetime2 data
type.

 The current table must have a primary key,


but the historical table cannot use constraints.

 To name the history table, you must specify


the schema name as well as the table name.

 By default, the compression of the history


table is PAGE compressed.

 The history table must reside in the same database as the current table.
 System-versioned tables are not compatible with FILETABLE or FILESTREAM features because SQL
Server cannot track changes that happen outside of itself.

 Columns with a BLOB data type, such as varchar(max) or image, can result in high storage
requirements because the history table will store the history values as the same type.

 INSERT and UPDATE statements cannot reference the SysStartTime or SysEndTime columns.

 You cannot modify data in the history table directly.


3-22 Advanced Table Designs

 You cannot truncate a system-versioned table. Turn SYSTEM_VERSIONING OFF to truncate the table.

 Merge replication is not supported.

For a full list of considerations and limitations, see Microsoft Docs:

Temporal Table Considerations and Limitations


http://aka.ms/vlehj6

System-Versioned Memory-Optimized Tables


In-Memory OLTP was introduced in SQL Server
2014 so that tables could exist in memory, giving
optimal performance by avoiding concurrency,
and reading data from memory rather than disk.
These high performance tables are known as
memory-optimized tables.
All changes to a memory-optimized table are
stored in a transaction log that resides on disk.
This guarantees that the data is secure in the event
of a service restart. System-versioning can be
added to a memory-optimized table, but rather
than bloating the server memory with the data
changes stored in the history table, and possibly exceeding the RAM limit, changed data is stored on disk.
SQL Server also creates an internal memory-optimized staging table that sits between the current
memory-optimized table and the disk-based history table. This is covered in more detail later in this topic.

Create a Memory-Optimized Filegroup


To create a memory-optimized table with system-versioning, you must first ensure the database has a
filegroup allocated for memory-optimized data. To check for a filegroup, open SQL Server Management
Studio, and connect to the server. Right-click the database in Object Explorer and click Properties. Look
in the Filegroups tab to see if a file exists in the MEMORY OPTIMIZED DATA box. You can only have
one memory-optimized filegroup per database. If you need to add a filegroup, click Add, and enter the
name of the filegroup, then click OK. You can also use the following code example to create a new
memory-optimized filegroup:

Add a filegroup to your database to contain your memory-optimized tables.

Add Filegroup for Memory-Optimized Data


ALTER DATABASE AdventureWorks
ADD FILEGROUP MemoryOptimized CONTAINS MEMORY_OPTIMIZED_DATA
ALTER DATABASE AdventureWorks
ADD FILE (name='AdventureWorks_mod', filename='C:\Program Files\Microsoft SQL
Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\AdventureWorks_mod') TO FILEGROUP MemoryOptimized

ALTER DATABASE AdventureWorks


SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT = ON
Developing SQL Databases 3-23

Create a System-Versioned Memory-Optimized Table


Now that you have a filegroup for your memory-optimized data, you can create a memory-optimized
table with system-versioning. However, there are a few considerations you need to be aware of:

 The durability of the current temporal table must be SCHEMA_AND_DATA.

 The primary key must be a nonclustered index; a clustered index is not compatible.

 If SYSTEM_VERSIONING is changed from ON to OFF, the data in the staging buffer is moved to disk.

 The staging table uses the same schema and the current table, but also includes a bigint column to
guarantee that the rows moved to the internal buffer history are unique. This bigint column adds 8
bytes, thereby reducing the maximum row size to 8052 bytes.

 Staging tables are not visible in Object Explorer, but you can use the sys.internal_tables view to
acquire information about these objects.

 If you include the HIDDEN keyword when specifying the start and end time columns, they don’t
appear in the results of a SELECT * FROM statement. However, you can specify the start and end
column names in the select list to include them.
The following code creates a memory-optimized table with system-versioning enabled. The start and end
columns are HIDDEN:

Create a System-Versioned Memory-Optimized Table


CREATE TABLE dbo.Employee
(
EmployeeID int IDENTITY(1,1) NOT NULL PRIMARY KEY NONCLUSTERED,
Title varchar(4) NOT NULL,
FirstName nvarchar(25) NOT NULL,
LastName nvarchar(25) NOT NULL,
Email nvarchar(50) NOT NULL,
Telephone nvarchar(15) NULL,
DateOfBirth date NULL,
StartTime datetime2(0) GENERATED ALWAYS AS ROW START HIDDEN
CONSTRAINT DF_EmpStartTime DEFAULT SYSUTCDATETIME() NOT NULL,
EndTime datetime2(0) GENERATED ALWAYS AS ROW END HIDDEN
CONSTRAINT DF_EmpEndTime DEFAULT CONVERT(datetime2 (0), '9999-12-31 23:59:59')
NOT NULL,
PERIOD FOR SYSTEM_TIME (StartTime, EndTime)
)
WITH
(
MEMORY_OPTIMIZED = ON,
DURABILITY = SCHEMA_AND_DATA,
SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.EmployeeHistory)
);

Working with the System-Versioned Memory-Optimized Table


Query execution performance will still be fast, despite the history table residing on disk because of the
internal, auto-generated memory-optimized staging table that stores recent history. This means you can
run queries from natively compiled code. SQL Server regularly moves data from this staging table to the
disk-based history table using an asynchronous data flush task. The data flush aims to keep the internal
memory buffers at less than 10 percent of the memory consumption of the parent object. If the workload
is light, this runs every minute, but may be as frequent as every five seconds for a heavy workload.
3-24 Advanced Table Designs

Note: You can force a data flush to run by executing sp_xtp_flush_temporal_history


command, and passing in the name of the schema and table:
sys.sp_xtp_flush_temporal_history @schema_name, @object_name.

If you execute a SELECT query against a temporal table, all rows returned will be current data. The SELECT
clause is exactly the same query as you would use with a standard user table. To query data in your
temporal table for a given point in time, include the FOR SYSTEM_TIME clause with one of the five
subclauses for setting the datetime boundaries:
1. AS OF <date_time> accepts a single datetime parameter and returns the state of the data for the
specified point in time.
2. FROM <start_date_time> TO <end_date_time> returns all current and historical rows that were active
during the timespan, regardless of whether they were active before or after those times. The results
will include rows that were active precisely on the lower boundary defined by the FROM date;
however, it excludes rows that became inactive on the upper boundary defined by the TO date.

3. BETWEEN <start_date_time> AND <end_date_time> is identical to the FROM TO subclause, except


that it includes rows that were active on the upper time boundary.
4. CONTAINED IN (<start_date_time>, <end_date_time>) returns the values for all rows that were
opened and closed within the specified timespan. Rows that became active exactly on the lower
boundary, or became inactive exactly on the upper boundary, are included.

5. ALL returns all data from the current and historical tables with no restrictions.
The AS OF subclause returns the data at a given point in time. The following code uses the datetime2
format to return all employees on a specific date—in this case June 1, 2015—and active at 09:00.

Querying a Temporal Table Using the FOR SYSTEM_TIME AS OF Clause


SELECT * FROM dbo.Employee
FOR SYSTEM_TIME AS OF '2015-06-01 09:00:00';

Best Practice: If you want to return just historical data, use the CONTAINED IN subclause
for the best performance, as this only uses the history table for querying.

The FOR SYSTEM_TIME clause can be used to query both disk-based and memory-optimized temporal
tables. For more detailed information on querying temporal tables, see Microsoft Docs:

Querying Data in a System-Versioned Temporal Table

http://aka.ms/y1w3oq
Developing SQL Databases 3-25

Demonstration: Adding System-Versioning to an Existing Table


In this demonstration, you will see how:

 System-versioning can be added to an existing table.

 Changes to the data are stored in the temporal table.

Demonstration Steps
Adding System-Versioning to an Existing Table

1. In SQL Server Management Studio, in Solution Explorer, open the 3 - Temporal Tables.sql script file.

2. Select and execute the query under Step 1 to use the AdventureWorks database.

3. Select and execute the query under Step 2 Add the two date range columns, to add the two
columns, StartDate and EndDate, to the Person.Person table.

4. Select and execute the query under Step 2 Enable system-versioning, to alter the table and add
system-versioning.

5. In Object Explorer, expand Databases, expand AdventureWorks2016, right-click Tables, and click
then Refresh.

6. In the list of tables and point out the Person.Person table. The name includes (System-Versioned).
7. Expand the Person.Person (System-Versioned) table node to display the history table. Point out the
name of the table included (History).

8. Expand the Person.Person_History (History) node, and then expand Columns. Point out that the
column names are identical to the current table.

9. Select and execute the query under Step 4 to update the row in the Person.Person table for
BusinessEntityID 1704.

10. Select and execute the query under Step 5 to show the history of changes for BusinessEntityID 1704.

11. Close SQL Server Management Studio without saving anything.


3-26 Advanced Table Designs

Check Your Knowledge


Question

Which of the following statements is


incorrect?

Select the correct answer.

A temporal table must have two


period columns, one for the start
time, one for the end time.

If you include the HIDDEN keyword


when specifying the period columns
on a temporal table, these columns
cannot be included in a SELECT query.

The FOR SYSTEM_TIME clause is used


to query historical data.

You can add system-versioning to a


memory-optimized table.

The history table for a system-


versioned memory-optimized table is
stored on disk.
Developing SQL Databases 3-27

Lab: Using Advanced Table Designs


Scenario
You are a database developer for Adventure Works who will be designing solutions using corporate
databases stored in SQL Server. You have been provided with a set of business requirements and will
implement partitioning to archive older data, and data compression to obtain optimal performance and
storage from your tables.

Objectives
After completing the lab exercises, you will be able to:

 Create partitioned tables.

 Compress data in tables.

Estimated Time: 30 minutes

Virtual machine: 20762C-MIA-SQL

User name: ADVENTUREWORKS\Student


Password: Pa55w.rd

Exercise 1: Partitioning Data


Scenario
You have created the tables for your business analyst, but believe that the solution could be improved by
implementing additional functionality. You will implement partitioned tables to move historical content to
an alternative partition.

The main tasks for this exercise are as follows:

1. Prepare the Lab Environment

2. Create the HumanResources Database

3. Implement a Partitioning Strategy

4. Test the Partitioning Strategy

 Task 1: Prepare the Lab Environment


1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are both running, and then
log on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.

2. Run Setup.cmd in the D:\Labfiles\Lab03\Starter folder as Administrator.

 Task 2: Create the HumanResources Database


1. Using SSMS, connect to MIA-SQL using Windows authentication.

2. Open the project file D:\Labfiles\Lab03\Starter\Project\Project\Project.ssmssln, and the T-SQL


script Lab Exercise 1.sql. Ensure that you are connected to the TSQL database.
3. Create the HumanResources database.

 Task 3: Implement a Partitioning Strategy


1. In Solution Explorer, open the query Lab Exercise 2.sql.

2. Create four filegroups for the HumanResources database: FG0, FG1, FG2, FG3.
3-28 Advanced Table Designs

3. Create a partition function named pfHumanResourcesDates in the HumanResources database to


partition the data by dates.
4. Using the partition function, create a partition scheme named psHumanResources to use the four
filegroups.

5. Create a Timesheet table that will use the new partition scheme.
6. Insert some data into the Timesheet table.

 Task 4: Test the Partitioning Strategy


1. In Solution Explorer, open the query Lab Exercise 3.sql.

2. Type and execute a Transact-SQL SELECT statement that returns all of the rows from the Timesheet
table, along with the partition number for each row. You can use the $PARTITION function to
achieve this.

3. Type and execute a Transact-SQL statement to view partition metadata.


4. Create a staging table called Timesheet_Staging on FG1. This should be identical to the Timesheet
table.
5. Add a check constraint to the Timesheet_Staging table to ensure that the values in the
RegisteredStartTime column meet the following criteria:

o All values must be greater than or equal to 2011-10-01 00:00.

o All values must be less than 2012-01-01 00:00.


o No values can be NULL.

6. Type a Transact-SQL statement to switch out the data in the partition on the filegroup FG1 to the
table Timesheet_Staging. Use the $PARTITION function to retrieve the partition number.

7. View the metadata for the partitioned table again to see the changes, and then write and execute a
SELECT statement to view the rows in the Timesheet_Staging table.

8. Type a Transact-SQL statement to merge the first two partitions, using the value 2011-10-01 00:00.

9. View the metadata for the partitioned table again to see the changes.

10. Type a Transact-SQL statement to make FG1 the next used filegroup for the partition scheme.

11. Type a Transact-SQL statement to split the first empty partition, using the value 2012-07-01 00:00.

12. Type and execute a Transact-SQL statement to add two rows for the new period.

13. View the metadata for the partitioned table again to see the changes.

Results: At the end of this lab, the timesheet data will be partitioned to archive old data.
Developing SQL Databases 3-29

Exercise 2: Compressing Data


Scenario
The business analyst is satisfied with the partitioning concept you have applied to the Timesheet table. To
put the table into production, you will rework the partition to widen the time boundaries to
accommodate data over a number of years. After you have populated the Timesheet table, you will
decide which compression type to apply to each partition.

The main tasks for this exercise are as follows:

1. Create Timesheet Table for Compression

2. Analyze Storage Savings with Compression

3. Compress Partitions

 Task 1: Create Timesheet Table for Compression


1. In Solution Explorer, open the query Lab Exercise 4.sql.

2. Type and execute a T-SQL SELECT statement that drops the Payment.Timesheet table.

3. Type and execute a T-SQL SELECT statement that drops the psHumanResources partition scheme.

4. Type and execute a T-SQL SELECT statement that drops the pfHumanResourcesDates partition
function.

5. Type and execute a T-SQL SELECT statement that creates the pfHumanResourcesDates partition
function, using RANGE RIGHT for the values: 2012-12-31 00:00:00.000, 2014-12-31 00:00:00.000,
and 2016-12-31 00:00:00.000.
6. Type and execute a T-SQL SELECT statement that creates the pfHumanResourcesDates partition
scheme, using the filegroups FG0, FG2, FG3, and FG1.

7. Type and execute a T-SQL SELECT statement that creates a Payment.Timesheet table, using the
pfHumanResourcesDates partition scheme.

8. Type and execute a T-SQL SELECT statement that adds staff to three shifts, over the course of six
years. Exclude weekend dates.

 Task 2: Analyze Storage Savings with Compression


1. In Solution Explorer, open the query Lab Exercise 5.sql.
2. Type and execute a T-SQL SELECT statement to view the partition metadata for the
Payment.Timesheet table.
3. Type and execute a T-SQL SELECT statement to view the estimated savings when applying ROW and
PAGE compression on the Payment.Timesheet table.

 Task 3: Compress Partitions


1. In Solution Explorer, open the query Lab Exercise 6.sql.
2. Type and execute a T-SQL SELECT statement to partition the Payment.Timesheet table. Partitions
FG0 and FG1 should use ROW compression, and partitions FG2 and FG3 should use PAGE
compression.

3. Close SSMS without saving anything.

Results: At the end of this lab, the Timesheet table will be populated with six years of data, and will be
partitioned and compressed.
3-30 Advanced Table Designs

Question: Discuss scenarios that you have experienced where you think partitioning would
have been beneficial. Have you worked with databases that could have had older data
archived? Were the databases large enough to split the partitions across physical drives for
better performance, or to quicken the backup process? Furthermore, could any of this data
be compressed? Give reasons for your answers.
Developing SQL Databases 3-31

Module Review and Takeaways


In this module, you learned:

 How to plan and implement partitioning.

 How to apply data compression to reduce storage of your data, and increase query performance.

 The benefits of using temporal tables to record all changes to your data.

Best Practice: One of the disadvantages of partitioning is that it can be complicated to set
up. However, you can use the Developer Edition to replicate your production systems and test
your partitioning scenario before applying it in your live environment. As with any major
database changes, it is always recommended that you take a backup before applying these
changes.

Review Question(s)
Question: What are the advantages of using system-versioning versus a custom-built
application to store data changes?
4-1

Module 4
Ensuring Data Integrity Through Constraints
Contents:
Module Overview 4-1
Lesson 1: Enforcing Data Integrity 4-2

Lesson 2: Implementing Data Domain Integrity 4-6

Lesson 3: Implementing Entity and Referential Integrity 4-11


Lab: Ensuring Data Integrity Through Constraints 4-22

Module Review and Takeaways 4-25

Module Overview
The quality of data in your database largely determines the usefulness and effectiveness of applications
that rely on it—the success or failure of an organization or a business venture could depend on it.
Ensuring data integrity is a critical step in maintaining high-quality data.

You should enforce data integrity at all levels of an application from first entry or collection through
storage. Microsoft® SQL Server® data management software provides a range of features to simplify the
job.

Objectives
After completing this module, you will be able to:
 Describe the options for enforcing data integrity, and the levels at which they should be applied.

 Implement domain integrity through options such as check, unique, and default constraints.

 Implement referential integrity through primary and foreign key constraints.


4-2 Ensuring Data Integrity Through Constraints

Lesson 1
Enforcing Data Integrity
Data integrity refers to the consistency and accuracy of data that is stored in a database. An important
step in database planning is deciding the best way to enforce this.

Lesson Objectives
After completing this lesson, you will be able to:

 Explain how data integrity checks apply across different layers of an application.

 Describe the difference between domain and referential integrity.

 Explain the available options for enforcing each type of data integrity.

Data Integrity Across Application Layers

Application Levels
Applications often have a three-tier hierarchical
structure. This keeps related functionality together
and improves the maintainability of code, in
addition to improving the chance of code being
reusable. Common examples of application levels
are:
 User interface level.

 Middle tier (sometimes referred to as business


logic).
 Data tier.

You can implement data integrity checks at each of these levels.

User Interface Level


There are some advantages of enforcing integrity at the user interface level. The responsiveness to the
user may be higher because it can trap minor errors before any service requests from other layers of code.
Error messages may be clearer because the code is more aware of which user action caused the error.

The main disadvantage of enforcing integrity at the user interface level is that more than a single
application might have to work with the same underlying data, and each application might enforce the
rules differently. It is also likely to require more lines of code to enforce business rule changes than may
be required at the data tier.

Middle Tier
Many integrity issues highlighted by the code are implemented for the purposes of business logic and
functional requirements, as opposed to checking the nonfunctional aspects of the requirements, such as
whether the data is in the correct format. The middle tier is often where the bulk of those requirements
exist in code, because they can apply to more than one application. In addition, multiple user interfaces
often reuse the middle tier. Implementing integrity at this level helps to avoid different user interfaces
applying different rules and checks at the user interface level. At this level, the logic is still quite aware of
Developing SQL Databases 4-3

the functions that cause errors, so the error messages generated and returned to the user can still be quite
specific.
It is also possible for integrity checks enforced only in the middle tier to compromise the integrity of the
data through a mixture of transactional inconsistencies due to optimistic locking, and race conditions,
caused by the multithreaded nature of programming models these days. For example, it might seem easy
to check that a customer exists and then place an order for that customer. Consider, though, the
possibility that another user could remove the customer between the time that you check for the
customer's existence and the time that you record the order. The requirement for transactional
consistency leads to the necessity for relational integrity of the data elements, which is where the services
of a data layer become imperative.

Data Tier
The advantage of implementing integrity at the data tier is that upper layers cannot bypass it. In
particular, multiple applications accessing the data simultaneously cannot compromise its quality—there
may even be multiple users connecting through tools such as SQL Server Management Studio (SSMS). If
referential integrity is not enforced at the data tier level, all applications and users need to individually
apply all the rules and checks themselves to ensure that the data is correct.
One of the issues with implementing data integrity constraints at the data tier is the separation between
the user actions that caused the errors to occur, and the data tier. This can lead to error messages being
precise in describing an issue, but difficult for an end user to understand unless the programmer has
ensured that appropriate functional metadata is passed between the system tiers. The cryptic nature of
the messages produced by the data tier has to be reprocessed by upper layers of code before
presentation to the end user.

Multiple Tiers
The correct solution in most situations involves applying rules and checks at multiple levels. However, the
challenge with this approach is in maintaining consistency between the rules and checks at different
application levels.

Types of Data Integrity


There are three basic forms of data integrity
commonly enforced in database applications:
domain integrity, entity integrity, and referential
integrity.

Domain Integrity
At the lowest level, SQL Server applies constraints
for a domain (or column) by limiting the choice of
data that can be entered, and whether nulls are
allowed. For example, if you only want whole
numbers to be entered, and don’t want alphabetic
characters, specify the INT (integer) data type.
Equally, assigning a TINYINT data type ensures
that only values from 0 to 255 can be stored in that column.

A check constraint can specify acceptable sets of data values, and what default values will be supplied in
the case of missing input.
4-4 Ensuring Data Integrity Through Constraints

Entity Integrity
Entity or table integrity ensures that each row within a table can be identified uniquely. This column (or
columns in the case of a composite key) is known as the table’s primary key. Whether the primary key
value can be changed or whether the whole row can be deleted depends on the level of integrity that is
required between the primary key and any other tables, based on referential integrity.

This is where the next level of integrity comes in, to ensure the changes are valid for a given relationship.

Referential Integrity
Referential integrity ensures that the relationships among the primary keys (in the referenced table) and
foreign keys (in the referencing tables) are maintained. You are not permitted to insert a value in the
referencing column that does not exist in the referenced column in the target table. A row in a referenced
table cannot be deleted, nor can the primary key be changed, if a foreign key refers to the row unless a
form of cascading action is permitted. You can define referential integrity relationships within the same
table or between separate tables.

As an example of referential integrity, you may have to ensure that an order cannot be placed for a
nonexistent customer.

Options for Enforcing Data Integrity


The table on the slide summarizes the mechanisms
that SQL Server provides for enforcing data
integrity.

Data Types
The first option for enforcing data integrity is to
ensure that only the correct type of data is stored
in a given column. For example, you cannot place
alphabetic characters into a column that has been
defined as storing integers.
The choice of a data type will also define the
permitted range of values that can be stored. For
example, the smallint data type can only contain values from –32,768 to 32,767.
For XML data (which is discussed in Module 14) XML schemas can be used to further constrain the data
that is held in the XML data type.

Null-ability Constraint
This determines whether a column can store a null value, or whether a value must be provided. This is
often referred to as whether a column is mandatory or not.

Default Values
If a column has been defined to not allow nulls, then a value must be provided whenever a new row is
inserted. With a default value, you can ignore the column during input and a specific value will be
inserted into the column when no value is supplied.

Check Constraint
Constraints are used to limit the permitted values in a column further than the limits that the data type,
null-ability and a default provides. For example, a tinyint column can have values from 0 to 255. You
might decide to further constrain the column so that only values between 1 and 9 are permitted.
Developing SQL Databases 4-5

You can also apply constraints at the table level and enforce relationships between the columns of a table.
For example, you might have a column that holds an order number, but it is not mandatory. You might
then add a constraint that specifies that the column must have a value if the Salesperson column also has
a value.

Triggers
Triggers are procedures, somewhat like stored procedures, that are executed whenever specific events
such as an INSERT or UPDATE occur. Triggers are executed automatically whenever a specific event
occurs. Within the trigger code, you can enforce more complex rules for integrity. Triggers are discussed
in Module 11.

Objects from Earlier Versions


Early versions of SQL Server supported objects called rules and defaults. Note that defaults were a type of
object and not the same as DEFAULT constraints. Defaults were separate objects that were then bound to
columns. They were reused across multiple columns.
These objects have been deprecated because they were not compliant with Structured Query Language
(SQL) standards. Code that is based on these objects should be replaced. In general, you should replace
rules with CHECK constraints and defaults with DEFAULT constraints.

Sequencing Activity
Put the following constraint types in order by numbering each to indicate the order of importance to
minimize constraint checking effort.

Steps

Specify data type.

Indicate column
null-ability.

Indicate column
default value.

Indicate a check
constraint.

Write a trigger to
control the
column contents.
4-6 Ensuring Data Integrity Through Constraints

Lesson 2
Implementing Data Domain Integrity
Domain integrity limits the range and type of values that can be stored in a column. It is usually the most
important form of data integrity when first designing a database. If domain integrity is not enforced,
processing errors can occur when unexpected or out-of-range values are encountered.

Lesson Objectives
After completing this lesson, you will be able to:

 Describe how you can use data types to enforce basic domain integrity.

 Describe the default value’s null-ability when entering the domain set as a valid value.

 Describe how you can use an additional DEFAULT constraint to provide a non-null default value for
the column.

 Describe how you can use CHECK constraints to enforce domain integrity beyond a null and default
value.

Data Types
Choosing an appropriate data type for each
column is one of the most important decisions
that you must make when you are designing a
table as part of a database. Data types were
discussed in detail in Module 2.
You can assign data types to a column by using
one of the following methods:

 Using SQL Server system data types.


 Creating alias data types that are based on
system data types.

 Creating user-defined data types based on


data types created within the Microsoft .NET Framework common language runtime (CLR).

System Data Types


SQL Server supplies a system-wide range of built-in data types. Choosing a data type determines both the
type of data that can be stored and the range of values that is permitted within the column.

Alias Data Types


Inconsistency between column data types can cause problems. The situation is exacerbated when more
than one person has designed the tables. For example, you may have several tables that store the weight
of a product that was sold. One column might be defined as decimal(18,3), another column might be
defined as decimal(12,2), and a third column might be defined as decimal(16,5). For consistency, alias data
types can create a data type called ProductWeight, and define it as decimal(18,3), which you can then
implement as the data type for all of the columns. This can lead to more consistent database designs.
Developing SQL Databases 4-7

An additional advantage of alias data types is that code generation tools can create more consistent code
when the tools have the additional information about the data types that alias data types provide. For
example, you could decide to have a user interface design program that always displayed and/or
prompted for product weights in a specific way.

User-Defined Data Types


The addition of managed code to SQL Server 2005 made it possible to create entirely new data types.
Although alias data types are user-defined, they are still effectively subsets of the existing system data
types. With user-defined data types that are created in managed code, you can define not only the data
that is stored in a data type, but also the behavior of the data type. For example, you can design a JPEG
data type. Besides designing how it will store images, you can define it to be updated by calling a
predesigned method. Designing user-defined data types is discussed in more detail in Module 13.

DEFAULT Constraints
A DEFAULT constraint provides a value for a
column when no value is specified in the
statement that inserted the row. You can view the
existing definition of DEFAULT constraints by
querying the sys.default_constraints view.

DEFAULT Constraint
Sometimes a column is mandatory—that is, a
value must be provided. However, the application
or program that is inserting the row might not
provide a value. In this case, you may want to
apply a value to ensure that the row will be
inserted.
DEFAULT constraints are associated with a table column. They are used to provide a default value for the
column when the user does not supply a value. The value is retrieved from the evaluation of an expression
and the data type that the expression returns must be compatible with the data type of the column.

Nullable Columns and DEFAULT Constraint Coexistence


Without a DEFAULT constraint, if no value is provided for the column in the statement that inserted the
row, the column value would be set to NULL. If a NOT NULL constraint has been applied to the column,
the insertion would fail. With a DEFAULT constraint, the default value would be used instead. DEFAULT
constraints might be used to insert the current date or the identity of the user inserting the data into the
table.

Note: If the statement that inserted the row explicitly inserted NULL, the default value
would not be used.

Named Constraints
SQL Server does not require you to supply names for constraints that you create. If a name is not supplied,
SQL Server will automatically generate a name. However, the names that are generated are not very
intuitive. Therefore, it is generally considered a good idea to provide names for constraints as you create
them—and to do so using a naming standard.
4-8 Ensuring Data Integrity Through Constraints

A good example of why naming constraints is important is that, if a column needs to be deleted, you must
first remove any constraints that are associated with the column. Dropping a constraint requires you to
provide a name for the constraint that you are dropping. Having a consistent naming standard for
constraints helps you to know what that name is likely to be, rather than having to execute a query to find
the name. Locating the name of a constraint would involve querying the sys.constraints system view,
searching in Object Explorer, or selecting the relevant data from the INFORMATION_SCHEMA.
CONSTRAINTS catalogue view.

The following code shows a named default constraint:

Named Default Constraint


CREATE TABLE dbo.SalesOrder
(
OpportunityID int,
ReceivedDate date NOT NULL
CONSTRAINT DF_SalesOrder_Date DEFAULT (SYSDATETIME()),
ProductID int NOT NULL,
Salesperson1D int NOT NULL
);

CHECK Constraints
A CHECK constraint limits the values that a column
can accept by controlling the values that can be
put in the column.

After determining the data type, and whether it


can be null, you may want to further restrict the
values that can be placed into the column. For
example, you might decide that a varchar(7)
column must be five characters long if the first
character is the letter A.

More commonly, CHECK constraints are used as a


form of “sanity” check. For example, you might
decide that a salary needs to be within a certain
range, or a person’s age must be in the range 0 to 130.

Logical Expression
CHECK constraints work with any logical (Boolean) expression that can return TRUE, FALSE, or
UNKNOWN. Particular care must be given to any expression that could have a NULL return value. CHECK
constraints reject values that evaluate to FALSE, but not an unknown return value, which is what NULL
evaluates to.
Developing SQL Databases 4-9

Create table with a check constraint.

Check Constraint
CREATE TABLE Sales.opportunity
(
opportunity1D int NOT NULL,
requirements nvarchar(50) NOT NULL,
salesperson1D int NOT NULL,
rating int NOT NULL
CONSTRAINT CK_Opportunity_Rating1to4
CHECK (Rating BETWEEN 1 AND 4)
);

For more information about column level constraints, see column_constraint (Transact SQL) in Microsoft
Docs:

column_constraint (Transact SQL)


http://aka.ms/b9ty9o

Table-Level CHECK Constraints


Apart from checking the value in a particular column, you can apply CHECK constraints at the table level
to check the relationship between the values in more than a single column from the same table. For
example, you could decide that the FromDate column should not have a larger value than the ToDate
column in the same row.
For more information about table-level constraints, see Microsoft Docs:

ALTER TABLE table_constraint


http://aka.ms/peaqqm

Demonstration: Data and Domain Integrity


In this demonstration, you will see how to:

 Enforce data and domain integrity.

Demonstration Steps
Enforce Data and Domain Integrity

1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running and then log
on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.

2. Run D:\Demofiles\Mod04\Setup.cmd as an administrator.

3. In the User Account Control dialog box, click Yes.

4. On the taskbar, click Microsoft SQL Server Management Studio.

5. In the Connect to Server dialog box, in the Server name box, type MIA-SQL and then click
Connect.

6. On the File menu, point to Open, click Project/Solution.

7. In the Open Project dialog box, navigate to D:\Demofiles\Mod04, click Demo04.ssmssln, and then
click Open.

8. In Solution Explorer, expand the Queries folder, and double-click 21 - Demonstration 2A.sql.
4-10 Ensuring Data Integrity Through Constraints

9. Familiarize yourself with the requirement using the code below Step 1: Review the requirements
for a table design.
10. Place the pseudo code for your findings for the requirements below Step 2: Determine the data
types, null-ability, default and check constraints that should be put in place.

11. Highlight the code below Step 3: Check the outcome with this proposed solution, and click
Execute.

12. Highlight the code below Step 4: Execute statements to test the actions of the integrity
constraints, and click Execute.

13. Highlight the code below Step 5: INSERT rows that test the nullability and constraints, and click
Execute. Note the errors.

14. Highlight the code below Step 6: Query sys.sysconstraints to see the list of constraints, and click
Execute.

15. Highlight the code below Step 7: Explore system catalog views through the
INFORMATION_SCHEMA owner, and click Execute.
16. Close SQL Server Management Studio, without saving any changes.

Verify the correctness of the statement by placing a mark in the column to the right.

Statement Answer

True or
false?
When you
have a
check
constraint
on a
column, it
is not
worth
having a
NOT
NULL
constraint
because
any nulls
will be
filtered
out by the
check
constraint.
Developing SQL Databases 4-11

Lesson 3
Implementing Entity and Referential Integrity
It is important to be able identify rows within tables uniquely and to be able to establish relationships
across tables. For example, if you have to ensure that an individual can be identified as an existing
customer before an order can be placed, you can enforce this by using a combination of entity and
referential integrity.

Lesson Objectives
After completing this lesson, you will be able to:

 Explain how PRIMARY KEY constraints enforce entity integrity.

 Describe the use of the IDENTITY property for Primary Keys.

 Describe the use of sequences for primary keys.

 Describe how UNIQUE constraints are sometimes used instead of PRIMARY KEY constraints.

 Explain how FOREIGN KEY constraints enforce referential integrity.

 Describe how data changes cascade through relationships.

 Explain the common considerations for constraint checking.

PRIMARY KEY Constraints


PRIMARY KEY constraints were introduced in
Module 2, and are used to uniquely identify each
row in a table. They must be unique, not NULL,
and may involve multiple columns. SQL Server will
internally create an index to support the PRIMARY
KEY constraint.
Remember that the term “candidate key” is used
to describe the column or combination of columns
that could uniquely identify a row of data within a
table. None of the columns that are part of a
candidate key are permitted to be nullable.

In the following example, the OpportunityID


column has been chosen as the primary key.

As with other types of constraints, even though a name is not required when defining a PRIMARY KEY
constraint, it is preferable to choose a name for the constraint, rather than leaving SQL Server to do so.
4-12 Ensuring Data Integrity Through Constraints

The following code shows a primary key constraint:

Primary Key Constraint


CREATE TABLE sales.opportunity
(Opportunity1D int NOT NULL
CONSTRAINT PK_Opportunity PRIMARY KEY,
Requirements nvarchar(50) NOT NULL,
ReceivedDate date NOT NULL,
SalespersonID int NULL,
Rating int NOT NULL
);

For more information about how to create primary keys, see Create Primary Keys in the SQL Server
Technical Documentation:

Create Primary Keys


http://aka.ms/ulm1g6

UNIQUE Constraints
A UNIQUE constraint indicates that the column or
combination of columns is unique. One row can
be NULL (if the column null-ability permits this).
SQL Server will internally create an index to
support the UNIQUE constraint.

For example, in Spain, all Spanish citizens over the


age of 14 receive a national identity document
called a Documento Nacional de Identidad (DNI).
It is a unique number in the format 99999999-X
where 9 is a digit and X is a letter used as a
checksum of the digits. People from other
countries who need a Spanish identification
number are given a Número de Identidad de Extranjero (NIE), which has a slightly different format of X-
99999999-X.

If you were storing a tax identifier for employees in Spain, you would store one of these values, include a
CHECK constraint to make sure that the value was in one of the two valid formats, and have a UNIQUE
constraint on the column that stores these values. Note that this may be unrelated to the fact that the
table has another unique identifier, such as EmployeeID, used as a primary key for the table.
As with other types of constraints, even though a name is not required when defining a UNIQUE
constraint, you should choose a name for the constraint rather than leaving SQL Server to do so.

The following code shows a unique constraint:

Unique Constraint
CREATE TABLE Sales.Opportunity
(
OpportunityID int NOT NULL
CONSTRAINT PK_Opportunity PRIMARY KEY,
Requirements nvarchar(50) NOT NULL
CONSTRAINT UQ_Opportunity_Requirements UNIQUE,
ReceivedDate date NOT NULL
);
Developing SQL Databases 4-13

NULL and UNIQUE


Although it is possible for a column that is required to be unique to be null, a null key value is only valid
for a single row. In practice, this means that nullable unique columns are rare. A UNIQUE constraint is
used to ensure that more than one row does not have a single value, including NULL.

Create Unique Constraints


http://aka.ms/npf0a6

IDENTITY Constraints
It is common to need automatically generated
numbers for an integer primary key column. The
IDENTITY property on a database column indicates
that an INSERT statement will not provide the
value for the column; instead, SQL Server will
provide it automatically.

IDENTITY is a property that is typically associated


with int or bigint columns that provide
automated generation of values during insert
operations. You may be familiar with auto-
numbering systems or sequences in other
database engines. IDENTITY columns are not
identical to these, but you can use them to replace the functionality from those other database engines.

Adding the IDENTITY Qualifier to a Table Column


When you specify the identity property CustomerID INT IDENTITY (1, 10) as a qualifier for a numeric
column, you specify a seed (1, in this example) and an increment (10, in this example). The seed is the
starting value. The increment is how much the value goes up each time it is incremented. Both seed and
increment default to a value of 1 if they are not specified.

The following code adds the IDENTITY property to the OpportunityID column:

IDENTITY property
CREATE TABLE Sales.Opportunity
(
Opportunity1D int NOT NULL IDENTITY(1,1),
Requirements nvarchar(50) NOT NULL,
ReceivedDate date NOT NULL,
SalespersonID int NULL

);

Adding Rows with Explicit Values for the Identity Column

Although explicit inserts are not normally permitted for columns that have an IDENTITY property, you can
explicitly insert values. You can do this by using a table setting option, SET IDENTITY_INSERT customer
ON. With this option, you can explicitly insert values into the column with the IDENTITY property within
the customer table. Remember to switch the automatic generation back on after you have inserted
exceptional rows.
4-14 Ensuring Data Integrity Through Constraints

Note: Having the IDENTITY property on a column does not ensure that the column is
unique. Define a UNIQUE constraint to guarantee that values in the column will be unique.

Retrieving the Inserted Identity Value


After inserting a row into a table, you often have to know the value that was placed into the column with
the IDENTITY property. The syntax SELECT @@IDENTITY returns the last identity value that was used
within the session, in any scope. This can be useful when triggers perform inserts on another table with an
IDENTITY column as part of an INSERT statement.

For example, if you insert a row into a customer table, the customer might be assigned a new identity
value. However, if a trigger on the customer table caused an entry to be written into an audit logging
table, when inserts are performed, the @@IDENTITY variable would return the identity value from the
audit logging table, rather than the one from the customer table.

To deal with this effectively, the SCOPE_IDENTITY() function was introduced. It provides the last identity
value but only within the current scope. In the previous example, it would return the identity value from
the customer table.

Retrieving the Identities of a Multirow INSERT


Another complexity relates to multirow inserts. In this situation, you may want to retrieve the IDENTITY
column value for more than one row at a time. Typically, this would be implemented by the use of the
OUTPUT clause on the INSERT statement.

IDENTITY (Property) (Transact-SQL)


http://aka.ms/ik8k5i

Working with Sequences

Sequences
Sequences are another way of creating values for
insertion into a column as sequential numbers.
However, unlike IDENTITY properties, sequences
are not tied to any specific table. This means that
you could use a single sequence to provide key
values for a group of tables.

Sequences can be cyclic. They can return to a low


value when a specified maximum value has been
exceeded.

In the example, a sequence called BookingID is


created in the Booking schema. This means that you can have the same name for different sequences
within different schemas. The sequence is defined as generating integer values. By default, sequences
generate bigint values.
Values from sequences are retrieved by using the NEXT VALUE FOR clause. In the example, the sequence
is being used to provide the default value for the FlightBookingID column in the Booking.FlightBooking
table.

Sequences are created by the CREATE SEQUENCE statement, modified by the ALTER SEQUENCE
statement, and deleted by the DROP SEQUENCE statement.
Developing SQL Databases 4-15

Other database engines provide sequence values, so the addition of sequence support in SQL Server 2012
and SQL Server 2014 can assist with migrating code to SQL Server from other database engines.
A range of sequence values can be retrieved in a single call via the sp_sequence_get_range system stored
procedure. There are also options to cache sets of sequence values to improve performance. However,
when a server failure occurs, the entire cached set of values is lost.
Values that are retrieved from the sequence are not available for reuse. This means that gaps can occur in
the set of sequence values.

The following code shows how to create and use a sequence object:

SEQUENCE
CREATE SEQUENCE Booking.Booking1D AS INT
START WITH 20001
INCREMENT BY 10;
GO

CREATE TABLE Booking.FlightBooking


(
FlightBooking1D INT NOT NULL PRIMARY KEY CLUSTERED DEFAULT
(NEXT VALUE FOR Booking.BookingID)
...

For more information about sequences, see Sequence Properties (General Page) in Microsoft Docs:

Sequence Properties (General Page)


http://aka.ms/dga6do

Demonstration: Sequences Demonstration


In this demonstration, you will see how to:

 Work with identity constraints, create a sequence, and use a sequence to provide key values for two
tables.

Demonstration Steps
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running and then log
on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.

2. Run D:\Demofiles\Mod04\Setup.cmd as an administrator.

3. In the User Account Control dialog box, click Yes.

4. On the taskbar, click Microsoft SQL Server Management Studio.

5. In the Connect to Server dialog box, in the Server name box, type MIA-SQL and then click
Connect.

6. On the File menu, point to Open, click Project/Solution.

7. In the Open Project dialog box, navigate to D:\Demofiles\Mod04, click Demo04.ssmssln, and then
click Open.

8. In Solution Explorer, double-click the 31 - Demonstration 3A.sql script file.


4-16 Ensuring Data Integrity Through Constraints

9. Highlight the code below Step 1: Open a new query window to the tempdb database, and click
Execute.
10. Highlight the code below Step 2: Create the dbo.Opportunity table, and click Execute.

11. Highlight the code below Step 3: Populate the table with two rows, and click Execute.

12. Highlight the code below Step 4: Check the identity values added, and click Execute.

13. Highlight the code below Step 5: Try to insert a specific value for OpportunityID, and click
Execute. Note the error.

14. Highlight the code below Step 6: Add a row without a value for LikelyClosingDate, and click
Execute.

15. Highlight the code below Step 7: Query the table to see the value in the LikelyClosingDate
column, and click Execute.
16. Highlight the code below Step 8: Create 3 Tables with separate identity columns, and click
Execute.

17. Highlight the code below Step 9: Insert some rows into each table, and click Execute.
18. Highlight the code below Step 10: Query the 3 tables in a single view and note the overlapping
ID values, then click Execute.

19. Highlight the code below Step 11: Drop the tables, and click Execute.

20. Highlight the code below Step 12: Create a sequence to use with all 3 tables, and click Execute.

21. Highlight the code below Step 13: Recreate the tables using the sequence for default values, and
click execute.

22. Highlight the code below Step 14: Reinsert the same data, and click Execute.

23. Highlight the code below Step 15: Note the values now appearing in the view, and click execute.
24. Highlight the code below Step 16: Note that sequence values can be created on the fly, and click
Execute.

25. Highlight the code below Step 17: Re-execute the same code and note that the sequence values,
and click Execute.

26. Highlight the code below Step 18: Note that when the same entry is used multiple times in a
SELECT statement, that the same value is used, and click Execute.

27. Highlight the code below Step 19: Fetch a range of sequence values, and click Execute.

28. Close SQL Server Management Studio, without saving any changes.
Developing SQL Databases 4-17

FOREIGN KEY Constraints


The FOREIGN KEY constraint was introduced in
Module 2. Foreign keys are used to link two tables,
and create a relationship between them. As an
example, a relationship could be to ensure that a
customer exists for any products that are placed in
the orders table. This is automatically created
when you specify a foreign key in the orders table
referring to the primary key in the customers
table.

The column that the foreign key references must


be defined either as a primary key in the linked
table, or the column must have a unique
constraint. You cannot create a link to a NULL value, so the unique constraint must have a NOT NULL
constraint.
In Module 2, you also saw that the target table can be the same table, known as a self-referencing
relationship.

As with other types of constraints, even though a name is not required when defining a FOREIGN KEY
constraint, you should provide a name rather than leaving SQL Server to do so.
Defining a foreign key constraint.

Foreign Key Constraint


CREATE TABLE sales.Opportunity
(
Opportunity1D int NOT NULL CONSTRAINT PK_Opportunity PRIMARY KEY,
Requirements nvarchar(50) NOT NULL,
SalespersonID int NULL
CONSTRAINT FK_Opportunity_Salesperson
FOREIGN KEY REFERENCES sales.Salesperson(Salesperson1D)
);

WITH NOCHECK Option


When you add a FOREIGN KEY constraint to a column (or columns) in a table, SQL Server will check the
data that is already in the column to make sure that the reference to the target table is valid. However, if
you specify WITH NOCHECK, SQL Server does not apply the check to existing rows and will only check the
reference in future when rows are inserted or updated. The WITH NOCHECK option can be applied to
other types of constraints, too.

Permission Requirement
Before you can place a FOREIGN KEY constraint on a table, you must have CREATE TABLE and ALTER
TABLE permissions.

The REFERENCES permission on the target table avoids the situation where another user could place a
reference to one of your tables, leaving you unable to drop or substantially change your own table until
the other user removed that reference. However, in terms of security, remember that providing
REFERENCES permission to a user on a table for which they do not have SELECT permission does not
totally prevent them from working out what the data in the table is. This might be done by a brute force
attempt that involves trying all possible values.
4-18 Ensuring Data Integrity Through Constraints

Note: Changes to the structure of the referenced column are limited while it is referenced
in a FOREIGN KEY. For example, you cannot change the size of the column when the relationship
is in place.

Note: The NOCHECK applies to a foreign key constraint in addition to other constraints
defined on the table. This prevents the constraint from checking data that is already present. This
is useful if a constraint should be applied to all new records, but existing data does not have to
meet the criteria.

For more information about defining foreign keys, see Create Foreign Key Relationships in Microsoft Docs:

Create Foreign Key Relationships


http://aka.ms/l2m22c

Cascading Referential Integrity


The FOREIGN KEY constraint includes a facility to
make any change to a column value that defines a
UNIQUE or PRIMARY KEY constraint to propagate
the change to any foreign key values that
reference it. This action is referred to as cascading
referential integrity.

By using cascading referential integrity, you can


define the actions that SQL Server takes when a
user tries to update or delete a key column (or
columns) to which a FOREIGN KEY constraint
makes reference.
The action to be taken is separately defined for
UPDATE and DELETE actions and can have four values:
1. NO ACTION is the default. For example, if you attempt to delete a customer and there are orders for
the customer, the deletion will fail.

2. CASCADE makes the required changes to the referencing tables. If the customer is being deleted, his
or her orders will also be deleted. If the customer primary key is being updated (although note that
this is undesirable), the customer key in the orders table will also be updated so that the orders still
refer to the correct customer.

3. SET DEFAULT causes the values in the columns in the referencing table to be set to their default
values. This provides more control than the SET NULL option, which always sets the values to NULL.

4. SET NULL causes the values in the columns in the referencing table to be nullified. For the customer
and orders example, this means that the orders would still exist, but they would not refer to any
customer.
Developing SQL Databases 4-19

Caution
Although cascading referential integrity is easy to set up, you should be careful when using it within
database designs.

For example, if you used the CASCADE option in the example above, would it really be okay for the orders
for the customer to be removed when you remove the customer? When you remove the customer, you
also delete their orders. There might be other tables that reference the orders table (such as order details
or invoices), and these would also be removed if they had a cascade relationship set up.

Primary and Foreign Key Constraints


http://aka.ms/o5f11a

Considerations for Constraint Checking


There are a few considerations when you are
working with constraints.

Naming
Specify meaningful names for constraints rather
than leaving SQL Server to select a name. SQL
Server provides complicated system-generated
names. Often, you have to refer to constraints by
name. Therefore, it is better to have chosen them
yourself using a consistent naming convention.

Changing Constraints
You can create, alter, or drop constraints without
having to drop and recreate the underlying table.

You use the ALTER TABLE statement to add, alter, or drop constraints.

Error Checking in Applications


Even though you have specified constraints in your database layer, you may also want to check the same
logic in higher layers of code. Doing so will lead to more responsive systems because they will go through
fewer layers of code. It will also provide more meaningful errors to users because the code is closer to the
business-related logic that led to the errors. The challenge is in keeping the checks between different
layers consistent.

High-Performance Data Loading or Updates


When you are performing bulk loading or updates of data, you can often achieve better performance by
disabling CHECK and FOREIGN KEY constraints while performing the bulk operations and then re-
enabling them afterwards, rather than having them checked row by row during the bulk operation.
4-20 Ensuring Data Integrity Through Constraints

Demonstration: Entity and Referential Integrity


In this demonstration, you will see how to:

 Define entity integrity for tables.

 Define referential integrity for tables.

 Define cascading actions to relax the default referential integrity constraint.

Demonstration Steps
Define entity integrity for a table, define referential integrity for a table, and define cascading
referential integrity actions for the constraint.

1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running, and then log
on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.

2. Run D:\Demofiles\Mod04\Setup.cmd as an administrator.

3. In the User Account Control dialog box, click Yes.

4. On the taskbar, click Microsoft SQL Server Management Studio.


5. In the Connect to Server dialog box, in the Server name box, type MIA-SQL and then click
Connect.

6. On the File menu, point to Open, click Project/Solution.


7. In the Open Project dialog box, navigate to D:\Demofiles\Mod04, click Demo04.ssmssln, and then
click Open.

8. In Solution Explorer, double-click the 32 - Demonstration 3B.sql script file.


9. Highlight the code below Step 1: Open a new query window to tempdb, and click Execute.

10. Highlight the code below Step 2: Create the Customer and CustomerOrder tables, and click
Execute.
11. Highlight the code below Step 3: Select the list of customers, and click Execute.

12. Highlight the code below Step 4: Try to insert a CustomerOrder row for an invalid customer, and
click Execute. Note the error message.

13. Highlight the code below Step 5: Try to remove a customer that has an order, and click Execute.
Note the error message.

14. Highlight the code below Step 6: Replace it with a named constraint with cascade, and click
Execute.

15. Highlight the code below Step 7: Select the list of customer orders, try a delete again, and click
Execute.
16. Highlight the code below Step 8: Note how the cascade option caused the orders, and click
Execute.
17. Highlight the code below Step 9: Try to drop the referenced table and note the error, then click
Execute. Note the error message.

18. Close SQL Server Management Studio, without saving any changes.
Developing SQL Databases 4-21

Check Your Knowledge


Question

You want to set up a cascading


referential integrity constraint between
two tables that has the minimum
impact on queries that are only
interested in current data rather than
historic data. What would you use?

Select the correct answer.

ON DELETE CASCADE

ON DELETE RESTRICT

ON DELETE SET DEFAULT

ON DELETE SET NULL


4-22 Ensuring Data Integrity Through Constraints

Lab: Ensuring Data Integrity Through Constraints


Scenario
A table named Yield has recently been added to the Marketing schema within the database, but it has no
constraints in place. In this lab, you will implement the required constraints to ensure data integrity and, if
you have time, test that constraints work as specified.

Column Name
Data Type Required Validation Rule

OpportunityID Int Yes Part of the Primary key

ProspectID Int Yes Part of the key—also, prospect must exist

DateRaised datetime Yes Must be today’s date

Likelihood Bit Yes

Rating char(1) Yes

EstimatedClosingDate date Yes

EstimatedRevenue decimal(10,2) Yes

Objectives
After completing this lab, you will be able to:

 Use the ALTER TABLE statement to adjust the constraints on existing tables.

 Create and test a DEFAULT constraint.


 Create and test a CHECK constraint.

 Create and test a UNIQUE constraint.

 Create and test a PRIMARY KEY constraint.

 Create and test a Referential Integrity FOREIGN KEY constraint.

 Create and test a CASCADING REFERENTIAL INTEGRITY constraint for a FOREIGN KEY and a PRIMARY
KEY.

Estimated Time: 30 Minutes

Virtual machine: 20762C-MIA-SQL

User name: ADVENTUREWORKS\Student

Password: Pa55w.rd
Developing SQL Databases 4-23

Exercise 1: Add Constraints


Scenario
You have been given the design for a table called DirectMarketing.Opportunity. You must alter the table
with the appropriate constraints, based upon the provided specifications.

The main tasks for this exercise are as follows:

 Review the supporting documentation.

 Alter the DirectMarketing.Opportunity table.

The main tasks for this exercise are as follows:

1. Prepare the Lab Environment

2. Review Supporting Documentation

3. Alter the Direct Marketing Table

 Task 1: Prepare the Lab Environment


1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are both running, and then
log on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
2. On the taskbar, click File Explorer.

3. In File Explorer, navigate to the D:\Labfiles\Lab04\Starter folder, right-click the Setup.cmd file, and
then click Run as administrator.

4. In the User Account Control dialog box, click Yes, and then wait for the script to finish.

 Task 2: Review Supporting Documentation


 Review the table design requirements that were supplied in the scenario.

 Task 3: Alter the Direct Marketing Table


1. Work through the list of requirements and alter the table to make the columns required, based on
the requirements.

2. Work through the list of requirements and alter the table to make columns the primary key, based on
the requirements.

3. Work through the list of requirements and alter the table to make columns foreign keys, based on the
requirements.

4. Work through the list of requirements and alter the table to add DEFAULT constraints to columns,
based on the requirements.

Exercise 2: Test the Constraints


Scenario
You should now test each of the constraints that you designed to ensure that they work as expected.

The main tasks for this exercise are as follows:


 Test the default values and data types.

 Test the primary key.

 Test the foreign key reference on ProspectID.


4-24 Ensuring Data Integrity Through Constraints

The main tasks for this exercise are as follows:

1. Test the Data Types and Default Constraints

2. Test the Primary Key

3. Test to Ensure the Foreign Key is Working as Expected

 Task 1: Test the Data Types and Default Constraints


 Create a new query for the solution called ConstraintTesting.sql. Use this new connection to
Adventureworks to insert a row into the opportunity table using the following values, which are
organized by the columns as found within the table: [1,1,8,’A’,’12/12/2013’,123000.00].

 Task 2: Test the Primary Key


 Try to add the same row again to confirm that the primary key constraint is working to ensure entity
integrity—only unique rows can be added to the table

 Task 3: Test to Ensure the Foreign Key is Working as Expected


 Try to add some data for a prospect that does not exist, to confirm that the foreign key constraint is
working, to ensure relational integrity. Only nonunique rows are to be added to the table for foreign
key values that are uniquely available in the prospects table.

Results: After completing this exercise, you should have successfully tested your constraints.

Question: Why implement CHECK constraints if an application is already checking the input
data?

Question: What are some scenarios in which you might want to temporarily disable
constraint checking?

Verify the correctness of the statement by placing a mark in the column to the right.

Statement Answer

True or
false? A
PRIMARY
KEY and a
UNIQUE
constraint
are doing
the same
thing
using
different
code
words.
Developing SQL Databases 4-25

Module Review and Takeaways


Best Practice: When you create a constraint on a column, if you do not specify a name for
the constraint, SQL Server will generate a unique name for it. However, you should always name
constraints to adhere to your naming conventions. This makes the constraints easier to identify
when they appear in error messages. They are also easier to alter because you don’t have to
remember any arbitrary numbers that SQL Server uses to make the names unique when it
generates constraint names automatically.

Review Question(s)
Question: Would you consider that you need to CHECK constraints if an application is
already checking the input data?

Question: What are some scenarios in which you might want to temporarily disable
constraint checking?

You might also like