Module 1 2 3 4
Module 1 2 3 4
Module 1
An Introduction to Database Development
Contents:
Module Overview 1-1
Lesson 1: Introduction to the SQL Server Platform 1-2
Module Overview
Before beginning to work with Microsoft® SQL Server® in either a development or an administration
role, it is important to understand the scope of the SQL Server platform. In particular, it is useful to
understand that SQL Server is not just a database engine—it is a complete platform for managing
enterprise data.
SQL Server provides a strong data platform for all sizes of organizations, in addition to a comprehensive
set of tools to make development easier, and more robust.
Objectives
After completing this module, you will be able to:
Lesson 1
Introduction to the SQL Server Platform
Microsoft SQL Server data management software is a platform for developing business applications that
are data focused. Rather than being a single, monolithic application, SQL Server is structured as a series of
components. It is important to understand the use of each component.
You can install more than one copy of SQL Server on a server. Each copy is called an instance and you can
configure and manage them separately.
There are various editions of SQL Server, and each one has a different set of capabilities. It is important to
understand the target business cases for each, and how SQL Server has evolved through a series of
improving versions over many years. It is a stable and robust data management platform.
Lesson Objectives
After completing this lesson, you will be able to:
Describe the overall SQL Server platform.
Explain the role of each of the components that make up the SQL Server platform.
Database Engine
The storage engine manages access to data stored in the database, including how the data is physically
stored on disk, backups and restores, indexes, and more.
The query processor ensures that queries are formatted correctly; it plans how best to execute a query,
and then executes the query.
SQL Server is an integrated and enterprise-ready platform for data management that offers a low total
cost of ownership.
Enterprise Ready
SQL Server provides a very secure, robust, and stable relational database management system, but there is
much more to it than that. You can use SQL Server to manage organizational data and provide both
analysis of, and insights into, that data. Microsoft also provides other enterprise development
environments—for example, Visual Studio®—that have excellent integration and support for SQL Server.
The SQL Server Database Engine is one of the highest performing database engines available and
regularly features in the top tier of industry performance benchmarks. You can review industry
benchmarks and scores on the Transaction Processing Performance Council (TPC) website.
High Availability
Impressive performance is necessary, but not at the cost of availability. Many enterprises are now finding
it necessary to provide access to their data 24 hours a day, seven days a week. The SQL Server platform
was designed with the highest levels of availability in mind. As each version of the product has been
released, more capabilities have been added to minimize any potential downtime.
Security
Uppermost in the minds of enterprise managers is the requirement to secure organizational data. It is not
possible to retrofit security after an application or product has been created. From the outset, SQL Server
has been built with the highest levels of security as a goal. SQL Server includes encryption features such as
Always Encrypted, designed to protect sensitive data such as credit card numbers, and other personal
information.
Scalability
Organizations require data management capabilities for systems of all sizes. SQL Server scales from the
smallest needs, running on a single desktop, to the largest—a high-availability server farm—via a series of
editions that have increasing capabilities.
Cost of Ownership
Many competing database management systems are expensive both to purchase and to maintain. SQL
Server offers very low total cost of ownership. SQL Server tooling (both management and development)
builds on existing Windows® knowledge. Most users tend to quickly become familiar with the tools. The
productivity users can achieve when they use the various tools is enhanced by the high degree of
integration between them. For example, many of the SQL Server tools have links to launch and
preconfigure other SQL Server tools.
1-4 An Introduction to Database Development
Component Description
SQL Server A relational database engine based on Structured Query Language (SQL), the
Database core service for storing, processing, and securing data, replication, full-text
Engine search, tools for managing relational and XML data.
Analysis An online analytical processing (OLAP) engine that works with analytic cubes and
Services (SSAS) supports data mining applications.
Reporting Offers a reporting engine based on web services and provides a web portal and
Services (SSRS) end-user reporting tools. It is also an extensible platform that you can use to
develop report applications.
Integration Used to orchestrate the movement of data between SQL Server components and
Services (SSIS) other external systems. Traditionally used for extract, transform and load (ETL)
operations.
Master Data Provides tooling for managing master or reference data and includes hierarchies,
Services (MDS) granular security, transactions, data versioning, and business rules, as well as an
add-in for Excel® for managing data.
Data Quality With DQS, you can build a knowledge base and use it to perform data quality
Services (DQS) tasks, including correction, enrichment, standardization, and de-duplication of
data.
Replication A set of technologies for copying and distributing data and database objects
between multiple databases.
Developing SQL Databases 1-5
Alongside these server components, SQL Server provides the following management tools:
Tool Description
SQL Server Provides basic configuration management of services, client and server
Configuration Manager protocols, and client aliases.
SQL Server Profiler A graphical user interface to monitor and assist in the management of
performance of database engine and Analysis Service components.
Database Engine Provides guidance on, and helps to create, the optimal sets of indexes,
Tuning Advisor indexed views, and partitions.
Data Quality Services A graphical user interface that connects to a DQS server, and then
Client provides data cleansing operations and monitoring of their
performance.
SQL Server Data Tools An integrated development environment for developing business
(SSDT) intelligence (BI) solutions utilizing SSAS, SSRS, and SSIS.
Multiple Instances
Applications may require SQL Server configurations that are inconsistent or incompatible with the
server requirements of other applications. Each instance of SQL Server can be configured
independently.
You might want to support application databases with different levels of service, particularly in
relation to availability. To meet different service level agreements (SLAs), you can create SQL Server
instances to separate workloads.
You can install different versions of SQL Server side by side, using multiple instances. This can assist when
testing upgrade scenarios or performing upgrades.
Default and Named Instances
One instance can be the default instance on a database server; this instance will have no name.
Connection requests will connect to the default instance if it is sent to a computer without specifying an
instance name. There is no requirement to have a default instance, because you can name every instance.
All other instances of SQL Server require an instance name, in addition to the server name, and are known
as ‘‘named’’ instances. You cannot install all components of SQL Server in more than one instance. A
substantial change in SQL Server 2012 was the introduction of multiple instance support for SQL Server
Integration Services (SSIS).
You do not have to install SSDT more than once. A single installation of the tools can manage and
configure all installed instances.
Developing SQL Databases 1-7
Developer Includes all the capabilities of the Enterprise edition, licensed to be used in
development and test environments. It cannot be used as a production server.
Express Free database that supports learning and building small desktop data-driven
applications.
Azure® SQL Helps you to build database applications on a scalable and robust cloud
Database platform.
This is the Azure version of SQL Server.
1-8 An Introduction to Database Development
Early Versions
The earliest versions of SQL Server (1.0 and 1.1)
were based on the OS/2 operating system.
Later Versions
Version 7.0 saw a significant rewrite of the product. Substantial advances were made to reduce the
administration workload for the product. OLAP Services, which later became Analysis Services, was
introduced.
SQL Server 2000 featured support for multiple instances and collations. It also introduced support for
data mining. SSRS was introduced after the product release as an add-on enhancement, along with
support for 64-bit processors.
SQL Server 2005 provided support for non-relational data that was stored and queried as XML, and SSMS
was released to replace several previous administrative tools. SSIS replaced a tool formerly known as Data
Transformation Services (DTS). Dynamic management views (DMVs) and functions were introduced to
provide detailed health monitoring, performance tuning, and troubleshooting. Also, substantial high-
availability improvements were included in the product, and database mirroring was introduced.
SQL Server 2008 provided AlwaysOn technologies to reduce potential downtime. Database compression
and encryption technologies were added. Specialized date-related and time-related data types were
introduced, including support for time zones within date/time data. Full-text indexing was integrated
directly within the database engine. (Previously, full-text indexing was based on interfaces to services at
the operating system level.) Additionally, a Windows PowerShell® provider for SQL Server was introduced.
SQL Server 2008 R2 added substantial enhancements to SSRS—the introduction of advanced analytic
capabilities with PowerPivot; support for managing reference data with the introduction of Master Data
Services; and the introduction of StreamInsight, with which users could query data that was arriving at
high speed, before storing the data in a database.
SQL Server 2012 introduced tabular data models into SSAS. New features included: an enhancement of
FileTable called Filestream; Semantic Search, with which users could extract statistically relevant words; the
ability to migrate BI projects into Microsoft Visual Studio 2010.
SQL Server 2014 included substantial performance gains from the introduction of in-memory tables and
native stored procedures. It also increased integration with Microsoft Azure.
SQL Server 2016 was a major release and added three important security features: Always Encrypted,
dynamic data masking, and row-level security. This version also included stretch database to archive data
in Microsoft Azure, Query Store to maintain a history of execution plans, PolyBase to connect to Hadoop
data, temporal tables, and support for R, plus in-memory enhancements and columnstore indexes.
Developing SQL Databases 1-9
Current Version
SQL Server 2017 includes many fixes and enhancements, including:
SQL Graph. Enables many-to-many relationships to be modelled more easily. Extensions to Transact-
SQL includes new syntax to create tables as EDGES or NODES, and the MATCH keyword for querying.
Adaptive query processing. This is a family of features that help queries to run more efficiently.
They are batch mode adaptive joins, batch mode memory grant feedback, and interleaved
execution for multi-statement table valued functions..
Automatic database tuning. This allows query performance to either be fixed automatically, or to
provide insight into potential problems so that fixes can be applied.
SQL Server Analysis Services. This includes several enhancements for tabular models.
Machine Learning. R is now known as SQL Server Machine Learning and includes support for Python
as well as R. SQL Server 2017 includes
Compatibility
Businesses can run different versions of databases on an instance of SQL Server. Each version of SQL
Server can build and maintain databases created on previous versions of SQL Server. For example, SQL
Server 2016 can read and create databases at compatibility level 100; that is, databases created on SQL
Server 2008. The compatibility level specifies the supported features of the database. For more
information on compatibility levels, see Microsoft Docs:
Categorize Activity
Place each item into the appropriate category. Indicate your answer by writing the category number to
the right of each item.
Items
1 Database Engine
3 Enterprise
5 Connectivity
6 Developer
7 Replication
8 Profiler
9 Web
10 Integration Services
12 Standard
Lesson 2
SQL Server Database Development Tasks
Microsoft provides numerous tools and integrated development environments to support database
developers. This lesson investigates some of the common tasks undertaken by developers, how SQL Server
supports those tasks, and which tools you can use to complete them.
Lesson Objectives
After completing this lesson, you will be able to:
Indexes can also be added to tables to ensure good performance when querying the data. As the volume
of data becomes greater, ensuring good performance becomes important.
Processing data programmatically
SQL Server encapsulates business logic through the use of stored procedures and functions. Rather than
build business logic into multiple applications, client applications can call stored procedures and functions
to perform data operations. This centralizes business logic, and makes it easier to maintain.
SSIS supports more complex data processing in the form of extraction and transformation.
Data quality is maintained by placing constraints on columns in a table. For example, you can specify a
data type to restrict the type of data that can be stored. This constrains the column to only holding, for
example, integers, date and time, or character data types. Columns can be further constrained by
specifying the length of the data type, or whether or not it can be left empty (null).
1-12 An Introduction to Database Development
Primary keys and unique keys ensure a value is unique amongst other rows in the table. You can also link
tables together by creating foreign keys.
If a database is poorly designed, without using any of these data quality constraints, a developer might
have to inspect and resolve data issues—removing duplicate data or performing some kind of data
cleansing.
Securing data
These tasks are usually performed by a database administrator. However, developers often provide
guidance about who needs access to what data; the business requirements around availability and
scheduling of backups; and what form of encryption is required.
SSMS can connect to a variety of SQL Server services, including the Database Engine, Analysis Services,
Integration Services, and Reporting Services. SSMS uses the Visual Studio environment and will be familiar
to Visual Studio developers.
SSDT brings SQL Server functionality into Visual Studio. With SSDT, Visual Studio you can develop both
on-premises, and cloud-based applications using SQL Server components. You can work with .NET
Framework code and database-specific code, such as Transact-SQL, in the same environment. If you want
to change the database design, you do not have to leave Visual Studio and open SSMS; you can work with
the schema within SSDT.
Developing SQL Databases 1-13
SQL Operations Studio is a free, lighweight administration tool for SQL Server than runs on Windows,
Linux, and macOS. SQL Operations Studio offers many of the same features as SQL Server Management
Studio, and includes new features such as customizable dashboards that you can use to get an overview
of server performance. At the time of writing, SQL Operations Studio is in public preview, and features are
subject to change. You can download VS Code from Microsoft.
SQL Server Profiler is a graphical user interface tool that is used to view the output of a SQL Trace. You
use SQL Server Profiler to monitor the performance of the Database Engine or Analysis Services by
capturing traces and saving them to a file or table. You can use Profiler to step through problem queries
to investigate the causes.
You can also use the saved trace to replicate the problem on a test server, making it easier to diagnose
the problem.
Profiler also supports auditing an instance of SQL Server. Audits record security-related actions so they
can be reviewed later.
Database Engine Tuning Advisor
The Database Engine Tuning Advisor analyzes databases and makes recommendations that you can use to
optimize performance. You can use it to select and create an optimal set of indexes, indexed views, or
table partitions. Common usage includes the following tasks:
Demonstration Steps
Use SSMS to Connect to an On-premises Instance of SQL Server
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running and then log
on to 20762C-MIA-SQL as AdventureWorks\Student with the password Pa55w.rd.
5. In the Connect to Server dialog box, ensure that Server type is set to Database Engine.
Note the use of IntelliSense while you are typing this query, and then on the toolbar, click Execute.
Note how the results can be returned.
5. In the Save File As dialog box, navigate to D:\Demofiles\Mod01, and then click Save. Note that this
saves the query to a file.
6. On the Results tab, right-click the cell for ProductID 1 (first row and first cell), and then click Save
Results As.
7. In the Save Grid Results dialog box, navigate to the D:\Demofiles\Mod01 folder.
8. In the File name box, type Demonstration2AResults, and then click Save. Note that this saves the
query results to a file.
9. On the Query menu, click Display Estimated Execution Plan. Note that SSMS can do more than just
execute queries.
10. On the Tools menu, click Options.
Developing SQL Databases 1-15
11. In the Options dialog box, expand Query Results, expand SQL Server, and then click General.
Review the available configuration options, and then click Cancel.
12. On the File menu, click Close.
3. In Solution Explorer, note the contents of Solution Explorer, and that by using a project or solution
you can save the state of the IDE. This means that any open connections, query windows, or Solution
Explorer panes will reopen in the state they were saved in.
1. In Object Explorer, click Connect and note the other SQL Server components to which connections
can be made.
2. On the File menu, point to New, and then click Database Engine Query to open a new connection.
3. In the Connect to Database Engine dialog box, in the Server name box, type MIA-SQL.
4. In the Authentication drop-down list, select Windows Authentication, and then click Connect.
5. In the Available Databases drop-down list on the toolbar, click tempdb. Note that this changes the
database against which the query is executed.
6. Right-click in the query window, point to Connection, and then click Change Connection. This will
reconnect the query to another instance of SQL Server.
7. In the Connect to Database Engine dialog box, click Cancel.
8. Close SSMS.
4. In the Add Connection dialog box, in the Server name box, type MIA-SQL.
5. In the Select or enter a database name drop-down list, click AdventureWorks, and then click OK.
6. In Server Explorer, expand Data Connections.
10. Note that you can view results, just as you can in SSMS.
Module 2
Designing and Implementing Tables
Contents:
Module Overview 2-1
Lesson 1: Designing Tables 2-2
Module Overview
In a relational database management system (RDBMS), user and system data is stored in tables. Each table
consists of a set of rows that describe entities and a set of columns that hold the attributes of an entity.
For example, a Customer table might have columns such as CustomerName and CreditLimit, and a row
for each customer. In Microsoft SQL Server® data management software tables are contained within
schemas that are very similar in concept to folders that contain files in the operating system. Designing
tables is one of the most important tasks that a database developer undertakes, because incorrect table
design leads to the inability to query the data efficiently.
After an appropriate design has been created, it is important to know how to correctly implement the
design.
Objectives
After completing this module, you will be able to:
Use schemas in your database designs to organize data, and manage object security.
Lesson 1
Designing Tables
The most important aspect of designing tables involves determining what data each column will hold. All
organizational data is held within database tables, so it is critical to store the data with an appropriate
structure.
The best practices for table and column design are often represented by a set of rules that are known as
“normalization” rules. In this lesson, you will learn the most important aspects of normalized table design,
along with the appropriate use of primary and foreign keys. In addition, you will learn to work with the
system tables that are supplied when SQL Server is installed.
Lesson Objectives
After completing this lesson, you will be able to:
What Is a Table?
Relational databases store data about entities in
tables, and tables are defined by columns and
rows. Rows represent entities, and columns define
the attributes of the entities. By default, the rows
of a table have no predefined order, and can be
used as a security boundary.
Tables
Columns define the information that is being held about each entity. For example, a Product table might
have columns such as ProductID, Size, Name, and UnitWeight. Each of these columns is defined by
using a specific data type. For example, the UnitWeight column of a product might be allocated a
decimal (18,3) data type.
Developing SQL Databases 2-3
Naming Conventions
There is strong disagreement within the industry over naming conventions for tables. The use of prefixes
(such as tblCustomer or tblProduct) is widely discouraged. Prefixes were commonly used in higher level
programming languages before the advent of strong data typing—that is, the use of strict data types
rather than generic data types—but are now rare. The main reason for this is that names should represent
the entities, not how they are stored. For example, during a maintenance operation, it might become
necessary to replace a table with a view, or vice versa. This could lead to views named tblProduct or
tblCustomer when trying to avoid breaking existing code.
Another area of strong disagreement relates to whether table names should be singular or plural. For
example, should a table that holds the details of a customer be called Customer or Customers?
Proponents of plural naming argue that the table holds the details of many customers; proponents of
singular naming say that it is common to expose these tables via object models in higher level languages,
and that the use of plural names complicates this process. Although we might have a Customers table, in
a high level language, we are likely to have a Customer object. SQL Server system tables and views have
plural names.
The argument is not likely to be resolved either way and is not a problem that is specific to the SQL
language. For example, an array of customers in a higher level language could sensibly be called
“Customers,” yet referring to a single customer via “Customers[49]” seems awkward. The most important
aspect of naming conventions is that you should adopt one that you can work with and apply
consistently.
Security
You can use tables as security boundaries because users can be assigned permissions at the table level.
However, note that SQL Server supports the assignment of permissions at the column level, in addition to
the table level; row-level security is available for tables in SQL Server. Row, column, and table security can
also be implemented by using a combination of views, stored procedures, and/or triggers.
Row Order
Tables are containers for rows but, by default, they do not define any order for the rows that they contain.
When users select rows from a table, they should only specify the order that the rows should be returned
in if the output order matters. SQL Server might have to expend additional sorting effort to return rows in
a given order, and it is important that this effort is only made when necessary.
Normalizing Data
Normalization is a systematic process that you can
use to improve the design of databases.
Normalization
Codd introduced first normal form in 1970, followed by second normal form, and then third normal form
in 1971. Since that time, higher forms of normalization have been introduced by theorists, but most
database designs today are considered to be “normalized” if they are in third normal form.
Intentional Denormalization
Not all databases should be normalized. It is common to intentionally denormalize databases for
performance reasons or for ease of end-user analysis.
For example, dimensional models that are widely used in data warehouses (such as the data warehouses
that are commonly used with SQL Server Analysis Services) are intentionally designed not to be
normalized.
Tables might also be denormalized to avoid the need for time-consuming calculations or to minimize
physical database design constraints, such as locking.
Normalization
Although there is disagreement on the interpretation of these rules, there is general agreement on most
common symptoms of violating the rules.
To adhere to the first normal form, you must eliminate repeating groups in individual tables. To do this,
you should create a separate table for each set of related data, and identify each set of related data by
using a primary key.
For example, a Product table should not include columns such as Supplier1, Supplier2, and Supplier3.
Column values should not include repeating groups. For example, a column should not contain a comma-
separated list of suppliers.
Duplicate rows should not exist in tables. You can use unique keys to avoid having duplicate rows. A
candidate key is a column or set of columns that you can use to uniquely identify a row in a table. An
alternate interpretation of first normal form rules would disallow the use of nullable columns.
For example, a second normal form error would be to hold the details of products that a supplier provides
in the same table as the details of the supplier's credit history. You should store these values in a separate
table.
To adhere to third normal form, eliminate fields that do not depend on the key.
Imagine a Sales table that has OrderNumber, ProductID, ProductName, SalesAmount, and SalesDate
columns. This table is not in third normal form. A candidate key for the table might be the OrderNumber
column. However, the ProductName column only depends on the ProductID column and not on the
candidate key, so the Sales table should be separated from a Products table, and probably linked to it by
ProductID.
Formal database terminology is precise, but can be hard to follow when it is first encountered. In the next
demonstration, you will see examples of common normalization errors.
Demonstration Steps
1. Ensure that the MT17B-WS2016-NAT, 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are
running, and then log on to 20762C-MIA-SQL as AdventureWorks\Student with the password
Pa55w.rd.
2. In File Explorer, navigate to D:\Demofiles\Mod02, right-click Setup.cmd, and then click Run as
administrator.
5. In the Connect to Server dialog box, connect to MIA-SQL, using Windows Authentication.
9. Select the code under the Step 1: Set AdventureWorks as the current database comment, and
then click Execute.
10. Select the code under the Step 2: Create a table for denormalizing comment, and then click
Execute.
11. Select the code under the Step 3: Alter the table to conform to third normal form comment, and
then click Execute.
12. Select the code under the Step 4: Drop and recreate the ProductList table comment, and then
click Execute.
13. Select the code under the Step 5: Populate the ProductList table comment, and then click Execute.
14. Close SQL Server Management Studio without saving any changes.
2-6 Designing and Implementing Tables
Primary Keys
A primary key is a form of constraint that uniquely
identifies each row within a table. A candidate key
is a column or set of columns that you could use
to identify a row uniquely—it is a candidate to be
chosen for the primary key. A primary key must be
unique and cannot be NULL.
For example, a Customer table might have a CustomerID or CustomerCode column that contains
numeric, GUID, or alphanumeric codes. The surrogate key would not be related to the other attributes of
a customer.
The use of surrogate keys is another subject that can lead to strong debate between database
professionals.
Foreign Keys
A foreign key is a key in one table that matches, or
references, a unique key in another table. Foreign
keys are used to create a relationship between
tables. A foreign key is a constraint because it
limits the data that can be held in the field to a
value that matches a field in the related table.
For example, a CustomerOrders table might include a CustomerID column. A foreign key reference is
used to ensure that any CustomerID value that is entered in the CustomerOrders table does in fact exist
in the Customers table.
In SQL Server, the reference is only checked if the column that holds the foreign key value is not NULL.
Self-Referencing Tables
A table can hold a foreign key reference to itself. For example, an Employees table might contain a
ManagerID column. An employee's manager is also an employee; therefore, a foreign key reference can
be made from the ManagerID column of the Employees table to the EmployeeID column in the same
table.
Reference Checking
SQL Server cannot update or delete referenced keys unless you enable options that cascade the changes
to related tables. For example, you cannot change the ID for a customer when there are orders in a
CustomerOrders table that reference the customer's ID.
Tables might also include multiple foreign key references. For example, an Orders table might have
foreign keys that refer to both a Customers table and a Products table.
Terminology
Foreign keys are used to “enforce referential integrity.” Foreign keys are a form of constraint and will be
covered in more detail in a later module.
The ANSI SQL 2003 definition refers to self-referencing tables as having “recursive foreign keys.”
In SQL Server 2005, system tables were hidden and replaced by a set of system views that show the
contents of the system tables. These views are permission-based and display data to a user only if the user
has appropriate permission.
2-8 Designing and Implementing Tables
The way in which your database is used will determine the best design for concurrency. You may need to
monitor your database over time to determine if the design is sufficient, and make alterations if locking
becomes a frequent problem. Your goal is to ensure transactions are as small and as fast as possible, and
less likely to block other transactions.
The higher the number of users in your database, the more locking you will have, because they are more
likely to simultaneously access the same row, table, and pages. The more locking you have, the lower the
performance of the system, because one user must wait for another user to finish their transaction, and
the application may temporarily freeze. You may also find that there are certain times of the day when
things slow down, such as afternoons, when staff return to the office after lunch. One option is to change
the transaction isolation level; however, this creates other potential problems, with logical errors in your
data. The better solution is to use normalization to separate the data as much as possible. While this
creates extra joins in your queries, it does help with concurrency.
If you have a table with 25 columns, and three users attempt to modify three different columns in the
same row, SQL Server takes a row lock for the first user, and the other two must wait for the first user to
complete. The third user must wait for the first two to complete. By splitting the table into three tables,
and separating the columns that are likely to be modified, each user will be modifying a different table,
without blocking the other users.
In a data warehouse, you won't have lots of users making modifications, so locking won't be an issue.
Users will only read data from the data warehouse, so it is safe to denormalize the data, because one table
does not need to describe one entity.
Developing SQL Databases 2-9
IDENTITY
In the following example, the first row in the Categories table will have a CategoryID value of 1. The next
row will have a value of 2:
IDENTITY Property
CREATE TABLE Categories
(
CategoryID int IDENTITY(1,1),
Category varchar(25) NOT NULL
);
The data type of the column is int, so the starting seed can be any value that the integer stores, but it is
common practice to start at 1, and increment by 1.
When inserting a record into a table with an identity column, if a transaction fails and is rolled back, then
the value that the row would have been assigned as the identity is not used for the next successful insert.
IDENTITY_INSERT
You can insert an explicit value into an identity column using SET IDENTITY_INSERT ON. The insert value
cannot be a value that has already been used; however, you can use a value from a failed transaction that
was not used, and keep the number sequence intact. You can also use the value from a row that has been
deleted. There can only be one IDENTITY_INSERT per session, so be sure to include SET
IDENTITY_INSERT OFF after the insert statement has completed.
Use TRUNCATE TABLE instead of DELETE FROM, though keep in mind that truncate is only
minimally logged in the transaction log:
Run the DBCC CHECKIDENT command to reseed the table. The reseed value should be 0 to set the
first row back to 1:
@@IDENTITY and SCOPE_IDENTITY() are very similar in function, but with a subtle difference. Both
return the identity value of the last inserted record. However, @@IDENTITY returns the last insert,
regardless of session, whereas SCOPE_IDENTITY() will return the value from the current session. If you
insert a row and need the identity value, use SCOPE_IDENTITY()—if you use @@IDENTITY after an
insert, and another session also inserts a new row, you might pick up that value, rather than your row
value.
CategoryID
----------------
6
SEQUENCE
The SEQUENCE object performs a similar function to IDENTITY, but has a lot more flexibility. You create a
SEQUENCE object at the database level, and values can be used by multiple tables. Whereas IDENTITY
creates a new value with each row insert, SEQUENCE returns a value when requested. This value does not
need to be inserted into a table:
SEQUENCE is useful when you want control over the values you are inserting. In the above example, the
Categories table uses IDENTITY to generate the CategoryID values. Consider if there were two tables in
your database, one for MainCategories, and another for SubCategories, but you wanted the
CategoryID to be unique across both tables—you could use SEQUENCE.
CategoryID SubCategory
----------- -----------------
3 Cat Food
4 Wine
The sequence can be limited by using the MINVALUE and MAXVALUE properties—when you reach the
MAXVALUE limit, you will receive an error.
Note: If you want to know the next value in the sequence, you can run the code: SELECT
NEXT VALUE FOR CategoryID. However, each time this runs, the sequence value will increment,
even if you don’t use this value in an insert statement.
Lesson 2
Data Types
The most basic types of data that get stored in database systems are numbers, dates, and strings. There is
a range of data types that can be used for each of these. In this lesson, you will see the Microsoft-supplied
data types that you can use for numeric and date-related data. You will also see what NULL means and
how to work with it. In the next lesson, you will see how to work with string data types.
Lesson Objectives
After completing this lesson, you will be able to:
Constraining Values
Data types are a form of constraint that is placed
on the values that can be stored in a location. For
example, if you choose a numeric data type, you
will not be able to store text.
In addition to constraining the types of values that can be stored, data types also constrain the range of
values that can be stored. For example, if you choose a smallint data type, you can only store values
between –32,768 and +32,767.
Query Optimization
When SQL Server identifies that the value in a column is an integer, it might be able to generate an
entirely different and more efficient query plan to one where it identifies that the location is holding text
values.
The data type also determines which sorts of operations are permitted on that data and how those
operations work.
Developing SQL Databases 2-13
Self-Documenting Nature
Choosing an appropriate data type provides a level of self-documentation. If all values were stored in a
string value (which could potentially represent any type of value) or XML data types, you would probably
need to store documentation about what sort of values can be stored in the string locations.
Data Types
There are three basic sets of data types:
System data types. SQL Server provides a large number of built-in (or intrinsic) data types. Examples
of these include integer, varchar, and date.
Alias data types. Users can also define data types that provide alternate names for the system data
types and, potentially, further constrain them. These are known as alias data types. For example, you
could use an alias data type to define the name PhoneNumber as being equivalent to nvarchar(16).
Alias data types can help to provide consistency of data type usage across applications and databases.
User-defined data types. By using managed code via SQL Server integration with the common
language runtime (CLR), you can create entirely new data types. There are two categories of these
CLR types. One category is system CLR data types, such as the geometry and geography spatial data
types. The other is user-defined CLR data types, which enable users to create their own data types.
smallint is stored in 2 bytes (that is, 16 bits) and stores values from –32,768 to 32,767.
int is stored in 4 bytes (that is, 32 bits) and stores values from –2,147,483,648 to 2,147,483,647. It is a
very commonly used data type. SQL Server uses the full word “integer” as a synonym for “int.”
bigint is stored in 8 bytes (that is, 64 bits) and stores very large integer values. Although it is easy to
refer to a 64-bit value, it is hard to comprehend how large these values are. If you placed a value of
zero in a 64-bit integer location, and executed a loop to add one to the value, on most common
servers currently available, you would not reach the maximum value for many months.
SQL Server provides a range of data types for storing exact numeric values that include decimal places:
decimal is an ANSI-compatible data type you use to specify the number of digits of precision and the
number of decimal places (referred to as the scale). A decimal(12,5) location can store up to 12
digits with up to five digits after the decimal point. You should use the decimal data type for
2-14 Designing and Implementing Tables
monetary or currency values in most systems, and any exact fractional values, such as sales quantities
(where part quantities can be sold) or weights.
numeric is a data type that is functionally equivalent to decimal.
money and smallmoney are data types that are specific to SQL Server and have been present since
the early days of the platform. They were used to store currency values with a fixed precision of four
decimal places.
Note: Four is often the wrong number of decimal places for many monetary applications,
and the money and smallmoney data types are not standard data types. In general, use decimal
for monetary values.
bit is a data type that is stored in a single bit. The storage of the bit data type is optimized. If there are
eight or fewer bit columns in a table, they are stored in a single byte. bit values are commonly used to
store the equivalent of Boolean values in higher level languages.
Note that there is no literal string format for bit values in SQL Server. The string values TRUE and FALSE
can be converted to bit values, as can the integer values 1 and 0. TRUE is converted to 1 and FALSE is
converted to 0.
Higher level programming languages differ on how they store true values in Boolean columns. Some
languages store true values as 1; others store true values as -1. To avoid any chance of mismatch, in
general, when working with bits in applications, test for false values by using the following code:
IF (@InputValue = 0)
IF (@InputValue <> 0)
This is preferable to testing for a value being equal to 1 because it will provide more reliable code.
bit, along with other data types, is also nullable, which can be a surprise to new users. That means that a
bit location can be in three states: NULL, 0, or 1. (Nullability is discussed in more detail later in this
module.)
A very common error for new developers is to use approximate numeric data types to store values that
need to be stored exactly. This causes rounding and processing errors. A “code smell” for identifying
programs that new developers have written is a column of numbers that do not exactly add up to the
displayed totals. It is common for small rounding errors to creep into calculations; for example, a total that
is incorrect by 1 cent in dollar-based or euro-based currencies.
The inappropriate use of numeric data types can cause processing errors.
Look at the following code and decide how many times the PRINT statement would be executed:
Look at the following code and decide how many times the PRINT statement would be executed:
In fact, this query would never stop running, and would need to be cancelled.
2-16 Designing and Implementing Tables
After cancelling the query, if you looked at the output, you would see the following code:
What has happened? The problem is that the value 0.1 cannot be stored exactly in a float or real data
type, so the termination value of the loop is never hit exactly. If a decimal value had been used instead,
the loop would have executed as expected.
Consider how you would write the answer to 1÷3 in decimal form. The answer isn't 0.3, it is 0.3333333
recurring. There is no way in decimal form to write 1÷3 as an exact decimal fraction. You have to
eventually settle for an approximate value.
The same problem occurs in binary fractions; it just occurs at different values—0.1 ends up being stored as
the equivalent of 0.099999 recurring. 0.1 in decimal form is a nonterminating fraction in binary. Therefore,
when you put the system in a loop adding 0.1 each time, the value never exactly equals 1.0, which can be
stored precisely.
The time data type is aligned to the SQL standard form of hh:mm:ss, with optional decimal places up to
hh:mm:ss.nnnnnnn. Note that when you are defining the data type, you need to specify the number of
decimal places, such as time(4), if you do not want to use the default value of seven decimal places, or if
Developing SQL Databases 2-17
you want to save some storage space. The format that SQL Server uses is similar to the ISO 8601 definition
for TIME.
The ISO 8601 standard makes it possible to use 24:00:00 to represent midnight and to have a leap second
over 59. These are not supported in the SQL Server implementation.
The datetime2 data type is a combination of a date data type and a time data type.
datetime Data Type
The datetime data type is an older data type that has a smaller range of allowed dates and a lower
precision or accuracy. It is a commonly used data type, particularly in older Transact-SQL code. A common
error is not allowing for the 3 milliseconds accuracy of the data type. For example, using the datetime
data type, executing the following code would cause the value '20110101 00:00:00.000' to be stored:
Another problem with the datetime data type is that the way it converts strings to dates is based on
language format settings. A value in the form “YYYYMMDD” will always be converted to the correct date,
but a value in the form “YYYY-MM-DD” might end up being interpreted as “YYYY-DD-MM,” depending
on the settings for the session.
It is important to understand that this behavior does not happen with the new date data type, so a string
that was in the form “YYYY-MM-DD” could be interpreted as two different dates by the date (and
datetime2) data type and the datetime data type. You should specifically check any of the formats that
you intend to use, or always use formats that cannot be misinterpreted. Another option that was
introduced in SQL Server 2012 can help. A series of functions that enable date and time values to be
created from component parts was introduced. For example, there is now a DATEFROMPARTS function
that you can use to create a date value from a year, a month, and a day.
Note: Be careful when working with string literal representations of dates and time, as they
can be interpreted in different ways depending on the location. For example, 01/06/2016 might
be June 1, 2016, or the January 6, 2016.
You can solve this ambiguity by expressing dates and times according to the ISO standard, where
dates are represented as YYYY-MM-DD. In the previous example, this would be 2016-06-01.
Time Zones
The datetimeoffset data type is a combination of a datetime2 data type and a time zone offset. Note
that the data type is not aware of the time zone; it can simply store and retrieve time zone values.
Note that the time zone offset values extend for more than a full day (a range of –14:00 to +14:00). A
range of system functions has been provided for working with time zone values, and for all of the data
types related to dates and times.
For more information about data and time data type, see Microsoft Docs:
For more information about using data and time data types, see Technet:
Unique Identifiers
Globally unique identifiers (GUIDs) have become
common in application development. They are
used to provide a mechanism where any process
can generate a number and know that it will not
clash with a number that any other process has
generated.
GUIDs
Numbering systems have traditionally depended
on a central source for the next value in a
sequence, to make sure that no two processes use
the same value. GUIDs were introduced to avoid
the need for anyone to function as the “number
allocator.” Any process (on any system) can generate a value and know that it will not clash with a value
generated by any process across time and space, and on any system to an extremely high degree of
probability.
This is achieved by using extremely large values. When discussing the bigint data type earlier, you learned
that the 64-bit bigint values were really large. GUIDs are 128-bit values. The magnitude of a 128-bit value
is well beyond our capabilities of comprehension.
The uniqueidentifier data type in SQL Server is typically used to store GUIDs. Standard arithmetic
operators such as =, <> (or !=), <, >, <=, and >= are supported, in addition to NULL and NOT NULL
checks.
The IDENTITY property is used to automatically assign values to columns. (IDENTITY is discussed in
Module 3.) The IDENTITY property is not used with uniqueidentifier columns. New values are not
calculated by code in your process. They are calculated by calling system functions that generate a value
for you. In SQL Server, this function is the NEWID() function.
The random nature of GUIDs has also caused significant problems in current storage subsystems. SQL
Server 2005 introduced the NEWSEQUENTIALID() function to try to circumvent the randomness of the
values that the NEWID() function generated. However, the function does so at the expense of some
guarantee of uniqueness.
The usefulness of the NEWSEQUENTIALID() function is limited because the main reason for using GUIDs
is to enable other layers of code to generate the values, and know that they can just insert them into a
database without clashes. If you need to request a value from the database via NEWSEQUENTIALID(), it
usually would have been better to use an IDENTITY column instead.
A common development error is to store GUIDs in string values rather than in uniqueidentifier columns.
NULL
Common Errors
New developers often confuse NULL values with zero, blank (or space), zero-length strings, and so on. The
misunderstanding is exacerbated by other database engines that treat NULL and zero-length strings or
zeroes as identical. NULL indicates the absence of a value.
Careful consideration must be given to the nullability of a column. In addition to specifying a data type
for a column, you specify whether a value needs to be present. Often, this is referred to as whether a
column value is mandatory.
Look at the NULL and NOT NULL declarations in the following code sample and decide why each decision
might have been made:
Look at the NULL and NOT NULL declarations in the following code sample and decide why each decision
might have been made:
You can set the default behavior for new columns using the ANSI NULL default. For details about how this
works, see MSDN:
SET ANSI_NULL_DFLT_ON (Transact-SQL)
http://go.microsoft.com/fwlink/?LinkID=233793
After creating an alias data type, it is used in the same way as a system data type. The alias data type is
used as the data type when creating a column. In the following code, the PostCode column uses the new
PostalCode data type. There is no need to specify the width of the column because this was done when
the type was created.
When declaring variables or parameters, the data type assignment is again used in exactly the same way
as system data types:
To discover which alias data types have already been created within a database, query the sys.types
system view within the context of the relevant database.
Developing SQL Databases 2-21
Note: You can create alias data types in the model database, so every time a new database
is created, the user data types will automatically be created.
The CAST function accepts an expression for the first parameter then, after the AS keyword, the data type
to which the expression should be converted. The following example converts today’s date to a string.
There is no option to format the date, so using CONVERT would be a better choice:
Example of using CAST:
CAST
SELECT CAST(GETDATE() AS nvarchar(50)) AS DateToday;
DateToday
-----------------------
Jan 25 2016 3:21PM
The following code passes a string with a whole number, so SQL Server can easily convert this to an
integer type:
The following code passes a string with a whole number, so SQL Server can easily convert this to an
integer type:
CAST
SELECT CAST('93751' AS int) AS ConvertedString;
ConvertedString
-------------------
93751
2-22 Designing and Implementing Tables
CAST
SELECT CAST('93751.3' AS int) AS ConvertedString;
In the following code, CONVERT accepts three parameters—the data type to which the expression should
be converted, the expression, and the datetime format for conversion:
In the following code, CONVERT accepts three parameters—the data type to which the expression should
be converted, the expression, and the datetime format for conversion:
CONVERT
SELECT CONVERT(nvarchar(50), GETDATE(), 106) AS DateToday;
DateToday
-----------------------
25 Jan 2016
PARSE
The structure of the PARSE function is similar to CAST; however, it accepts an optional parameter through
the USING keyword that enables you to set the culture of the expression. If no culture parameter is
provided, the function will use the language of the current session. PARSE should only be used for
converting strings to date/time or numbers, including money.
In the following example, the session language uses the British English culture, which uses the date
format DD/MM/YYYY. The US English date expression is in the American format MM/DD/YYYY, and is
parsed into the British English language:
In the following example, the session language uses the British English culture, which uses the date
format DD/MM/YYYY. The US English date expression is in the American format MM/DD/YYYY, and is
parsed into the British English language:
PARSE
SET LANGUAGE 'British English';
SELECT PARSE('10/13/2015' AS datetime2 USING 'en-US') AS MyDate;
MyDate
-------------
2015-10-13 00:00:00.0000000
If the optional parameter is excluded, then the parser will try to convert the date, and throw an error:
If the optional parameter is excluded, then the parser will try to convert the date, and throw an error:
PARSE
SET LANGUAGE 'British English';
SELECT PARSE('10/13/2015' AS datetime2) AS MyDate;
--------------
Msg 9819, Level 16, State 1, Line 7
Error converting string value '10/13/2015' into data type datetime2 using culture ''.
Developing SQL Databases 2-23
To find out which languages are present on an instance of SQL Server, run the following code:
TRY_CAST operates in much the same way as CAST, but will return NULL rather than an error, if the
expression cannot be cast into the intended data type.
The following query executes the code used in the above CAST example that failed, but this time returns
NULL:
TRY_CAST
SELECT TRY_CAST('93751.3' AS int) AS ConvertedString;
ConvertedString
----------------
NULL
This is useful for eloquently handling errors in your code, as a NULL is easier to deal with than an error
message.
It can also be used with the CASE statement, as per the following example:
TRY_CAST
SELECT
CASE
WHEN TRY_CAST('93751.3' AS int) IS NULL THEN 'FAIL'
ELSE 'SUCCESS'
END AS ConvertedString;
ConvertedString
----------------
FAIL
Just as TRY_CAST is similar to CAST, TRY_CONVERT works the same as CONVERT but returns NULL instead
of an error when an expression cannot be converted. It, too, can also be used in a CASE statement.
TRY_CONVERT
SELECT
CASE
WHEN TRY_CONVERT(varchar(25), 93751.3) IS NULL THEN 'FAIL'
ELSE 'SUCCESS'
END AS ConvertedString;
ConvertedString
----------------
SUCCESS
2-24 Designing and Implementing Tables
TRY_PARSE
Following the format of the previous TRY functions, TRY_PARSE is identical to PARSE, but returns a NULL
instead of an error when an expression cannot be parsed.
When using TRY_PARSE, and running the code sample that failed, A NULL is returned, rather than an
error:
TRY_PARSE
SET LANGUAGE 'British English';
SELECT TRY_PARSE('10/13/2015' AS datetime2) AS MyDate;
MyDate
---------
NULL
If you use CAST, CONVERT or PARSE in your application code and the parser throws an error that you
haven’t handled, it may cause issues in the application. Use TRY_CAST, TRY_CONVERT and TRY_PARSE
when you need to handle an error, and use the CASE statement to provide alternative values when a
conversion is not possible.
Users can enter the number beside the character to select the intended word. It might not seem
important to an English-speaking person but, given that the first option means “horse”, the second option
is like a question mark, and the third option means “mother”, there is definitely a need to select the
correct option!
Developing SQL Databases 2-25
Character Groups
An alternate way to enter the characters is via radical groupings.
Note the third character in the preceding code example. The left-hand part of that character, 女, means
“woman”. Rather than entering English-like characters (that could be quite unfamiliar to the writers),
select a group of characters based on what is known as a radical.
Note that the character representing “mother” is the first character on the second line. For this sort of
keyboard entry to work, the characters must be in appropriate groups, not just stored as one large sea of
characters. An additional complexity is that the radicals themselves are also in groups. In the screenshot,
you can see that the woman radical was part of the third group of radicals.
Unicode
In the 1980s, work was done by a variety of researchers, to determine how many bytes are required to be
able to hold all characters from all languages, but also store them in their correct groupings. The answer
from all researchers was three bytes. You can imagine that three was not an ideal number for computing
and at the time, users were mostly working with 2 byte (that is, 16-bit) computer systems.
Unicode introduced a two-byte character set that attempts to fit the values from the 3 bytes into 2 bytes.
Inevitably, there had to be some trade-offs.
Unicode allows any combination of characters, which are drawn from any combination of languages, to
exist in a single document. There are multiple encodings for Unicode with UTF-7, UTF-8, UTF-16, and UTF-
32. (UTF is universal text format.) SQL Server currently implements double-byte UTF-16 characters for its
Unicode implementation.
For string literal values, an N prefix on a string allows the entry of double-byte characters into the string,
rather than just single-byte characters. (N stands for “National” in National Character Set.)
When working with character strings, the LEN function returns the number of characters (Unicode or not)
whereas DATALENGTH returns the number of bytes.
Question: What would be a suitable data type for storing the value of a check box that can
be 0 for cleared, 1 for selected, or -1 for disabled?
2-26 Designing and Implementing Tables
Lesson 3
Working with Schemas
A schema is a namespace that allows objects within a database to be logically separated to make them
easier to manage. Objects may be separated according to the owner, according to their function, or any
other way that makes sense for a particular database.
Schemas were introduced with SQL Server 2005. They can be thought of as containers for objects such as
tables, views, and stored procedures. Schemas provide organization and structure when a database
includes large numbers of objects.
You can also assign security permissions at the schema level, rather than for individual objects that are
contained within the schemas. Doing this can greatly simplify the design of system security requirements.
Lesson Objectives
After completing this lesson, you will be able to:
Describe the role of a schema.
Create schemas.
What Is a Schema?
Schemas are used to contain objects and to
provide a security boundary for the assignment of
permissions. In SQL Server, schemas are used as
containers for objects, rather like a folder is used
to hold files at the operating system level. Since
their introduction in SQL Server 2005, schemas can
be used to contain objects such as tables, stored
procedures, functions, types, and views. Schemas
form a part of the multipart naming convention
for objects. In SQL Server, an object is formally
referred to by a name of the form
Server.Database.Schema.Object.
Security Boundary
Schemas can be used to simplify the assignment of permissions. An example of applying permissions at
the schema level would be to assign the EXECUTE permission on a schema to a user. The user could then
execute all stored procedures within the schema. This simplifies the granting of permissions because there
is no need to set up individual permissions on each stored procedure.
It is important to understand that schemas are not used to define physical storage locations for data, as
occurs in some other database engines.
If you are upgrading applications from SQL Server 2000 and earlier versions, it is important to understand
that the naming convention changed when schemas were introduced. Previously, names were of the form
Server.Database.Owner.Object.
Developing SQL Databases 2-27
Objects still have owners, but the owner's name does not form a part of the multipart naming convention
from SQL Server 2005 onward. When upgrading databases from earlier versions, SQL Server will
automatically create a schema that has the same name as existing object owners, so that applications that
use multipart names will continue to work.
More than one Product table could exist in separate schemas of the same database. When single-part
names are used, SQL Server must then determine which Product table is being referred to.
Most users have default schemas assigned, but not all types of users have these. Default schemas are
assigned to users based on standard Windows® and SQL Server logins. You can also assign default
schemas to Windows groups when using SQL Server 2012. Users without default schemas are considered
to have the dbo schema as their default schema.
When locating an object, SQL Server will first check the user's default schema. If the object is not found,
SQL Server will then check the dbo schema to try to locate it.
It is important to include schema names when referring to objects, instead of depending upon schema
name resolution, such as in this modified version of the previous statement:
Apart from rare situations, using multipart names leads to more reliable code that does not depend upon
default schema settings.
2-28 Designing and Implementing Tables
Creating Schemas
Schemas are created by using the CREATE
SCHEMA command. This command can also
include the definition of objects to be created
within the schema at the time the schema is
created.
CREATE SCHEMA
Besides creating schemas, the CREATE SCHEMA statement can include options for object creation.
Although the code example that follows might appear to be three statements (CREATE SCHEMA,
CREATE TABLE, and GRANT), it is in fact a single statement. Both CREATE TABLE and GRANT are
options that are being applied to the CREATE SCHEMA statement.
Within the newly created KnowledgeBase schema, the Article table is being created and the SELECT
permission on the database is being granted to Salespeople.
Statements such as the second CREATE SCHEMA statement can lead to issues if the entire statement is
not executed together.
CREATE SCHEMA
CREATE SCHEMA Reporting
AUTHORIZATION Terry;
Create a schema.
Drop a schema.
Demonstration Steps
1. Ensure that you have completed the previous demonstrations in this module.
3. In the Connect to Server dialog box, in the Server box, type the URL of the Azure server <Server
Name>.database.windows.net (where <Server Name> is the name of the server you created).
5. In the User name box, type Student, and in the Password box, type Pa55w.rd, and then click
Connect.
7. In the Open Project dialog box, navigate to the D:\Demofiles\Mod02 folder, click Demo.ssmssln,
and then click Open.
8. In Solution Explorer, under Queries, double-click 2 - Schemas.sql.
10. Select the code under the Step 2: Create a Schema comment, and then click Execute.
11. Select the code under the Step 3: Create a table using the new schema comment, and then click
Execute.
12. Select the code under the Step 4: Drop the schema comment, and then click Execute. Note that the
schema cannot be dropped while objects exist in it.
13. Select the code under the Step 5: Drop and the table and then the schema comment, and then
click Execute.
14. Leave SQL Server Management Studio open for the next demonstration.
2-30 Designing and Implementing Tables
Table
Function
Database role
View
Stored procedure
Developing SQL Databases 2-31
Lesson 4
Creating and Altering Tables
Now that you understand the core concepts surrounding the design of tables, this lesson introduces you
to the Transact-SQL syntax that is used when defining, modifying, or dropping tables. Temporary tables
are a special form of table that can be used to hold temporary result sets. Computed columns are used to
create columns where the value held in the column is automatically calculated, either from expressions
involving other columns from the table, or from the execution of functions.
Lesson Objectives
After completing this lesson, you will be able to:
Create tables
Drop tables
Alter tables
Creating Tables
Tables are created by using the CREATE TABLE
statement. This statement is also used to define
the columns that are associated with the table,
and identify constraints such as primary and
secondary keys.
CREATE TABLE
Nullability
You should specify NULL or NOT NULL for each column in the table. SQL Server has defaults for this that
you can change via the ANSI_NULL_DEFAULT setting. Scripts should always be designed to be as reliable
as possible—specifying nullability in data definition language (DDL) scripts helps to improve script
reliability.
Primary Key
You can specify a primary key constraint beside the name of a column if only a single column is included
in the key. It must be included after the list of columns when more than one column is included in the
key.
2-32 Designing and Implementing Tables
In the following example, the SalesID value is only unique for each SalesRegisterID value:
Primary keys are constraints and are more fully described, along with other constraints, later in this course.
Dropping Tables
The DROP TABLE statement is used to delete a
table from a database. If a table is referenced by a
foreign key constraint, it cannot be dropped.
DROP
DROP TABLE PetStore.Owner;
GO
Developing SQL Databases 2-33
Altering Tables
Altering a table is useful because permissions on
the table are retained, along with the data in the
table. If you drop and recreate the table with a
new definition, both the permissions on the table
and the data in the table are lost. However, if the
table is referenced by a foreign key, it cannot be
dropped, though it can be altered.
Note that the syntax for adding and dropping columns is inconsistent. The word COLUMN is required for
DROP, but not for ADD. In fact, it is not an optional keyword for ADD either. If the word COLUMN is
omitted in a DROP, SQL Server assumes that it is a constraint being dropped.
In the following example, the PreferredName column is being added to the PetStore.Owner table. Then,
the PreferredName column is being dropped from the PetStore.Owner table. Note the difference in
syntax regarding the word COLUMN.
Use ALTER TABLE to add or delete columns.
ALTER TABLE
ALTER TABLE Petstore.Owner
ADD PreferredName nvarchar(30) NULL;
GO
Drop tables.
Demonstration Steps
1. Ensure that you have completed the previous demonstrations in this module.
2. In SQL Server Management Studio, in Solution Explorer, under Queries, double-click 3 - Create
Tables.sql.
4. Select the code under the Step 2: Create a table comment, and then click Execute.
5. Select the code under the Step 3: Alter the SalesLT.Courier table comment, and then click Execute.
6. Select the code under the Step 4: Drop the tables comment, and then click Execute.
7. Leave SQL Server Management Studio open for the next demonstration
Temporary Tables
Temporary tables are used to hold temporary
result sets within a user's session. They are created
within the tempdb database and deleted
automatically when they go out of scope. This
typically occurs when the code in which they were
created completes or aborts. Temporary tables are
very similar to other tables, except that they are
only visible to the creator and in the same scope
(and subscopes) within the session. They are
automatically deleted when a session ends or
when they go out of scope. Although temporary
tables are deleted when they go out of scope, you
should explicitly delete them when they are no longer needed to reduce resource requirements on the
server. Temporary tables are often created in code by using the SELECT INTO statement.
A table is created as a temporary table if its name has a number sign (#) prefix. A global temporary table
is created if the name has a double number sign (##) prefix. Global temporary tables are visible to all
users and are not commonly used.
Temporary tables are also often used to pass rowsets between stored procedures. For example, a
temporary table that is created in a stored procedure is visible to other stored procedures that are
executed from within the first procedure. Although this use is possible, it is not considered good practice
in general. It breaks common rules of abstraction for coding and also makes it more difficult to debug or
troubleshoot the nested procedures. SQL Server 2008 introduced table-valued parameters (TVPs) that can
provide an alternate mechanism for passing tables to stored procedures or functions. (TVPs are discussed
later in this course.)
Developing SQL Databases 2-35
The overuse of temporary tables is a common Transact-SQL coding error that often leads to performance
and resource issues. Extensive use of temporary tables can be an indicator of poor coding techniques,
often due to a lack of set-based logic design.
Demonstration Steps
1. Ensure that you have completed the previous demonstrations in this module.
2. In SQL Server Management Studio, in Solution Explorer, under Queries, double-click 4 - Temporary
Tables.sql.
3. Right-click the query pane, point to Connection, and then click Change Connection.
4. In the Connect to Database Engine window dialog box, in the Server name box, type MIA-SQL, in
the Authentication box, select Windows Authentication, and then click Connect.
5. Select the code under the Step 1: Create a local temporary table comment, and then click Execute.
7. Select the code under the Step 1: Select and execute the following query comment, and then click
Execute. Note that this session cannot access the local temporary table from the other session.
8. Switch to the 4 - Temporary Tables.sql pane.
9. Select the code under the Step 3: Create a global temporary table comment, and then click
Execute.
10. Switch to the 5 - Temporary Tables.sql pane.
11. Select the code under the Step 2: Select and execute the following query comment, and then click
Execute. Note that this session can access the global temporary table from the other session.
12. Switch to the 4 - Temporary Tables.sql pane.
13. Select the code under the Step 5: Drop the two temporary tables comment, and then click
Execute.
14. Leave SQL Server Management Studio open for the next demonstration
2-36 Designing and Implementing Tables
Computed Columns
Computed columns are derived from other
columns or from the result of executing functions.
Computed Column
CREATE TABLE PetStore.Pet
(
Pet1D int IDENTITY (1,1) PRIMARY KEY,
PetName nvarchar(30) NOT NULL,
DateOfBirth date NOT NULL,
YearOfBirth AS DATEPART(year, DateOfBirth) PERSISTED
);
GO
A nonpersisted computed column is calculated every time a SELECT operation occurs on the column and
it does not consume space on disk. A persisted computed column is calculated when the data in the row
is inserted or updated and does consume space on the disk. The data in the column is then selected like
the data in any other column.
The difference between persisted and nonpersisted computed columns relates to when the computational
performance impact is exerted.
Nonpersisted computed columns work best for data that is modified regularly, but rarely selected.
Persisted computed columns work best for data that is modified rarely, but selected regularly.
In most business systems, data is read much more regularly than it is updated. For this reason, most
computed columns would perform best as persisted computed columns.
Developing SQL Databases 2-37
Demonstration Steps
1. Ensure that you have completed the previous demonstrations in this module.
2. In SQL Server Management Studio, in Solution Explorer, under Queries, double-click 6 - Computed
Columns.sql.
3. Right-click the query pane, point to Connection, and then click Change Connection.
4. In the Connect to Database Engine window dialog box, in the Server name box, type the URL for
the Azure account, in the Authentication box, select SQL Server Authentication, in the Login box,
type Student, and in the Password box, type Pa55w.rd, and then click Connect.
7. Select the code under the Step 3: Populate the table with data comment, and then click Execute.
8. Select the code under the Step 4: Return the results from the SalesLT.SalesOrderDates table
comment, and then click Execute.
9. Select the code under the Step 5: Update a row in the SalesLT.SalesOrderDates table comment,
and then click Execute.
10. Select the code under the Step 6: Create a table with a computed column that is not persisted
comment, and then click Execute.
11. Select the code under the Step 7: Populate the table with data comment, and then click Execute.
12. Select the code under the Step 8 - Return the results from the SalesLT.TotalSales table comment,
and then click Execute.
13. Close SQL Server Management Studio without saving any changes.
Question: When creating a computed column, why is it good practice to include the
PERSISTED keyword? What are the consequences of excluding PERSISTED when the table has
several million records?
2-38 Designing and Implementing Tables
Objectives
After completing this lab, you will be able to:
Create a schema.
Create tables.
Estimated Time: 45 minutes
3. Review the suggested solution in Schema Design for Marketing Development Tables.docx in the
D:\Labfiles\Lab02\Solution folder.
4. Close WordPad.
Results: After completing this exercise, you will have an improved schema and table design.
1. Create a Schema
Results: After completing this exercise, you will have a new schema in the database.
2. Refresh Object Explorer and verify that the new table exists.
2-40 Designing and Implementing Tables
2. Refresh Object Explorer and verify that the new table exists.
Results: After completing this exercise you will have created the Competitor, TVAdvertisement, and the
CampaignResponse tables. You will have created table columns with the appropriate NULL or NOT NULL
settings, and primary keys.
Module 3
Advanced Table Designs
Contents:
Module Overview 3-1
Lesson 1: Partitioning Data 3-2
Module Overview
The physical design of a database can have a significant impact on the ability of the database to meet the
storage and performance requirements set out by the stakeholders. Designing a physical database
implementation includes planning the file groups, how to use partitioning to manage large tables, and
using compression to improve storage and performance. Temporal tables are a new feature in SQL
Server® 2016 and offer a straightforward solution to collecting changes to your data.
Objectives
At the end of this module, you will be able to:
Describe the considerations for using partitioned tables in a SQL Server database.
Plan for using data compression in a SQL Server database.
Lesson 1
Partitioning Data
Databases that contain very large tables are often difficult to manage and might not scale well. This lesson
explains how you can use partitioning to overcome these problems, ensuring that databases remain
efficient, and can grow in a managed, orderly manner.
Lesson Objectives
At the end of this lesson, you will be able to:
Partitioning improves manageability for large tables and indexes, particularly when you need to load large
volumes of data into tables, or remove data from tables. You can manipulate data in partitioned tables by
using a set of dedicated commands that enable you to merge, split and switch partitions. These
operations often only move metadata, rather than moving data—which makes tasks such as loading data
much faster. They are also less resource intensive than loading data by using INSERTs. In some cases,
partitioning can also improve query performance, by moving older data out, thereby reducing the volume
of data to be queried.
Developing SQL Databases 3-3
Enabling separate maintenance operations. You can perform maintenance operations, such as
rebuilding and reorganizing indexes on a partition-by-partition basis, which might be more efficient
than doing it for the whole table. This is particularly useful when only some of the partitions contain
data that changes—because there is no need to maintain indexes for partitions in which the data
doesn't change. For example, in a table called Orders, where only the current orders are updated, you
can create separate partitions for current orders and completed orders, and only rebuild indexes on
the partition that contains the current orders.
Performing partial backups. You can use multiple filegroups to store partitioned tables and indexes.
If some partitions use read-only filegroups, you can use partial backups to back up only the primary
filegroups and the read/write filegroups; this is more efficient than backing up the entire database.
You can use partitioning to create partitioned tables and partitioned indexes. When you create a
partitioned table, you do not create it on a filegroup, as you do a nonpartitioned table. Instead, you
create it on a partition scheme, which defines a filegroup or, more usually, a set of filegroups. In turn, a
partition scheme is based on a partition function, which defines the boundary values that will be used to
divide table data into partitions.
Partition Functions
When creating a partition function, you need to
first plan the column in the table that you will use
to partition the data. You should then decide
which values in that column will be the boundary
values. It is common practice in tables that contain
a datetime or smalldatetime column to use this
to partition data, because this means you can
divide the table based on time intervals. For
example, you could partition a table that contains
order information by order date. You could then
maintain current orders in one partition, and
archive older orders into one or more additional
partitions.
The values that you choose for the partition function will have an effect on the size of each partition. For
example, in the OrderArchive table, you could choose boundary values that divide data based on yearly
intervals. The bigger the gap between the intervals, the bigger each partition will probably be. The
number of values that you include in the partition function determines the number of partitions in the
table. For example, if you include two boundary values, there will be three partitions in the table. The
additional partition is created to store the values outside of the second boundary.
Note: You cannot use the following data types in a partition function: text, ntext, image,
xml, timestamp, varchar(max), nvarchar(max), varbinary(max), alias data types, and CLR
user-defined data types.
3-4 Advanced Table Designs
To create a partition function, use the CREATE PARTITION FUNCTION Transact-SQL statement. You
must specify the name of the function, and the data type that the function will use. This should be the
same as the data type of the column that you will use as the partitioning key. You must also detail the
boundary values, and either RANGE LEFT or RANGE RIGHT to specify how to handle values in the table
that fall exactly on the boundary values:
RANGE LEFT is the default value. This value forms the upper boundary of a partition. For example, if
a partition function used a boundary value of midnight on December 31, 2015, any values in the
partitioned table date column that were equal to or less than this date and time would be placed into
the partition. All values from January 1, 2016 00:00:00 would be stored in the second partition.
With RANGE RIGHT, the value is the lower boundary of the partition. If you specified January 1, 2016
00:00:00, all dates including and later than this date would go into one partition; all dates before this
would go into another partition. This produces the same result as the RANGE LEFT example.
Note: The definition of the partition function does not include any objects, columns, or
filegroup storage information. This independence means you can reuse the function for as many
tables, indexes, or indexed views as you like. This is particularly useful for partitioning dates.
The following code example creates a partition function called YearlyPartitionFunction that specifies
three boundary values, and will therefore create four partitions:
The YearlyPartitionFunction in the preceding code example can be applied to any table. After
partitioning has been added to a table (you will see this in a later lesson), the datetime column used for
partitioning value will determine which partition the row will be stored in:
Partition
Minimum Value Maximum Value
Number
If the function used the RANGE RIGHT option, then the maximum values would become the minimum
values:
Partition
Minimum Value Maximum Value
Number
In this case, RANGE LEFT works better with the dates used, as each partition can then contain data for one
year. RANGE RIGHT would work better using the dates in the following code example.
The following code uses RANGE RIGHT to divide rows into annual partitions:
Note: A table or index can have a maximum of 15,000 partitions in SQL Server.
Partition Schemes
Partition schemes map table or index partitions to
filegroups. When planning a partition scheme,
think about the filegroups that your partitioned
table will use. By using multiple filegroups for your
partitioned table, you can separately back up
discrete parts of the table by backing up the
appropriate filegroup. It is common practice to
use one filegroup for each partition, but this is not
a requirement; you can use a single filegroup to
store all partitioned data, or map some partitions
to a single filegroup, and others to separate
filegroups.
For example, if you plan to store read-only data in a partitioned table, you might place all filegroups that
contain read-only data on the same filegroup, so you can manage the data together.
To create a partition scheme, use the CREATE PARTITION SCHEME Transact-SQL statement. You must
specify a name for the scheme, the partition function that it references, and the filegroups that it will use.
3-6 Advanced Table Designs
The following code example creates a scheme called OrdersByYear that references the function
PartitionByYearFunction and uses four filegroups, Orders1, Orders2, Orders3, and Orders4:
Note: You must create the partition function using the CREATE PARTITION FUNCTION
statement before you create your partition scheme.
The following code example creates a table named Orders, which will use the OrdersByYear scheme:
Partitioned Indexes
You create a partitioned index in much the same
way as a table using the ON clause, but a table
and its indexes can be partitioned using different
schemes. However, you must partition the
clustered index and table in the same way,
because the clustered index cannot be stored
separately from the table. If a table and all its
indexes are identically partitioned by using the
same partition scheme, then they are considered
to be aligned. When storage is aligned, both the
rows in a table and the indexes that depend on
these rows will be stored in the same filegroup.
Therefore, if a single partition is backed up or restored, both the data and indexes are kept together. An
index that is partitioned differently to its dependent table is considered nonaligned.
The following example creates a nonclustered index on the OrderID column of the Orders table:
Notice that, when you partition an index, you are not limited to using the columns in the index when
specifying the partitioning key. SQL Server includes the partitioning key in the definition of the index,
which means you can partition the index using the same scheme as the table.
The following code creates the Orders table with a partitioned clustered index on the OrderID and
OrderDate column of the Orders table:
When an index is partitioned, you can rebuild or reorganize the entire index, or a single partition of an
index. The sys.dm_db_index_physical_stats dynamic management view (DMV) provides fragmentation
information for each partition, so you can see which partitions are most heavily fragmented. You can then
create a targeted defragmentation strategy based on this data.
The following code uses the sys.dm_db_index_physical_stats DMV to show fragmentation in each partition
of a table:
Switching Partitions
One of the major benefits of partitioned tables is
the ability to switch individual partitions in and
out of a partitioned table. By using switching, you
can archive data quickly and with minimal impact
on other database operations. This is because, if it
is configured correctly, switching usually only
involves swapping the metadata of two partitions
in different tables, not the actual data. Consequently, the operation has minimal effect on performance.
You can switch partitions between partitioned tables, or you can switch a partition from a partitioned
table to a nonpartitioned table.
Both the source partition (or table) and the destination partition (or table) must be in the same
filegroup, so you need to take account of this when planning filegroups for a database.
The target partition (or table) must be empty; you cannot perform a SWITCH operation by using two
populated partitions.
The two partitions or tables involved must have the same schema (columns, data types, and so on).
The rows that the partitions contain must also fall within exactly the same range of values for the
partitioning column; this ensures that you cannot switch rows with inappropriate values into a
partitioned table. You should use CHECK constraints to ensure that the partitioning column values are
valid for the partition being switched. For example, for a table that is partitioned by a date value, you
could create a CHECK constraint on the table that you are switching; this then checks that all values
fall between two specified dates.
Splitting Partitions
Because you need to maintain an empty partition to switch partitions, it is usually necessary to split an
existing partition to create a new empty partition that you can then use to switch data. To split a partition,
you first need to alter the partition scheme to specify the filegroup that the new partition will use (this
assumes that your solution maps partitions one-to-one with filegroups). When you alter the scheme, you
specify this filegroup as the next used filegroup, which means that it will automatically be used for the
new partition that you create when you perform the split operation.
The following code example adds the next used filegroup NewFilegroup to the OrderArchiveScheme
partition scheme:
You can then alter the partition function to split the range and create a new partition.
Developing SQL Databases 3-9
The following code example adds a new partition by splitting the range:
You can now switch the empty partition as required. To do this, you need the partition number, which you
can get by using the $PARTITION function, and specifying the value for which you want to identify the
partition.
The following code example switches a partition from the Orders table to the OrderArchive table:
Merging Partitions
Merging partitions does the opposite of splitting a partition, because it removes a range boundary instead
of adding one.
A partition function that uses a datetime column. To implement a sliding window, you should use
a datetime data type.
Partitions that map to the appropriate time period. For example, if you want to SWITCH out one
month's data at a time, each partition should contain only the data for a single month. You specify
the time periods by defining the boundary values in the partition function.
Empty partitions. Performing MERGE and SPLIT operations on empty partitions maintains the
number of partitions in the table, and makes the table easier to manage.
3-10 Advanced Table Designs
1. Create a partitioned table with four partitions, each of which represents a period of one month.
Partition 1 contains the oldest data, partition 2 contains the current data, partition 3 is empty, and
partition 4 is empty. The table looks like this:
Partition 2:
Partition 1: Partition 3:
Empty: load
Oldest Data Empty
current data
4. Split the other empty partition to return the table to the same state as it was in step 1.
If you use RANGE RIGHT to create the partition function, you can create and maintain a partitioned table
as described in the following example:
1. Create a partitioned table with four partitions, each of which represents a period of one month.
Partition 1 is empty, partition 2 contains the oldest data, partition 3 contains the current data, and
partition 4 is empty. The table looks like this:
Partition boundary values with RANGE LEFT. When partitioning on a column that uses the
datetime data type, you should choose the partition boundary value that you specify with RANGE
LEFT carefully. SQL Server performs explicit rounding of times in datetime values that can have
unexpected consequences. For example, if you create a partition function with a RANGE LEFT
boundary value of 2012-10-30 23:59:59.999, SQL Server will round this up to 2012-10-31
00:00:00.000; as a result, rows with the value of midnight will be added to the left partition instead
of the right. This could lead to inconsistencies because some rows for a particular date might be in a
different partition to the other rows with the same date. To avoid SQL Server performing rounding on
times in this way, specify the boundary value as 2012-10-30 23:59:59.997 instead of 2012-10-30
23:59:59.999; this will ensure that rows are added to partitions as expected. For the datetime2 and
datetimeoffset data types, you can specify a boundary of 2012-10-30 23:59:59.999 without
experiencing this problem. If you use RANGE RIGHT in the partition function, specifying a time value
of 00:00:00:000 will ensure that all rows for a single date are in the same partition, regardless of the
data type that you use.
CHECK constraint. You must create a check constraint on the staging table to which you will switch
the partition containing the old data. The check constraint should ensure that both partitions contain
dates for exactly the same period, and that NULL values are not allowed.
The code example below adds a check constraint to the Orders_Staging table:
Demonstration Steps
Creating a Partitioned Table
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are both running, and then
log on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
3. In the User Account Control dialog box, click Yes, and then if prompted with the question Do you
want to continue with this operation? type Y, then press Enter.
4. Start SQL Server Management Studio and connect to the MIA-SQL database engine instance using
Windows® authentication.
7. Select and execute the query under Step 1 to use the master database.
8. Select and execute the query under Step 2 to create four filegroups, and add a file to each filegroup.
3-12 Advanced Table Designs
9. Select and execute the query under Step 3 to switch to the AdventureWorks database.
10. Select and execute the query under Step 4 to create the partition function.
11. Select and execute the query under Step 5 to create the OrdersByYear partition scheme.
12. Select and execute the query under Step 6 to create the Sales.SalesOrderHeader_Partitioned table.
13. Select and execute the query under Step 7 to copy data into the
Sales.SalesOrderHeader_Partitioned table.
14. Select and execute the query under Step 8 to check the rows counts within each of the partitions.
15. Keep SQL Server Management Studio open for the next demonstration.
256
1,000
15,000
256,000
Developing SQL Databases 3-13
Lesson 2
Compressing Data
SQL Server includes the ability to compress data in SQL Server databases. Compression reduces the space
required to store data and can improve performance for workloads that are I/O intensive. This lesson
describes the options for using SQL Server compression and its benefits. It also describes the
considerations for planning data compression.
Note: In versions of SQL Server before SQL Server 2016 Service Pack 1, compression was
only available in Enterprise edition. In SQL Server 2016 Service Pack 1 and later, compression is
available in all editions of SQL Server.
Lesson Objectives
At the end of this lesson, you will be able to:
Describe the benefits of data compression.
Tables
Nonclustered indexes
Indexed views
Individual partitions in a partitioned table or index—each partition can be set to PAGE, ROW or
NONE
Spatial indexes
In SQL Server, you can implement compression in two ways: page compression and row compression. You
can also implement Unicode compression for the nchar(n) and nvarchar(n) data types.
3-14 Advanced Table Designs
Note: You can see the compression state of the partitions in a partitioned table by
querying the data_compression column of the sys.partitions catalog view.
Page Compression
Page compression takes advantage of data
redundancy to reclaim storage space. Each
compressed page compression includes a
structure called the compression information (CI)
structure below the page header. The CI structure
is used to store compression metadata.
2. Prefix compression. SQL Server scans each compressed column to identify values that have a
common prefix. It then records the prefixes in the CI structure and assigns an identifier for each
prefix—which it then uses in each column to replace the shared prefixes. Because an identifier is
usually much smaller than the prefix that it replaces, SQL Server can potentially reclaim a considerable
amount of space. For example, imagine a set of parts in a product table that all have an identifier that
begins TFG00, followed by a number. If the Products table contains a large number of these products,
prefix compression would eliminate the redundant TFG00 values from the column, and replace them
with a smaller alias.
3. Dictionary compression. Dictionary compression works in a similar way to prefix compression, but
instead of just identifying prefixes, dictionary compression identifies entire repeated values, and
replaces them with an identifier that is stored in the CI structure. For example, in the products table,
there is a column called Color that contains values such as Blue, Red, and Green that are repeated
extensively throughout the column. Dictionary compression would replace each color value with an
identifier, and store each color value in the CI structure, along with its corresponding identifier.
SQL Server processes these three compression operations in the order that they are shown in the previous
bullet list.
Developing SQL Databases 3-15
Row Compression
Row compression saves space by changing the
way it stores fixed length data types. Instead of
storing them as fixed length types, row
compression stores them in variable length format.
For example, the integer data type normally takes
up four bytes of space. If you had a column that
used the integer data type, the amount of space
that each row has in the column would vary,
depending on the values in the rows. A value of six
would only consume a single byte, whereas a
value of 6,000 would consume two bytes. Row
compression only works with certain types of data;
it does not affect variable length data types, or other data types including: xml, image, text, and ntext.
When you implement row compression for a table, SQL Server adds an extra four bits to each compressed
column to store the length of the data type. However, this small increase in size is normally outweighed
by the space saved. For NULL values, the four bits is the only space that they consume.
Unicode Compression
SQL Server Unicode compression uses the
Standard Compression Scheme for Unicode (SCSU)
algorithm to extend the compression capabilities
to include the Unicode data types nchar(n) and
nvarchar(n). When you implement row or page
compression on an object, Unicode columns are
automatically compressed by using SCSU. Note
that nvarchar(MAX) columns cannot be
compressed with row compression, but can
benefit from page compression.
Note: The compression ratio for Unicode compression varies between different languages,
particularly for languages whose alphabets contain significantly more characters. For example,
Unicode compression for the Japanese language yields just 15 percent savings.
3-16 Advanced Table Designs
If your data includes a large number of fixed length data types, you can potentially benefit from row
compression. For the greatest benefit, a large number of the values should consume less space than
the data types allow in total. For example, the smallint data type consumes 2 bytes of space. If a
smallint column contains values that are less than 256, you can save a whole byte of data for each
value. However, if the majority of the values are greater than this, the benefit of compressing the data
is less, because the overall percentage of space saved is lower.
If your data includes a large amount of redundant, repeating data, you might be able to benefit from
page compression. This applies to repeating prefixes, in addition to entire words or values.
Whether or not you will benefit from Unicode compression depends on the language that your data
is written in.
You can use the stored procedure sp_estimate_data_compression_savings to obtain an estimation of
the savings that you could make by compressing a table. When you execute
sp_estimate_data_compression_savings, you supply the schema name, the table name, the index id if
this is included in the calculation, the partition id if the table is partitioned, and the type of compression
(ROW, PAGE, or NONE). sp_estimate_data_compression_savings takes a representative sample of the
table data and places it in tempdb, where is it compressed, and then supplies a result set that displays the
potential savings that you could make by compressing the table.
The following code estimates the potential space savings of implementing row compression in the
Internet.Orders table:
sp_estimate_data_compression_savings
USE Sales;
GO
EXEC sp_estimate_data_compression_savings 'Internet', 'Orders', NULL, NULL, 'ROW’;
GO
I/O and more efficient in-memory page storage are greater for workloads that scan large amounts of
data, rather than for queries that return just a small subset of data.
To decide whether to implement compression, you must balance the performance improvements against
the cost, in terms of CPU resources, of compressing and uncompressing the data. For row compression,
this cost is typically a seven to 10 percent increase in CPU utilization. For page compression, this figure is
usually higher.
Two factors that can help you to assess the value of implementing compression for a table or index are:
1. The frequency of data change operations relative to other operations. The lower the percentage of
data change operations, such as updates, the greater the benefit of compression. Updates typically
require access to only a small part of the data, and so do not involve accessing a large number of
data pages.
2. The proportion of operations that involve a scan. The higher this value, the greater the benefit of
compression. Scans involve a large number of data pages, so you can improve performance
considerably if a significant percentage of the workload involves scans.
You can use the sys.dm_db_index_operational_stats dynamic management view (DMV) to obtain the
information required to assess the frequency of data change operations and the proportion of operations
that involve a scan.
Demonstration Steps
Compressing Data
1. In Server Management Studio, in Solution Explorer, open the 2 - Compressing Data.sql script file.
2. Select and execute the query under Step 1 to use the AdventureWorks database.
3. Select and execute the query under Step 2 to run the sp_estimate_data_compression_savings
procedure against the Sales.SalesOrderDetail table.
4. Select and execute the query under Step 3 to add row compression to the Sales.SalesOrderDetail
table.
5. Select and execute the code under Step 4 to run the sp_estimate_data_compression_savings
procedure against the Sales.SalesOrderDetail table to see if the table can be further compressed.
6. Select and execute the query under Step 5 to rebuild indexes 1 and 3.
8. Keep SQL Server Management Studio open for the next demonstration.
3-18 Advanced Table Designs
Lesson 3
Temporal Tables
Most developers, at some point in their careers, have faced the problem of capturing and storing changed
data, including what was changed, and when it was changed. In addition to the Slowing Changing
Dimension (SCD) component found in data warehousing, it is common to add triggers or custom code to
extract the changes and store this for future reference. The introduction of temporal tables means that
SQL Server can capture data change information automatically. These tables are also known as system-
versioned tables.
Lesson Objectives
After completing this lesson, you will be able to:
For a relationship to be established with the historical table, the current table must have a primary key.
This also means you can indirectly query the historical table to see the full history for any given record.
The historical table can be named at the time of creation, or SQL Server will give it a default name.
3-20 Advanced Table Designs
You must include the SysStartTime and SysEndTime columns, and the PERIOD FOR SYSTEM_TIME
which references these columns. You can change the name of these columns and change the references
to them in the PERIOD FOR SYSTEM_TIME parameters, as shown in the following code:
The Manager table in the following example has the date columns named as DateFrom and DateTo—
these names are also used in the history table:
The Manager table in the following example has the date columns named as DateFrom and DateTo—
these names are also used in the history table:
If you want SQL Server to name the historical table, run the preceding code excluding the
(HISTORY_TABLE = dbo.EmployeeHistory) clause. Furthermore, if you include the HIDDEN keyword when
specifying the start and end time columns, they don’t appear in the results of a SELECT * FROM statement.
However, you can specify the start and end column names in the select list to include them.
Developing SQL Databases 3-21
The history table must reside in the same database as the current table.
System-versioned tables are not compatible with FILETABLE or FILESTREAM features because SQL
Server cannot track changes that happen outside of itself.
Columns with a BLOB data type, such as varchar(max) or image, can result in high storage
requirements because the history table will store the history values as the same type.
INSERT and UPDATE statements cannot reference the SysStartTime or SysEndTime columns.
You cannot truncate a system-versioned table. Turn SYSTEM_VERSIONING OFF to truncate the table.
The primary key must be a nonclustered index; a clustered index is not compatible.
If SYSTEM_VERSIONING is changed from ON to OFF, the data in the staging buffer is moved to disk.
The staging table uses the same schema and the current table, but also includes a bigint column to
guarantee that the rows moved to the internal buffer history are unique. This bigint column adds 8
bytes, thereby reducing the maximum row size to 8052 bytes.
Staging tables are not visible in Object Explorer, but you can use the sys.internal_tables view to
acquire information about these objects.
If you include the HIDDEN keyword when specifying the start and end time columns, they don’t
appear in the results of a SELECT * FROM statement. However, you can specify the start and end
column names in the select list to include them.
The following code creates a memory-optimized table with system-versioning enabled. The start and end
columns are HIDDEN:
If you execute a SELECT query against a temporal table, all rows returned will be current data. The SELECT
clause is exactly the same query as you would use with a standard user table. To query data in your
temporal table for a given point in time, include the FOR SYSTEM_TIME clause with one of the five
subclauses for setting the datetime boundaries:
1. AS OF <date_time> accepts a single datetime parameter and returns the state of the data for the
specified point in time.
2. FROM <start_date_time> TO <end_date_time> returns all current and historical rows that were active
during the timespan, regardless of whether they were active before or after those times. The results
will include rows that were active precisely on the lower boundary defined by the FROM date;
however, it excludes rows that became inactive on the upper boundary defined by the TO date.
5. ALL returns all data from the current and historical tables with no restrictions.
The AS OF subclause returns the data at a given point in time. The following code uses the datetime2
format to return all employees on a specific date—in this case June 1, 2015—and active at 09:00.
Best Practice: If you want to return just historical data, use the CONTAINED IN subclause
for the best performance, as this only uses the history table for querying.
The FOR SYSTEM_TIME clause can be used to query both disk-based and memory-optimized temporal
tables. For more detailed information on querying temporal tables, see Microsoft Docs:
http://aka.ms/y1w3oq
Developing SQL Databases 3-25
Demonstration Steps
Adding System-Versioning to an Existing Table
1. In SQL Server Management Studio, in Solution Explorer, open the 3 - Temporal Tables.sql script file.
2. Select and execute the query under Step 1 to use the AdventureWorks database.
3. Select and execute the query under Step 2 Add the two date range columns, to add the two
columns, StartDate and EndDate, to the Person.Person table.
4. Select and execute the query under Step 2 Enable system-versioning, to alter the table and add
system-versioning.
5. In Object Explorer, expand Databases, expand AdventureWorks2016, right-click Tables, and click
then Refresh.
6. In the list of tables and point out the Person.Person table. The name includes (System-Versioned).
7. Expand the Person.Person (System-Versioned) table node to display the history table. Point out the
name of the table included (History).
8. Expand the Person.Person_History (History) node, and then expand Columns. Point out that the
column names are identical to the current table.
9. Select and execute the query under Step 4 to update the row in the Person.Person table for
BusinessEntityID 1704.
10. Select and execute the query under Step 5 to show the history of changes for BusinessEntityID 1704.
Objectives
After completing the lab exercises, you will be able to:
2. Create four filegroups for the HumanResources database: FG0, FG1, FG2, FG3.
3-28 Advanced Table Designs
5. Create a Timesheet table that will use the new partition scheme.
6. Insert some data into the Timesheet table.
2. Type and execute a Transact-SQL SELECT statement that returns all of the rows from the Timesheet
table, along with the partition number for each row. You can use the $PARTITION function to
achieve this.
6. Type a Transact-SQL statement to switch out the data in the partition on the filegroup FG1 to the
table Timesheet_Staging. Use the $PARTITION function to retrieve the partition number.
7. View the metadata for the partitioned table again to see the changes, and then write and execute a
SELECT statement to view the rows in the Timesheet_Staging table.
8. Type a Transact-SQL statement to merge the first two partitions, using the value 2011-10-01 00:00.
9. View the metadata for the partitioned table again to see the changes.
10. Type a Transact-SQL statement to make FG1 the next used filegroup for the partition scheme.
11. Type a Transact-SQL statement to split the first empty partition, using the value 2012-07-01 00:00.
12. Type and execute a Transact-SQL statement to add two rows for the new period.
13. View the metadata for the partitioned table again to see the changes.
Results: At the end of this lab, the timesheet data will be partitioned to archive old data.
Developing SQL Databases 3-29
3. Compress Partitions
2. Type and execute a T-SQL SELECT statement that drops the Payment.Timesheet table.
3. Type and execute a T-SQL SELECT statement that drops the psHumanResources partition scheme.
4. Type and execute a T-SQL SELECT statement that drops the pfHumanResourcesDates partition
function.
5. Type and execute a T-SQL SELECT statement that creates the pfHumanResourcesDates partition
function, using RANGE RIGHT for the values: 2012-12-31 00:00:00.000, 2014-12-31 00:00:00.000,
and 2016-12-31 00:00:00.000.
6. Type and execute a T-SQL SELECT statement that creates the pfHumanResourcesDates partition
scheme, using the filegroups FG0, FG2, FG3, and FG1.
7. Type and execute a T-SQL SELECT statement that creates a Payment.Timesheet table, using the
pfHumanResourcesDates partition scheme.
8. Type and execute a T-SQL SELECT statement that adds staff to three shifts, over the course of six
years. Exclude weekend dates.
Results: At the end of this lab, the Timesheet table will be populated with six years of data, and will be
partitioned and compressed.
3-30 Advanced Table Designs
Question: Discuss scenarios that you have experienced where you think partitioning would
have been beneficial. Have you worked with databases that could have had older data
archived? Were the databases large enough to split the partitions across physical drives for
better performance, or to quicken the backup process? Furthermore, could any of this data
be compressed? Give reasons for your answers.
Developing SQL Databases 3-31
How to apply data compression to reduce storage of your data, and increase query performance.
The benefits of using temporal tables to record all changes to your data.
Best Practice: One of the disadvantages of partitioning is that it can be complicated to set
up. However, you can use the Developer Edition to replicate your production systems and test
your partitioning scenario before applying it in your live environment. As with any major
database changes, it is always recommended that you take a backup before applying these
changes.
Review Question(s)
Question: What are the advantages of using system-versioning versus a custom-built
application to store data changes?
4-1
Module 4
Ensuring Data Integrity Through Constraints
Contents:
Module Overview 4-1
Lesson 1: Enforcing Data Integrity 4-2
Module Overview
The quality of data in your database largely determines the usefulness and effectiveness of applications
that rely on it—the success or failure of an organization or a business venture could depend on it.
Ensuring data integrity is a critical step in maintaining high-quality data.
You should enforce data integrity at all levels of an application from first entry or collection through
storage. Microsoft® SQL Server® data management software provides a range of features to simplify the
job.
Objectives
After completing this module, you will be able to:
Describe the options for enforcing data integrity, and the levels at which they should be applied.
Implement domain integrity through options such as check, unique, and default constraints.
Lesson 1
Enforcing Data Integrity
Data integrity refers to the consistency and accuracy of data that is stored in a database. An important
step in database planning is deciding the best way to enforce this.
Lesson Objectives
After completing this lesson, you will be able to:
Explain how data integrity checks apply across different layers of an application.
Explain the available options for enforcing each type of data integrity.
Application Levels
Applications often have a three-tier hierarchical
structure. This keeps related functionality together
and improves the maintainability of code, in
addition to improving the chance of code being
reusable. Common examples of application levels
are:
User interface level.
The main disadvantage of enforcing integrity at the user interface level is that more than a single
application might have to work with the same underlying data, and each application might enforce the
rules differently. It is also likely to require more lines of code to enforce business rule changes than may
be required at the data tier.
Middle Tier
Many integrity issues highlighted by the code are implemented for the purposes of business logic and
functional requirements, as opposed to checking the nonfunctional aspects of the requirements, such as
whether the data is in the correct format. The middle tier is often where the bulk of those requirements
exist in code, because they can apply to more than one application. In addition, multiple user interfaces
often reuse the middle tier. Implementing integrity at this level helps to avoid different user interfaces
applying different rules and checks at the user interface level. At this level, the logic is still quite aware of
Developing SQL Databases 4-3
the functions that cause errors, so the error messages generated and returned to the user can still be quite
specific.
It is also possible for integrity checks enforced only in the middle tier to compromise the integrity of the
data through a mixture of transactional inconsistencies due to optimistic locking, and race conditions,
caused by the multithreaded nature of programming models these days. For example, it might seem easy
to check that a customer exists and then place an order for that customer. Consider, though, the
possibility that another user could remove the customer between the time that you check for the
customer's existence and the time that you record the order. The requirement for transactional
consistency leads to the necessity for relational integrity of the data elements, which is where the services
of a data layer become imperative.
Data Tier
The advantage of implementing integrity at the data tier is that upper layers cannot bypass it. In
particular, multiple applications accessing the data simultaneously cannot compromise its quality—there
may even be multiple users connecting through tools such as SQL Server Management Studio (SSMS). If
referential integrity is not enforced at the data tier level, all applications and users need to individually
apply all the rules and checks themselves to ensure that the data is correct.
One of the issues with implementing data integrity constraints at the data tier is the separation between
the user actions that caused the errors to occur, and the data tier. This can lead to error messages being
precise in describing an issue, but difficult for an end user to understand unless the programmer has
ensured that appropriate functional metadata is passed between the system tiers. The cryptic nature of
the messages produced by the data tier has to be reprocessed by upper layers of code before
presentation to the end user.
Multiple Tiers
The correct solution in most situations involves applying rules and checks at multiple levels. However, the
challenge with this approach is in maintaining consistency between the rules and checks at different
application levels.
Domain Integrity
At the lowest level, SQL Server applies constraints
for a domain (or column) by limiting the choice of
data that can be entered, and whether nulls are
allowed. For example, if you only want whole
numbers to be entered, and don’t want alphabetic
characters, specify the INT (integer) data type.
Equally, assigning a TINYINT data type ensures
that only values from 0 to 255 can be stored in that column.
A check constraint can specify acceptable sets of data values, and what default values will be supplied in
the case of missing input.
4-4 Ensuring Data Integrity Through Constraints
Entity Integrity
Entity or table integrity ensures that each row within a table can be identified uniquely. This column (or
columns in the case of a composite key) is known as the table’s primary key. Whether the primary key
value can be changed or whether the whole row can be deleted depends on the level of integrity that is
required between the primary key and any other tables, based on referential integrity.
This is where the next level of integrity comes in, to ensure the changes are valid for a given relationship.
Referential Integrity
Referential integrity ensures that the relationships among the primary keys (in the referenced table) and
foreign keys (in the referencing tables) are maintained. You are not permitted to insert a value in the
referencing column that does not exist in the referenced column in the target table. A row in a referenced
table cannot be deleted, nor can the primary key be changed, if a foreign key refers to the row unless a
form of cascading action is permitted. You can define referential integrity relationships within the same
table or between separate tables.
As an example of referential integrity, you may have to ensure that an order cannot be placed for a
nonexistent customer.
Data Types
The first option for enforcing data integrity is to
ensure that only the correct type of data is stored
in a given column. For example, you cannot place
alphabetic characters into a column that has been
defined as storing integers.
The choice of a data type will also define the
permitted range of values that can be stored. For
example, the smallint data type can only contain values from –32,768 to 32,767.
For XML data (which is discussed in Module 14) XML schemas can be used to further constrain the data
that is held in the XML data type.
Null-ability Constraint
This determines whether a column can store a null value, or whether a value must be provided. This is
often referred to as whether a column is mandatory or not.
Default Values
If a column has been defined to not allow nulls, then a value must be provided whenever a new row is
inserted. With a default value, you can ignore the column during input and a specific value will be
inserted into the column when no value is supplied.
Check Constraint
Constraints are used to limit the permitted values in a column further than the limits that the data type,
null-ability and a default provides. For example, a tinyint column can have values from 0 to 255. You
might decide to further constrain the column so that only values between 1 and 9 are permitted.
Developing SQL Databases 4-5
You can also apply constraints at the table level and enforce relationships between the columns of a table.
For example, you might have a column that holds an order number, but it is not mandatory. You might
then add a constraint that specifies that the column must have a value if the Salesperson column also has
a value.
Triggers
Triggers are procedures, somewhat like stored procedures, that are executed whenever specific events
such as an INSERT or UPDATE occur. Triggers are executed automatically whenever a specific event
occurs. Within the trigger code, you can enforce more complex rules for integrity. Triggers are discussed
in Module 11.
Sequencing Activity
Put the following constraint types in order by numbering each to indicate the order of importance to
minimize constraint checking effort.
Steps
Indicate column
null-ability.
Indicate column
default value.
Indicate a check
constraint.
Write a trigger to
control the
column contents.
4-6 Ensuring Data Integrity Through Constraints
Lesson 2
Implementing Data Domain Integrity
Domain integrity limits the range and type of values that can be stored in a column. It is usually the most
important form of data integrity when first designing a database. If domain integrity is not enforced,
processing errors can occur when unexpected or out-of-range values are encountered.
Lesson Objectives
After completing this lesson, you will be able to:
Describe how you can use data types to enforce basic domain integrity.
Describe the default value’s null-ability when entering the domain set as a valid value.
Describe how you can use an additional DEFAULT constraint to provide a non-null default value for
the column.
Describe how you can use CHECK constraints to enforce domain integrity beyond a null and default
value.
Data Types
Choosing an appropriate data type for each
column is one of the most important decisions
that you must make when you are designing a
table as part of a database. Data types were
discussed in detail in Module 2.
You can assign data types to a column by using
one of the following methods:
An additional advantage of alias data types is that code generation tools can create more consistent code
when the tools have the additional information about the data types that alias data types provide. For
example, you could decide to have a user interface design program that always displayed and/or
prompted for product weights in a specific way.
DEFAULT Constraints
A DEFAULT constraint provides a value for a
column when no value is specified in the
statement that inserted the row. You can view the
existing definition of DEFAULT constraints by
querying the sys.default_constraints view.
DEFAULT Constraint
Sometimes a column is mandatory—that is, a
value must be provided. However, the application
or program that is inserting the row might not
provide a value. In this case, you may want to
apply a value to ensure that the row will be
inserted.
DEFAULT constraints are associated with a table column. They are used to provide a default value for the
column when the user does not supply a value. The value is retrieved from the evaluation of an expression
and the data type that the expression returns must be compatible with the data type of the column.
Note: If the statement that inserted the row explicitly inserted NULL, the default value
would not be used.
Named Constraints
SQL Server does not require you to supply names for constraints that you create. If a name is not supplied,
SQL Server will automatically generate a name. However, the names that are generated are not very
intuitive. Therefore, it is generally considered a good idea to provide names for constraints as you create
them—and to do so using a naming standard.
4-8 Ensuring Data Integrity Through Constraints
A good example of why naming constraints is important is that, if a column needs to be deleted, you must
first remove any constraints that are associated with the column. Dropping a constraint requires you to
provide a name for the constraint that you are dropping. Having a consistent naming standard for
constraints helps you to know what that name is likely to be, rather than having to execute a query to find
the name. Locating the name of a constraint would involve querying the sys.constraints system view,
searching in Object Explorer, or selecting the relevant data from the INFORMATION_SCHEMA.
CONSTRAINTS catalogue view.
CHECK Constraints
A CHECK constraint limits the values that a column
can accept by controlling the values that can be
put in the column.
Logical Expression
CHECK constraints work with any logical (Boolean) expression that can return TRUE, FALSE, or
UNKNOWN. Particular care must be given to any expression that could have a NULL return value. CHECK
constraints reject values that evaluate to FALSE, but not an unknown return value, which is what NULL
evaluates to.
Developing SQL Databases 4-9
Check Constraint
CREATE TABLE Sales.opportunity
(
opportunity1D int NOT NULL,
requirements nvarchar(50) NOT NULL,
salesperson1D int NOT NULL,
rating int NOT NULL
CONSTRAINT CK_Opportunity_Rating1to4
CHECK (Rating BETWEEN 1 AND 4)
);
For more information about column level constraints, see column_constraint (Transact SQL) in Microsoft
Docs:
Demonstration Steps
Enforce Data and Domain Integrity
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running and then log
on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
5. In the Connect to Server dialog box, in the Server name box, type MIA-SQL and then click
Connect.
7. In the Open Project dialog box, navigate to D:\Demofiles\Mod04, click Demo04.ssmssln, and then
click Open.
8. In Solution Explorer, expand the Queries folder, and double-click 21 - Demonstration 2A.sql.
4-10 Ensuring Data Integrity Through Constraints
9. Familiarize yourself with the requirement using the code below Step 1: Review the requirements
for a table design.
10. Place the pseudo code for your findings for the requirements below Step 2: Determine the data
types, null-ability, default and check constraints that should be put in place.
11. Highlight the code below Step 3: Check the outcome with this proposed solution, and click
Execute.
12. Highlight the code below Step 4: Execute statements to test the actions of the integrity
constraints, and click Execute.
13. Highlight the code below Step 5: INSERT rows that test the nullability and constraints, and click
Execute. Note the errors.
14. Highlight the code below Step 6: Query sys.sysconstraints to see the list of constraints, and click
Execute.
15. Highlight the code below Step 7: Explore system catalog views through the
INFORMATION_SCHEMA owner, and click Execute.
16. Close SQL Server Management Studio, without saving any changes.
Verify the correctness of the statement by placing a mark in the column to the right.
Statement Answer
True or
false?
When you
have a
check
constraint
on a
column, it
is not
worth
having a
NOT
NULL
constraint
because
any nulls
will be
filtered
out by the
check
constraint.
Developing SQL Databases 4-11
Lesson 3
Implementing Entity and Referential Integrity
It is important to be able identify rows within tables uniquely and to be able to establish relationships
across tables. For example, if you have to ensure that an individual can be identified as an existing
customer before an order can be placed, you can enforce this by using a combination of entity and
referential integrity.
Lesson Objectives
After completing this lesson, you will be able to:
Describe how UNIQUE constraints are sometimes used instead of PRIMARY KEY constraints.
As with other types of constraints, even though a name is not required when defining a PRIMARY KEY
constraint, it is preferable to choose a name for the constraint, rather than leaving SQL Server to do so.
4-12 Ensuring Data Integrity Through Constraints
For more information about how to create primary keys, see Create Primary Keys in the SQL Server
Technical Documentation:
UNIQUE Constraints
A UNIQUE constraint indicates that the column or
combination of columns is unique. One row can
be NULL (if the column null-ability permits this).
SQL Server will internally create an index to
support the UNIQUE constraint.
If you were storing a tax identifier for employees in Spain, you would store one of these values, include a
CHECK constraint to make sure that the value was in one of the two valid formats, and have a UNIQUE
constraint on the column that stores these values. Note that this may be unrelated to the fact that the
table has another unique identifier, such as EmployeeID, used as a primary key for the table.
As with other types of constraints, even though a name is not required when defining a UNIQUE
constraint, you should choose a name for the constraint rather than leaving SQL Server to do so.
Unique Constraint
CREATE TABLE Sales.Opportunity
(
OpportunityID int NOT NULL
CONSTRAINT PK_Opportunity PRIMARY KEY,
Requirements nvarchar(50) NOT NULL
CONSTRAINT UQ_Opportunity_Requirements UNIQUE,
ReceivedDate date NOT NULL
);
Developing SQL Databases 4-13
IDENTITY Constraints
It is common to need automatically generated
numbers for an integer primary key column. The
IDENTITY property on a database column indicates
that an INSERT statement will not provide the
value for the column; instead, SQL Server will
provide it automatically.
The following code adds the IDENTITY property to the OpportunityID column:
IDENTITY property
CREATE TABLE Sales.Opportunity
(
Opportunity1D int NOT NULL IDENTITY(1,1),
Requirements nvarchar(50) NOT NULL,
ReceivedDate date NOT NULL,
SalespersonID int NULL
);
Although explicit inserts are not normally permitted for columns that have an IDENTITY property, you can
explicitly insert values. You can do this by using a table setting option, SET IDENTITY_INSERT customer
ON. With this option, you can explicitly insert values into the column with the IDENTITY property within
the customer table. Remember to switch the automatic generation back on after you have inserted
exceptional rows.
4-14 Ensuring Data Integrity Through Constraints
Note: Having the IDENTITY property on a column does not ensure that the column is
unique. Define a UNIQUE constraint to guarantee that values in the column will be unique.
For example, if you insert a row into a customer table, the customer might be assigned a new identity
value. However, if a trigger on the customer table caused an entry to be written into an audit logging
table, when inserts are performed, the @@IDENTITY variable would return the identity value from the
audit logging table, rather than the one from the customer table.
To deal with this effectively, the SCOPE_IDENTITY() function was introduced. It provides the last identity
value but only within the current scope. In the previous example, it would return the identity value from
the customer table.
Sequences
Sequences are another way of creating values for
insertion into a column as sequential numbers.
However, unlike IDENTITY properties, sequences
are not tied to any specific table. This means that
you could use a single sequence to provide key
values for a group of tables.
Sequences are created by the CREATE SEQUENCE statement, modified by the ALTER SEQUENCE
statement, and deleted by the DROP SEQUENCE statement.
Developing SQL Databases 4-15
Other database engines provide sequence values, so the addition of sequence support in SQL Server 2012
and SQL Server 2014 can assist with migrating code to SQL Server from other database engines.
A range of sequence values can be retrieved in a single call via the sp_sequence_get_range system stored
procedure. There are also options to cache sets of sequence values to improve performance. However,
when a server failure occurs, the entire cached set of values is lost.
Values that are retrieved from the sequence are not available for reuse. This means that gaps can occur in
the set of sequence values.
The following code shows how to create and use a sequence object:
SEQUENCE
CREATE SEQUENCE Booking.Booking1D AS INT
START WITH 20001
INCREMENT BY 10;
GO
For more information about sequences, see Sequence Properties (General Page) in Microsoft Docs:
Work with identity constraints, create a sequence, and use a sequence to provide key values for two
tables.
Demonstration Steps
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running and then log
on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
5. In the Connect to Server dialog box, in the Server name box, type MIA-SQL and then click
Connect.
7. In the Open Project dialog box, navigate to D:\Demofiles\Mod04, click Demo04.ssmssln, and then
click Open.
9. Highlight the code below Step 1: Open a new query window to the tempdb database, and click
Execute.
10. Highlight the code below Step 2: Create the dbo.Opportunity table, and click Execute.
11. Highlight the code below Step 3: Populate the table with two rows, and click Execute.
12. Highlight the code below Step 4: Check the identity values added, and click Execute.
13. Highlight the code below Step 5: Try to insert a specific value for OpportunityID, and click
Execute. Note the error.
14. Highlight the code below Step 6: Add a row without a value for LikelyClosingDate, and click
Execute.
15. Highlight the code below Step 7: Query the table to see the value in the LikelyClosingDate
column, and click Execute.
16. Highlight the code below Step 8: Create 3 Tables with separate identity columns, and click
Execute.
17. Highlight the code below Step 9: Insert some rows into each table, and click Execute.
18. Highlight the code below Step 10: Query the 3 tables in a single view and note the overlapping
ID values, then click Execute.
19. Highlight the code below Step 11: Drop the tables, and click Execute.
20. Highlight the code below Step 12: Create a sequence to use with all 3 tables, and click Execute.
21. Highlight the code below Step 13: Recreate the tables using the sequence for default values, and
click execute.
22. Highlight the code below Step 14: Reinsert the same data, and click Execute.
23. Highlight the code below Step 15: Note the values now appearing in the view, and click execute.
24. Highlight the code below Step 16: Note that sequence values can be created on the fly, and click
Execute.
25. Highlight the code below Step 17: Re-execute the same code and note that the sequence values,
and click Execute.
26. Highlight the code below Step 18: Note that when the same entry is used multiple times in a
SELECT statement, that the same value is used, and click Execute.
27. Highlight the code below Step 19: Fetch a range of sequence values, and click Execute.
28. Close SQL Server Management Studio, without saving any changes.
Developing SQL Databases 4-17
As with other types of constraints, even though a name is not required when defining a FOREIGN KEY
constraint, you should provide a name rather than leaving SQL Server to do so.
Defining a foreign key constraint.
Permission Requirement
Before you can place a FOREIGN KEY constraint on a table, you must have CREATE TABLE and ALTER
TABLE permissions.
The REFERENCES permission on the target table avoids the situation where another user could place a
reference to one of your tables, leaving you unable to drop or substantially change your own table until
the other user removed that reference. However, in terms of security, remember that providing
REFERENCES permission to a user on a table for which they do not have SELECT permission does not
totally prevent them from working out what the data in the table is. This might be done by a brute force
attempt that involves trying all possible values.
4-18 Ensuring Data Integrity Through Constraints
Note: Changes to the structure of the referenced column are limited while it is referenced
in a FOREIGN KEY. For example, you cannot change the size of the column when the relationship
is in place.
Note: The NOCHECK applies to a foreign key constraint in addition to other constraints
defined on the table. This prevents the constraint from checking data that is already present. This
is useful if a constraint should be applied to all new records, but existing data does not have to
meet the criteria.
For more information about defining foreign keys, see Create Foreign Key Relationships in Microsoft Docs:
2. CASCADE makes the required changes to the referencing tables. If the customer is being deleted, his
or her orders will also be deleted. If the customer primary key is being updated (although note that
this is undesirable), the customer key in the orders table will also be updated so that the orders still
refer to the correct customer.
3. SET DEFAULT causes the values in the columns in the referencing table to be set to their default
values. This provides more control than the SET NULL option, which always sets the values to NULL.
4. SET NULL causes the values in the columns in the referencing table to be nullified. For the customer
and orders example, this means that the orders would still exist, but they would not refer to any
customer.
Developing SQL Databases 4-19
Caution
Although cascading referential integrity is easy to set up, you should be careful when using it within
database designs.
For example, if you used the CASCADE option in the example above, would it really be okay for the orders
for the customer to be removed when you remove the customer? When you remove the customer, you
also delete their orders. There might be other tables that reference the orders table (such as order details
or invoices), and these would also be removed if they had a cascade relationship set up.
Naming
Specify meaningful names for constraints rather
than leaving SQL Server to select a name. SQL
Server provides complicated system-generated
names. Often, you have to refer to constraints by
name. Therefore, it is better to have chosen them
yourself using a consistent naming convention.
Changing Constraints
You can create, alter, or drop constraints without
having to drop and recreate the underlying table.
You use the ALTER TABLE statement to add, alter, or drop constraints.
Demonstration Steps
Define entity integrity for a table, define referential integrity for a table, and define cascading
referential integrity actions for the constraint.
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running, and then log
on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
10. Highlight the code below Step 2: Create the Customer and CustomerOrder tables, and click
Execute.
11. Highlight the code below Step 3: Select the list of customers, and click Execute.
12. Highlight the code below Step 4: Try to insert a CustomerOrder row for an invalid customer, and
click Execute. Note the error message.
13. Highlight the code below Step 5: Try to remove a customer that has an order, and click Execute.
Note the error message.
14. Highlight the code below Step 6: Replace it with a named constraint with cascade, and click
Execute.
15. Highlight the code below Step 7: Select the list of customer orders, try a delete again, and click
Execute.
16. Highlight the code below Step 8: Note how the cascade option caused the orders, and click
Execute.
17. Highlight the code below Step 9: Try to drop the referenced table and note the error, then click
Execute. Note the error message.
18. Close SQL Server Management Studio, without saving any changes.
Developing SQL Databases 4-21
ON DELETE CASCADE
ON DELETE RESTRICT
Column Name
Data Type Required Validation Rule
Objectives
After completing this lab, you will be able to:
Use the ALTER TABLE statement to adjust the constraints on existing tables.
Create and test a CASCADING REFERENTIAL INTEGRITY constraint for a FOREIGN KEY and a PRIMARY
KEY.
Password: Pa55w.rd
Developing SQL Databases 4-23
3. In File Explorer, navigate to the D:\Labfiles\Lab04\Starter folder, right-click the Setup.cmd file, and
then click Run as administrator.
4. In the User Account Control dialog box, click Yes, and then wait for the script to finish.
2. Work through the list of requirements and alter the table to make columns the primary key, based on
the requirements.
3. Work through the list of requirements and alter the table to make columns foreign keys, based on the
requirements.
4. Work through the list of requirements and alter the table to add DEFAULT constraints to columns,
based on the requirements.
Results: After completing this exercise, you should have successfully tested your constraints.
Question: Why implement CHECK constraints if an application is already checking the input
data?
Question: What are some scenarios in which you might want to temporarily disable
constraint checking?
Verify the correctness of the statement by placing a mark in the column to the right.
Statement Answer
True or
false? A
PRIMARY
KEY and a
UNIQUE
constraint
are doing
the same
thing
using
different
code
words.
Developing SQL Databases 4-25
Review Question(s)
Question: Would you consider that you need to CHECK constraints if an application is
already checking the input data?
Question: What are some scenarios in which you might want to temporarily disable
constraint checking?