SQL DW
SQL DW
SQL Data Warehouse is a cloud-based Enterprise Data Warehouse (EDW ) that uses Massively Parallel
Processing (MPP ) to quickly run complex queries across petabytes of data. Use SQL Data Warehouse as a key
component of a big data solution. Import big data into SQL Data Warehouse with simple PolyBase T-SQL
queries, and then use the power of MPP to run high-performance analytics. As you integrate and analyze, the
data warehouse will become the single version of truth your business can count on for insights.
In a cloud data solution, data is ingested into big data stores from a variety of sources. Once in a big data store,
Hadoop, Spark, and machine learning algorithms prepare and train the data. When the data is ready for complex
analysis, SQL Data Warehouse uses PolyBase to query the big data stores. PolyBase uses standard T-SQL
queries to bring the data into SQL Data Warehouse.
SQL Data Warehouse stores data into relational tables with columnar storage. This format significantly reduces
the data storage costs, and improves query performance. Once data is stored in SQL Data Warehouse, you can
run analytics at massive scale. Compared to traditional database systems, analysis queries finish in seconds
instead of minutes, or hours instead of days.
The analysis results can go to worldwide reporting databases or applications. Business analysts can then gain
insights to make well-informed business decisions.
Next steps
Explore Azure SQL Data Warehouse architecture
Quickly create a SQL Data Warehouse
Load sample data.
Explore Videos
Or look at some of these other SQL Data Warehouse Resources.
Search Blogs
Submit a Feature requests
Search Customer Advisory Team blogs
Create a support ticket
Search MSDN forum
Search Stack Overflow forum
Azure SQL Data Warehouse - Massively parallel
processing (MPP) architecture
6/4/2019 • 5 minutes to read • Edit Online
Learn how Azure SQL Data Warehouse combines massively parallel processing (MPP ) with Azure storage to
achieve high performance and scalability.
SQL Data Warehouse uses a node-based architecture. Applications connect and issue T-SQL commands to a
Control node, which is the single point of entry for the data warehouse. The Control node runs the MPP engine
which optimizes queries for parallel processing, and then passes operations to Compute nodes to do their work in
parallel. The Compute nodes store all user data in Azure Storage and run the parallel queries. The Data Movement
Service (DMS ) is a system-level internal service that moves data across the nodes as necessary to run queries in
parallel and return accurate results.
With decoupled storage and compute, SQL Data Warehouse can:
Independently size compute power irrespective of your storage needs.
Grow or shrink compute power without moving data.
Pause compute capacity while leaving data intact, so you only pay for storage.
Resume compute capacity during operational hours.
Azure storage
SQL Data Warehouse uses Azure storage to keep your user data safe. Since your data is stored and managed by
Azure storage, SQL Data Warehouse charges separately for your storage consumption. The data itself is sharded
into distributions to optimize the performance of the system. You can choose which sharding pattern to use to
distribute the data when you define the table. SQL Data Warehouse supports these sharding patterns:
Hash
Round Robin
Replicate
Control node
The Control node is the brain of the data warehouse. It is the front end that interacts with all applications and
connections. The MPP engine runs on the Control node to optimize and coordinate parallel queries. When you
submit a T-SQL query to SQL Data Warehouse, the Control node transforms it into queries that run against each
distribution in parallel.
Compute nodes
The Compute nodes provide the computational power. Distributions map to Compute nodes for processing. As
you pay for more compute resources, SQL Data Warehouse re-maps the distributions to the available Compute
nodes. The number of compute nodes ranges from 1 to 60, and is determined by the service level for the data
warehouse.
Each Compute node has a node ID that is visible in system views. You can see the Compute node ID by looking for
the node_id column in system views whose names begin with sys.pdw_nodes. For a list of these system views, see
MPP system views.
Data Movement Service
Data Movement Service (DMS ) is the data transport technology that coordinates data movement between the
Compute nodes. Some queries require data movement to ensure the parallel queries return accurate results. When
data movement is required, DMS ensures the right data gets to the right location.
Distributions
A distribution is the basic unit of storage and processing for parallel queries that run on distributed data. When
SQL Data Warehouse runs a query, the work is divided into 60 smaller queries that run in parallel. Each of the 60
smaller queries runs on one of the data distributions. Each Compute node manages one or more of the 60
distributions. A data warehouse with maximum compute resources has one distribution per Compute node. A data
warehouse with minimum compute resources has all the distributions on one compute node.
Hash-distributed tables
A hash distributed table can deliver the highest query performance for joins and aggregations on large tables.
To shard data into a hash-distributed table, SQL Data Warehouse uses a hash function to deterministically assign
each row to one distribution. In the table definition, one of the columns is designated as the distribution column.
The hash function uses the values in the distribution column to assign each row to a distribution.
The following diagram illustrates how a full (non-distributed table) gets stored as a hash-distributed table.
Each row belongs to one distribution.
A deterministic hash algorithm assigns each row to one distribution.
The number of table rows per distribution varies as shown by the different sizes of tables.
There are performance considerations for the selection of a distribution column, such as distinctness, data skew,
and the types of queries that run on the system.
Replicated Tables
A replicated table provides the fastest query performance for small tables.
A table that is replicated caches a full copy of the table on each compute node. Consequently, replicating a table
removes the need to transfer data among compute nodes before a join or aggregation. Replicated tables are best
utilized with small tables. Extra storage is required and there is additional overhead that is incurred when writing
data which make large tables impractical.
The following diagram shows a replicated table. For SQL Data Warehouse, the replicated table is cached on the
first distribution on each compute node.
Next steps
Now that you know a bit about SQL Data Warehouse, learn how to quickly create a SQL Data Warehouse and
load sample data. If you are new to Azure, you may find the Azure glossary helpful as you encounter new
terminology. Or look at some of these other SQL Data Warehouse Resources.
Customer success stories
Blogs
Feature requests
Videos
Customer Advisory Team blogs
Create support ticket
MSDN forum
Stack Overflow forum
Twitter
Data Warehouse Units (DWUs) and compute Data
Warehouse Units (cDWUs)
8/15/2019 • 7 minutes to read • Edit Online
Recommendations on choosing the ideal number of data warehouse units (DWUs, cDWUs) to optimize price and
performance, and how to change the number of units.
NOTE
Azure SQL Data Warehouse Gen2 recently added additional scale capabilities to support compute tiers as low as 100
cDWU. Existing data warehouses currently on Gen1 that require the lower compute tiers can now upgrade to Gen2 in the
regions that are currently available for no additional cost. If your region is not yet supported, you can still upgrade to a
supported region. For more information, see Upgrade to Gen2.
In T-SQL, the SERVICE_OBJECTIVE setting determines the service level and the performance tier for your data
warehouse.
--Gen1
CREATE DATABASE myElasticSQLDW
WITH
( SERVICE_OBJECTIVE = 'DW1000'
)
;
--Gen2
CREATE DATABASE myComputeSQLDW
(Edition = 'Datawarehouse'
,SERVICE_OBJECTIVE = 'DW1000c'
)
;
Permissions
Changing the data warehouse units requires the permissions described in ALTER DATABASE.
Built-in roles for Azure resources such as SQL DB Contributor and SQL Server Contributor can change DWU
settings.
NOTE
This article has been updated to use the new Azure PowerShell Az module. You can still use the AzureRM module, which will
continue to receive bug fixes until at least December 2020. To learn more about the new Az module and AzureRM
compatibility, see Introducing the new Azure PowerShell Az module. For Az module installation instructions, see Install
Azure PowerShell.
To change the DWUs or cDWUs, use the Set-AzSqlDatabase PowerShell cmdlet. The following example sets the
service level objective to DW1000 for the database MySQLDW that is hosted on server MyServer.
For more information, see PowerShell cmdlets for SQL Data Warehouse
T -SQL
With T-SQL you can view the current DWU or cDWU settings, change the settings, and check the progress.
To change the DWUs or cDWUs:
1. Connect to the master database associated with your logical SQL Database server.
2. Use the ALTER DATABASE TSQL statement. The following example sets the service level objective to
DW1000 for the database MySQLDW.
REST APIs
To change the DWUs, use the Create or Update Database REST API. The following example sets the service level
objective to DW1000 for the database MySQLDW, which is hosted on server MyServer. The server is in an Azure
resource group named ResourceGroup1.
PUT https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-
name}/providers/Microsoft.Sql/servers/{server-name}/databases/{database-name}?api-version=2014-04-01-preview
HTTP/1.1
Content-Type: application/json; charset=UTF-8
{
"properties": {
"requestedServiceObjectiveName": DW1000
}
}
For more REST API examples, see REST APIs for SQL Data Warehouse.
SELECT *
FROM sys.databases
;
This DMV returns information about various management operations on your SQL Data Warehouse such as the
operation and the state of the operation, which is either IN_PROGRESS or COMPLETED.
Next steps
To learn more about managing performance, see Resource classes for workload management and Memory and
concurrency limits.
Cheat sheet for Azure SQL Data Warehouse
8/30/2019 • 6 minutes to read • Edit Online
This cheat sheet provides helpful tips and best practices for building your Azure SQL Data Warehouse solutions.
Before you get started, learn more about each step in detail by reading Azure SQL Data Warehouse Workload
Patterns and Anti-Patterns, which explains what SQL Data Warehouse is and what it is not.
The following graphic shows the process of designing a data warehouse:
Data migration
First, load your data into Azure Data Lake Storage or Azure Blob storage. Next, use PolyBase to load your data into
SQL Data Warehouse in a staging table. Use the following configuration:
DESIGN RECOMMENDATION
Indexing Heap
Partitioning None
Learn more about data migration, data loading, and the Extract, Load, and Transform (ELT) process.
Replicated • Small dimension tables in a star • Many write transactions are on table
schema with less than 2 GB of storage (such as insert, upsert, delete, update)
after compression (~5x compression) • You change Data Warehouse Units
(DWU) provisioning frequently
• You only use 2-3 columns but your
table has many columns
• You index a replicated table
Tips:
Start with Round Robin, but aspire to a hash distribution strategy to take advantage of a massively parallel
architecture.
Make sure that common hash keys have the same data format.
Don’t distribute on varchar format.
Dimension tables with a common hash key to a fact table with frequent join operations can be hash distributed.
Use sys.dm_pdw_nodes_db_partition_stats to analyze any skewness in the data.
Use sys.dm_pdw_request_steps to analyze data movements behind queries, monitor the time broadcast, and
shuffle operations take. This is helpful to review your distribution strategy.
Learn more about replicated tables and distributed tables.
Clustered index • Tables with up to 100 million rows • Used on a replicated table
• Large tables (more than 100 million • You have complex queries involving
rows) with only 1-2 columns heavily multiple join and Group By operations
used • You make updates on the indexed
columns: it takes memory
Clustered columnstore index (CCI) • Large tables (more than 100 million • Used on a replicated table
(default) rows) • You make massive update operations
on your table
• You overpartition your table: row
groups do not span across different
distribution nodes and partitions
Tips:
On top of a clustered index, you might want to add a nonclustered index to a column heavily used for filtering.
Be careful how you manage the memory on a table with CCI. When you load data, you want the user (or the
query) to benefit from a large resource class. Make sure to avoid trimming and creating many small
compressed row groups.
On Gen2, CCI tables are cached locally on the compute nodes to maximize performance.
For CCI, slow performance can happen due to poor compression of your row groups. If this occurs, rebuild or
reorganize your CCI. You want at least 100,000 rows per compressed row groups. The ideal is 1 million rows in
a row group.
Based on the incremental load frequency and size, you want to automate when you reorganize or rebuild your
indexes. Spring cleaning is always helpful.
Be strategic when you want to trim a row group. How large are the open row groups? How much data do you
expect to load in the coming days?
Learn more about indexes.
Partitioning
You might partition your table when you have a large fact table (greater than 1 billion rows). In 99 percent of cases,
the partition key should be based on date. Be careful to not overpartition, especially when you have a clustered
columnstore index.
With staging tables that require ELT, you can benefit from partitioning. It facilitates data lifecycle management. Be
careful not to overpartition your data, especially on a clustered columnstore index.
Learn more about partitions.
Incremental load
If you're going to incrementally load your data, first make sure that you allocate larger resource classes to loading
your data. This is particularly important when loading into tables with clustered columnstore indexes. See resource
classes for further details.
We recommend using PolyBase and ADF V2 for automating your ELT pipelines into SQL Data Warehouse.
For a large batch of updates in your historical data, consider using a CTAS to write the data you want to keep in a
table rather than using INSERT, UPDATE, and DELETE.
Maintain statistics
Until auto-statistics are generally available, SQL Data Warehouse requires manual maintenance of statistics. It's
important to update statistics as significant changes happen to your data. This helps optimize your query plans. If
you find that it takes too long to maintain all of your statistics, be more selective about which columns have
statistics.
You can also define the frequency of the updates. For example, you might want to update date columns, where new
values might be added, on a daily basis. You gain the most benefit by having statistics on columns involved in joins,
columns used in the WHERE clause, and columns found in GROUP BY.
Learn more about statistics.
Resource class
SQL Data Warehouse uses resource groups as a way to allocate memory to queries. If you need more memory to
improve query or loading speed, you should allocate higher resource classes. On the flip side, using larger resource
classes impacts concurrency. You want to take that into consideration before moving all of your users to a large
resource class.
If you notice that queries take too long, check that your users do not run in large resource classes. Large resource
classes consume many concurrency slots. They can cause other queries to queue up.
Finally, by using Gen2 of SQL Data Warehouse, each resource class gets 2.5 times more memory than Gen1.
Learn more how to work with resource classes and concurrency.
This article is a collection of best practices to help you to achieve optimal performance from your Azure SQL Data
Warehouse. Some of the concepts in this article are basic and easy to explain, other concepts are more advanced
and we just scratch the surface in this article. The purpose of this article is to give you some basic guidance and to
raise awareness of important areas to focus as you build your data warehouse. Each section introduces you to a
concept and then point you to more detailed articles which cover the concept in more depth.
If you are just getting started with Azure SQL Data Warehouse, do not let this article overwhelm you. The
sequence of the topics is mostly in the order of importance. If you start by focusing on the first few concepts, you'll
be in good shape. As you get more familiar and comfortable with using SQL Data Warehouse, come back and look
at a few more concepts. It won't take long for everything to make sense.
For loading guidance, see Guidance for loading data.
Maintain statistics
Unlike SQL Server, which automatically detects and creates or updates statistics on columns, SQL Data
Warehouse requires manual maintenance of statistics. While we do plan to change this in the future, for now you
will want to maintain your statistics to ensure that the SQL Data Warehouse plans are optimized. The plans
created by the optimizer are only as good as the available statistics. Creating sampled statistics on every
column is an easy way to get started with statistics. It's equally important to update statistics as significant
changes happen to your data. A conservative approach may be to update your statistics daily or after each load.
There are always trade-offs between performance and the cost to create and update statistics. If you find it is
taking too long to maintain all of your statistics, you may want to try to be more selective about which columns
have statistics or which columns need frequent updating. For example, you might want to update date columns,
where new values may be added, daily. You will gain the most benefit by having statistics on columns
involved in joins, columns used in the WHERE clause and columns found in GROUP BY.
See also Manage table statistics, CREATE STATISTICS, UPDATE STATISTICS
Do not over-partition
While partitioning data can be very effective for maintaining your data through partition switching or optimizing
scans by with partition elimination, having too many partitions can slow down your queries. Often a high
granularity partitioning strategy which may work well on SQL Server may not work well on SQL Data Warehouse.
Having too many partitions can also reduce the effectiveness of clustered columnstore indexes if each partition has
fewer than 1 million rows. Keep in mind that behind the scenes, SQL Data Warehouse partitions your data for you
into 60 databases, so if you create a table with 100 partitions, this actually results in 6000 partitions. Each
workload is different so the best advice is to experiment with partitioning to see what works best for your
workload. Consider lower granularity than what may have worked for you in SQL Server. For example, consider
using weekly or monthly partitions rather than daily partitions.
See also Table partitioning
Other resources
Also see our Troubleshooting article for common issues and solutions.
If you didn't find what you were looking for in this article, try using the "Search for docs" on the left side of this
page to search all of the Azure SQL Data Warehouse documents. The Azure SQL Data Warehouse Forum is a
place for you to ask questions to other users and to the SQL Data Warehouse Product Group. We actively monitor
this forum to ensure that your questions are answered either by another user or one of us. If you prefer to ask your
questions on Stack Overflow, we also have an Azure SQL Data Warehouse Stack Overflow Forum.
Finally, please do use the Azure SQL Data Warehouse Feedback page to make feature requests. Adding your
requests or up-voting other requests really helps us prioritize features.
SQL Data Warehouse capacity limits
9/30/2019 • 5 minutes to read • Edit Online
Maximum values allowed for various components of Azure SQL Data Warehouse.
Workload management
CATEGORY DESCRIPTION MAXIMUM
Data Warehouse Units (DWU) Max DWU for a single SQL Data Gen1: DW6000
Warehouse Gen2: DW30000c
Database objects
CATEGORY DESCRIPTION MAXIMUM
Loads
CATEGORY DESCRIPTION MAXIMUM
Queries
CATEGORY DESCRIPTION MAXIMUM
Metadata
sys.dm_pdw_component_health_alerts 10,000
SYSTEM VIEW MAXIMUM ROWS
sys.dm_pdw_dms_cores 100
sys.dm_pdw_dms_workers Total number of DMS workers for the most recent 1000 SQL
requests.
sys.dm_pdw_errors 10,000
sys.dm_pdw_exec_requests 10,000
sys.dm_pdw_exec_sessions 10,000
sys.dm_pdw_request_steps Total number of steps for the most recent 1000 SQL requests
that are stored in sys.dm_pdw_exec_requests.
sys.dm_pdw_os_event_logs 10,000
sys.dm_pdw_sql_requests The most recent 1000 SQL requests that are stored in
sys.dm_pdw_exec_requests.
Next steps
For recommendations on using SQL Data Warehouse, see the Cheat Sheet.
SQL Data Warehouse Frequently asked questions
6/4/2019 • 2 minutes to read • Edit Online
General
Q. What does SQL DW offer for data security?
A. SQL DW offers several solutions for protecting data such as TDE and auditing. For more information, see
Security.
Q. Where can I find out what legal or business standards is SQL DW compliant with?
A. Visit the Microsoft Compliance page for various compliance offerings by product such as SOC and ISO. First
choose by Compliance title, then expand Azure in the Microsoft in-scope cloud services section on the right side of
the page to see what services are Azure services are compliant.
Q. Can I connect PowerBI?
A. Yes! Though PowerBI supports direct query with SQL DW, it’s not intended for large number of users or real-
time data. For production use of PowerBI, we recommend using PowerBI on top of Azure Analysis Services or
Analysis Service IaaS.
Q. What are SQL Data Warehouse Capacity Limits?
A. See our current capacity limits page.
Q. Why is my Scale/Pause/Resume taking so long?
A. A variety of factors can influence the time for compute management operations. A common case for long
running operations is transactional rollback. When a scale or pause operation is initiated, all incoming sessions are
blocked and queries are drained. In order to leave the system in a stable state, transactions must be rolled back
before an operation can commence. The greater the number and larger the log size of transactions, the longer the
operation will be stalled restoring the system to a stable state.
User support
Q. I have a feature request, where do I submit it?
A. If you have a feature request, submit it on our UserVoice page
Q. How can I do x?
A. For help in developing with SQL Data Warehouse, you can ask questions on our Stack Overflow page.
Q. How do I submit a support ticket?
A. Support Tickets can be filed through Azure portal.
Loading
Q. What client drivers do you support?
A. Driver support for DW can be found on the Connection Strings page
Q: What file formats are supported by PolyBase with SQL Data Warehouse?
A: Orc, RC, Parquet, and flat delimited text
Q: What can I connect to from SQL DW using PolyBase?
A: Azure Data Lake Store and Azure Storage Blobs
Q: Is computation pushdown possible when connecting to Azure Storage Blobs or ADLS?
A: No, SQL DW PolyBase only interacts the storage components.
Q: Can I connect to HDI?
A: HDI can use either ADLS or WASB as the HDFS layer. If you have either as your HDFS layer, then you can load
that data into SQL DW. However, you cannot generate pushdown computation to the HDI instance.
Next steps
For more information on SQL Data Warehouse as a whole, see our Overview page.
Azure SQL Data Warehouse release notes
9/25/2019 • 16 minutes to read • Edit Online
This article summarizes the new features and improvements in the recent releases of Azure SQL Data Warehouse
(Azure SQL DW ). The article also lists notable content updates that aren't directly related to the release but
published in the same time frame. For improvements to other Azure services, see Service updates.
Use the date identified to confirm which release has been applied to your Azure SQL DW.
September 2019
SERVICE IMPROVEMENTS DETAILS
Azure Private Link (Preview) With Azure Private Link, you can create a private endpoint in
your Virtual Network (VNet) and map it to your Azure SQL
DW. These resources are then accessible over a private IP
address in your VNet, enabling connectivity from on-premises
through Azure ExpressRoute private peering and/or VPN
gateway. Overall, this simplifies the network configuration by
not requiring you to open it up to public IP addresses. This
also enables protection against data exfiltration risks. For more
details, see overview and SQL DW documentation.
Data Discovery & Classification (GA) Data discovery and classification feature is now Generally
Available. This feature provides advanced capabilities for
discovering, classifying, labeling & protecting sensitive
data in your databases.
Azure Advisor one-click Integration SQL Data Warehouse now directly integrates with Azure
Advisor recommendations in the overview blade along with
providing a one-click experience. You can now discover
recommendations in the overview blade instead of navigating
to the Azure advisor blade. Find out more about
recommendations here.
SERVICE IMPROVEMENTS DETAILS
Read Committed Snapshot Isolation (Preview) You can use ALTER DATABSE to enable or disable snapshot
isolation for a user database. To avoid impact to your current
workload, you may want to set this option during database
maintenance window or wait until there is no other active
connection to the database. For more information, see Alter
database set options.
Additional T-SQL support The T-SQL language surface area for SQL Data Warehouse has
been extended to include support for:
- FORMAT (Transact-SQL)
- TRY_PARSE (Transact-SQL)
- TRY_CAST (Transact-SQL)
- TRY_CONVERT (Transact-SQL)
- sys.user_token (Transact-SQL)
July 2019
SERVICE IMPROVEMENTS DETAILS
Materialized View (Preview) A Materialized View persists the data returned from the view
definition query and automatically gets updated as data
changes in the underlying tables. It improves the performance
of complex queries (typically queries with joins and
aggregations) while offering simple maintenance operations.
For more information, see:
- CREATE MATERIALIZED VIEW AS SELECT (Transact-SQL)
- ALTER MATERIALIZED VIEW (Transact-SQL)
- T-SQL statements supported in Azure SQL Data Warehouse
Additional T-SQL support The T-SQL language surface area for SQL Data Warehouse has
been extended to include support for:
- AT TIME ZONE (Transact-SQL)
- STRING_AGG (Transact-SQL)
Result set caching (Preview) DBCC commands added to manage the previously announced
result set cache. For more information, see:
- DBCC DROPRESULTSETCACHE (Transact-SQL)
- DBCC SHOWRESULTCACHESPACEUSED (Transact-SQL)
Also see the new result_set_cache column in
sys.dm_pdw_exec_requests that shows when an executed
query used the result set cache.
May 2019
SERVICE IMPROVEMENTS DETAILS
SERVICE IMPROVEMENTS DETAILS
Dynamic data masking (Preview) Dynamic Data Masking (DDM) prevents unauthorized access
to your sensitive data in your data warehouse by obfuscating
it on-the-fly in the query results, based on the masking rules
you define.For more information, see SQL Database dynamic
data masking.
Workload importance now Generally Available Workload Management Classification and Importance provide
the ability to influence the run order of queries. For more
information on workload importance, see the Classification
and Importance overview articles in the documentation. Check
out the CREATE WORKLOAD CLASSIFIER doc as well.
Additional T-SQL support The T-SQL language surface area for SQL Data Warehouse has
been extended to include support for:
- TRIM
JSON functions Business analysts can now use familiar T-SQL language to
query and manipulate documents that are formatted as JSON
data using the following new JSON functions in Azure Data
Warehouse:
- ISJSON
- JSON_VALUE
- JSON_QUERY
- JSON_MODIFY
- OPENJSON
Result set caching (Preview) Result-set caching enables instant query response times while
reducing time-to-insight for business analysts and reporting
users. For more information, see:
- ALTER DATABASE (Transact-SQL)
- ALTER DATABASE SET Options (Transact SQL)
- SET RESULT SET CACHING (Transact-SQL)
- SET Statement (Transact-SQL)
- sys.databases (Transact-SQL)
Ordered clustered columnstore index (Preview) Columnstore is a key enabler for storing and efficiently
querying large amounts of data. For each table, it divides the
incoming data into Row Groups and each column of a Row
Group forms a Segment on a disk. Ordered clustered
columnstore indexes further optimize query execution by
enabling efficient segment elimination. For more information,
see:
- CREATE TABLE (Azure SQL Data Warehouse)
- CREATE COLUMNSTORE INDEX (Transact-SQL).
March 2019
SERVICE IMPROVEMENTS DETAILS
SERVICE IMPROVEMENTS DETAILS
Data Discovery & Classification Data Discovery & Classification is now available in public
preview for Azure SQL Data Warehouse. It’s critical to protect
sensitive data and the privacy of your customers. As your
business and customer data assets grow, it becomes
unmanageable to discover, classify, and protect your data. The
data discovery and classification feature that we’re introducing
natively with Azure SQL Data Warehouse helps make
protecting your data more manageable. The overall benefits of
this capability are:
• Meeting data privacy standards and regulatory compliance
requirements.
• Restricting access to and hardening the security of data
warehouses containing highly sensitive data.
• Monitoring and alerting on anomalous access to sensitive
data.
• Visualization of sensitive data in a central dashboard on the
Azure portal.
Improved accuracy for DWU used and CPU portal SQL Data Warehouse significantly enhances metric accuracy in
metrics the Azure portal. This release includes a fix to the CPU and
DWU Used metric definition to properly reflect your workload
across all compute nodes. Before this fix, metric values were
being underreported. Expect to see an increase in the DWU
used and CPU metrics in the Azure portal.
Row Level Security We introduced Row-level Security capability back in Nov 2017.
We’ve now extended this support to external tables as well.
Additionally, we’ve added support for calling non-deterministic
functions in the inline table-valued functions (inline TVFs)
required for defining a security filter predicate. This addition
allows you to specify IS_ROLEMEMBER(), USER_NAME() etc. in
the security filter predicate. For more information, please see
the examples in the Row-level Security documentation.
Additional T-SQL Support The T-SQL language surface area for SQL Data Warehouse has
been extended to include support for STRING_SPLIT (Transact-
SQL).
SERVICE IMPROVEMENTS DETAILS
Documentation improvements
DOCUMENTATION IMPROVEMENTS DETAILS
January 2019
Service improvements
SERVICE IMPROVEMENTS DETAILS
Data Movement Enhancements for PartitionMove and In Azure SQL Data Warehouse Gen2, data movement steps of
BroadcastMove type ShuffleMove, use instant data movement techniques. For
more information, see performance enhancements blog. With
this release, PartitionMove and BroadcastMove are now
powered by the same instant data movement techniques.
User queries that use these types of data movement steps will
run with improved performance. No code change is required
to take advantage of these performance improvements.
Documentation improvements
DOCUMENTATION IMPROVEMENTS DETAILS
none
December 2018
Service improvements
SERVICE IMPROVEMENTS DETAILS
Virtual Network Service Endpoints Generally Available This release includes general availability of Virtual Network
(VNet) Service Endpoints for Azure SQL Data Warehouse in all
Azure regions. VNet Service Endpoints enable you to isolate
connectivity to your logical server from a given subnet or set
of subnets within your virtual network. The traffic to Azure
SQL Data Warehouse from your VNet will always stay within
the Azure backbone network. This direct route will be
preferred over any specific routes that take Internet traffic
through virtual appliances or on-premises. No additional
billing is charged for virtual network access through service
endpoints. Current pricing model for Azure SQL Data
Warehouse applies as is.
Using Polybase you can also import data into Azure SQL Data
Warehouse from Azure Storage secured to VNet. Similarly,
exporting data from Azure SQL Data Warehouse to Azure
Storage secured to VNet is also supported via Polybase.
Automatic Performance Monitoring (Preview) Query Store is now available in Preview for Azure SQL Data
Warehouse. Query Store is designed to help you with query
performance troubleshooting by tracking queries, query plans,
runtime statistics, and query history to help you monitor the
activity and performance of your data warehouse. Query Store
is a set of internal stores and Dynamic Management Views
(DMVs) that allow you to:
Lower Compute Tiers for Azure SQL Data Warehouse Azure SQL Data Warehouse Gen2 now supports lower
Gen2 compute tiers. Customers can experience Azure SQL Data
Warehouse’s leading performance, flexibility, and security
features starting with 100 cDWU (Data Warehouse Units) and
scale to 30,000 cDWU in minutes. Starting mid-December
2018, customers can benefit from Gen2 performance and
flexibility with lower compute tiers in regions, with the rest of
the regions available during 2019.
Columnstore Background Merge By default, Azure SQL Data Warehouse (Azure SQL DW) stores
data in columnar format, with micro-partitions called
rowgroups. Sometimes, due to memory constrains at index
build or data load time, the rowgroups may be compressed
with less than the optimal size of one million rows. Rowgroups
may also become fragmented due to deletes. Small or
fragmented rowgroups result in higher memory consumption,
as well as inefficient query execution. With this release of
Azure SQL DW, the columnstore background maintenance
task merges small compressed rowgroups to create larger
rowgroups to better utilize memory and speed up query
execution.
October 2018
Service improvements
SERVICE IMPROVEMENTS DETAILS
DevOps for Data Warehousing The highly requested feature for SQL Data Warehouse (SQL
DW) is now in preview with the support for SQL Server Data
Tool (SSDT) in Visual Studio! Teams of developers can now
collaborate over a single, version-controlled codebase and
quickly deploy changes to any instance in the world.
Interested in joining? This feature is available for preview
today! You can register by visiting the SQL Data Warehouse
Visual Studio SQL Server Data Tools (SSDT) - Preview
Enrollment form. Given the high demand, we are managing
acceptance into preview to ensure the best experience for our
customers. Once you sign up, our goal is to confirm your
status within seven business days.
Row Level Security Generally Available Azure SQL Data Warehouse (SQL DW) now supports row level
security (RLS) adding a powerful capability to secure your
sensitive data. With the introduction of RLS, you can
implement security policies to control access to rows in your
tables, as in who can access what rows. RLS enables this fine-
grained access control without having to redesign your data
warehouse. RLS simplifies the overall security model as the
access restriction logic is located in the database tier itself
rather than away from the data in another application. RLS
also eliminates the need to introduce views to filter out rows
for access control management. There is no additional cost for
this enterprise-grade security feature for all our customers.
SERVICE IMPROVEMENTS DETAILS
Advanced Advisors Advanced tuning for Azure SQL Data Warehouse (SQL DW)
just got simpler with additional data warehouse
recommendations and metrics. There are additional advanced
performance recommendations through Azure Advisor at your
disposal, including:
Advanced tuning with integrated advisors Advanced tuning for Azure SQL Data Warehouse (SQL DW)
just got simpler with additional data warehouse
recommendations and metrics and a redesign of the portal
overview blade that provides an integrated experience with
Azure Advisor and Azure Monitor.
Accelerated Database Recovery (ADR) Azure SQL Data Warehouse Accelerated Database Recovery
(ADR) is now in Public Preview. ADR is a new SQL Server
Engine that greatly improves database availability, especially in
the presence of long running transactions, by completely
redesigning the current recovery process from the ground up.
The primary benefits of ADR are fast and consistent database
recovery and instantaneous transaction rollback.
SERVICE IMPROVEMENTS DETAILS
Azure Monitor diagnostics logs SQL Data Warehouse (SQL DW) now enables enhanced
insights into analytical workloads by integrating directly with
Azure Monitor diagnostic logs. This new capability enables
developers to analyze workload behavior over an extended
time period and make informed decisions on query
optimization or capacity management. We have now
introduced an external logging process through Azure
Monitor diagnostic logs that provide additional insights into
your data warehouse workload. With a single click of a button,
you are now able to configure diagnostic logs for historical
query performance troubleshooting capabilities using Log
Analytics. Azure Monitor diagnostic logs support customizable
retention periods by saving the logs to a storage account for
auditing purposes, the capability to stream logs to event hubs
near real-time telemetry insights, and the ability to analyze
logs using Log Analytics with log queries. Diagnostic logs
consist of telemetry views of your data warehouse equivalent
to the most commonly used performance troubleshooting
DMVs for SQL Data Warehouse. For this initial release, we
have enabled views for the following system dynamic
management views:
• sys.dm_pdw_exec_requests
• sys.dm_pdw_request_steps
• sys.dm_pdw_dms_workers
• sys.dm_pdw_waits
• sys.dm_pdw_sql_requests
Columnstore memory management As the number of compressed column store row groups
increases, the memory required to manage the internal
column segment metadata for those rowgroups increases. As
a result, query performance and queries executed against
some of the Columnstore Dynamic Management Views
(DMVs) can degrade. Improvements have made in this release
to optimize the size of the internal metadata for these cases,
leading to improved experience and performance for such
queries.
Azure Data Lake Storage Gen2 integration (GA Azure SQL Data Warehouse (SQL DW) now has native
integration with Azure Data Lake Storage Gen2. Customers
can now load data using external tables from ABFS into SQL
DW. This functionality enables customers to integrate with
their data lakes in Data Lake Storage Gen2.
SERVICE IMPROVEMENTS DETAILS
Next steps
create a SQL Data Warehouse
More information
Blog - Azure SQL Data Warehouse
Customer Advisory Team blogs
Customer success stories
Stack Overflow forum
Twitter
Videos
Azure glossary
Upgrade your data warehouse to Gen2
9/11/2019 • 6 minutes to read • Edit Online
Microsoft is helping drive down the entry-level cost of running a data warehouse. Lower compute tiers capable of
handling demanding queries are now available for Azure SQL Data Warehouse. Read the full announcement
Lower compute tier support for Gen2. The new offering is available in the regions noted in the table below. For
supported regions, existing Gen1 data warehouses can be upgraded to Gen2 through either:
The automatic upgrade process: Automatic upgrades don't start as soon as the service is available in a
region. When automatic upgrades start in a specific region, individual DW upgrades will take place during your
selected maintenance schedule.
Self-upgrade to Gen2: You can control when to upgrade by doing a self-upgrade to Gen2. If your region is not
yet supported, you can restore from a restore point directly to a Gen2 instance in a supported region.
China East * *
China North * *
Germany Central * *
NOTE
Alter Index rebuild is an offline operation and the tables will not be available until the rebuild completes.
Self-upgrade to Gen2
You can choose to self-upgrade by following these steps on an existing Gen1 data warehouse. If you choose to self-
upgrade, you must complete it before the automatic upgrade process begins in your region. Doing so ensures that
you avoid any risk of the automatic upgrades causing a conflict.
There are two options when conducting a self-upgrade. You can either upgrade your current data warehouse in-
place or you can restore a Gen1 data warehouse into a Gen2 instance.
Upgrade in-place - This option will upgrade your existing Gen1 data warehouse to Gen2. The upgrade
process will involve a brief drop in connectivity (approximately 5 min) as we restart your data warehouse.
Once your data warehouse has been restarted, it will be fully available for use. If you experience issues
during the upgrade, open a support request and reference “Gen2 upgrade” as the possible cause.
Upgrade from restore point - Create a user-defined restore point on your current Gen1 data warehouse and
then restore directly to a Gen2 instance. The existing Gen1 data warehouse will stay in place. Once the
restore has been completed, your Gen2 data warehouse will be fully available for use. Once you have run all
testing and validation processes on the restored Gen2 instance, the original Gen1 instance can be deleted.
Step 1: From the Azure portal, create a user-defined restore point.
Step 2: When restoring from a user-defined restore point, set the "performance Level" to your preferred
Gen2 tier.
You may experience a period of degradation in performance while the upgrade process continues to upgrade the
data files in the background. The total time for the performance degradation will vary dependent on the size of
your data files.
To expedite the background data migration process, you can immediately force data movement by running Alter
Index rebuild on all primary columnstore tables you'd be querying at a larger SLO and resource class.
NOTE
Alter Index rebuild is an offline operation and the tables will not be available until the rebuild completes.
If you encounter any issues with your data warehouse, create a support request and reference “Gen2 upgrade” as
the possible cause.
For more information, see Upgrade to Gen2.
Next steps
Upgrade steps
Maintenance windows
Resource health monitor
Review Before you begin a migration
Upgrade in-place and upgrade from a restore point
Create a user-defined restore point
Learn How to restore to Gen2
Open a SQL Data Warehouse support request
Quickstart: Create and query an Azure SQL Data
Warehouse in the Azure portal
9/5/2019 • 6 minutes to read • Edit Online
Quickly create and query an Azure SQL Data Warehouse by using the Azure portal.
If you don't have an Azure subscription, create a free account before you begin.
NOTE
Creating a SQL Data Warehouse may result in a new billable service. For more information, see SQL Data Warehouse
pricing.
Server name Any globally unique name For valid server names, see Naming
rules and restrictions.
Server admin login Any valid name For valid login names, see Database
Identifiers.
5. Click Select.
6. Click Performance level to specify the performance configuration for the data warehouse.
7. For this tutorial, select Gen2. The slider, by default, is set to DW1000c. Try moving it up and down to see
how it works.
8. Click Apply.
9. Now that you've completed the SQL Data Warehouse form, click Create to provision the database.
Provisioning takes a few minutes.
10. On the toolbar, click Notifications to monitor the deployment process.
NOTE
SQL Data Warehouse communicates over port 1433. If you are trying to connect from within a corporate network,
outbound traffic over port 1433 might not be allowed by your network's firewall. If so, you cannot connect to your Azure
SQL Database server unless your IT department opens port 1433.
1. After the deployment completes, select All services from the left-hand menu. Select Databases, select the
star next to SQL data warehouses to add SQL data warehouses to your favorites.
2. Select SQL data warehouses from the left-hand menu and then click mySampleDataWarehouse on
the SQL data warehouses page. The overview page for your database opens, showing you the fully
qualified server name (such as mynewserver-20180430.database.windows.net) and provides options
for further configuration.
3. Copy this fully qualified server name for use to connect to your server and its databases in this and other
quick starts. To open server settings, click the server name.
5. The Firewall settings page for the SQL Database server opens.
6. To add your current IP address to a new firewall rule, click Add client IP on the toolbar. A firewall rule can
open port 1433 for a single IP address or a range of IP addresses.
7. Click Save. A server-level firewall rule is created for your current IP address opening port 1433 on the
logical server.
8. Click OK and then close the Firewall settings page.
You can now connect to the SQL server and its data warehouses using this IP address. The connection works
from SQL Server Management Studio or another tool of your choice. When you connect, use the ServerAdmin
account you created previously.
IMPORTANT
By default, access through the SQL Database firewall is enabled for all Azure services. Click OFF on this page and then click
Save to disable the firewall for all Azure services.
Server name The fully qualified server name Here's an example: mynewserver-
20180430.database.windows.net.
Login The server admin account Account that you specified when you
created the server.
Password The password for your server admin Password that you specified when
account you created the server.
3. Click Execute. The query results show two databases: master and mySampleDataWarehouse.
4. To look at some data, use the following command to see the number of customers with last name of
Adams that have three children at home. The results list six customers.
Clean up resources
You're being charged for data warehouse units and data stored your data warehouse. These compute and storage
resources are billed separately.
If you want to keep the data in storage, you can pause compute when you aren't using the data warehouse. By
pausing compute, you're only charged for data storage. You can resume compute whenever you're ready to
work with the data.
If you want to remove future charges, you can delete the data warehouse.
Follow these steps to clean up resources you no longer need.
1. Sign in to the Azure portal, click on your data warehouse.
2. To pause compute, click the Pause button. When the data warehouse is paused, you see a Resume button.
To resume compute, click Resume.
3. To remove the data warehouse so you aren't charged for compute or storage, click Delete.
4. To remove the SQL server you created, click mynewserver-20180430.database.windows.net in the
previous image, and then click Delete. Be careful with this deletion, since deleting the server also deletes
all databases assigned to the server.
5. To remove the resource group, click myResourceGroup, and then click Delete resource group.
Next steps
You've now created a data warehouse, created a firewall rule, connected to your data warehouse, and run a few
queries. To learn more about Azure SQL Data Warehouse, continue to the tutorial for loading data.
Load data into a SQL Data Warehouse
Quickstart: Create and query an Azure SQL Data
Warehouse with Azure PowerShell
8/18/2019 • 4 minutes to read • Edit Online
NOTE
Creating a SQL Data Warehouse may result in a new billable service. For more information, see SQL Data Warehouse pricing.
NOTE
This article has been updated to use the new Azure PowerShell Az module. You can still use the AzureRM module, which will
continue to receive bug fixes until at least December 2020. To learn more about the new Az module and AzureRM
compatibility, see Introducing the new Azure PowerShell Az module. For Az module installation instructions, see Install Azure
PowerShell.
Sign in to Azure
Sign in to your Azure subscription using the Connect-AzAccount command and follow the on-screen directions.
Connect-AzAccount
Get-AzSubscription
If you need to use a different subscription than the default, run Set-AzContext.
Create variables
Define variables for use in the scripts in this quickstart.
# The data center and resource name for your resources
$resourcegroupname = "myResourceGroup"
$location = "WestEurope"
# The logical server name: Use a random value or replace with your own value (don't capitalize)
$servername = "server-$(Get-Random)"
# Set an admin name and password for your database
# The sign-in information for the server
$adminlogin = "ServerAdmin"
$password = "ChangeYourAdminPassword1"
# The ip address range that you want to allow to access your server - change as appropriate
$startip = "0.0.0.0"
$endip = "0.0.0.0"
# The database name
$databasename = "mySampleDataWarehosue"
New-AzSqlDatabase `
-ResourceGroupName $resourcegroupname `
-ServerName $servername `
-DatabaseName $databasename `
-Edition "DataWarehouse" `
-RequestedServiceObjectiveName "DW100c" `
-CollationName "SQL_Latin1_General_CP1_CI_AS" `
-MaxSizeBytes 10995116277760
Clean up resources
Other quickstart tutorials in this collection build upon this quickstart.
TIP
If you plan to continue on to work with later quickstart tutorials, don't clean up the resources created in this quickstart. If you
don't plan to continue, use the following steps to delete all resources created by this quickstart in the Azure portal.
Next steps
You've now created a data warehouse, created a firewall rule, connected to your data warehouse, and run a few
queries. To learn more about Azure SQL Data Warehouse, continue to the tutorial for loading data.
Load data into a SQL Data Warehouse
Quickstart: Pause and resume compute for an Azure
SQL Data Warehouse in the Azure portal
8/18/2019 • 2 minutes to read • Edit Online
Use the Azure portal to pause compute in Azure SQL Data Warehouse to save costs. Resume compute when you
are ready to use the data warehouse.
If you don't have an Azure subscription, create a free account before you begin.
Pause compute
To save costs, you can pause and resume compute resources on-demand. For example, if you won't be using the
database during the night and on weekends, you can pause it during those times, and resume it during the day.
You won't be charged for compute resources while the database is paused. However, you will continue to be
charged for storage.
Follow these steps to pause a SQL Data Warehouse.
1. Click SQL databases in the left page of the Azure portal.
2. Select mySampleDataWarehouse from the SQL databases page. This opens the data warehouse.
3. On the mySampleDataWarehouse page, notice Status is Online.
4. To pause the data warehouse, click the Pause button.
5. A confirmation question appears asking if you want to continue. Click Yes.
6. Wait a few moments, and then notice the Status is Pausing.
7. When the pause operation is complete, the status is Paused and the option button is Start.
8. The compute resources for the data warehouse are now offline. You won't be charged for compute until you
resume the service.
Resume compute
Follow these steps to resume a SQL Data Warehouse.
1. Click SQL databases in the left page of the Azure portal.
2. Select mySampleDataWarehouse from the SQL databases page. This opens the data warehouse.
3. On the mySampleDataWarehouse page, notice Status is Paused.
4. To resume the data warehouse, click Start.
5. A confirmation question appears asking if you want to start. Click Yes.
6. Notice the Status is Resuming.
7. When the data warehouse is back online, the status is Online and the option button is Pause.
8. The compute resources for the data warehouse are now online and you can use the service. Charges for
compute have resumed.
Clean up resources
You are being charged for data warehouse units and the data stored in your data warehouse. These compute and
storage resources are billed separately.
If you want to keep the data in storage, pause compute.
If you want to remove future charges, you can delete the data warehouse.
Follow these steps to clean up resources as you desire.
1. Sign in to the Azure portal, and click on your data warehouse.
2. To pause compute, click the Pause button. When the data warehouse is paused, you see a Start button. To
resume compute, click Start.
3. To remove the data warehouse so you are not charged for compute or storage, click Delete.
4. To remove the SQL server you created, click mynewserver-20171113.database.windows.net, and then
click Delete. Be careful with this deletion, since deleting the server also deletes all databases assigned to
the server.
5. To remove the resource group, click myResourceGroup, and then click Delete resource group.
Next steps
You have now paused and resumed compute for your data warehouse. To learn more about Azure SQL Data
Warehouse, continue to the tutorial for loading data.
Load data into a SQL Data Warehouse
Quickstart: Pause and resume compute in Azure SQL
Data Warehouse with Azure PowerShell
9/4/2019 • 3 minutes to read • Edit Online
Use PowerShell to pause compute in Azure SQL Data Warehouse to save costs. Resume compute when you are
ready to use the data warehouse.
If you don't have an Azure subscription, create a free account before you begin.
This quickstart assumes you already have a SQL Data Warehouse that you can pause and resume. If you need to
create one, you can use Create and Connect - portal to create a data warehouse called
mySampleDataWarehouse.
Log in to Azure
Log in to your Azure subscription using the Connect-AzAccount command and follow the on-screen directions.
Connect-AzAccount
Get-AzSubscription
If you need to use a different subscription than the default, run Set-AzContext.
Pause compute
To save costs, you can pause and resume compute resources on-demand. For example, if you are not using the
database during the night and on weekends, you can pause it during those times, and resume it during the day.
There is no charge for compute resources while the database is paused. However, you continue to be charged for
storage.
To pause a database, use the Suspend-AzSqlDatabase cmdlet. The following example pauses a data warehouse
named mySampleDataWarehouse hosted on a server named newserver-20171113. The server is in an Azure
resource group named myResourceGroup.
A variation, this next example retrieves the database into the $database object. It then pipes the object to Suspend-
AzSqlDatabase. The results are stored in the object resultDatabase. The final command shows the results.
Resume compute
To start a database, use the Resume-AzSqlDatabase cmdlet. The following example starts a database named
mySampleDataWarehouse hosted on a server named newserver-20171113. The server is in an Azure resource
group named myResourceGroup.
A variation, this next example retrieves the database into the $database object. It then pipes the object to Resume-
AzSqlDatabase and stores the results in $resultDatabase. The final command shows the results.
$database = Get-AzSqlDatabase –ResourceGroupName "ResourceGroup1" `
–ServerName "Server01" –DatabaseName "Database02"
$resultDatabase = $database | Resume-AzSqlDatabase
$resultDatabase
Clean up resources
You are being charged for data warehouse units and data stored your data warehouse. These compute and storage
resources are billed separately.
If you want to keep the data in storage, pause compute.
If you want to remove future charges, you can delete the data warehouse.
Follow these steps to clean up resources as you desire.
1. Sign in to the Azure portal, and click on your data warehouse.
2. To pause compute, click the Pause button. When the data warehouse is paused, you see a Start button. To
resume compute, click Start.
3. To remove the data warehouse so you are not charged for compute or storage, click Delete.
4. To remove the SQL server you created, click mynewserver-20171113.database.windows.net, and then
click Delete. Be careful with this deletion, since deleting the server also deletes all databases assigned to the
server.
5. To remove the resource group, click myResourceGroup, and then click Delete resource group.
Next steps
You have now paused and resumed compute for your data warehouse. To learn more about Azure SQL Data
Warehouse, continue to the tutorial for loading data.
Load data into a SQL Data Warehouse
Quickstart: Scale compute in Azure SQL Data
Warehouse in the Azure portal
8/18/2019 • 2 minutes to read • Edit Online
Scale compute in Azure SQL Data Warehouse in the Azure portal. Scale out compute for better performance, or
scale back compute to save costs.
If you don't have an Azure subscription, create a free account before you begin.
NOTE
Your data warehouse must be online to scale.
Scale compute
SQL Data Warehouse compute resources can be scaled by increasing or decreasing data warehouse units. The
[create and connect - portal] quickstart(create-data-warehouse-portal.md) created mySampleDataWarehouse
and initialized it with 400 DWUs. The following steps adjust the DWUs for mySampleDataWarehouse.
To change data warehouse units:
1. Click SQL data warehouses in the left page of the Azure portal.
2. Select mySampleDataWarehouse from the SQL data warehouses page. The data warehouse opens.
3. Click Scale.
4. In the Scale panel, move the slider left or right to change the DWU setting.
Scale compute in Azure SQL Data Warehouse using Azure PowerShell. Scale out compute for better performance,
or scale back compute to save costs.
If you don't have an Azure subscription, create a free account before you begin.
This quickstart assumes you already have a SQL Data Warehouse that you can scale. If you need to create one, use
Create and Connect - portal to create a data warehouse called mySampleDataWarehouse.
Log in to Azure
Log in to your Azure subscription using the Connect-AzAccount command and follow the on-screen directions.
Connect-AzAccount
Get-AzSubscription
If you need to use a different subscription than the default, run Set-AzContext.
Scale compute
In SQL Data Warehouse, you can increase or decrease compute resources by adjusting data warehouse units. The
Create and Connect - portal created mySampleDataWarehouse and initialized it with 400 DWUs. The following
steps adjust the DWUs for mySampleDataWarehouse.
To change data warehouse units, use the Set-AzSqlDatabase PowerShell cmdlet. The following example sets the
data warehouse units to DW300c for the database mySampleDataWarehouse which is hosted in the Resource
group myResourceGroup on server mynewserver-20180430.
You can see the Status of the database in the output. In this case, you can see that this database is online. When
you run this command, you should receive a Status value of Online, Pausing, Resuming, Scaling, or Paused.
To see the status by itself, use the following command:
Next steps
You have now learned how to scale compute for your data warehouse. To learn more about Azure SQL Data
Warehouse, continue to the tutorial for loading data.
Load data into a SQL Data Warehouse
Quickstart: Scale compute in Azure SQL Data
Warehouse using T-SQL
8/18/2019 • 3 minutes to read • Edit Online
Scale compute in Azure SQL Data Warehouse using T-SQL and SQL Server Management Studio (SSMS ). Scale
out compute for better performance, or scale back compute to save costs.
If you don't have an Azure subscription, create a free account before you begin.
Server name The fully qualified server name Here's an example: mynewserver-
20171113.database.windows.net.
Login The server admin account The account that you specified when
you created the server.
Password The password for your server admin This is the password that you
account specified when you created the
server.
3. Click Connect. The Object Explorer window opens in SSMS.
4. In Object Explorer, expand Databases. Then expand mySampleDatabase to view the objects in your new
database.
View service objective
The service objective setting contains the number of data warehouse units for the data warehouse.
To view the current data warehouse units for your data warehouse:
1. Under the connection to mynewserver-20171113.database.windows.net, expand System Databases.
2. Right-click master and select New Query. A new query window opens.
3. Run the following query to select from the sys.database_service_objectives dynamic management view.
SELECT
db.name [Database]
, ds.edition [Edition]
, ds.service_objective [Service Objective]
FROM
sys.database_service_objectives ds
JOIN
sys.databases db ON ds.database_id = db.database_id
WHERE
db.name = 'mySampleDataWarehouse'
Scale compute
In SQL Data Warehouse, you can increase or decrease compute resources by adjusting data warehouse units. The
Create and Connect - portal created mySampleDataWarehouse and initialized it with 400 DWUs. The following
steps adjust the DWUs for mySampleDataWarehouse.
To change data warehouse units:
1. Right-click master and select New Query.
2. Use the ALTER DATABASE T-SQL statement to modify the service objective. Run the following query to
change the service objective to DW300.
WHILE
(
SELECT TOP 1 state_desc
FROM sys.dm_operation_status
WHERE
1=1
AND resource_type_desc = 'Database'
AND major_resource_id = 'MySampleDataWarehouse'
AND operation = 'ALTER DATABASE'
ORDER BY
start_time DESC
) = 'IN_PROGRESS'
BEGIN
RAISERROR('Scale operation in progress',0,0) WITH NOWAIT;
WAITFOR DELAY '00:00:05';
END
PRINT 'Complete';
SELECT *
FROM
sys.dm_operation_status
WHERE
resource_type_desc = 'Database'
AND
major_resource_id = 'MySampleDataWarehouse'
Next steps
You've now learned how to scale compute for your data warehouse. To learn more about Azure SQL Data
Warehouse, continue to the tutorial for loading data.
Load data into a SQL Data Warehouse
Quickstart: Create a workload classifier using T-SQL
8/18/2019 • 2 minutes to read • Edit Online
In this quickstart, you'll quickly create a workload classifier with high importance for the CEO of your organization.
This workload classifier will allow CEO queries to take precedence over other queries with lower importance in the
queue.
If you don't have an Azure subscription, create a free account before you begin.
NOTE
Creating a SQL Data Warehouse may result in a new billable service. For more information, see SQL Data Warehouse pricing.
Prerequisites
This quickstart assumes you already have a SQL Data Warehouse and that you have CONTROL DATABASE
permissions. If you need to create one, use Create and Connect - portal to create a data warehouse called
mySampleDataWarehouse.
Create user
Create user, "TheCEO", in mySampleDataWarehouse
Clean up resources
DROP WORKLOAD CLASSIFIER [wgcTheCEO]
DROP USER [TheCEO]
;
You're being charged for data warehouse units and data stored in your data warehouse. These compute and
storage resources are billed separately.
If you want to keep the data in storage, you can pause compute when you aren't using the data warehouse. By
pausing compute, you're only charged for data storage. When you're ready to work with the data, resume
compute.
If you want to remove future charges, you can delete the data warehouse.
Follow these steps to clean up resources.
1. Sign in to the Azure portal, select on your data warehouse.
2. To pause compute, select the Pause button. When the data warehouse is paused, you see a Start button. To
resume compute, select Start.
3. To remove the data warehouse so you're not charged for compute or storage, select Delete.
4. To remove the SQL server you created, select mynewserver-20180430.database.windows.net in the
previous image, and then select Delete. Be careful with this deletion, since deleting the server also deletes
all databases assigned to the server.
5. To remove the resource group, select myResourceGroup, and then select Delete resource group.
Next steps
You've now created a workload classifier. Run a few queries as TheCEO to see how they perform. See
sys.dm_pdw_exec_requests to view queries and the importance assigned.
For more information about Azure SQL Data Warehouse workload management, see Workload Importance
and Workload Classification.
See the how -to articles to Configure Workload Importance and how to Manage and monitor Workload
Management.
Secure a database in SQL Data Warehouse
3/18/2019 • 4 minutes to read • Edit Online
This article walks through the basics of securing your Azure SQL Data Warehouse database. In particular, this
article gets you started with resources for limiting access, protecting data, and monitoring activities on a
database.
Connection security
Connection Security refers to how you restrict and secure connections to your database using firewall rules and
connection encryption.
Firewall rules are used by both the server and the database to reject connection attempts from IP addresses that
have not been explicitly whitelisted. To allow connections from your application or client machine's public IP
address, you must first create a server-level firewall rule using the Azure portal, REST API, or PowerShell. As a
best practice, you should restrict the IP address ranges allowed through your server firewall as much as possible.
To access Azure SQL Data Warehouse from your local computer, ensure the firewall on your network and local
computer allows outgoing communication on TCP port 1433.
SQL Data Warehouse uses server-level firewall rules. It does not support database-level firewall rules. For more
information, see Azure SQL Database firewall, sp_set_firewall_rule.
Connections to your SQL Data Warehouse are encrypted by default. Modifying connection settings to disable
encryption are ignored.
Authentication
Authentication refers to how you prove your identity when connecting to the database. SQL Data Warehouse
currently supports SQL Server Authentication with a username and password, and with Azure Active Directory.
When you created the logical server for your database, you specified a "server admin" login with a username and
password. Using these credentials, you can authenticate to any database on that server as the database owner, or
"dbo" through SQL Server Authentication.
However, as a best practice, your organization’s users should use a different account to authenticate. This way you
can limit the permissions granted to the application and reduce the risks of malicious activity in case your
application code is vulnerable to a SQL injection attack.
To create a SQL Server Authenticated user, connect to the master database on your server with your server
admin login and create a new server login. Additionally, it is a good idea to create a user in the master database
for Azure SQL Data Warehouse users. Creating a user in master allows a user to log in using tools like SSMS
without specifying a database name. It also allows them to use the object explorer to view all databases on a SQL
server.
Then, connect to your SQL Data Warehouse database with your server admin login and create a database user
based on the server login you created.
-- Connect to SQL DW database and create a database user
CREATE USER ApplicationUser FOR LOGIN ApplicationLogin;
To give a user permission to perform additional operations such as creating logins or creating new databases,
assign the user to the Loginmanager and dbmanager roles in the master database. For more information on these
additional roles and authenticating to a SQL Database, see Managing databases and logins in Azure SQL
Database. For more information, see Connecting to SQL Data Warehouse By Using Azure Active Directory
Authentication.
Authorization
Authorization refers to what you can do within an Azure SQL Data Warehouse database. Authorization privileges
are determined by role memberships and permissions. As a best practice, you should grant users the least
privileges necessary. To manage roles, you can use the following stored procedures:
The server admin account you are connecting with is a member of db_owner, which has authority to do anything
within the database. Save this account for deploying schema upgrades and other management operations. Use
the "ApplicationUser" account with more limited permissions to connect from your application to the database
with the least privileges needed by your application.
There are ways to further limit what a user can do with Azure SQL Data Warehouse:
Granular Permissions let you control which operations you can do on individual columns, tables, views,
schemas, procedures, and other objects in the database. Use granular permissions to have the most control
and grant the minimum permissions necessary.
Database roles other than db_datareader and db_datawriter can be used to create more powerful application
user accounts or less powerful management accounts. The built-in fixed database roles provide an easy way to
grant permissions, but can result in granting more permissions than are necessary.
Stored procedures can be used to limit the actions that can be taken on the database.
The following example grants read access to a user-defined schema.
Managing databases and logical servers from the Azure portal or using the Azure Resource Manager API is
controlled by your portal user account's role assignments. For more information, see Role-based access control in
Azure portal.
Encryption
Azure SQL Data Warehouse Transparent Data Encryption (TDE ) helps protect against the threat of malicious
activity by encrypting and decrypting your data at rest. When you encrypt your database, associated backups and
transaction log files are encrypted without requiring any changes to your applications. TDE encrypts the storage
of an entire database by using a symmetric key called the database encryption key.
In SQL Database, the database encryption key is protected by a built-in server certificate. The built-in server
certificate is unique for each SQL Database server. Microsoft automatically rotates these certificates at least every
90 days. The encryption algorithm used by SQL Data Warehouse is AES -256. For a general description of TDE,
see Transparent Data Encryption.
You can encrypt your database using the Azure portal or T-SQL.
Next steps
For details and examples on connecting to your SQL Data Warehouse with different protocols, see Connect to
SQL Data Warehouse.
Advanced data security for Azure SQL Database
10/27/2019 • 3 minutes to read • Edit Online
Advanced data security is a unified package for advanced SQL security capabilities. It includes functionality for
discovering and classifying sensitive data, surfacing and mitigating potential database vulnerabilities, and
detecting anomalous activities that could indicate a threat to your database. It provides a single go-to location for
enabling and managing these capabilities.
Overview
Advanced data security (ADS ) provides a set of advanced SQL security capabilities, including data discovery &
classification, vulnerability assessment, and Advanced Threat Protection.
Data discovery & classification provides capabilities built into Azure SQL Database for discovering, classifying,
labeling & protecting the sensitive data in your databases. It can be used to provide visibility into your database
classification state, and to track the access to sensitive data within the database and beyond its borders.
Vulnerability assessment is an easy to configure service that can discover, track, and help you remediate
potential database vulnerabilities. It provides visibility into your security state, and includes actionable steps to
resolve security issues, and enhance your database fortifications.
Advanced Threat Protection detects anomalous activities indicating unusual and potentially harmful attempts
to access or exploit your database. It continuously monitors your database for suspicious activities, and
provides immediate security alerts on potential vulnerabilities, SQL injection attacks, and anomalous database
access patterns. Advanced Threat Protection alerts provide details of the suspicious activity and recommend
action on how to investigate and mitigate the threat.
Enable SQL ADS once to enable all of these included features. With one click, you can enable ADS for all
databases on your SQL Database server or managed instance. Enabling or managing ADS settings requires
belonging to the SQL security manager role, SQL database admin role or SQL server admin role.
ADS pricing aligns with Azure Security Center standard tier, where each protected SQL Database server or
managed instance is counted as one node. Newly protected resources qualify for a free trial of Security Center
standard tier. For more information, see the Azure Security Center pricing page.
1. Enable ADS
Enable ADS by navigating to Advanced Data Security under the Security heading for your SQL Database
server or manged instance. To enable ADS for all databases on the database server or managed instance, click
Enable Advanced Data Security on the server.
NOTE
A storage account is automatically created and configured to store your Vulnerability Assessment scan results. If you've
already enabled ADS for another server in the same resource group and region, then the existing storage account is used.
NOTE
The cost of ADS is aligned with Azure Security Center standard tier pricing per node, where a node is the entire SQL
Database server or managed instance. You are thus paying only once for protecting all databases on the database server or
managed instance with ADS. You can try ADS out initially with a free trial.
Next steps
Learn more about data discovery & classification
Learn more about vulnerability assessment
Learn more about Advanced Threat Protection
Learn more about Azure security center
Azure SQL Database and SQL Data Warehouse data
discovery & classification
9/16/2019 • 6 minutes to read • Edit Online
Data discovery & classification provides advanced capabilities built into Azure SQL Database for discovering,
classifying, labeling & protecting the sensitive data in your databases.
Discovering and classifying your most sensitive data (business, financial, healthcare, personally identifiable data
(PII), and so on.) can play a pivotal role in your organizational information protection stature. It can serve as
infrastructure for:
Helping meet data privacy standards and regulatory compliance requirements.
Various security scenarios, such as monitoring (auditing) and alerting on anomalous access to sensitive data.
Controlling access to and hardening the security of databases containing highly sensitive data.
Data discovery & classification is part of the Advanced Data Security (ADS ) offering, which is a unified package
for advanced SQL security capabilities. data discovery & classification can be accessed and managed via the
central SQL ADS portal.
NOTE
This document relates to Azure SQL Database and Azure SQL Data Warehouse. For simplicity, SQL Database is used when
referring to both SQL Database and SQL Data Warehouse. For SQL Server (on premises), see SQL Data Discovery and
Classification.
3. The Overview tab includes a summary of the current classification state of the database, including a
detailed list of all classified columns, which you can also filter to view only specific schema parts,
information types and labels. If you haven’t yet classified any columns, skip to step 5.
4. To download a report in Excel format, click on the Export option in the top menu of the window.
5. To begin classifying your data, click on the Classification tab at the top of the window.
6. The classification engine scans your database for columns containing potentially sensitive data and
provides a list of recommended column classifications. To view and apply classification
recommendations:
To view the list of recommended column classifications, click on the recommendations panel at the
bottom of the window:
Review the list of recommendations – to accept a recommendation for a specific column, check the
checkbox in the left column of the relevant row. You can also mark all recommendations as accepted
by checking the checkbox in the recommendations table header.
To apply the selected recommendations, click on the blue Accept selected recommendations
button.
7. You can also manually classify columns as an alternative, or in addition, to the recommendation-based
classification:
Click on Add classification in the top menu of the window.
In the context window that opens, select the schema > table > column that you want to classify, and
the information type and sensitivity label. Then click on the blue Add classification button at the
bottom of the context window.
8. To complete your classification and persistently label (tag) the database columns with the new classification
metadata, click on Save in the top menu of the window.
NOTE
When using T-SQL to manage labels, there is no validation that labels added to a column exist in the organizational
information protection policy (the set of labels that appear in the portal recommendations). It is therefore up to you to
validate this.
Permissions
The following built-in roles can read the data classification of an Azure SQL database: Owner , Reader ,
Contributor , SQL Security Manager and User Access Administrator .
The following built-in roles can modify the data classification of an Azure SQL database: Owner , Contributor ,
SQL Security Manager .
Next steps
Learn more about advanced data security.
Consider configuring Azure SQL Database Auditing for monitoring and auditing access to your classified
sensitive data.
SQL Vulnerability Assessment service helps you
identify database vulnerabilities
7/26/2019 • 5 minutes to read • Edit Online
SQL Vulnerability Assessment is an easy to configure service that can discover, track, and help you remediate
potential database vulnerabilities. Use it to proactively improve your database security.
Vulnerability Assessment is part of the advanced data security (ADS ) offering, which is a unified package for
advanced SQL security capabilities. Vulnerability Assessment can be accessed and managed via the central SQL
ADS portal.
NOTE
Vulnerability Assessment is supported for Azure SQL Database, Azure SQL Managed Instance and Azure SQL Data
Warehouse. For simplicity, SQL Database is used in this article when referring to any of these managed database services.
Vulnerability Assessment can now be used to monitor that your database maintains a high level of security at all
times, and that your organizational policies are met. If compliance reports are required, VA reports can be helpful
to facilitate the compliance process.
6. Set up periodic recurring scans
Navigate to the Vulnerability Assessment settings to turn on Periodic recurring scans. This configures
Vulnerability Assessment to automatically run a scan on your database once per week. A scan result summary will
be sent to the email address(es) you provide.
7. Export an assessment report
Click Export Scan Results to create a downloadable Excel report of your scan result. This report contains a
summary tab that displays a summary of the assessment, including all failed checks. It also includes a Results tab
containing the full set of results from the scan, including all checks that were run and the result details for each.
8. View scan history
Click Scan History in the VA pane to view a history of all scans previously run on this database. Select a
particular scan in the list to view the detailed results of that scan.
Vulnerability Assessment can now be used to monitor that your database maintains a high level of security at all
times, and that your organizational policies are met. If compliance reports are required, VA reports can be helpful
to facilitate the compliance process.
IMPORTANT
The PowerShell Azure Resource Manager module is still supported by Azure SQL Database, but all future development is for
the Az.Sql module. For these cmdlets, see AzureRM.Sql. The arguments for the commands in the Az module and in the
AzureRm modules are substantially identical.
You can use Azure PowerShell cmdlets to programmatically manage your vulnerability assessments. The
supported cmdlets are:
Update-AzSqlDatabaseVulnerabilityAssessmentSetting
Updates the vulnerability assessment settings of a database
Get-AzSqlDatabaseVulnerabilityAssessmentSetting
Returns the vulnerability assessment settings of a database
Clear-AzSqlDatabaseVulnerabilityAssessmentSetting
Clears the vulnerability assessment settings of a database
Set-AzSqlDatabaseVulnerabilityAssessmentRuleBaseline
Sets the vulnerability assessment rule baseline.
Get-AzSqlDatabaseVulnerabilityAssessmentRuleBaseline
Gets the vulnerability assessment rule baseline for a given rule.
Clear-AzSqlDatabaseVulnerabilityAssessmentRuleBaseline
Clears the vulnerability assessment rule baseline. First set the baseline before using this cmdlet to clear it.
Start-AzSqlDatabaseVulnerabilityAssessmentScan
Triggers the start of a vulnerability assessment scan
Get-AzSqlDatabaseVulnerabilityAssessmentScanRecord
Gets all vulnerability assessment scan record(s) associated with a given database.
Convert-AzSqlDatabaseVulnerabilityAssessmentScan
Converts vulnerability assessment scan results to an Excel file
For a script example, see Azure SQL Vulnerability Assessment PowerShell support.
Next steps
Learn more about advanced data security
Learn more about data discovery & classification
Advanced Threat Protection for Azure SQL Database
7/26/2019 • 4 minutes to read • Edit Online
Advanced Threat Protection for Azure SQL Database and SQL Data Warehouse detects anomalous activities
indicating unusual and potentially harmful attempts to access or exploit databases.
Advanced Threat Protection is part of the Advanced data security (ADS ) offering, which is a unified package for
advanced SQL security capabilities. Advanced Threat Protection can be accessed and managed via the central SQL
ADS portal.
NOTE
This topic applies to Azure SQL server, and to both SQL Database and SQL Data Warehouse databases that are created on
the Azure SQL server. For simplicity, SQL Database is used when referring to both SQL Database and SQL Data Warehouse.
2. Click a specific alert to get additional details and actions for investigating this threat and remediating future
threats.
For example, SQL injection is one of the most common Web application security issues on the Internet that
is used to attack data-driven applications. Attackers take advantage of application vulnerabilities to inject
malicious SQL statements into application entry fields, breaching or modifying data in the database. For
SQL Injection alerts, the alert’s details include the vulnerable SQL statement that was exploited.
Auditing for Azure SQL Database and SQL Data Warehouse tracks database events and writes them to an audit
log in your Azure storage account, Log Analytics workspace or Event Hubs. Auditing also:
Helps you maintain regulatory compliance, understand database activity, and gain insight into
discrepancies and anomalies that could indicate business concerns or suspected security violations.
Enables and facilitates adherence to compliance standards, although it doesn't guarantee compliance. For
more information about Azure programs that support standards compliance, see the Azure Trust Center
where you can find the most current list of SQL Database compliance certifications.
NOTE
This topic applies to Azure SQL server, and to both SQL Database and SQL Data Warehouse databases that are created on
the Azure SQL server. For simplicity, SQL Database is used when referring to both SQL Database and SQL Data Warehouse.
NOTE
This article was recently updated to use the term Azure Monitor logs instead of Log Analytics. Log data is still stored in a
Log Analytics workspace and is still collected and analyzed by the same Log Analytics service. We are updating the
terminology to better reflect the role of logs in Azure Monitor. See Azure Monitor terminology changes for details.
IMPORTANT
Audit logs are written to Append Blobs in Azure Blob storage on your Azure subscription.
All storage kinds (v1, v2, blob) are supported.
All storage replication configurations are supported.
Premium storage is currently not supported.
Storage in VNet is currently not supported.
Storage behind a Firewall is currently not supported
NOTE
You should avoid enabling both server blob auditing and database blob auditing together, unless:
You want to use a different storage account or retention period for a specific database.
You want to audit event types or categories for a specific database that differ from the rest of the databases on
the server. For example, you might have table inserts that need to be audited only for a specific database.
Otherwise, we recommended that you enable only server-level blob auditing and leave the database-level auditing
disabled for all databases.
3. If you prefer to set up a server auditing policy, you can select the View server settings link on the
database auditing page. You can then view or modify the server auditing settings. Server auditing policies
apply to all existing and newly created databases on this server.
4. If you prefer to enable auditing on the database level, switch Auditing to ON.
If server auditing is enabled, the database-configured audit will exist side-by-side with the server audit.
5. New - You now have multiple options for configuring where audit logs will be written. You can write logs
to an Azure storage account, to a Log Analytics workspace for consumption by Azure Monitor logs, or to
event hub for consumption using event hub. You can configure any combination of these options, and audit
logs will be written to each.
WARNING
Enabling auditing to Log Analytics will incur cost based on ingestion rates. Please be aware of the associated cost
with using this option, or consider storing the audit logs in an Azure storage account.
6. To configure writing audit logs to a storage account, select Storage and open Storage details. Select the
Azure storage account where logs will be saved, and then select the retention period. The old logs will be
deleted. Then click OK.
IMPORTANT
The default value for retention period is 0 (unlimited retention). You can change this value by moving the
Retention (Days) slider in Storage settings when configuring the storage account for auditing.
If you change retention period from 0 (unlimited retention) to any other value, please note that retention will
only apply to logs written after retention value was changed (logs written during the period when retention was
set to unlimited are preserved, even after retention is enabled)
7. To configure writing audit logs to a Log Analytics workspace, select Log Analytics (Preview) and open
Log Analytics details. Select or create the Log Analytics workspace where logs will be written and then
click OK.
8. To configure writing audit logs to an event hub, select Event Hub (Preview) and open Event Hub
details. Select the event hub where logs will be written and then click OK. Be sure that the event hub is in
the same region as your database and server.
9. Click Save.
10. If you want to customize the audited events, you can do this via PowerShell cmdlets or the REST API.
11. After you've configured your auditing settings, you can turn on the new threat detection feature and
configure emails to receive security alerts. When you use threat detection, you receive proactive alerts on
anomalous database activities that can indicate potential security threats. For more information, see
Getting started with threat detection.
IMPORTANT
Enabling auditing on an paused Azure SQL Data Warehouse is not possible. To enable it, un-pause the Data Warehouse.
WARNING
Enabling auditing on a server that has an Azure SQL Data Warehouse on it will result in the Data Warehouse being
resumed and re-paused again which may incur in billing charges.
Clicking View dashboard at the top of the Audit records page will open a dashboard displaying audit
logs info, where you can drill down into Security Insights, Access to Sensitive Data and more. This
dashboard is designed to help you gain security insights for your data. You can also customize the time
range and search query.
Alternatively, you can also access the audit logs from Log Analytics blade. Open your Log Analytics
workspace and under General section, click Logs. You can start with a simple query, such as: search
"SQLSecurityAuditEvents" to view the audit logs. From here, you can also use Azure Monitor logs to run
advanced searches on your audit log data. Azure Monitor logs gives you real-time operational insights
using integrated search and custom dashboards to readily analyze millions of records across all your
workloads and servers. For additional useful information about Azure Monitor logs search language and
commands, see Azure Monitor logs search reference.
If you chose to write audit logs to Event Hub:
To consume audit logs data from Event Hub, you will need to set up a stream to consume events and write
them to a target. For more information, see Azure Event Hubs Documentation.
Audit logs in Event Hub are captured in the body of Apache Avro events and stored using JSON formatting
with UTF -8 encoding. To read the audit logs, you can use Avro Tools or similar tools that process this format.
If you chose to write audit logs to an Azure storage account, there are several methods you can use to view the
logs:
Audit logs are aggregated in the account you chose during setup. You can explore audit logs by using a tool
such as Azure Storage Explorer. In Azure storage, auditing logs are saved as a collection of blob files within
a container named sqldbauditlogs. For further details about the hierarchy of the storage folder, naming
conventions, and log format, see the SQL Database Audit Log Format.
Use the Azure portal. Open the relevant database. At the top of the database's Auditing page, click View
audit logs.
Audit records opens, from which you'll be able to view the logs.
You can view specific dates by clicking Filter at the top of the Audit records page.
You can switch between audit records that were created by the server audit policy and the database
audit policy by toggling Audit Source.
You can view only SQL injection related audit records by checking Show only audit records for
SQL injections checkbox.
Use the system function sys.fn_get_audit_file (T-SQL ) to return the audit log data in tabular format. For
more information on using this function, see sys.fn_get_audit_file.
Use Merge Audit Files in SQL Server Management Studio (starting with SSMS 17):
1. From the SSMS menu, select File > Open > Merge Audit Files.
2. The Add Audit Files dialog box opens. Select one of the Add options to choose whether to merge
audit files from a local disk or import them from Azure Storage. You are required to provide your
Azure Storage details and account key.
3. After all files to merge have been added, click OK to complete the merge operation.
4. The merged file opens in SSMS, where you can view and analyze it, as well as export it to an XEL or
CSV file, or to a table.
Use Power BI. You can view and analyze audit log data in Power BI. For more information and to access a
downloadable template, see Analyze audit log data in Power BI.
Download log files from your Azure Storage blob container via the portal or by using a tool such as Azure
Storage Explorer.
After you have downloaded a log file locally, double-click the file to open, view, and analyze the logs in
SSMS.
You can also download multiple files simultaneously via Azure Storage Explorer. To do so, right-click a
specific subfolder and select Save as to save in a local folder.
Additional methods:
After downloading several files or a subfolder that contains log files, you can merge them locally as
described in the SSMS Merge Audit Files instructions described previously.
View blob auditing logs programmatically:
Query Extended Events Files by using PowerShell.
Production practices
With geo-replicated databases, when you enable auditing on the primary database the secondary database will
Audi ti ng geo-repl i cated databases
have an identical auditing policy. It is also possible to set up auditing on the secondary database by enabling
auditing on the secondary server, independently from the primary database.
Server-level (recommended): Turn on auditing on both the primary server as well as the secondary server
- the primary and secondary databases will each be audited independently based on their respective server-
level policy.
Database-level: Database-level auditing for secondary databases can only be configured from Primary
database auditing settings.
Auditing must be enabled on the primary database itself, not the server.
After auditing is enabled on the primary database, it will also become enabled on the secondary
database.
IMPORTANT
With database-level auditing, the storage settings for the secondary database will be identical to those of
the primary database, causing cross-regional traffic. We recommend that you enable only server-level
auditing, and leave the database-level auditing disabled for all databases.
In production, you are likely to refresh your storage keys periodically. When writing audit logs to Azure storage,
Storage key regenerati on
you need to resave your auditing policy when refreshing your keys. The process is as follows:
1. Open Storage Details. In the Storage Access Key box, select Secondary, and click OK. Then click Save
at the top of the auditing configuration page.
2. Go to the storage configuration page and regenerate the primary access key.
3. Go back to the auditing configuration page, switch the storage access key from secondary to primary, and
then click OK. Then click Save at the top of the auditing configuration page.
4. Go back to the storage configuration page and regenerate the secondary access key (in preparation for the
next key's refresh cycle).
Additional Information
For details about the log format, hierarchy of the storage folder and naming conventions, see the Blob
Audit Log Format Reference.
IMPORTANT
Azure SQL Database Audit stores 4000 characters of data for character fields in an audit record. When the
statement or the data_sensitivity_information values returned from an auditable action contain more than 4000
characters, any data beyond the first 4000 characters will be truncated and not audited.
Audit logs are written to Append Blobs in an Azure Blob storage on your Azure subscription:
Premium Storage is currently not supported by Append Blobs.
Storage in VNet is currently not supported.
The default auditing policy includes all actions and the following set of action groups, which will audit all
the queries and stored procedures executed against the database, as well as successful and failed logins:
BATCH_COMPLETED_GROUP
SUCCESSFUL_DATABASE_AUTHENTICATION_GROUP
FAILED_DATABASE_AUTHENTICATION_GROUP
You can configure auditing for different types of actions and action groups using PowerShell, as described
in the Manage SQL database auditing using Azure PowerShell section.
When using AAD Authentication, failed logins records will not appear in the SQL audit log. To view failed
login audit records, you need to visit the Azure Active Directory portal, which logs details of these events.
NOTE
The linked samples are on an external public repository and are provided 'as is', without warranty, and are not supported
under any Microsoft support program/service.
Azure SQL Database and Azure SQL Data
Warehouse IP firewall rules
9/30/2019 • 11 minutes to read • Edit Online
NOTE
This article applies to Azure SQL servers, and to both Azure SQL Database and Azure SQL Data Warehouse databases on
an Azure SQL server. For simplicity, SQL Database is used to refer to both SQL Database and SQL Data Warehouse.
IMPORTANT
This article does not apply to Azure SQL Database Managed Instance. For information about network configuration, see
Connect your application to Azure SQL Database Managed Instance.
When you create a new Azure SQL server named mysqlserver, for example, the SQL Database firewall blocks all
access to the public endpoint for the server (which is accessible at mysqlserver.database.windows.net).
IMPORTANT
SQL Data Warehouse only supports server-level IP firewall rules. It doesn't support database-level IP firewall rules.
NOTE
For information about portable databases in the context of business continuity, see Authentication requirements for
disaster recovery.
NOTE
To access SQL Database from your local computer, ensure that the firewall on your network and local computer allow
outgoing communication on TCP port 1433.
IMPORTANT
This option configures the firewall to allow all connections from Azure, including connections from the subscriptions of
other customers. If you select this option, make sure that your login and user permissions limit access to authorized users
only.
IMPORTANT
Database-level IP firewall rules can only be created and managed by using Transact-SQL.
To improve performance, server-level IP firewall rules are temporarily cached at the database level. To refresh the
cache, see DBCC FLUSHAUTHCACHE.
TIP
You can use SQL Database Auditing to audit server-level and database-level firewall changes.
TIP
For a tutorial, see Create a DB using the Azure portal.
2. Select Add client IP on the toolbar to add the IP address of the computer that you're using, and then
select Save. A server-level IP firewall rule is created for your current IP address.
The following example reviews the existing rules, enables a range of IP addresses on the server Contoso, and
deletes an IP firewall rule:
To delete a server-level IP firewall rule, execute the sp_delete_firewall_rule stored procedure. The following
example deletes the rule ContosoFirewallRule:
NOTE
This article has been updated to use the new Azure PowerShell Az module. You can still use the AzureRM module, which will
continue to receive bug fixes until at least December 2020. To learn more about the new Az module and AzureRM
compatibility, see Introducing the new Azure PowerShell Az module. For Az module installation instructions, see Install
Azure PowerShell.
IMPORTANT
The PowerShell Azure Resource Manager module is still supported by Azure SQL Database, but all development is now for
the Az.Sql module. For these cmdlets, see AzureRM.Sql. The arguments for the commands in the Az and AzureRm modules
are substantially identical.
TIP
For PowerShell examples in the context of a quickstart, see Create DB - PowerShell and Create a single database and
configure a SQL Database server-level IP firewall rule using PowerShell.
az sql server firewall-rule list Server Lists the IP firewall rules on a server
az sql server firewall-rule show Server Shows the detail of an IP firewall rule
Next steps
Confirm that your corporate network environment allows inbound communication from the compute IP
address ranges (including SQL ranges) that are used by the Azure datacenters. You might have to add those
IP addresses to the allow list. See Microsoft Azure datacenter IP ranges.
For a quickstart about creating a server-level IP firewall rule, see Create an Azure SQL database.
For help with connecting to an Azure SQL database from open-source or third-party applications, see Client
quickstart code samples to SQL Database.
For information about additional ports that you may need to open, see the "SQL Database: Outside vs inside"
section of Ports beyond 1433 for ADO.NET 4.5 and SQL Database
For an overview of Azure SQL Database security, see Securing your database.
Private Link for Azure SQL Database and Data
Warehouse (Preview)
9/17/2019 • 7 minutes to read • Edit Online
Private Link allows you to connect to various PaaS services in Azure via a private endpoint. For a list to PaaS
services that support Private Link functionality, go to the Private Link Documentation page. A private endpoint is a
private IP address within a specific VNet and Subnet.
IMPORTANT
This article applies to Azure SQL server, and to both SQL Database and SQL Data Warehouse databases that are created on
the Azure SQL server. For simplicity, SQL Database is used when referring to both SQL Database and SQL Data Warehouse.
This article does not apply to a managed instance deployment in Azure SQL Database.
4. After approval or rejection, the list will reflect the appropriate state along with the response text.
When Telnet connects successfully, you'll see a blank screen at the command window like the below image:
The output show that Psping could ping the private IP address associated with the PEC.
Check connectivity using Nmap
Nmap (Network Mapper) is a free and open-source tool used for network discovery and security auditing. For
more information and the download link, visit https://nmap.org. You can use this tool to ensure that the private
endpoint is listening for connections on port 1433.
Run Nmap as follows by providing the address range of the subnet that hosts the private endpoint.
The result shows that one IP address is up; which corresponds to the IP address for the private endpoint.
Check Connectivity using SQL Server Management Studio (SSMS )
The last step is to use SSMS to connect to the SQL Database. After you connect to the SQL Database using SSMS,
verify that you're connecting from the private IP address of the Azure VM by running the following query:
NOTE
In preview, connections to private endpoint only support Proxy as the connection policy
Next steps
For an overview of Azure SQL Database security, see Securing your database
For an overview of Azure SQL Database connectivity, see Azure SQL Connectivity Architecture
Use virtual network service endpoints and rules for
database servers
10/2/2019 • 13 minutes to read • Edit Online
Virtual network rules are one firewall security feature that controls whether the database server for your single
databases and elastic pool in Azure SQL Database or for your databases in SQL Data Warehouse accepts
communications that are sent from particular subnets in virtual networks. This article explains why the virtual
network rule feature is sometimes your best option for securely allowing communication to your Azure SQL
Database and SQL Data Warehouse.
IMPORTANT
This article applies to Azure SQL server, and to both SQL Database and SQL Data Warehouse databases that are created on
the Azure SQL server. For simplicity, SQL Database is used when referring to both SQL Database and SQL Data Warehouse.
This article does not apply to a managed instance deployment in Azure SQL Database because it does not have a service
endpoint associated with it.
To create a virtual network rule, there must first be a virtual network service endpoint for the rule to reference.
NOTE
In some cases the Azure SQL Database and the VNet-subnet are in different subscriptions. In these cases you must ensure
the following configurations:
Both subscriptions must be in the same Azure Active Directory tenant.
The user has the required permissions to initiate operations, such as enabling service endpoints and adding a VNet-
subnet to the given Server.
Both subscriptions must have the Microsoft.Sql provider registered.
Limitations
For Azure SQL Database, the virtual network rules feature has the following limitations:
In the firewall for your SQL Database, each virtual network rule references a subnet. All these referenced
subnets must be hosted in the same geographic region that hosts the SQL Database.
Each Azure SQL Database server can have up to 128 ACL entries for any given virtual network.
Virtual network rules apply only to Azure Resource Manager virtual networks; and not to classic
deployment model networks.
Turning ON virtual network service endpoints to Azure SQL Database also enables the endpoints for the
MySQL and PostgreSQL Azure services. However, with endpoints ON, attempts to connect from the
endpoints to your MySQL or PostgreSQL instances may fail.
The underlying reason is that MySQL and PostgreSQL likely do not have a virtual network rule
configured. You must configure a virtual network rule for Azure Database for MySQL and PostgreSQL
and the connection will succeed.
On the firewall, IP address ranges do apply to the following networking items, but virtual network rules do
not:
Site-to-Site (S2S ) virtual private network (VPN )
On-premises via ExpressRoute
Considerations when using Service Endpoints
When using service endpoints for Azure SQL Database, review the following considerations:
Outbound to Azure SQL Database Public IPs is required: Network Security Groups (NSGs) must be
opened to Azure SQL Database IPs to allow connectivity. You can do this by using NSG Service Tags for Azure
SQL Database.
ExpressRoute
If you are using ExpressRoute from your premises, for public peering or Microsoft peering, you will need to
identify the NAT IP addresses that are used. For public peering, each ExpressRoute circuit by default uses two NAT
IP addresses applied to Azure service traffic when the traffic enters the Microsoft Azure network backbone. For
Microsoft peering, the NAT IP address(es) that are used are either customer provided or are provided by the
service provider. To allow access to your service resources, you must allow these public IP addresses in the
resource IP firewall setting. To find your public peering ExpressRoute circuit IP addresses, open a support ticket
with ExpressRoute via the Azure portal. Learn more about NAT for ExpressRoute public and Microsoft peering.
To allow communication from your circuit to Azure SQL Database, you must create IP network rules for the public
IP addresses of your NAT.
NOTE
This article has been updated to use the new Azure PowerShell Az module. You can still use the AzureRM module, which will
continue to receive bug fixes until at least December 2020. To learn more about the new Az module and AzureRM
compatibility, see Introducing the new Azure PowerShell Az module. For Az module installation instructions, see Install Azure
PowerShell.
IMPORTANT
The PowerShell Azure Resource Manager module is still supported by Azure SQL Database, but all future development is for
the Az.Sql module. For these cmdlets, see AzureRM.Sql. The arguments for the commands in the Az module and in the
AzureRm modules are substantially identical.
Connect-AzAccount
Select-AzSubscription -SubscriptionId your-subscriptionId
Set-AzSqlServer -ResourceGroupName your-database-server-resourceGroup -ServerName your-SQL-servername -
AssignIdentity
NOTE
If you have a general-purpose v1 or blob storage account, you must first upgrade to v2 using this guide.
For known issues with Azure Data Lake Storage Gen2, please refer to this guide.
2. Under your storage account, navigate to Access Control (IAM ), and click Add role assignment. Assign
Storage Blob Data Contributor RBAC role to your Azure SQL Server hosting your Azure SQL Data
Warehouse which you've registered with Azure Active Directory (AAD ) as in step#1.
NOTE
Only members with Owner privilege can perform this step. For various built-in roles for Azure resources, refer to this
guide.
CREATE DATABASE SCOPED CREDENTIAL msi_cred WITH IDENTITY = 'Managed Service Identity';
NOTE
There is no need to specify SECRET with Azure Storage access key because this mechanism uses Managed
Identity under the covers.
IDENTITY name should be 'Managed Service Identity' for PolyBase connectivity to work with Azure
Storage account secured to VNet.
c. Create external data source with abfss:// scheme for connecting to your general-purpose v2 storage
account using PolyBase:
NOTE
If you already have external tables associated with general-purpose v1 or blob storage account, you
should first drop those external tables and then drop corresponding external data source. Then create
external data source with abfss:// scheme connecting to general-purpose v2 storage account as above and
re-create all the external tables using this new external data source. You could use Generate and Publish
Scripts Wizard to generate create-scripts for all the external tables for ease.
For more information on abfss:// scheme, refer to this guide.
For more information on CREATE EXTERNAL DATA SOURCE, refer to this guide.
NOTE
If you intend to add a service endpoint to the VNet firewall rules of your Azure SQL Database server, first ensure that service
endpoints are turned On for the subnet.
If service endpoints are not turned on for the subnet, the portal asks you to enable them. Click the Enable button on the
same blade on which you add the rule.
PowerShell alternative
A PowerShell script can also create virtual network rules. The crucial cmdlet New-
AzSqlServerVirtualNetworkRule. If interested, see PowerShell to create a Virtual Network service endpoint and
rule for Azure SQL Database.
Prerequisites
You must already have a subnet that is tagged with the particular Virtual Network service endpoint type name
relevant to Azure SQL Database.
The relevant endpoint type name is Microsoft.Sql.
If your subnet might not be tagged with the type name, see Verify your subnet is an endpoint.
IMPORTANT
If you leave the control set to ON, your Azure SQL Database server accepts communication from any subnet inside
the Azure boundary i.e. originating from one of the IP addresses that is recognized as those within ranges defined for
Azure data centers. Leaving the control set to ON might be excessive access from a security point of view. The
Microsoft Azure Virtual Network service endpoint feature, in coordination with the virtual network rule feature of
SQL Database, together can reduce your security surface area.
TIP
You must include the correct Address prefix for your subnet. You can find the value in the portal. Navigate All
resources > All types > Virtual networks. The filter displays your virtual networks. Click your virtual network, and
then click Subnets. The ADDRESS RANGE column has the Address prefix you need.
6. Click the OK button near the bottom of the pane.
7. See the resulting virtual network rule on the firewall pane.
NOTE
The following statuses or states apply to the rules:
Ready: Indicates that the operation that you initiated has Succeeded.
Failed: Indicates that the operation that you initiated has Failed.
Deleted: Only applies to the Delete operation, and indicates that the rule has been deleted and no longer applies.
InProgress: Indicates that the operation is in progress. The old rule applies while the operation is in this state.
Related articles
Azure virtual network service endpoints
Azure SQL Database server-level and database-level firewall rules
The virtual network rule feature for Azure SQL Database became available in late September 2017.
Next steps
Use PowerShell to create a virtual network service endpoint, and then a virtual network rule for Azure SQL
Database.
Virtual Network Rules: Operations with REST APIs
Authenticate to Azure SQL Data Warehouse
4/3/2019 • 2 minutes to read • Edit Online
Learn how to authenticate to Azure SQL Data Warehouse by using Azure Active Directory (AAD ) or SQL Server
authentication.
To connect to SQL Data Warehouse, you must pass in security credentials for authentication purposes. Upon
establishing a connection, certain connection settings are configured as part of establishing your query session.
For more information on security and how to enable connections to your data warehouse, see Secure a database
in SQL Data Warehouse.
SQL authentication
To connect to SQL Data Warehouse, you must provide the following information:
Fully qualified servername
Specify SQL authentication
Username
Password
Default database (optional)
By default your connection connects to the master database and not your user database. To connect to your user
database, you can choose to do one of two things:
Specify the default database when registering your server with the SQL Server Object Explorer in SSDT,
SSMS, or in your application connection string. For example, include the InitialCatalog parameter for an
ODBC connection.
Highlight the user database before creating a session in SSDT.
NOTE
The Transact-SQL statement USE MyDatabase; is not supported for changing the database for a connection. For guidance
connecting to SQL Data Warehouse with SSDT, refer to the Query with Visual Studio article.
NOTE
Azure Active Directory is still relatively new and has some limitations. To ensure that Azure Active Directory is a good fit for
your environment, see Azure AD features and limitations, specifically the Additional considerations.
Configuration steps
Follow these steps to configure Azure Active Directory authentication.
1. Create and populate an Azure Active Directory
2. Optional: Associate or change the active directory that is currently associated with your Azure Subscription
3. Create an Azure Active Directory administrator for Azure SQL Data Warehouse.
4. Configure your client computers
5. Create contained database users in your database mapped to Azure AD identities
6. Connect to your data warehouse by using Azure AD identities
Currently Azure Active Directory users are not shown in SSDT Object Explorer. As a workaround, view the users
in sys.database_principals.
Find the details
The steps to configure and use Azure Active Directory authentication are nearly identical for Azure SQL
Database and Azure SQL Data Warehouse. Follow the detailed steps in the topic Connecting to SQL Database
or SQL Data Warehouse By Using Azure Active Directory Authentication.
Create custom database roles and add users to the roles. Then grant granular permissions to the roles. For
more information, see Getting Started with Database Engine Permissions.
Next steps
To start querying your data warehouse with Visual Studio and other applications, see Query with Visual Studio.
Use Azure Active Directory Authentication for
authentication with SQL
8/21/2019 • 9 minutes to read • Edit Online
Azure Active Directory authentication is a mechanism of connecting to Azure SQL Database, Managed Instance,
and SQL Data Warehouse by using identities in Azure Active Directory (Azure AD ).
NOTE
This topic applies to Azure SQL server, and to both SQL Database and SQL Data Warehouse databases that are created on
the Azure SQL server. For simplicity, SQL Database is used when referring to both SQL Database and SQL Data Warehouse.
With Azure AD authentication, you can centrally manage the identities of database users and other Microsoft
services in one central location. Central ID management provides a single place to manage database users and
simplifies permission management. Benefits include the following:
It provides an alternative to SQL Server authentication.
Helps stop the proliferation of user identities across database servers.
Allows password rotation in a single place.
Customers can manage database permissions using external (Azure AD ) groups.
It can eliminate storing passwords by enabling integrated Windows authentication and other forms of
authentication supported by Azure Active Directory.
Azure AD authentication uses contained database users to authenticate identities at the database level.
Azure AD supports token-based authentication for applications connecting to SQL Database.
Azure AD authentication supports ADFS (domain federation) or native user/password authentication for a
local Azure Active Directory without domain synchronization.
Azure AD supports connections from SQL Server Management Studio that use Active Directory Universal
Authentication, which includes Multi-Factor Authentication (MFA). MFA includes strong authentication with a
range of easy verification options — phone call, text message, smart cards with pin, or mobile app notification.
For more information, see SSMS support for Azure AD MFA with SQL Database and SQL Data Warehouse.
Azure AD supports similar connections from SQL Server Data Tools (SSDT) that use Active Directory
Interactive Authentication. For more information, see Azure Active Directory support in SQL Server Data Tools
(SSDT).
NOTE
Connecting to SQL Server running on an Azure VM is not supported using an Azure Active Directory account. Use a domain
Active Directory account instead.
The configuration steps include the following procedures to configure and use Azure Active Directory
authentication.
1. Create and populate Azure AD.
2. Optional: Associate or change the active directory that is currently associated with your Azure Subscription.
3. Create an Azure Active Directory administrator for the Azure SQL Database server, the Managed Instance, or
the Azure SQL Data Warehouse.
4. Configure your client computers.
5. Create contained database users in your database mapped to Azure AD identities.
6. Connect to your database by using Azure AD identities.
NOTE
To learn how to create and populate Azure AD, and then configure Azure AD with Azure SQL Database, Managed Instance,
and SQL Data Warehouse, see Configure Azure AD with Azure SQL Database.
Trust architecture
The following high-level diagram summarizes the solution architecture of using Azure AD authentication with
Azure SQL Database. The same concepts apply to SQL Data Warehouse. To support Azure AD native user
password, only the Cloud portion and Azure AD/Azure SQL Database is considered. To support Federated
authentication (or user/password for Windows credentials), the communication with ADFS block is required. The
arrows indicate communication pathways.
The following diagram indicates the federation, trust, and hosting relationships that allow a client to connect to a
database by submitting a token. The token is authenticated by an Azure AD, and is trusted by the database.
Customer 1 can represent an Azure Active Directory with native users or an Azure AD with federated users.
Customer 2 represents a possible solution including imported users; in this example coming from a federated
Azure Active Directory with ADFS being synchronized with Azure Active Directory. It's important to understand
that access to a database using Azure AD authentication requires that the hosting subscription is associated to the
Azure AD. The same subscription must be used to create the SQL Server hosting the Azure SQL Database or SQL
Data Warehouse.
Administrator structure
When using Azure AD authentication, there are two Administrator accounts for the SQL Database server and
Managed Instance; the original SQL Server administrator and the Azure AD administrator. The same concepts
apply to SQL Data Warehouse. Only the administrator based on an Azure AD account can create the first Azure
AD contained database user in a user database. The Azure AD administrator login can be an Azure AD user or an
Azure AD group. When the administrator is a group account, it can be used by any group member, enabling
multiple Azure AD administrators for the SQL Server instance. Using group account as an administrator enhances
manageability by allowing you to centrally add and remove group members in Azure AD without changing the
users or permissions in SQL Database. Only one Azure AD administrator (a user or group) can be configured at
any time.
Permissions
To create new users, you must have the ALTER ANY USER permission in the database. The ALTER ANY USER
permission can be granted to any database user. The ALTER ANY USER permission is also held by the server
administrator accounts, and database users with the CONTROL ON DATABASE or ALTER ON DATABASE permission for
that database, and by members of the db_owner database role.
To create a contained database user in Azure SQL Database, Managed Instance, or SQL Data Warehouse, you
must connect to the database or instance using an Azure AD identity. To create the first contained database user,
you must connect to the database by using an Azure AD administrator (who is the owner of the database). This is
demonstrated in Configure and manage Azure Active Directory authentication with SQL Database or SQL Data
Warehouse. Any Azure AD authentication is only possible if the Azure AD admin was created for Azure SQL
Database or SQL Data Warehouse server. If the Azure Active Directory admin was removed from the server,
existing Azure Active Directory users created previously inside SQL Server can no longer connect to the database
using their Azure Active Directory credentials.
Grant the db_owner role directly to the individual Azure AD user to mitigate the CREATE DATABASE
SCOPED CREDENTIAL issue.
These system functions return NULL values when executed under Azure AD principals:
SUSER_ID()
SUSER_NAME(<admin ID>)
SUSER_SNAME(<admin SID>)
SUSER_ID(<admin name>)
SUSER_SID(<admin name>)
Managed Instances
Azure AD server principals (logins) and users are supported as a preview feature for Managed Instances.
Setting Azure AD server principals (logins) mapped to an Azure AD group as database owner is not supported
in Managed Instances.
An extension of this is that when a group is added as part of the dbcreator server role, users from this
group can connect to the Managed Instance and create new databases, but will not be able to access the
database. This is because the new database owner is SA, and not the Azure AD user. This issue does not
manifest if the individual user is added to the dbcreator server role.
SQL Agent management and jobs execution is supported for Azure AD server principals (logins).
Database backup and restore operations can be executed by Azure AD server principals (logins).
Auditing of all statements related to Azure AD server principals (logins) and authentication events is
supported.
Dedicated administrator connection for Azure AD server principals (logins) which are members of sysadmin
server role is supported.
Supported through SQLCMD Utility and SQL Server Management Studio.
Logon triggers are supported for logon events coming from Azure AD server principals (logins).
Service Broker and DB mail can be setup using an Azure AD server principal (login).
Next steps
To learn how to create and populate Azure AD, and then configure Azure AD with Azure SQL Database or
Azure SQL Data Warehouse, see Configure and manage Azure Active Directory authentication with SQL
Database, Managed Instance, or SQL Data Warehouse.
For a tutorial of using Azure AD server principals (logins) with Managed Instances, see Azure AD server
principals (logins) with Managed Instances
For an overview of access and control in SQL Database, see SQL Database access and control.
For an overview of logins, users, and database roles in SQL Database, see Logins, users, and database roles.
For more information about database principals, see Principals.
For more information about database roles, see Database roles.
For syntax on creating Azure AD server principals (logins) for Managed Instances, see CREATE LOGIN.
For more information about firewall rules in SQL Database, see SQL Database firewall rules.
Controlling and granting database access to SQL
Database and SQL Data Warehouse
8/14/2019 • 12 minutes to read • Edit Online
After firewall rules configuration, you can connect to Azure SQL Database and SQL Data Warehouse as one of
the administrator accounts, as the database owner, or as a database user in the database.
NOTE
This topic applies to Azure SQL server, and to SQL Database and SQL Data Warehouse databases created on the Azure SQL
server. For simplicity, SQL Database is used when referring to both SQL Database and SQL Data Warehouse.
TIP
For a tutorial, see Secure your Azure SQL Database. This tutorial does not apply to Azure SQL Database Managed
Instance.
Server admin
When you create an Azure SQL server, you must designate a Server admin login. SQL server creates
that account as a login in the master database. This account connects using SQL Server authentication
(user name and password). Only one of these accounts can exist.
NOTE
To reset the password for the server admin, go to the Azure portal, click SQL Servers, select the server from the
list, and then click Reset Password.
IMPORTANT
It is recommended that you always use the latest version of Management Studio to remain synchronized with updates to
Microsoft Azure and SQL Database. Update SQL Server Management Studio.
In addition to the server-level administrative roles discussed previously, SQL Database provides two restricted
administrative roles in the master database to which user accounts can be added that grant permissions to either
create databases or manage logins.
Database creators
One of these administrative roles is the dbmanager role. Members of this role can create new databases. To use
this role, you create a user in the master database and then add the user to the dbmanager database role. To
create a database, the user must be a user based on a SQL Server login in the master database or contained
database user based on an Azure Active Directory user.
1. Using an administrator account, connect to the master database.
2. Create a SQL Server authentication login, using the CREATE LOGIN statement. Sample statement:
To improve performance, logins (server-level principals) are temporarily cached at the database level. To
refresh the authentication cache, see DBCC FLUSHAUTHCACHE.
3. In the master database, create a user by using the CREATE USER statement. The user can be an Azure
Active Directory authentication contained database user (if you have configured your environment for
Azure AD authentication), or a SQL Server authentication contained database user, or a SQL Server
authentication user based on a SQL Server authentication login (created in the previous step.) Sample
statements:
CREATE USER [[email protected]] FROM EXTERNAL PROVIDER; -- To create a user with Azure Active Directory
CREATE USER Ann WITH PASSWORD = '<strong_password>'; -- To create a SQL Database contained database
user
CREATE USER Mary FROM LOGIN Mary; -- To create a SQL Server user based on a SQL Server authentication
login
4. Add the new user, to the dbmanager database role in master using the ALTER ROLE statement. Sample
statements:
NOTE
The dbmanager is a database role in master database so you can only add a database user to the dbmanager role.
You cannot add a server-level login to database-level role.
5. If necessary, configure a firewall rule to allow the new user to connect. (The new user might be covered by
an existing firewall rule.)
Now the user can connect to the master database and can create new databases. The account creating the
database becomes the owner of the database.
Login managers
The other administrative role is the login manager role. Members of this role can create new logins in the master
database. If you wish, you can complete the same steps (create a login and user, and add a user to the
loginmanager role) to enable a user to create new logins in the master. Usually logins are not necessary as
Microsoft recommends using contained database users, which authenticate at the database-level instead of using
users based on logins. For more information, see Contained Database Users - Making Your Database Portable.
Non-administrator users
Generally, non-administrator accounts do not need access to the master database. Create contained database
users at the database level using the CREATE USER (Transact-SQL ) statement. The user can be an Azure Active
Directory authentication contained database user (if you have configured your environment for Azure AD
authentication), or a SQL Server authentication contained database user, or a SQL Server authentication user
based on a SQL Server authentication login (created in the previous step.) For more information, see Contained
Database Users - Making Your Database Portable.
To create users, connect to the database, and execute statements similar to the following examples:
Initially, only one of the administrators or the owner of the database can create users. To authorize additional
users to create new users, grant that selected user the ALTER ANY USER permission, by using a statement such as:
To give additional users full control of the database, make them a member of the db_owner fixed database role.
In Azure SQL Database use the ALTER ROLE statement.
NOTE
One common reason to create a database user based on a SQL Database server login is for users that need access to
multiple databases. Since contained database users are individual entities, each database maintains its own user and its own
password. This can cause overhead as the user must then remember each password for each database, and it can become
untenable when having to change multiple passwords for many databases. However, when using SQL Server Logins and
high availability (active geo-replication and failover groups), the SQL Server logins must be set manually at each server.
Otherwise, the database user will no longer be mapped to the server login after a failover occurs, and will not be able to
access the database post failover. For more information on configuring logins for geo-replication, please see Configure and
manage Azure SQL Database security for geo-restore or failover.
Permissions
There are over 100 permissions that can be individually granted or denied in SQL Database. Many of these
permissions are nested. For example, the UPDATE permission on a schema includes the UPDATE permission on
each table within that schema. As in most permission systems, the denial of a permission overrides a grant.
Because of the nested nature and the number of permissions, it can take careful study to design an appropriate
permission system to properly protect your database. Start with the list of permissions at Permissions (Database
Engine) and review the poster size graphic of the permissions.
Considerations and restrictions
When managing logins and users in SQL Database, consider the following:
You must be connected to the master database when executing the CREATE/ALTER/DROP DATABASE
statements.
The database user corresponding to the Server admin login cannot be altered or dropped.
US -English is the default language of the Server admin login.
Only the administrators (Server admin login or Azure AD administrator) and the members of the
dbmanager database role in the master database have permission to execute the CREATE DATABASE and
DROP DATABASE statements.
You must be connected to the master database when executing the CREATE/ALTER/DROP LOGIN statements.
However using logins is discouraged. Use contained database users instead.
To connect to a user database, you must provide the name of the database in the connection string.
Only the server-level principal login and the members of the loginmanager database role in the master
database have permission to execute the CREATE LOGIN , ALTER LOGIN , and DROP LOGIN statements.
When executing the CREATE/ALTER/DROP LOGIN and CREATE/ALTER/DROP DATABASE statements in an ADO.NET
application, using parameterized commands is not allowed. For more information, see Commands and
Parameters.
When executing the CREATE/ALTER/DROP DATABASE and CREATE/ALTER/DROP LOGIN statements, each of these
statements must be the only statement in a Transact-SQL batch. Otherwise, an error occurs. For example,
the following Transact-SQL checks whether the database exists. If it exists, a DROP DATABASE statement is
called to remove the database. Because the DROP DATABASE statement is not the only statement in the batch,
executing the following Transact-SQL statement results in an error.
When executing the CREATE USER statement with the FOR/FROM LOGIN option, it must be the only statement
in a Transact-SQL batch.
When executing the ALTER USER statement with the WITH LOGIN option, it must be the only statement in a
Transact-SQL batch.
To CREATE/ALTER/DROP a user requires the ALTER ANY USER permission on the database.
When the owner of a database role tries to add or remove another database user to or from that database
role, the following error may occur: User or role 'Name' does not exist in this database. This error
occurs because the user is not visible to the owner. To resolve this issue, grant the role owner the
VIEW DEFINITION permission on the user.
Next steps
To learn more about firewall rules, see Azure SQL Database Firewall.
For an overview of all the SQL Database security features, see SQL security overview.
For a tutorial, see Secure your Azure SQL Database.
For information about views and stored procedures, see Creating views and stored procedures
For information about granting access to a database object, see Granting Access to a Database Object
Using Multi-factor AAD authentication with Azure
SQL Database and Azure SQL Data Warehouse
(SSMS support for MFA)
8/25/2019 • 5 minutes to read • Edit Online
Azure SQL Database and Azure SQL Data Warehouse support connections from SQL Server Management
Studio (SSMS ) using Active Directory Universal Authentication. This article discusses the differences between the
various authentication options, and also the limitations associated with using Universal Authentication.
Download the latest SSMS - On the client computer, download the latest version of SSMS, from Download
SQL Server Management Studio (SSMS ).
For all the features discussed in this article, use at least July 2017, version 17.2. The most recent connection dialog
box, should look similar to the following image:
There are two non-interactive authentication models as well, which can be used in many different applications
(ADO.NET, JDCB, ODC, etc.). These two methods never result in pop-up dialog boxes:
Active Directory - Password
Active Directory - Integrated
The interactive method is that also supports Azure multi-factor authentication (MFA) is:
Active Directory - Universal with MFA
Azure MFA helps safeguard access to data and applications while meeting user demand for a simple sign-in
process. It delivers strong authentication with a range of easy verification options (phone call, text message, smart
cards with pin, or mobile app notification), allowing users to choose the method they prefer. Interactive MFA with
Azure AD can result in a pop-up dialog box for validation.
For a description of Multi-Factor Authentication, see Multi-Factor Authentication. For configuration steps, see
Configure Azure SQL Database multi-factor authentication for SQL Server Management Studio.
Azure AD domain name or tenant ID parameter
Beginning with SSMS version 17, users that are imported into the current Active Directory from other Azure
Active Directories as guest users, can provide the Azure AD domain name, or tenant ID when they connect. Guest
users include users invited from other Azure ADs, Microsoft accounts such as outlook.com, hotmail.com, live.com,
or other accounts like gmail.com. This information, allows Active Directory Universal with MFA
Authentication to identify the correct authenticating authority. This option is also required to support Microsoft
accounts (MSA) such as outlook.com, hotmail.com, live.com, or non-MSA accounts. All these users who want to
be authenticated using Universal Authentication must enter their Azure AD domain name or tenant ID. This
parameter represents the current Azure AD domain name/tenant ID the Azure Server is linked with. For example,
if Azure Server is associated with Azure AD domain contosotest.onmicrosoft.com where user
[email protected] is hosted as an imported user from Azure AD domain
contosodev.onmicrosoft.com , the domain name required to authenticate this user is contosotest.onmicrosoft.com .
When the user is a native user of the Azure AD linked to Azure Server, and is not an MSA account, no domain
name or tenant ID is required. To enter the parameter (beginning with SSMS version 17.2), in the Connect to
Database dialog box, complete the dialog box, selecting Active Directory - Universal with MFA authentication,
click Options, complete the User name box, and then click the Connection Properties tab. Check the AD
domain name or tenant ID box, and provide authenticating authority, such as the domain name
(contosotest.onmicrosoft.com ) or the GUID of the tenant ID.
If you are running SSMS 18.x or later then the AD domain name or tenant ID is no longer needed for guest users
because 18.x or later automatically recognizes it.
Azure AD business to business support
Azure AD users supported for Azure AD B2B scenarios as guest users (see What is Azure B2B collaboration) can
connect to SQL Database and SQL Data Warehouse only as part of members of a group created in current Azure
AD and mapped manually using the Transact-SQL CREATE USER statement in a given database. For example, if
[email protected] is invited to Azure AD contosotest (with the Azure Ad domain contosotest.onmicrosoft.com ), an
Azure AD group, such as usergroup must be created in the Azure AD that contains the [email protected] member.
Then, this group must be created for a specific database (that is, MyDatabase) by Azure AD SQL admin or Azure
AD DBO by executing a Transact-SQL CREATE USER [usergroup] FROM EXTERNAL PROVIDER statement. After the
database user is created, then the user [email protected] can log in to MyDatabase using the SSMS authentication
option Active Directory – Universal with MFA support . The usergroup, by default, has only the connect permission
and any further data access that will need to be granted in the normal way. Note that user [email protected] as a
guest user must check the box and add the AD domain name contosotest.onmicrosoft.com in the SSMS
Connection Property dialog box. The AD domain name or tenant ID option is only supported for the
Universal with MFA connection options, otherwise it is greyed out.
Next steps
For configuration steps, see Configure Azure SQL Database multi-factor authentication for SQL Server
Management Studio.
Grant others access to your database: SQL Database Authentication and Authorization: Granting Access
Make sure others can connect through the firewall: Configure an Azure SQL Database server-level firewall rule
using the Azure portal
Configure and manage Azure Active Directory authentication with SQL Database or SQL Data Warehouse
Microsoft SQL Server Data-Tier Application Framework (17.0.0 GA)
SQLPackage.exe
Import a BACPAC file to a new Azure SQL Database
Export an Azure SQL database to a BACPAC file
C# interface IUniversalAuthProvider Interface
When using Active Directory- Universal with MFA authentication, ADAL tracing is available beginning with
SSMS 17.3. Off by default, you can turn on ADAL tracing by using the Tools, Options menu, under Azure
Services, Azure Cloud, ADAL Output Window Trace Level, followed by enabling Output in the View
menu. The traces are available in the output window when selecting Azure Active Directory option.
Azure SQL Database and SQL Data Warehouse
access control
7/26/2019 • 4 minutes to read • Edit Online
To provide security, Azure SQL Database and SQL Data Warehouse control access with firewall rules limiting
connectivity by IP address, authentication mechanisms requiring users to prove their identity, and authorization
mechanisms limiting users to specific actions and data.
IMPORTANT
For an overview of the SQL Database security features, see SQL security overview. For a tutorial, see Secure your Azure SQL
Database. For an overview of SQL Data Warehouse security features, see SQL Data Warehouse security overview
Authentication
SQL Database supports two types of authentication:
SQL Authentication:
This authentication method uses a username and password. When you created the SQL Database server
for your database, you specified a "server admin" login with a username and password. Using these
credentials, you can authenticate to any database on that server as the database owner, or "dbo."
Azure Active Directory Authentication:
This authentication method uses identities managed by Azure Active Directory and is supported for
managed and integrated domains. If you want to use Azure Active Directory Authentication, you must
create another server admin called the "Azure AD admin," which is allowed to administer Azure AD users
and groups. This admin can also perform all operations that a regular server admin can. See Connecting to
SQL Database By Using Azure Active Directory Authentication for a walkthrough of how to create an Azure
AD admin to enable Azure Active Directory Authentication.
The Database Engine closes connections that remain idle for more than 30 minutes. The connection must login
again before it can be used. Continuously active connections to SQL Database require reauthorization (performed
by the database engine) at least every 10 hours. The database engine attempts reauthorization using the originally
submitted password and no user input is required. For performance reasons, when a password is reset in SQL
Database, the connection is not reauthenticated, even if the connection is reset due to connection pooling. This is
different from the behavior of on-premises SQL Server. If the password has been changed since the connection
was initially authorized, the connection must be terminated and a new connection made using the new password.
A user with the KILL DATABASE CONNECTION permission can explicitly terminate a connection to SQL Database by
using the KILL command.
User accounts can be created in the master database and can be granted permissions in all databases on the
server, or they can be created in the database itself (called contained users). For information on creating and
managing logins, see Manage logins. Use contained databases to enhance portability and scalability. For more
information on contained users, see Contained Database Users - Making Your Database Portable, CREATE USER
(Transact-SQL ), and Contained Databases.
As a best practice your application should use a dedicated account to authenticate -- this way you can limit the
permissions granted to the application and reduce the risks of malicious activity in case your application code is
vulnerable to a SQL injection attack. The recommended approach is to create a contained database user, which
allows your app to authenticate directly to the database.
Authorization
Authorization refers to what a user can do within an Azure SQL Database, and this is controlled by your user
account's database role memberships and object-level permissions. As a best practice, you should grant users the
least privileges necessary. The server admin account you are connecting with is a member of db_owner, which has
authority to do anything within the database. Save this account for deploying schema upgrades and other
management operations. Use the "ApplicationUser" account with more limited permissions to connect from your
application to the database with the least privileges needed by your application. For more information, see Manage
logins.
Typically, only administrators need access to the master database. Routine access to each user database should be
through non-administrator contained database users created in each database. When you use contained database
users, you do not need to create logins in the master database. For more information, see Contained Database
Users - Making Your Database Portable.
You should familiarize yourself with the following features that can be used to limit or elevate permissions:
Impersonation and module-signing can be used to securely elevate permissions temporarily.
Row -Level Security can be used limit which rows a user can access.
Data Masking can be used to limit exposure of sensitive data.
Stored procedures can be used to limit the actions that can be taken on the database.
Next steps
For an overview of the SQL Database security features, see SQL security overview.
To learn more about firewall rules, see Firewall rules.
To learn about users and logins, see Manage logins.
For a discussion of proactive monitoring, see Database Auditing and SQL Database threat detection.
For a tutorial, see Secure your Azure SQL Database.
Column-level Security
4/2/2019 • 2 minutes to read • Edit Online
Column-Level Security (CLS ) enables customers to control access to database table columns based on the user's
execution context or their group membership.
CLS simplifies the design and coding of security in your application. CLS enables you to implement restrictions on
column access to protect sensitive data. For example, ensuring that specific users can access only certain columns
of a table pertinent to their department. The access restriction logic is located in the database tier rather than away
from the data in another application tier. The database applies the access restrictions every time that data access is
attempted from any tier. This restriction makes your security system more reliable and robust by reducing the
surface area of your overall security system. In addition, CLS also eliminates the need for introducing views to filter
out columns for imposing access restrictions on the users.
You could implement CLS with the GRANT T-SQL statement. With this mechanism, both SQL and Azure Active
Directory (AAD ) authentication are supported.
Syntax
GRANT <permission> [ ,...n ] ON
[ OBJECT :: ][ schema_name ]. object_name [ ( column [ ,...n ] ) ]
TO <database_principal> [ ,...n ]
[ WITH GRANT OPTION ]
[ AS <database_principal> ]
<permission> ::=
SELECT
| UPDATE
<database_principal> ::=
Database_user
| Database_role
| Database_user_mapped_to_Windows_User
| Database_user_mapped_to_Windows_Group
Example
The following example shows how to restrict ‘TestUser’ from accessing ‘SSN’ column of ‘Membership’ table:
Create ‘Membership’ table with SSN column used to store social security numbers:
CREATE TABLE Membership
(MemberID int IDENTITY,
FirstName varchar(100) NULL,
SSN char(9) NOT NULL,
LastName varchar(100) NOT NULL,
Phone varchar(12) NULL,
Email varchar(100) NULL);
Allow ‘TestUser’ to access all columns except for SSN column that has sensitive data:
Queries executed as ‘TestUser’ will fail if they include the SSN column:
Use Cases
Some examples of how CLS is being used today:
A financial services firm allows only account managers to have access to customer social security numbers
(SSN ), phone numbers, and other personally identifiable information (PII).
A health care provider allows only doctors and nurses to have access to sensitive medical records while not
allowing members of the billing department to view this data.
Dynamic data masking for Azure SQL Database and
Data Warehouse
10/16/2019 • 3 minutes to read • Edit Online
SQL Database dynamic data masking limits sensitive data exposure by masking it to non-privileged users.
Dynamic data masking helps prevent unauthorized access to sensitive data by enabling customers to designate
how much of the sensitive data to reveal with minimal impact on the application layer. It’s a policy-based security
feature that hides the sensitive data in the result set of a query over designated database fields, while the data in
the database is not changed.
For example, a service representative at a call center may identify callers by several digits of their credit card
number, but those data items should not be fully exposed to the service representative. A masking rule can be
defined that masks all but the last four digits of any credit card number in the result set of any query. As another
example, an appropriate data mask can be defined to protect personally identifiable information (PII) data, so that a
developer can query production environments for troubleshooting purposes without violating compliance
regulations.
Credit card Masking method, which exposes the last four digits of
the designated fields and adds a constant string as a prefix
in the form of a credit card.
XXXX-XXXX-XXXX-1234
Custom text Masking method, which exposes the first and last
characters and adds a custom padding string in the middle. If
the original string is shorter than the exposed prefix and suffix,
only the padding string is used.
prefix[padding]suffix
Set up dynamic data masking for your database using REST API
See Operations for Azure SQL Database.
Transparent data encryption for SQL Database and
Data Warehouse
10/25/2019 • 10 minutes to read • Edit Online
Transparent data encryption (TDE ) helps protect Azure SQL Database, Azure SQL Managed Instance, and Azure
Data Warehouse against the threat of malicious offline activity by encrypting data at rest. It performs real-time
encryption and decryption of the database, associated backups, and transaction log files at rest without requiring
changes to the application. By default, TDE is enabled for all newly deployed Azure SQL databases. TDE cannot be
used to encrypt the logical master database in SQL Database. The master database contains objects that are
needed to perform the TDE operations on the user databases.
TDE needs to be manually enabled for older databases of Azure SQL Database, Azure SQL Managed Instance, or
Azure SQL Data Warehouse. Managed Instance databases created through restore inherit encryption status from
the source database.
Transparent data encryption encrypts the storage of an entire database by using a symmetric key called the
database encryption key. This database encryption key is protected by the transparent data encryption protector.
The protector is either a service-managed certificate (service-managed transparent data encryption) or an
asymmetric key stored in Azure Key Vault (Bring Your Own Key). You set the transparent data encryption protector
at the server level for Azure SQL Database and Data Warehouse, and instance level for Azure SQL Managed
Instance. The term server refers both to server and instance throughout this document, unless stated differently.
On database startup, the encrypted database encryption key is decrypted and then used for decryption and re-
encryption of the database files in the SQL Server Database Engine process. Transparent data encryption performs
real-time I/O encryption and decryption of the data at the page level. Each page is decrypted when it's read into
memory and then encrypted before being written to disk. For a general description of transparent data encryption,
see Transparent data encryption.
SQL Server running on an Azure virtual machine also can use an asymmetric key from Key Vault. The
configuration steps are different from using an asymmetric key in SQL Database and SQL Managed Instance. For
more information, see Extensible key management by using Azure Key Vault (SQL Server).
When you export a transparent data encryption-protected database, the exported content of the database isn't
encrypted. This exported content is stored in un-encrypted BACPAC files. Be sure to protect the BACPAC files
appropriately and enable transparent data encryption after import of the new database is finished.
For example, if the BACPAC file is exported from an on-premises SQL Server instance, the imported content of the
new database isn't automatically encrypted. Likewise, if the BACPAC file is exported to an on-premises SQL Server
instance, the new database also isn't automatically encrypted.
The one exception is when you export to and from a SQL database. Transparent data encryption is enabled in the
new database, but the BACPAC file itself still isn't encrypted.
Transparent Data Encryption (TDE ) with Azure Key Vault integration allows to encrypt the Database Encryption
Key (DEK) with a customer-managed asymmetric key called TDE Protector. This is also generally referred to as
Bring Your Own Key (BYOK) support for Transparent Data Encryption. In the BYOK scenario, the TDE Protector is
stored in a customer-owned and managed Azure Key Vault, Azure’s cloud-based external key management
system. The TDE Protector can be generated by the key vault or transferred to the key vault from an on-prem
HSM device. The TDE DEK, which is stored on the boot page of a database, is encrypted and decrypted by the
TDE Protector stored in Azure Key Vault, which it never leaves. SQL Database needs to be granted permissions to
the customer-owned key vault to decrypt and encrypt the DEK. If permissions of the logical SQL server to the key
vault are revoked, a database will be inaccessible, connections will be denied and all data is encrypted. For Azure
SQL Database, the TDE protector is set at the logical SQL server level and is inherited by all databases associated
with that server. For Azure SQL Managed Instance, the TDE protector is set at the instance level and it is inherited
by all encrypted databases on that instance. The term server refers both to server and instance throughout this
document, unless stated differently.
NOTE
Transparent Data Encryption with Azure Key Vault integration (Bring Your Own Key) for Azure SQL Database Managed
Instance is in preview.
With TDE with Azure Key Vault integration, users can control key management tasks including key rotations, key
vault permissions, key backups, and enable auditing/reporting on all TDE protectors using Azure Key Vault
functionality. Key Vault provides central key management, leverages tightly monitored hardware security modules
(HSMs), and enables separation of duties between management of keys and data to help meet compliance with
security policies.
TDE with Azure Key Vault integration provides the following benefits:
Increased transparency and granular control with the ability to self-manage the TDE protector
Ability to revoke permissions at any time to render database inaccessible
Central management of TDE protectors (along with other keys and secrets used in other Azure services) by
hosting them in Key Vault
Separation of key and data management responsibilities within the organization, to support separation of
duties
Greater trust from your own clients, since Key Vault is designed so that Microsoft does not see or extract any
encryption keys.
Support for key rotation
IMPORTANT
For those using service-managed TDE who would like to start using Key Vault, TDE remains enabled during the process of
switching over to a TDE protector in Key Vault. There is no downtime nor re-encryption of the database files. Switching from
a service-managed key to a Key Vault key only requires re-encryption of the database encryption key (DEK), which is a fast
and online operation.
How does TDE with Azure Key Vault integration support work
NOTE
This article has been updated to use the new Azure PowerShell Az module. You can still use the AzureRM module, which will
continue to receive bug fixes until at least December 2020. To learn more about the new Az module and AzureRM
compatibility, see Introducing the new Azure PowerShell Az module. For Az module installation instructions, see Install Azure
PowerShell.
IMPORTANT
The PowerShell Azure Resource Manager module is still supported by Azure SQL Database, but all future development is for
the Az.Sql module. For these cmdlets, see AzureRM.Sql. The arguments for the commands in the Az module and in the
AzureRm modules are substantially identical.
When TDE is first configured to use a TDE protector from Key Vault, the server sends the DEK of each TDE -
enabled database to Key Vault for a wrap key request. Key Vault returns the encrypted database encryption key,
which is stored in the user database.
IMPORTANT
It is important to note that once a TDE Protector is stored in Azure Key Vault, it never leaves the Azure Key Vault.
The server can only send key operation requests to the TDE protector key material within Key Vault, and never accesses or
caches the TDE protector. The Key Vault administrator has the right to revoke Key Vault permissions of the server at any
point, in which case all connections to the database are denied.
Guidelines for configuring TDE with Azure Key Vault
General Guidelines
Ensure Azure Key Vault and Azure SQL Database/Managed Instance are going to be in the same tenant. Cross-
tenant key vault and server interactions are not supported.
If you are planning a tenant move, TDE with AKV will have to be reconfigured, learn more about moving
resources.
When configuring TDE with Azure Key Vault, it is important to consider the load placed on the key vault by
repeated wrap/unwrap operations. For example, since all databases associated with a SQL Database server use
the same TDE protector, a failover of that server will trigger as many key operations against the vault as there
are databases in the server. Based on our experience and documented key vault service limits, we recommend
associating at most 500 Standard / General Purpose or 200 Premium / Business Critical databases with one
Azure Key Vault in a single subscription to ensure consistently high availability when accessing the TDE
protector in the vault.
Recommended: Keep a copy of the TDE protector on premises. This requires an HSM device to create a TDE
Protector locally and a key escrow system to store a local copy of the TDE Protector. Learn how to transfer a
key from a local HSM to Azure Key Vault.
Guidelines for configuring Azure Key Vault
Create a key vault with soft-delete and purge protection enabled to protect from data loss in case of
accidental key – or key vault – deletion. You must enable “soft-delete” property on the key vault via CLI or
Powershell (this option is not available from the AKV Portal yet – but required by Azure SQL ):
Soft deleted resources are retained for a set period of time, 90 days unless they are recovered or purged.
The recover and purge actions have their own permissions associated in a key vault access policy.
Set a resource lock on the key vault to control who can delete this critical resource and help to prevent
accidental or unauthorized deletion. Learn more about resource locks
Grant the SQL Database server access to the key vault using its Azure Active Directory (Azure AD ) Identity.
When using the Portal UI, the Azure AD identity gets automatically created and the key vault access
permissions are granted to the server. Using PowerShell to configure TDE with BYOK, the Azure AD
identity must be created and completion should be verified. See Configure TDE with BYOK and Configure
TDE with BYOK for Managed Instance for detailed step-by-step instructions when using PowerShell.
NOTE
If the Azure AD Identity is accidentally deleted or the server’s permissions are revoked using the key vault’s
access policy or inadvertently by moving the server to a different tenant, the server loses access to the key vault,
and TDE encrypted databases will be inaccessible and logons are denied until the logical server’s Azure AD Identity
and permissions have been restored.
When using firewalls and virtual networks with Azure Key Vault, you must allow trusted Microsoft services
to bypass this firewall. Choose YES.
NOTE
If TDE encrypted SQL databases lose access to the key vault because they cannot bypass the firewall, the databases
will be inaccessible, and logons are denied until firewall bypass permissions have been restored.
Enable auditing and reporting on all encryption keys: Key Vault provides logs that are easy to inject into
other security information and event management (SIEM ) tools. Operations Management Suite (OMS )
Log Analytics is one example of a service that is already integrated.
To ensure high availability of encrypted databases, configure each SQL Database server with two Azure
Key Vaults that reside in different regions.
Guidelines for configuring the TDE Protector (asymmetric key)
Create your encryption key locally on a local HSM device. Ensure this is an asymmetric, RSA 2048 or RSA
HSM 2048 key so it is storable in Azure Key Vault.
Escrow the key in a key escrow system.
Import the encryption key file (.pfx, .byok, or .backup) to Azure Key Vault.
NOTE
For testing purposes, it is possible to create a key with Azure Key Vault, however this key cannot be escrowed,
because the private key can never leave the key vault. Always back up and escrow keys used to encrypt production
data, as the loss of the key (accidental deletion in key vault, expiration etc.) results in permanent data loss.
If you use a key with an expiration date – implement an expiration warning system to rotate the key before
it expires: once the key expires, the encrypted databases lose access to their TDE Protector and
will be inaccessible and all logons will be denied until the key has been rotated to a new key and selected
as the new key and default TDE Protector for the logical SQL server.
Ensure the key is enabled and has permissions to perform get, wrap key, and unwrap key operations.
Create an Azure Key Vault key backup before using the key in Azure Key Vault for the first time. Learn more
about the Backup-AzKeyVaultKey command.
Create a new backup whenever any changes are made to the key (for example, add ACLs, add tags, add key
attributes).
Keep previous versions of the key in the key vault when rotating keys, so older database backups can be
restored. When the TDE Protector is changed for a database, old backups of the database are not updated
to use the latest TDE Protector. Each backup needs the TDE Protector it was created with at restore time.
Key rotations can be performed following the instructions at Rotate the Transparent Data Encryption
Protector Using PowerShell.
Keep all previously used keys in Azure Key Vault after changing back to service-managed keys. This ensures
database backups can be restored with the TDE protectors stored in Azure Key Vault. TDE protectors
created with Azure Key Vault have to be maintained until all stored backups have been created with service-
managed keys.
Make recoverable backup copies of these keys using Backup-AzKeyVaultKey.
To remove a potentially compromised key during a security incident without the risk of data loss, follow the
steps at Remove a potentially compromised key.
Guidelines for monitoring the TDE with Azure Key Vault configuration
If the logical SQL server loses access to the customer-managed TDE protector in Azure Key Vault, the database
will deny all connections and appear inaccessible in the Azure portal. The most common causes for this are:
Key vault accidentally deleted or behind a firewall
Key vault key accidentally deleted, disabled or expired
The logical SQL Server instance AppId accidentally deleted
Key specific permissions for logical SQL Server instance AppId revoked
NOTE
The database will self-heal and become online automatically if the access to the customer-managed TDE Protector is
restored within 48 hours. If the database is inaccessible due to an intermittent networking outage, there is no action
required and the databases will come back online automatically.
For more information about troubleshooting existing configurations see Troubleshoot TDE
To monitor database state and to enable alerting for loss of TDE Protector access, configure the following
Azure features:
Azure Resource Health. An inaccessible database that has lost access to the TDE Protector will show as
"Unavailable" after the first connection to the database has been denied.
Activity Log when access to the TDE protector in the customer-managed key vault fails, entries are
added to the activity log. Creating alerts for these events will enable you to reinstate access as soon as
possible.
Action Groups can be defined to send you notifications and alerts based on your preferences, e.g.
Email/SMS/Push/Voice, Logic App, Webhook, ITSM, or Automation Runbook.
Before enabling TDE with customer managed keys in Azure Key Vault for a SQL Database Geo-DR scenario, it is
important to create and maintain two Azure Key Vaults with identical contents in the same regions that will be
used for SQL Database geo-replication. “Identical contents” specifically means that both key vaults must contain
copies of the same TDE Protector(s) so that both servers have access to the TDE Protectors use by all databases.
Going forward, it is required to keep both key vaults in sync, which means they must contain the same copies of
TDE Protectors after key rotation, maintain old versions of keys used for log files or backups, TDE Protectors must
maintain the same key properties and the key vaults must maintain the same access permissions for SQL.
Follow the steps in Active geo-replication overview to test and trigger a failover, which should be done on a
regular basis to confirm the access permissions for SQL to both key vaults have been maintained.
Backup and Restore
Once a database is encrypted with TDE using a key from Key Vault, any generated backups are also encrypted
with the same TDE Protector.
To restore a backup encrypted with a TDE Protector from Key Vault, make sure that the key material is still in the
original vault under the original key name. When the TDE Protector is changed for a database, old backups of the
database are not updated to use the latest TDE Protector. Therefore, we recommend that you keep all old
versions of the TDE Protector in Key Vault, so database backups can be restored.
If a key that might be needed for restoring a backup is no longer in its original key vault, the following error
message is returned: "Target server <Servername> does not have access to all AKV Uris created between
<Timestamp #1> and <Timestamp #2>. Please retry operation after restoring all AKV Uris."
To mitigate this, run the Get-AzSqlServerKeyVaultKey cmdlet to return the list of keys from Key Vault that were
added to the server (unless they were deleted by a user). To ensure all backups can be restored, make sure the
target server for the backup has access to all of these keys.
Get-AzSqlServerKeyVaultKey `
-ServerName <LogicalServerName> `
-ResourceGroup <SQLDatabaseResourceGroupName>
To learn more about backup recovery for SQL Database, see Recover an Azure SQL database. To learn more
about backup recovery for SQL Data Warehouse, see Recover an Azure SQL Data Warehouse.
Additional consideration for backed up log files: Backed up log files remain encrypted with the original TDE
Encryptor, even if the TDE Protector was rotated and the database is now using a new TDE Protector. At restore
time, both keys will be needed to restore the database. If the log file is using a TDE Protector stored in Azure Key
Vault, this key will be needed at restore time, even if the database has been changed to use service-managed TDE
in the meantime.
Designing a PolyBase data loading strategy for Azure
SQL Data Warehouse
7/28/2019 • 6 minutes to read • Edit Online
Traditional SMP data warehouses use an Extract, Transform and Load (ETL ) process for loading data. Azure SQL
Data Warehouse is a massively parallel processing (MPP ) architecture that takes advantage of the scalability and
flexibility of compute and storage resources. Utilizing an Extract, Load, and Transform (ELT) process can take
advantage of MPP and eliminate resources needed to transform the data prior to loading. While SQL Data
Warehouse supports many loading methods including non-Polybase options such as BCP and SQL BulkCopy
API, the fastest and most scalable way to load date is through PolyBase. PolyBase is a technology that accesses
external data stored in Azure Blob storage or Azure Data Lake Store via the T-SQL language.
What is ELT?
Extract, Load, and Transform (ELT) is a process by which data is extracted from a source system, loaded into a data
warehouse and then transformed.
The basic steps for implementing a PolyBase ELT for SQL Data Warehouse are:
1. Extract the source data into text files.
2. Land the data into Azure Blob storage or Azure Data Lake Store.
3. Prepare the data for loading.
4. Load the data into SQL Data Warehouse staging tables using PolyBase.
5. Transform the data.
6. Insert the data into production tables.
For a loading tutorial, see Use PolyBase to load data from Azure blob storage to Azure SQL Data Warehouse.
For more information, see Loading patterns blog.
tinyint tinyint
smallint smallint
PARQUET DATA TYPE SQL DATA TYPE
int int
bigint bigint
boolean bit
double float
float real
double money
double smallmoney
string nchar
string nvarchar
string char
string varchar
binary binary
binary varbinary
timestamp date
timestamp smalldatetime
timestamp datetime2
timestamp datetime
timestamp time
date date
decimal decimal
2. Land the data into Azure Blob storage or Azure Data Lake Store
To land the data in Azure storage, you can move it to Azure Blob storage or Azure Data Lake Store. In either
location, the data should be stored in text files. PolyBase can load from either location.
Tools and services you can use to move data to Azure Storage:
Azure ExpressRoute service enhances network throughput, performance, and predictability. ExpressRoute is a
service that routes your data through a dedicated private connection to Azure. ExpressRoute connections do
not route data through the public internet. The connections offer more reliability, faster speeds, lower latencies,
and higher security than typical connections over the public internet.
AZCopy utility moves data to Azure Storage over the public internet. This works if your data sizes are less than
10 TB. To perform loads on a regular basis with AZCopy, test the network speed to see if it is acceptable.
Azure Data Factory (ADF ) has a gateway that you can install on your local server. Then you can create a
pipeline to move data from your local server up to Azure Storage. To use Data Factory with SQL Data
Warehouse, see Load data into SQL Data Warehouse.
4. Load the data into SQL Data Warehouse staging tables using
PolyBase
It is best practice to load data into a staging table. Staging tables allow you to handle errors without interfering
with the production tables. A staging table also gives you the opportunity to use SQL Data Warehouse MPP for
data transformations before inserting the data into production tables.
Options for loading with PolyBase
To load data with PolyBase, you can use any of these loading options:
PolyBase with T-SQL works well when your data is in Azure Blob storage or Azure Data Lake Store. It gives
you the most control over the loading process, but also requires you to define external data objects. The other
methods define these objects behind the scenes as you map source tables to destination tables. To orchestrate
T-SQL loads, you can use Azure Data Factory, SSIS, or Azure functions.
PolyBase with SSIS works well when your source data is in SQL Server, either SQL Server on-premises or in
the cloud. SSIS defines the source to destination table mappings, and also orchestrates the load. If you already
have SSIS packages, you can modify the packages to work with the new data warehouse destination.
PolyBase with Azure Data Factory (ADF ) is another orchestration tool. It defines a pipeline and schedules jobs.
PolyBase with Azure DataBricks transfers data from a SQL Data Warehouse table to a Databricks dataframe
and/or writes data from a Databricks dataframe to a SQL Data Warehouse table using PolyBase.
Non-PolyBase loading options
If your data is not compatible with PolyBase, you can use bcp or the SQLBulkCopy API. bcp loads directly to SQL
Data Warehouse without going through Azure Blob storage, and is intended only for small loads. Note, the load
performance of these options is significantly slower than PolyBase.
Next steps
For loading guidance, see Guidance for load data.
Best practices for loading data into Azure SQL Data
Warehouse
8/8/2019 • 6 minutes to read • Edit Online
Recommendations and performance optimizations for loading data into Azure SQL Data Warehouse.
-- Connect to master
CREATE LOGIN LoaderRC20 WITH PASSWORD = 'a123STRONGpassword!';
Connect to the data warehouse and create a user. The following code assumes you are connected to the database
called mySampleDataWarehouse. It shows how to create a user called LoaderRC20, give the user control
permission on a database. It then adds the user as a member of the staticrc20 database role.
To run a load with resources for the staticRC20 resource classes, sign in as LoaderRC20 and run the load.
Run loads under static rather than dynamic resource classes. Using the static resource classes guarantees the
same resources regardless of your data warehouse units. If you use a dynamic resource class, the resources vary
according to your service level. For dynamic classes, a lower service level means you probably need to use a
larger resource class for your loading user.
User_A and user_B are now locked out from the other dept’s schema.
CREATE DATABASE SCOPED CREDENTIAL my_credential WITH IDENTITY = 'my_identity', SECRET = 'key1'
ALTER DATABASE SCOPED CREDENTIAL my_credential WITH IDENTITY = 'my_identity', SECRET = 'key2'
Rowgroup quality is determined by the number of rows in a rowgroup. Increasing the available memory can
maximize the number of rows a columnstore index compresses into each rowgroup. Use these methods to
improve compression rates and query performance for columnstore indexes.
The trim_reason_desc tells whether the rowgroup was trimmed(trim_reason_desc = NO_TRIM implies there was
no trimming and row group is of optimal quality). The following trim reasons indicate premature trimming of the
rowgroup:
BULKLOAD: This trim reason is used when the incoming batch of rows for the load had less than 1 million
rows. The engine will create compressed row groups if there are greater than 100,000 rows being inserted (as
opposed to inserting into the delta store) but sets the trim reason to BULKLOAD. In this scenario, consider
increasing your batch load to include more rows. Also, reevaluate your partitioning scheme to ensure it is not
too granular as row groups cannot span partition boundaries.
MEMORY_LIMITATION: To create row groups with 1 million rows, a certain amount of working memory is
required by the engine. When available memory of the loading session is less than the required working
memory, row groups get prematurely trimmed. The following sections explain how to estimate memory
required and allocate more memory.
DICTIONARY_SIZE: This trim reason indicates that rowgroup trimming occurred because there was at least
one string column with wide and/or high cardinality strings. The dictionary size is limited to 16 MB in memory
and once this limit is reached the row group is compressed. If you do run into this situation, consider isolating
the problematic column into a separate table.
Next steps
To find more ways to improve performance in SQL Data Warehouse, see the Performance overview.
Design decisions and coding techniques for SQL
Data Warehouse
7/24/2019 • 2 minutes to read • Edit Online
Take a look through these development articles to better understand key design decisions, recommendations,
and coding techniques for SQL Data Warehouse.
Next steps
For more reference information, see SQL Data Warehouse T-SQL statements.
Development best practices for Azure SQL Data
Warehouse
7/24/2019 • 6 minutes to read • Edit Online
This article describes guidance and best practices as you develop your data warehouse solution.
Maintain statistics
Ensure you update your statistics daily or after each load. There are always trade-offs between performance and
the cost to create and update statistics. If you find it is taking too long to maintain all of your statistics, you may
want to try to be more selective about which columns have statistics or which columns need frequent updating. For
example, you might want to update date columns, where new values may be added, daily. You will gain the most
benefit by having statistics on columns involved in joins, columns used in the WHERE clause and
columns found in GROUP BY.
See also Manage table statistics, CREATE STATISTICS, UPDATE STATISTICS
Do not over-partition
While partitioning data can be very effective for maintaining your data through partition switching or optimizing
scans by with partition elimination, having too many partitions can slow down your queries. Often a high
granularity partitioning strategy which may work well on SQL Server may not work well on SQL Data Warehouse.
Having too many partitions can also reduce the effectiveness of clustered columnstore indexes if each partition has
fewer than 1 million rows. Keep in mind that behind the scenes, SQL Data Warehouse partitions your data for you
into 60 databases, so if you create a table with 100 partitions, this actually results in 6000 partitions. Each workload
is different so the best advice is to experiment with partitioning to see what works best for your workload. Consider
lower granularity than what may have worked for you in SQL Server. For example, consider using weekly or
monthly partitions rather than daily partitions.
See also Table partitioning
The materialized views in Azure SQL Data Warehouse provide a low maintenance method for complex analytical
queries to get fast performance without any query change. This article discusses the general guidance on using
materialized views.
View definition Stored in Azure data warehouse. Stored in Azure data warehouse.
View content Generated each time when the view is Pre-processed and stored in Azure data
used. warehouse during view creation.
Updated as data is added to the
underlying tables.
Common scenarios
Materialized views are typically used in following scenarios:
Need to improve the performance of complex analytical queries against large data in size
Complex analytical queries typically use more aggregation functions and table joins, causing more compute-heavy
operations such as shuffles and joins in query execution. That's why those queries take longer to complete,
specially on large tables. Users can create materialized views for the data returned from the common computations
of queries, so there's no recomputation needed when this data is needed by queries, allowing lower compute cost
and faster query response.
Need faster performance with no or minimum query changes
Schema and query changes in data warehouses are typically kept to a minimum to support regular ETL operations
and reporting. People can use materialized views for query performance tuning, if the cost incurred by the views
can be offset by the gain in query performance. In comparison to other tuning options such as scaling and statistics
management, it's a much less impactful production change to create and maintain a materialized view and its
potential performance gain is also higher.
Creating or maintaining materialized views does not impact the queries running against the base tables.
The query optimizer can automatically use the deployed materialized views without direct view reference in a
query. This capability reduces the need for query change in performance tuning.
Need different data distribution strategy for faster query performance
Azure data warehouse is a distributed massively parallel processing (MPP ) system. Data in a data warehouse table
is distributed across 60 nodes using one of three distribution strategies (hash, round_robin, or replicated). The data
distribution is specified at the table creation time and stays unchanged until the table is dropped. Materialized view
being a virtual table on disk supports hash and round_robin data distributions. Users can choose a data
distribution that is different from the base tables but optimal for the performance of queries that use the views
most.
Design guidance
Here is the general guidance on using materialized views to improve query performance:
Design for your workload
Before you begin to create materialized views, it's important to have a deep understanding of your workload in
terms of query patterns, importance, frequency, and the size of resulting data.
Users can run EXPLAIN WITH_RECOMMENDATIONS <SQL_statement> for the materialized views
recommended by the query optimizer. Since these recommendations are query-specific, a materialized view that
benefits a single query may not be optimal for other queries in the same workload. Evaluate these
recommendations with your workload needs in mind. The ideal materialized views are those that benefit the
workload's performance.
Be aware of the tradeoff between faster queries and the cost
For each materialized view, there's a data storage cost and a cost for maintaining the view. As data changes in base
tables, the size of the materialized view increases and its physical structure also changes. To avoid query
performance degradation, each materialized view is maintained separately by the data warehouse engine. The
maintenance workload gets higher when the number of materialized views and base table changes increase. Users
should check if the cost incurred from all materialized views can be offset by the query performance gain.
You can run this query for the list of materialized view in a database:
-- Query 1 would benefit from having a materialized view created with this SELECT statement
SELECT A, SUM(B)
FROM T
GROUP BY A
-- Query 2 would benefit from having a materialized view created with this SELECT statement
SELECT C, SUM(D)
FROM T
GROUP BY C
Example
This example uses a TPCDS -like query that finds customers who spend more money via catalog than in stores,
identify the preferred customers and their country of origin. The query involves selecting TOP 100 records from
the UNION of three sub-SELECT statements involving SUM () and GROUP BY.
WITH year_total AS (
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt+ss_ext_sales_price, 0)/2)
year_total
,'s' sale_type
FROM customer
,store_sales
,date_dim
WHERE c_customer_sk = ss_customer_sk
AND ss_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
UNION ALL
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt+cs_ext_sales_price, 0)/2)
year_total
,'c' sale_type
FROM customer
,catalog_sales
,date_dim
WHERE c_customer_sk = cs_bill_customer_sk
AND cs_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
UNION ALL
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt+ws_ext_sales_price, 0)/2)
year_total
,'w' sale_type
FROM customer
,web_sales
,date_dim
WHERE c_customer_sk = ws_bill_customer_sk
AND ws_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
)
SELECT TOP 100
t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_birth_country
FROM year_total t_s_firstyear
,year_total t_s_secyear
,year_total t_c_firstyear
,year_total t_c_secyear
,year_total t_w_firstyear
,year_total t_w_secyear
WHERE t_s_secyear.customer_id = t_s_firstyear.customer_id
AND t_s_firstyear.customer_id = t_c_secyear.customer_id
AND t_s_firstyear.customer_id = t_c_firstyear.customer_id
AND t_s_firstyear.customer_id = t_w_firstyear.customer_id
AND t_s_firstyear.customer_id = t_w_secyear.customer_id
AND t_s_firstyear.sale_type = 's'
AND t_c_firstyear.sale_type = 'c'
AND t_w_firstyear.sale_type = 'w'
AND t_s_secyear.sale_type = 's'
AND t_c_secyear.sale_type = 'c'
AND t_w_secyear.sale_type = 'w'
AND t_s_firstyear.dyear+0 = 1999
AND t_s_secyear.dyear+0 = 1999+1
AND t_c_firstyear.dyear+0 = 1999
AND t_c_secyear.dyear+0 = 1999+1
AND t_c_secyear.dyear+0 = 1999+1
AND t_w_firstyear.dyear+0 = 1999
AND t_w_secyear.dyear+0 = 1999+1
AND t_s_firstyear.year_total > 0
AND t_c_firstyear.year_total > 0
AND t_w_firstyear.year_total > 0
AND CASE WHEN t_c_firstyear.year_total > 0 THEN t_c_secyear.year_total / t_c_firstyear.year_total ELSE NULL
END
> CASE WHEN t_s_firstyear.year_total > 0 THEN t_s_secyear.year_total / t_s_firstyear.year_total ELSE
NULL END
AND CASE WHEN t_c_firstyear.year_total > 0 THEN t_c_secyear.year_total / t_c_firstyear.year_total ELSE NULL
END
> CASE WHEN t_w_firstyear.year_total > 0 THEN t_w_secyear.year_total / t_w_firstyear.year_total ELSE
NULL END
ORDER BY t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_birth_country
OPTION ( LABEL = 'Query04-af359846-253-3');
Check the query's estimated execution plan. There are 18 shuffles and 17 joins operations, which take more time to
execute. Now let's create one materialized view for each of the three sub-SELECT statements.
GO
CREATE materialized view nbViewWS WITH (DISTRIBUTION=HASH(customer_id)) AS
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt+ws_ext_sales_price, 0)/2)
year_total
, count_big(*) AS cb
FROM dbo.customer
,dbo.web_sales
,dbo.date_dim
WHERE c_customer_sk = ws_bill_customer_sk
AND ws_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
Check the execution plan of the original query again. Now the number of joins changes from 17 to 5 and there's no
shuffle anymore. Click the Filter operation icon in the plan, its Output List shows the data is read from the
materialized views instead of base tables.
With materialized views, the same query runs much faster without any code change.
Next steps
For more development tips, see SQL Data Warehouse development overview.
Performance tuning with ordered clustered
columnstore index
10/18/2019 • 6 minutes to read • Edit Online
When users query a columnstore table in Azure SQL Data Warehouse, the optimizer checks the minimum and
maximum values stored in each segment. Segments that are outside the bounds of the query predicate aren't read
from disk to memory. A query can get faster performance if the number of segments to read and their total size are
small.
NOTE
In an ordered CCI table, new data resulting from DML or data loading operations are not automatically sorted. Users can
REBUILD the ordered CCI to sort all data in the table. In Azure SQL Data Warehouse, the columnstore index REBUILD is an
offline operation. For a partitioned table, the REBUILD is done one partition at a time. Data in the partition that is being
rebuilt is "offline" and unavailable until the REBUILD is complete for that partition.
Query performance
A query's performance gain from an ordered CCI depends on the query patterns, the size of data, how well the
data is sorted, the physical structure of segments, and the DWU and resource class chosen for the query execution.
Users should review all these factors before choosing the ordering columns when designing an ordered CCI table.
Queries with all these patterns typically run faster with ordered CCI.
1. The queries have equality, inequality, or range predicates
2. The predicate columns and the ordered CCI columns are the same.
3. The predicate columns are used in the same order as the column ordinal of ordered CCI columns.
In this example, table T1 has a clustered columnstore index ordered in the sequence of Col_C, Col_B, and Col_A.
The performance of query 1 can benefit more from ordered CCI than the other 3 queries.
-- Query #1:
SELECT * FROM T1 WHERE Col_C = 'c' AND Col_B = 'b' AND Col_A = 'a';
-- Query #2
SELECT * FROM T1 WHERE Col_B = 'b' AND Col_C = 'c' AND Col_A = 'a';
-- Query #3
SELECT * FROM T1 WHERE Col_B = 'b' AND Col_A = 'a';
-- Query #4
SELECT * FROM T1 WHERE Col_A = 'a' AND Col_C = 'c';
CREATE TABLE Table1 WITH (DISTRIBUTION = HASH(c1), CLUSTERED COLUMNSTORE INDEX ORDER(c1) )
AS SELECT * FROM ExampleTable
OPTION (MAXDOP 1);
Pre-sort the data by the sort key(s) before loading them into Azure SQL Data Warehouse tables.
Here is an example of an ordered CCI table distribution that has zero segment overlapping following above
recommendations. The ordered CCI table is created in a DWU1000c database via CTAS from a 20GB heap table
using MAXDOP 1 and xlargerc. The CCI is ordered on a BIGINT column with no duplicates.
Examples
A. To check for ordered columns and order ordinal:
B. To change column ordinal, add or remove columns from the order list, or to change from CCI to
ordered CCI:
Next steps
For more development tips, see SQL Data Warehouse development overview.
Performance tuning with result set caching
10/17/2019 • 2 minutes to read • Edit Online
When result set caching is enabled, Azure SQL Data Warehouse automatically caches query results in the user
database for repetitive use. This allows subsequent query executions to get results directly from the persisted cache
so recomputation is not needed. Result set caching improves query performance and reduces compute resource
usage. In addition, queries using cached results set do not use any concurrency slots and thus do not count against
existing concurrency limits. For security, users can only access the cached results if they have the same data access
permissions as the users creating the cached results.
Key commands
Turn ON/OFF result set caching for a user database
Turn ON/OFF result set caching for a session
Check the size of cached result set
Clean up the cache
Next steps
For more development tips, see SQL Data Warehouse development overview.
Designing tables in Azure SQL Data Warehouse
9/26/2019 • 10 minutes to read • Edit Online
Learn key concepts for designing tables in Azure SQL Data Warehouse.
To show the organization of the tables in SQL Data Warehouse, you could use fact, dim, and int as prefixes to the
table names. The following table shows some of the schema and table names for WideWorldImportersDW.
Table persistence
Tables store data either permanently in Azure Storage, temporarily in Azure Storage, or in a data store external to
data warehouse.
Regular table
A regular table stores data in Azure Storage as part of the data warehouse. The table and the data persist
regardless of whether a session is open. This example creates a regular table with two columns.
Temporary table
A temporary table only exists for the duration of the session. You can use a temporary table to prevent other
users from seeing temporary results and also to reduce the need for cleanup. Temporary tables utilize local
storage to offer fast performance. For more information, see Temporary tables.
External table
An external table points to data located in Azure Storage blob or Azure Data Lake Store. When used in
conjunction with the CREATE TABLE AS SELECT statement, selecting from an external table imports data into
SQL Data Warehouse. External tables are therefore useful for loading data. For a loading tutorial, see Use
PolyBase to load data from Azure blob storage.
Data types
SQL Data Warehouse supports the most commonly used data types. For a list of the supported data types, see
data types in CREATE TABLE reference in the CREATE TABLE statement. For guidance on using data types, see
Data types.
Distributed tables
A fundamental feature of SQL Data Warehouse is the way it can store and operate on tables across distributions.
SQL Data Warehouse supports three methods for distributing data, round-robin (default), hash and replicated.
Hash-distributed tables
A hash distributed table distributes rows based on the value in the distribution column. A hash distributed table
is designed to achieve high performance for queries on large tables. There are several factors to consider when
choosing a distribution column.
For more information, see Design guidance for distributed tables.
Replicated tables
A replicated table has a full copy of the table available on every Compute node. Queries run fast on replicated
tables since joins on replicated tables do not require data movement. Replication requires extra storage, though,
and is not practical for large tables.
For more information, see Design guidance for replicated tables.
Round-robin tables
A round-robin table distributes table rows evenly across all distributions. The rows are distributed randomly.
Loading data into a round-robin table is fast. However, queries can require more data movement than the other
distribution methods.
For more information, see Design guidance for distributed tables.
Common distribution methods for tables
The table category often determines which option to choose for distributing the table.
Dimension Use replicated for smaller tables. If tables are too large to
store on each Compute node, use hash-distributed.
Staging Use round-robin for the staging table. The load with CTAS is
fast. Once the data is in the staging table, use
INSERT...SELECT to move the data to production tables.
Table partitions
A partitioned table stores and performs operations on the table rows according to data ranges. For example, a
table could be partitioned by day, month, or year. You can improve query performance through partition
elimination, which limits a query scan to data within a partition. You can also maintain the data through partition
switching. Since the data in SQL Data Warehouse is already distributed, too many partitions can slow query
performance. For more information, see Partitioning guidance. When partition switching into table partitions that
are not empty, consider using the TRUNCATE_TARGET option in your ALTER TABLE statement if the existing
data is to be truncated. The below code switches in the transformed daily data into the SalesFact overwriting any
existing data.
ALTER TABLE SalesFact_DailyFinalLoad SWITCH PARTITION 256 TO SalesFact PARTITION 256 WITH (TRUNCATE_TARGET =
ON);
Columnstore indexes
By default, SQL Data Warehouse stores a table as a clustered columnstore index. This form of data storage
achieves high data compression and query performance on large tables. The clustered columnstore index is
usually the best choice, but in some cases a clustered index or a heap is the appropriate storage structure. A heap
table can be especially useful for loading transient data, such as a staging table which is transformed into a final
table.
For a list of columnstore features, see What's new for columnstore indexes. To improve columnstore index
performance, see Maximizing rowgroup quality for columnstore indexes.
Statistics
The query optimizer uses column-level statistics when it creates the plan for executing a query. To improve query
performance, it's important to have statistics on individual columns, especially columns used in query joins.
Creating statistics happens automatically. However, updating statistics does not happen automatically. Update
statistics after a significant number of rows are added or changed. For example, update statistics after a load. For
more information, see Statistics guidance.
CREATE TABLE Creates an empty table by defining all the table columns and
options.
CREATE EXTERNAL TABLE Creates an external table. The definition of the table is stored
in SQL Data Warehouse. The table data is stored in Azure
Blob storage or Azure Data Lake Store.
CREATE TABLE AS SELECT Populates a new table with the results of a select statement.
The table columns and data types are based on the select
statement results. To import data, this statement can select
from an external table.
CREATE EXTERNAL TABLE AS SELECT Creates a new external table by exporting the results of a
select statement to an external location. The location is either
Azure Blob storage or Azure Data Lake Store.
DBCC PDW_SHOWSPACEUSED('dbo.FactInternetSales');
However, using DBCC commands can be quite limiting. Dynamic management views (DMVs) show more detail
than DBCC commands. Start by creating this view.
SELECT
distribution_policy_name
, SUM(row_count) as table_type_row_count
, SUM(reserved_space_GB) as table_type_reserved_space_GB
, SUM(data_space_GB) as table_type_data_space_GB
, SUM(index_space_GB) as table_type_index_space_GB
, SUM(unused_space_GB) as table_type_unused_space_GB
FROM dbo.vTableSizes
GROUP BY distribution_policy_name
;
SELECT
index_type_desc
, SUM(row_count) as table_type_row_count
, SUM(reserved_space_GB) as table_type_reserved_space_GB
, SUM(data_space_GB) as table_type_data_space_GB
, SUM(index_space_GB) as table_type_index_space_GB
, SUM(unused_space_GB) as table_type_unused_space_GB
FROM dbo.vTableSizes
GROUP BY index_type_desc
;
Next steps
After creating the tables for your data warehouse, the next step is to load data into the table. For a loading
tutorial, see Loading data to SQL Data Warehouse.
CREATE TABLE AS SELECT (CTAS) in Azure SQL
Data Warehouse
10/24/2019 • 9 minutes to read • Edit Online
This article explains the CREATE TABLE AS SELECT (CTAS ) T-SQL statement in Azure SQL Data Warehouse for
developing solutions. The article also provides code examples.
SELECT *
INTO [dbo].[FactInternetSales_new]
FROM [dbo].[FactInternetSales]
SELECT...INTO doesn't allow you to change either the distribution method or the index type as part of the
operation. You create [dbo].[FactInternetSales_new] by using the default distribution type of ROUND_ROBIN,
and the default table structure of CLUSTERED COLUMNSTORE INDEX.
With CTAS, on the other hand, you can specify both the distribution of the table data as well as the table structure
type. To convert the previous example to CTAS:
NOTE
If you're only trying to change the index in your CTAS operation, and the source table is hash distributed, maintain the
same distribution column and data type. This avoids cross-distribution data movement during the operation, which is more
efficient.
Now you want to create a new copy of this table, with a Clustered Columnstore Index , so you can take advantage
of the performance of Clustered Columnstore tables. You also want to distribute this table on ProductKey ,
because you're anticipating joins on this column and want to avoid data movement during joins on ProductKey .
Lastly, you also want to add partitioning on OrderDateKey , so you can quickly delete old data by dropping old
partitions. Here is the CTAS statement, which copies your old table into a new table.
Finally, you can rename your tables, to swap in your new table and then drop your old table.
RENAME OBJECT FactInternetSales TO FactInternetSales_old;
RENAME OBJECT FactInternetSales_new TO FactInternetSales;
TIP
Try to think "CTAS first." Solving a problem by using CTAS is generally a good approach, even if you're writing more data as
a result.
The original query might have looked something like this example:
UPDATE acs
SET [TotalSalesAmount] = [fis].[TotalSalesAmount]
FROM [dbo].[AnnualCategorySales] AS acs
JOIN (
SELECT [EnglishProductCategoryName]
, [CalendarYear]
, SUM([SalesAmount]) AS [TotalSalesAmount]
FROM [dbo].[FactInternetSales] AS s
JOIN [dbo].[DimDate] AS d ON s.[OrderDateKey] = d.[DateKey]
JOIN [dbo].[DimProduct] AS p ON s.[ProductKey] = p.[ProductKey]
JOIN [dbo].[DimProductSubCategory] AS u ON p.[ProductSubcategoryKey] = u.
[ProductSubcategoryKey]
JOIN [dbo].[DimProductCategory] AS c ON u.[ProductCategoryKey] = c.
[ProductCategoryKey]
WHERE [CalendarYear] = 2004
GROUP BY
[EnglishProductCategoryName]
, [CalendarYear]
) AS fis
ON [acs].[EnglishProductCategoryName] = [fis].[EnglishProductCategoryName]
AND [acs].[CalendarYear] = [fis].[CalendarYear];
SQL Data Warehouse doesn't support ANSI joins in the FROM clause of an UPDATE statement, so you can't use
the previous example without modifying it.
You can use a combination of a CTAS and an implicit join to replace the previous example:
You might think you should migrate this code to CTAS, and you'd be correct. However, there's a hidden issue
here.
The following code doesn't yield the same result:
Notice that the column "result" carries forward the data type and nullability values of the expression. Carrying the
data type forward can lead to subtle variances in values if you aren't careful.
Try this example:
SELECT result,result*@d
from result;
SELECT result,result*@d
from ctas_r;
The value stored for result is different. As the persisted value in the result column is used in other expressions, the
error becomes even more significant.
This is important for data migrations. Even though the second query is arguably more accurate, there's a
problem. The data would be different compared to the source system, and that leads to questions of integrity in
the migration. This is one of those rare cases where the "wrong" answer is actually the right one!
The reason we see a disparity between the two results is due to implicit type casting. In the first example, the table
defines the column definition. When the row is inserted, an implicit type conversion occurs. In the second
example, there is no implicit type conversion as the expression defines the data type of the column.
Notice also that the column in the second example has been defined as a NULLable column, whereas in the first
example it has not. When the table was created in the first example, column nullability was explicitly defined. In
the second example, it was left to the expression, and by default would result in a NULL definition.
To resolve these issues, you must explicitly set the type conversion and nullability in the SELECT portion of the
CTAS statement. You can't set these properties in 'CREATE TABLE'. The following example demonstrates how to
fix the code:
NOTE
For the nullability to be correctly set, it's vital to use ISNULL and not COALESCE. COALESCE is not a deterministic function,
and so the result of the expression will always be NULLable. ISNULL is different. It's deterministic. Therefore, when the
second part of the ISNULL function is a constant or a literal, the resulting value will be NOT NULL.
Ensuring the integrity of your calculations is also important for table partition switching. Imagine you have this
table defined as a fact table:
However, the amount field is a calculated expression. It isn't part of the source data.
To create your partitioned dataset, you might want to use the following code:
CREATE TABLE [dbo].[Sales_in]
WITH
( DISTRIBUTION = HASH([product])
, PARTITION ( [date] RANGE RIGHT FOR VALUES
(20000101,20010101
)
)
)
AS
SELECT
[date]
, [product]
, [store]
, [quantity]
, [price]
, [quantity]*[price] AS [amount]
FROM [stg].[source]
OPTION (LABEL = 'CTAS : Partition IN table : Create');
The query would run perfectly well. The problem comes when you try to do the partition switch. The table
definitions don't match. To make the table definitions match, modify the CTAS to add an ISNULL function to
preserve the column's nullability attribute.
You can see that type consistency and maintaining nullability properties on a CTAS is an engineering best
practice. It helps to maintain integrity in your calculations, and also ensures that partition switching is possible.
CTAS is one of the most important statements in SQL Data Warehouse. Make sure you thoroughly understand it.
See the CTAS documentation.
Next steps
For more development tips, see the development overview.
Table data types in Azure SQL Data Warehouse
7/24/2019 • 2 minutes to read • Edit Online
Recommendations for defining table data types in Azure SQL Data Warehouse.
geometry varbinary
geography varbinary
hierarchyid nvarchar(4000)
UNSUPPORTED DATA TYPE WORKAROUND
image varbinary
text varchar
ntext nvarchar
xml varchar
user-defined type Convert back to the native data type when possible.
Next steps
For more information on developing tables, see Table Overview.
Guidance for designing distributed tables in Azure
SQL Data Warehouse
7/24/2019 • 9 minutes to read • Edit Online
Recommendations for designing hash-distributed and round-robin distributed tables in Azure SQL Data
Warehouse.
This article assumes you are familiar with data distribution and data movement concepts in SQL Data
Warehouse. For more information, see Azure SQL Data Warehouse - Massively Parallel Processing (MPP )
architecture.
Since identical values always hash to the same distribution, the data warehouse has built-in knowledge of the
row locations. SQL Data Warehouse uses this knowledge to minimize data movement during queries, which
improves query performance.
Hash-distributed tables work well for large fact tables in a star schema. They can have very large numbers of
rows and still achieve high performance. There are, of course, some design considerations that help you to get
the performance the distributed system is designed to provide. Choosing a good distribution column is one such
consideration that is described in this article.
Consider using a hash-distributed table when:
The table size on disk is more than 2 GB.
The table has frequent insert, update, and delete operations.
Round-robin distributed
A round-robin distributed table distributes table rows evenly across all distributions. The assignment of rows to
distributions is random. Unlike hash-distributed tables, rows with equal values are not guaranteed to be assigned
to the same distribution.
As a result, the system sometimes needs to invoke a data movement operation to better organize your data
before it can resolve a query. This extra step can slow down your queries. For example, joining a round-robin
table usually requires reshuffling the rows, which is a performance hit.
Consider using the round-robin distribution for your table in the following scenarios:
When getting started as a simple starting point since it is the default
If there is no obvious joining key
If there is not good candidate column for hash distributing the table
If the table does not share a common join key with other tables
If the join is less significant than other joins in the query
When the table is a temporary staging table
The tutorial Load New York taxicab data to Azure SQL Data Warehouse gives an example of loading data into a
round-robin staging table.
Choosing a distribution column is an important design decision since the values in this column determine how
the rows are distributed. The best choice depends on several factors, and usually involves tradeoffs. However, if
you don't choose the best column the first time, you can use CREATE TABLE AS SELECT (CTAS ) to re-create the
table with a different distribution column.
Choose a distribution column that does not require updates
You cannot update a distribution column unless you delete the row and insert a new row with the updated
values. Therefore, select a column with static values.
Choose a distribution column with data that distributes evenly
For best performance, all of the distributions should have approximately the same number of rows. When one or
more distributions have a disproportionate number of rows, some distributions finish their portion of a parallel
query before others. Since the query can't complete until all distributions have finished processing, each query is
only as fast as the slowest distribution.
Data skew means the data is not distributed evenly across the distributions
Processing skew means that some distributions take longer than others when running parallel queries. This
can happen when the data is skewed.
To balance the parallel processing, select a distribution column that:
Has many unique values. The column can have some duplicate values. However, all rows with the same
value are assigned to the same distribution. Since there are 60 distributions, the column should have at least
60 unique values. Usually the number of unique values is much greater.
Does not have NULLs, or has only a few NULLs. For an extreme example, if all values in the column are
NULL, all the rows are assigned to the same distribution. As a result, query processing is skewed to one
distribution, and does not benefit from parallel processing.
Is not a date column. All data for the same date lands in the same distribution. If several users are all
filtering on the same date, then only 1 of the 60 distributions do all the processing work.
Choose a distribution column that minimizes data movement
To get the correct query result queries might move data from one Compute node to another. Data movement
commonly happens when queries have joins and aggregations on distributed tables. Choosing a distribution
column that helps minimize data movement is one of the most important strategies for optimizing performance
of your SQL Data Warehouse.
To minimize data movement, select a distribution column that:
Is used in JOIN , GROUP BY , DISTINCT , OVER , and HAVING clauses. When two large fact tables have frequent
joins, query performance improves when you distribute both tables on one of the join columns. When a table
is not used in joins, consider distributing the table on a column that is frequently in the GROUP BY clause.
Is not used in WHERE clauses. This could narrow the query to not run on all the distributions.
Is not a date column. WHERE clauses often filter by date. When this happens, all the processing could run on
only a few distributions.
What to do when none of the columns are a good distribution column
If none of your columns have enough distinct values for a distribution column, you can create a new column as a
composite of one or more values. To avoid data movement during query execution, use the composite
distribution column as a join column in queries.
Once you design a hash-distributed table, the next step is to load data into the table. For loading guidance, see
Loading overview.
select *
from dbo.vTableSizes
where two_part_name in
(
select two_part_name
from dbo.vTableSizes
where row_count > 0
group by two_part_name
having (max(row_count * 1.000) - min(row_count * 1.000))/max(row_count * 1.000) >= .10
)
order by two_part_name, row_count
;
Next steps
To create a distributed table, use one of these statements:
CREATE TABLE (Azure SQL Data Warehouse)
CREATE TABLE AS SELECT (Azure SQL Data Warehouse
Primary key, foreign key, and unique key in Azure
SQL Data Warehouse
9/26/2019 • 3 minutes to read • Edit Online
Learn about table constraints in Azure SQL Data Warehouse, including primary key, foreign key, and unique key.
Table constraints
Azure SQL Data Warehouse supports these table constraints:
PRIMARY KEY is only supported when NONCLUSTERED and NOT ENFORCED are both used.
UNIQUE constraint is only supported with NOT ENFORCED is used.
FOREIGN KEY constraint is not supported in Azure SQL Data Warehouse.
Remarks
Having primary key and/or unique key allows data warehouse engine to generate an optimal execution plan for a
query. All values in a primary key column or a unique constraint column should be unique.
After creating a table with primary key or unique constraint in Azure data warehouse, users need to make sure all
values in those columns are unique. A violation of that may cause the query to return inaccurate result. This
example shows how a query may return inaccurate result if the primary key or unique constraint column includes
duplicate values.
-- Create table t1
CREATE TABLE t1 (a1 INT NOT NULL, b1 INT) WITH (DISTRIBUTION = ROUND_ROBIN)
-- Run this query. No primary key or unique constraint. 4 rows returned. Correct result.
SELECT a1, COUNT(*) AS total FROM t1 GROUP BY a1
/*
a1 total
----------- -----------
1 2
2 1
3 1
4 1
(4 rows affected)
*/
/*
a1 total
a1 total
----------- -----------
2 1
4 1
1 1
3 1
1 1
(5 rows affected)
*/
/*
a1 total
----------- -----------
2 1
4 1
1 1
3 1
1 1
(5 rows affected)
*/
/*
a1 b1
----------- -----------
2 200
3 300
4 400
0 1000
1 100
(5 rows affected)
*/
/*
a1 total
----------- -----------
2 1
3 1
4 1
0 1
1 1
(5 rows affected)
*/
/*
a1 total
----------- -----------
2 1
3 1
4 1
0 1
1 1
(5 rows affected)
*/
Examples
Create a data warehouse table with a primary key:
CREATE TABLE mytable (c1 INT PRIMARY KEY NONCLUSTERED NOT ENFORCED, c2 INT);
Next steps
After creating the tables for your data warehouse, the next step is to load data into the table. For a loading tutorial,
see Loading data to SQL Data Warehouse.
Indexing tables in SQL Data Warehouse
7/24/2019 • 14 minutes to read • Edit Online
Recommendations and examples for indexing tables in Azure SQL Data Warehouse.
Index types
SQL Data Warehouse offers several indexing options including clustered columnstore indexes, clustered indexes
and nonclustered indexes, and a non-index option also known as heap.
To create a table with an index, see the CREATE TABLE (Azure SQL Data Warehouse) documentation.
There are a few scenarios where clustered columnstore may not be a good option:
Columnstore tables do not support varchar(max), nvarchar(max) and varbinary(max). Consider heap or
clustered index instead.
Columnstore tables may be less efficient for transient data. Consider heap and perhaps even temporary
tables.
Small tables with less than 60 million rows. Consider heap tables.
Heap tables
When you are temporarily landing data in SQL Data Warehouse, you may find that using a heap table makes
the overall process faster. This is because loads to heaps are faster than to index tables and in some cases the
subsequent read can be done from cache. If you are loading data only to stage it before running more
transformations, loading the table to heap table is much faster than loading the data to a clustered columnstore
table. In addition, loading data to a temporary table loads faster than loading a table to permanent storage.
For small lookup tables, less than 60 million rows, often heap tables make sense. Cluster columnstore tables
begin to achieve optimal compression once there is more than 60 million rows.
To create a heap table, simply specify HEAP in the WITH clause:
CREATE TABLE myTable
(
id int NOT NULL,
lastName varchar(20),
zipCode varchar(6)
)
WITH ( HEAP );
Now that you have created the view, run this query to identify tables with row groups with less than 100K rows.
Of course, you may want to increase the threshold of 100K if you are looking for more optimal segment quality.
SELECT *
FROM [dbo].[vColumnstoreDensity]
WHERE COMPRESSED_rowgroup_rows_AVG < 100000
OR INVISIBLE_rowgroup_rows_AVG < 100000
Once you have run the query you can begin to look at the data and analyze your results. This table explains what
to look for in your row group analysis.
[table_partition_count] If the table is partitioned, then you may expect to see higher
Open row group counts. Each partition in the distribution
could in theory have an open row group associated with it.
Factor this into your analysis. A small table that has been
partitioned could be optimized by removing the partitioning
altogether as this would improve compression.
[row_count_total] Total row count for the table. For example, you can use this
value to calculate percentage of rows in the compressed
state.
[row_count_per_distribution_MAX] If all rows are evenly distributed this value would be the
target number of rows per distribution. Compare this value
with the compressed_rowgroup_count.
[COMPRESSED_rowgroup_rows_MIN] Use this in conjunction with the AVG and MAX columns to
understand the range of values for the row groups in your
columnstore. A low number over the load threshold
(102,400 per partition aligned distribution) suggests that
optimizations are available in the data load
[COMPRESSED_rowgroup_rows_MAX] As above
[OPEN_rowgroup_count] Open row groups are normal. One would reasonably expect
one OPEN row group per table distribution (60). Excessive
numbers suggest data loading across partitions. Double
check the partitioning strategy to make sure it is sound
[OPEN_rowgroup_rows_MIN] Open groups indicate that data is either being trickle loaded
into the table or that the previous load spilled over
remaining rows into this row group. Use the MIN, MAX, AVG
columns to see how much data is sat in OPEN row groups.
For small tables it could be 100% of all the data! In which
case ALTER INDEX REBUILD to force the data to
columnstore.
[OPEN_rowgroup_rows_MAX] As above
[OPEN_rowgroup_rows_AVG] As above
[CLOSED_rowgroup_count] The number of closed row groups should be low if any are
seen at all. Closed row groups can be converted to
compressed row groups using the ALTER INDEX ...
REORGANIZE command. However, this is not normally
required. Closed groups are automatically converted to
columnstore row groups by the background "tuple mover"
process.
[CLOSED_rowgroup_rows_MIN] Closed row groups should have a very high fill rate. If the fill
rate for a closed row group is low, then further analysis of
the columnstore is required.
[CLOSED_rowgroup_rows_MAX] As above
[CLOSED_rowgroup_rows_AVG] As above
Step 2: Rebuild clustered columnstore indexes with higher resource class user
Sign in as the user from step 1 (e.g. LoadUser), which is now using a higher resource class, and execute the
ALTER INDEX statements. Be sure that this user has ALTER permission to the tables where the index is being
rebuilt. These examples show how to rebuild the entire columnstore index or how to rebuild a single partition.
On large tables, it is more practical to rebuild indexes a single partition at a time.
Alternatively, instead of rebuilding the index, you could copy the table to a new table using CTAS. Which way is
best? For large volumes of data, CTAS is usually faster than ALTER INDEX. For smaller volumes of data, ALTER
INDEX is easier to use and won't require you to swap out the table.
Rebuilding an index in SQL Data Warehouse is an offline operation. For more information about rebuilding
indexes, see the ALTER INDEX REBUILD section in Columnstore Indexes Defragmentation, and ALTER INDEX.
Step 3: Verify clustered columnstore segment quality has improved
Rerun the query which identified table with poor segment quality and verify segment quality has improved. If
segment quality did not improve, it could be that the rows in your table are extra wide. Consider using a higher
resource class or DWU when rebuilding your indexes.
-- Step 1: Select the partition of data and write it out to a new table using CTAS
CREATE TABLE [dbo].[FactInternetSales_20000101_20010101]
WITH ( DISTRIBUTION = HASH([ProductKey])
, CLUSTERED COLUMNSTORE INDEX
, PARTITION ( [OrderDateKey] RANGE RIGHT FOR VALUES
(20000101,20010101
)
)
)
AS
SELECT *
FROM [dbo].[FactInternetSales]
WHERE [OrderDateKey] >= 20000101
AND [OrderDateKey] < 20010101
;
Next steps
For more information about developing tables, see Developing tables.
Using IDENTITY to create surrogate keys in Azure
SQL Data Warehouse
7/26/2019 • 5 minutes to read • Edit Online
Recommendations and examples for using the IDENTITY property to create surrogate keys on tables in Azure
SQL Data Warehouse.
SELECT *
FROM dbo.T1;
DBCC PDW_SHOWSPACEUSED('dbo.T1');
In the preceding example, two rows landed in distribution 1. The first row has the surrogate value of 1 in column
C1 , and the second row has the surrogate value of 61. Both of these values were generated by the IDENTITY
property. However, the allocation of the values is not contiguous. This behavior is by design.
Skewed data
The range of values for the data type are spread evenly across the distributions. If a distributed table suffers from
skewed data, then the range of values available to the datatype can be exhausted prematurely. For example, if all
the data ends up in a single distribution, then effectively the table has access to only one-sixtieth of the values of
the data type. For this reason, the IDENTITY property is limited to INT and BIGINT data types only.
SELECT..INTO
When an existing IDENTITY column is selected into a new table, the new column inherits the IDENTITY property,
unless one of the following conditions is true:
The SELECT statement contains a join.
Multiple SELECT statements are joined by using UNION.
The IDENTITY column is listed more than one time in the SELECT list.
The IDENTITY column is part of an expression.
If any one of these conditions is true, the column is created NOT NULL instead of inheriting the IDENTITY
property.
CREATE TABLE AS SELECT
CREATE TABLE AS SELECT (CTAS ) follows the same SQL Server behavior that's documented for SELECT..INTO.
However, you can't specify an IDENTITY property in the column definition of the CREATE TABLE part of the
statement. You also can't use the IDENTITY function in the SELECT part of the CTAS. To populate a table, you need
to use CREATE TABLE to define the table followed by INSERT..SELECT to populate it.
SELECT *
FROM dbo.T1
;
Loading data
The presence of the IDENTITY property has some implications to your data-loading code. This section highlights
some basic patterns for loading data into tables by using IDENTITY.
To load data into a table and generate a surrogate key by using IDENTITY, create the table and then use
INSERT..SELECT or INSERT..VALUES to perform the load.
The following example highlights the basic pattern:
SELECT *
FROM dbo.T1
;
DBCC PDW_SHOWSPACEUSED('dbo.T1');
NOTE
It's not possible to use CREATE TABLE AS SELECT currently when loading data into a table with an IDENTITY column.
For more information on loading data, see Designing Extract, Load, and Transform (ELT) for Azure SQL Data
Warehouse and Loading best practices.
System views
You can use the sys.identity_columns catalog view to identify a column that has the IDENTITY property.
To help you better understand the database schema, this example shows how to integrate sys.identity_column` with
other system catalog views:
SELECT sm.name
, tb.name
, co.name
, CASE WHEN ic.column_id IS NOT NULL
THEN 1
ELSE 0
END AS is_identity
FROM sys.schemas AS sm
JOIN sys.tables AS tb ON sm.schema_id = tb.schema_id
JOIN sys.columns AS co ON tb.object_id = co.object_id
LEFT JOIN sys.identity_columns AS ic ON co.object_id = ic.object_id
AND co.column_id = ic.column_id
WHERE sm.name = 'dbo'
AND tb.name = 'T1'
;
Limitations
The IDENTITY property can't be used:
When the column data type is not INT or BIGINT
When the column is also the distribution key
When the table is an external table
The following related functions are not supported in SQL Data Warehouse:
IDENTITY ()
@@IDENTITY
SCOPE_IDENTITY
IDENT_CURRENT
IDENT_INCR
IDENT_SEED
Common tasks
This section provides some sample code you can use to perform common tasks when you work with IDENTITY
columns.
Column C1 is the IDENTITY in all the following tasks.
Find the highest allocated value for a table
Use the MAX() function to determine the highest value allocated for a distributed table:
SELECT MAX(C1)
FROM dbo.T1
Next steps
Table overview
CREATE TABLE (Transact-SQL ) IDENTITY (Property)
DBCC CHECKINDENT
Partitioning tables in SQL Data Warehouse
7/24/2019 • 10 minutes to read • Edit Online
Recommendations and examples for using table partitions in Azure SQL Data Warehouse.
Sizing partitions
While partitioning can be used to improve performance some scenarios, creating a table with too many
partitions can hurt performance under some circumstances. These concerns are especially true for clustered
columnstore tables. For partitioning to be helpful, it is important to understand when to use partitioning and the
number of partitions to create. There is no hard fast rule as to how many partitions are too many, it depends on
your data and how many partitions you loading simultaneously. A successful partitioning scheme usually has
tens to hundreds of partitions, not thousands.
When creating partitions on clustered columnstore tables, it is important to consider how many rows belong
to each partition. For optimal compression and performance of clustered columnstore tables, a minimum of 1
million rows per distribution and partition is needed. Before partitions are created, SQL Data Warehouse already
divides each table into 60 distributed databases. Any partitioning added to a table is in addition to the
distributions created behind the scenes. Using this example, if the sales fact table contained 36 monthly
partitions, and given that SQL Data Warehouse has 60 distributions, then the sales fact table should contain 60
million rows per month, or 2.1 billion rows when all months are populated. If a table contains fewer than the
recommended minimum number of rows per partition, consider using fewer partitions in order to increase the
number of rows per partition. For more information, see the Indexing article, which includes queries that can
assess the quality of cluster columnstore indexes.
Partition switching
SQL Data Warehouse supports partition splitting, merging, and switching. Each of these functions is executed
using the ALTER TABLE statement.
To switch partitions between two tables, you must ensure that the partitions align on their respective boundaries
and that the table definitions match. As check constraints are not available to enforce the range of values in a
table, the source table must contain the same partition boundaries as the target table. If the partition boundaries
are not then same, then the partition switch will fail as the partition metadata will not be synchronized.
How to split a partition that contains data
The most efficient method to split a partition that already contains data is to use a CTAS statement. If the
partitioned table is a clustered columnstore, then the table partition must be empty before it can be split.
The following example creates a partitioned columnstore table. It inserts one row into each partition:
CREATE TABLE [dbo].[FactInternetSales]
(
[ProductKey] int NOT NULL
, [OrderDateKey] int NOT NULL
, [CustomerKey] int NOT NULL
, [PromotionKey] int NOT NULL
, [SalesOrderNumber] nvarchar(20) NOT NULL
, [OrderQuantity] smallint NOT NULL
, [UnitPrice] money NOT NULL
, [SalesAmount] money NOT NULL
)
WITH
( CLUSTERED COLUMNSTORE INDEX
, DISTRIBUTION = HASH([ProductKey])
, PARTITION ( [OrderDateKey] RANGE RIGHT FOR VALUES
(20000101
)
)
)
;
The following query finds the row count by using the sys.partitions catalog view:
Msg 35346, Level 15, State 1, Line 44 SPLIT clause of ALTER PARTITION statement failed because the partition
is not empty. Only empty partitions can be split in when a columnstore index exists on the table. Consider
disabling the columnstore index before issuing the ALTER PARTITION statement, then rebuilding the
columnstore index after ALTER PARTITION is complete.
However, you can use CTAS to create a new table to hold the data.
CREATE TABLE dbo.FactInternetSales_20000101
WITH ( DISTRIBUTION = HASH(ProductKey)
, CLUSTERED COLUMNSTORE INDEX
, PARTITION ( [OrderDateKey] RANGE RIGHT FOR VALUES
(20000101
)
)
)
AS
SELECT *
FROM FactInternetSales
WHERE 1=2
;
As the partition boundaries are aligned, a switch is permitted. This will leave the source table with an empty
partition that you can subsequently split.
All that is left is to align the data to the new partition boundaries using CTAS , and then switch the data back into
the main table.
Once you have completed the movement of the data, it is a good idea to refresh the statistics on the target table.
Updating statistics ensures the statistics accurately reflect the new distribution of the data in their respective
partitions.
Load new data into partitions that contain data in one step
Loading data into partitions with partition switching is a convenient way stage new data in a table that is not
visible to users the switch in the new data. It can be challenging on busy systems to deal with the locking
contention associated with partition switching. To clear out the existing data in a partition, an ALTER TABLE used
to be required to switch out the data. Then another ALTER TABLE was required to switch in the new data. In SQL
Data Warehouse, the TRUNCATE_TARGET option is supported in the ALTER TABLE command. With TRUNCATE_TARGET
the ALTER TABLE command overwrites existing data in the partition with new data. Below is an example which
uses CTAS to create a new table with the existing data, inserts new data, then switches all the data back into the
target table, overwriting the existing data.
CREATE TABLE [dbo].[FactInternetSales_NewSales]
WITH ( DISTRIBUTION = HASH([ProductKey])
, CLUSTERED COLUMNSTORE INDEX
, PARTITION ( [OrderDateKey] RANGE RIGHT FOR VALUES
(20000101,20010101
)
)
)
AS
SELECT *
FROM [dbo].[FactInternetSales]
WHERE [OrderDateKey] >= 20000101
AND [OrderDateKey] < 20010101
;
WHILE @i <= @c
BEGIN
SET @p = (SELECT ptn_no FROM #partitions WHERE seq_no = @i);
SET @q = (SELECT N'ALTER TABLE '+@s+N'.'+@t+N' SPLIT RANGE ('+@p+N');');
-- PRINT @q;
EXECUTE sp_executesql @q;
SET @i+=1;
END
-- Code clean-up
With this approach the code in source control remains static and the partitioning boundary values are allowed to
be dynamic; evolving with the warehouse over time.
Next steps
For more information about developing tables, see the articles on Table Overview.
Design guidance for using replicated tables in Azure
SQL Data Warehouse
7/24/2019 • 7 minutes to read • Edit Online
This article gives recommendations for designing replicated tables in your SQL Data Warehouse schema. Use
these recommendations to improve query performance by reducing data movement and query complexity.
Prerequisites
This article assumes you are familiar with data distribution and data movement concepts in SQL Data
Warehouse. For more information, see the architecture article.
As part of table design, understand as much as possible about your data and how the data is queried. For
example, consider these questions:
How large is the table?
How often is the table refreshed?
Do I have fact and dimension tables in a data warehouse?
Replicated tables work well for dimension tables in a star schema. Dimension tables are typically joined to fact
tables which are distributed differently than the dimension table. Dimensions are usually of a size that makes it
feasible to store and maintain multiple copies. Dimensions store descriptive data that changes slowly, such as
customer name and address, and product details. The slowly changing nature of the data leads to less
maintenance of the replicated table.
Consider using a replicated table when:
The table size on disk is less than 2 GB, regardless of the number of rows. To find the size of a table, you can
use the DBCC PDW_SHOWSPACEUSED command: DBCC PDW_SHOWSPACEUSED('ReplTableCandidate') .
The table is used in joins that would otherwise require data movement. When joining tables that are not
distributed on the same column, such as a hash-distributed table to a round-robin table, data movement is
required to complete the query. If one of the tables is small, consider a replicated table. We recommend using
replicated tables instead of round-robin tables in most cases. To view data movement operations in query
plans, use sys.dm_pdw_request_steps. The BroadcastMoveOperation is the typical data movement operation
that can be eliminated by using a replicated table.
Replicated tables may not yield the best query performance when:
The table has frequent insert, update, and delete operations. These data manipulation language (DML )
operations require a rebuild of the replicated table. Rebuilding frequently can cause slower performance.
The data warehouse is scaled frequently. Scaling a data warehouse changes the number of Compute nodes,
which incurs rebuilding the replicated table.
The table has a large number of columns, but data operations typically access only a small number of columns.
In this scenario, instead of replicating the entire table, it might be more effective to distribute the table, and
then create an index on the frequently accessed columns. When a query requires data movement, SQL Data
Warehouse only moves data for the requested columns.
SELECT EnglishProductName
FROM DimProduct
WHERE EnglishDescription LIKE '%frame%comfortable%'
We re-created DimDate and DimSalesTerritory as round-robin tables. As a result, the query showed the following
query plan, which has multiple broadcast move operations:
We re-created DimDate and DimSalesTerritory as replicated tables, and ran the query again. The resulting query
plan is much shorter and does not have any broadcast moves.
Performance considerations for modifying replicated tables
SQL Data Warehouse implements a replicated table by maintaining a master version of the table. It copies the
master version to one distribution database on each Compute node. When there is a change, SQL Data
Warehouse first updates the master table. Then it rebuilds the tables on each Compute node. A rebuild of a
replicated table includes copying the table to each Compute node and then building the indexes. For example, a
replicated table on a DW400 has 5 copies of the data. A master copy and a full copy on each Compute node. All
data is stored in distribution databases. SQL Data Warehouse uses this model to support faster data modification
statements and flexible scaling operations.
Rebuilds are required after:
Data is loaded or modified
The data warehouse is scaled to a different level
Table definition is updated
Rebuilds are not required after:
Pause operation
Resume operation
The rebuild does not happen immediately after data is modified. Instead, the rebuild is triggered the first time a
query selects from the table. The query that triggered the rebuild reads immediately from the master version of
the table while the data is asynchronously copied to each Compute node. Until the data copy is complete,
subsequent queries will continue to use the master version of the table. If any activity happens against the
replicated table that forces another rebuild, the data copy is invalidated and the next select statement will trigger
data to be copied again.
Use indexes conservatively
Standard indexing practices apply to replicated tables. SQL Data Warehouse rebuilds each replicated table index
as part of the rebuild. Only use indexes when the performance gain outweighs the cost of rebuilding the indexes.
Batch data loads
When loading data into replicated tables, try to minimize rebuilds by batching loads together. Perform all the
batched loads before running select statements.
For example, this load pattern loads data from four sources and invokes four rebuilds.
Load from source 1.
Select statement triggers rebuild 1.
Load from source 2.
Select statement triggers rebuild 2.
Load from source 3.
Select statement triggers rebuild 3.
Load from source 4.
Select statement triggers rebuild 4.
For example, this load pattern loads data from four sources, but only invokes one rebuild.
Load from source 1.
Load from source 2.
Load from source 3.
Load from source 4.
Select statement triggers rebuild.
Rebuild a replicated table after a batch load
To ensure consistent query execution times, consider forcing the build of the replicated tables after a batch load.
Otherwise, the first query will still use data movement to complete the query.
This query uses the sys.pdw_replicated_table_cache_state DMV to list the replicated tables that have been
modified, but not rebuilt.
To trigger a rebuild, run the following statement on each table in the preceding output.
Next steps
To create a replicated table, use one of these statements:
CREATE TABLE (Azure SQL Data Warehouse)
CREATE TABLE AS SELECT (Azure SQL Data Warehouse)
For an overview of distributed tables, see distributed tables.
Table statistics in Azure SQL Data Warehouse
7/24/2019 • 14 minutes to read • Edit Online
Recommendations and examples for creating and updating query-optimization statistics on tables in Azure SQL
Data Warehouse.
If your data warehouse does not have AUTO_CREATE_STATISTICS configured, we recommend you enable this
property by running the following command:
NOTE
Automatic creation of statistics are not created on temporary or external tables.
Automatic creation of statistics is done synchronously so you may incur slightly degraded query performance if
your columns are missing statistics. The time to create statistics for a single column depends on the size of the
table. To avoid measurable performance degradation, especially in performance benchmarking, you should
ensure stats have been created first by executing the benchmark workload before profiling the system.
NOTE
The creation of stats will be logged in sys.dm_pdw_exec_requests under a different user context.
When automatic statistics are created, they will take the form: WA_Sys<8 digit column id in Hex>_<8 digit table
id in Hex>. You can view stats that have already been created by running the DBCC SHOW_STATISTICS
command:
The table_name is the name of the table that contains the statistics to display. This cannot be an external table.
The target is the name of the target index, statistics, or column for which to display statistics information.
Updating statistics
One best practice is to update statistics on date columns each day as new dates are added. Each time new rows
are loaded into the data warehouse, new load dates or transaction dates are added. These change the data
distribution and make the statistics out of date. Conversely, statistics on a country/region column in a customer
table might never need to be updated, because the distribution of values doesn’t generally change. Assuming the
distribution is constant between customers, adding new rows to the table variation isn't going to change the data
distribution. However, if your data warehouse only contains one country/region and you bring in data from a
new country/region, resulting in data from multiple countries/regions being stored, then you need to update
statistics on the country/region column.
The following are recommendations updating statistics:
Sampling Less than 1 billion rows, use default sampling (20 percent).
With more than 1 billion rows, use sampling of two percent.
One of the first questions to ask when you're troubleshooting a query is, "Are the statistics up to date?"
This question is not one that can be answered by the age of the data. An up-to-date statistics object might be old
if there's been no material change to the underlying data. When the number of rows has changed substantially,
or there is a material change in the distribution of values for a column, then it's time to update statistics.
There is no dynamic management view to determine if data within the table has changed since the last time
statistics were updated. Knowing the age of your statistics can provide you with part of the picture. You can use
the following query to determine the last time your statistics were updated on each table.
NOTE
If there is a material change in the distribution of values for a column, you should update statistics regardless of the last
time they were updated.
SELECT
sm.[name] AS [schema_name],
tb.[name] AS [table_name],
co.[name] AS [stats_column_name],
st.[name] AS [stats_name],
STATS_DATE(st.[object_id],st.[stats_id]) AS [stats_last_updated_date]
FROM
sys.objects ob
JOIN sys.stats st
ON ob.[object_id] = st.[object_id]
JOIN sys.stats_columns sc
ON st.[stats_id] = sc.[stats_id]
AND st.[object_id] = sc.[object_id]
JOIN sys.columns co
ON sc.[column_id] = co.[column_id]
AND sc.[object_id] = co.[object_id]
JOIN sys.types ty
ON co.[user_type_id] = ty.[user_type_id]
JOIN sys.tables tb
ON co.[object_id] = tb.[object_id]
JOIN sys.schemas sm
ON tb.[schema_id] = sm.[schema_id]
WHERE
st.[user_created] = 1;
Date columns in a data warehouse, for example, usually need frequent statistics updates. Each time new rows
are loaded into the data warehouse, new load dates or transaction dates are added. These change the data
distribution and make the statistics out of date. Conversely, statistics on a gender column in a customer table
might never need to be updated. Assuming the distribution is constant between customers, adding new rows to
the table variation isn't going to change the data distribution. However, if your data warehouse contains only one
gender and a new requirement results in multiple genders, then you need to update statistics on the gender
column.
For more information, see general guidance for Statistics.
For example:
For example:
CREATE STATISTICS stats_col1 ON table1(col1) WHERE col1 > '2000101' AND col1 < '20001231';
NOTE
For the query optimizer to consider using filtered statistics when it chooses the distributed query plan, the query must fit
inside the definition of the statistics object. Using the previous example, the query's WHERE clause needs to specify col1
values between 2000101 and 20001231.
NOTE
The histogram, which is used to estimate the number of rows in the query result, is only available for the first column listed
in the statistics object definition.
IF @create_type IS NULL
BEGIN
SET @create_type = 1;
SET @create_type = 1;
END;
IF @sample_pct IS NULL
BEGIN;
SET @sample_pct = 20;
END;
DECLARE @i INT = 1
, @t INT = (SELECT COUNT(*) FROM #stats_ddl)
, @s NVARCHAR(4000) = N''
, @s NVARCHAR(4000) = N''
;
WHILE @i <= @t
BEGIN
SET @s=(SELECT create_stat_ddl FROM #stats_ddl WHERE seq_nmbr = @i);
PRINT @s
EXEC sp_executesql @s
SET @i+=1;
END
To create statistics on all columns in the table using the defaults, execute the stored procedure.
To create statistics on all columns in the table using a fullscan, call this procedure:
To create sampled statistics on all columns in the table, enter 3, and the sample percent. This procedures uses a
20 percent sample rate.
For example:
By updating specific statistics objects, you can minimize the time and resources required to manage statistics.
This requires some thought to choose the best statistics objects to update.
Update all statistics on a table
A simple method for updating all the statistics objects on a table is:
For example:
UPDATE STATISTICS dbo.table1;
The UPDATE STATISTICS statement is easy to use. Just remember that it updates all statistics on the table, and
therefore might perform more work than is necessary. If performance is not an issue, this is the easiest and most
complete way to guarantee that statistics are up to date.
NOTE
When updating all statistics on a table, SQL Data Warehouse does a scan to sample the table for each statistics object. If
the table is large and has many columns and many statistics, it might be more efficient to update individual statistics based
on need.
For an implementation of an UPDATE STATISTICS procedure, see Temporary Tables. The implementation method
is slightly different from the preceding CREATE STATISTICS procedure, but the result is the same.
For the full syntax, see Update Statistics.
Statistics metadata
There are several system views and functions that you can use to find information about statistics. For example,
you can see if a statistics object might be out of date by using the stats-date function to see when statistics were
last created or updated.
Catalog views for statistics
These system views provide information about statistics:
sys.stats_columns One row for each column in the statistics object. Links back
to sys.columns.
DBCC SHOW_STATISTICS([<schema_name>.<table_name>],<stats_name>)
For example:
For example:
Next steps
For further improve query performance, see Monitor your workload
Temporary tables in SQL Data Warehouse
7/24/2019 • 4 minutes to read • Edit Online
This article contains essential guidance for using temporary tables and highlights the principles of session level
temporary tables. Using the information in this article can help you modularize your code, improving both
reusability and ease of maintenance of your code.
Temporary tables can also be created with a CTAS using exactly the same approach:
CREATE TABLE #stats_ddl
WITH
(
DISTRIBUTION = HASH([seq_nmbr])
, HEAP
)
AS
(
SELECT
sm.[name] AS [schema_name]
, tb.[name] AS [table_name]
, st.[name] AS [stats_name]
, st.[has_filter] AS [stats_is_filtered]
, ROW_NUMBER()
OVER(ORDER BY (SELECT NULL)) AS [seq_nmbr]
, QUOTENAME(sm.[name])+'.'+QUOTENAME(tb.[name]) AS [two_part_name]
, QUOTENAME(DB_NAME())+'.'+QUOTENAME(sm.[name])+'.'+QUOTENAME(tb.[name]) AS [three_part_name]
FROM sys.objects AS ob
JOIN sys.stats AS st ON ob.[object_id] = st.[object_id]
JOIN sys.stats_columns AS sc ON st.[stats_id] = sc.[stats_id]
AND st.[object_id] = sc.[object_id]
JOIN sys.columns AS co ON sc.[column_id] = co.[column_id]
AND sc.[object_id] = co.[object_id]
JOIN sys.tables AS tb ON co.[object_id] = tb.[object_id]
JOIN sys.schemas AS sm ON tb.[schema_id] = sm.[schema_id]
WHERE 1=1
AND st.[user_created] = 1
GROUP BY
sm.[name]
, tb.[name]
, st.[name]
, st.[filter_definition]
, st.[has_filter]
)
;
NOTE
CTAS is a powerful command and has the added advantage of being efficient in its use of transaction log space.
For coding consistency, it is a good practice to use this pattern for both tables and temporary tables. It is also a
good idea to use DROP TABLE to remove temporary tables when you have finished with them in your code. In
stored procedure development, it is common to see the drop commands bundled together at the end of a
procedure to ensure these objects are cleaned up.
IF @sample_pct IS NULL
BEGIN;
SET @sample_pct = 20;
END;
At this stage, the only action that has occurred is the creation of a stored procedure that generates a temporary
table, #stats_ddl, with DDL statements. This stored procedure drops #stats_ddl if it already exists to ensure it does
not fail if run more than once within a session. However, since there is no DROP TABLE at the end of the stored
procedure, when the stored procedure completes, it leaves the created table so that it can be read outside of the
stored procedure. In SQL Data Warehouse, unlike other SQL Server databases, it is possible to use the
temporary table outside of the procedure that created it. SQL Data Warehouse temporary tables can be used
anywhere inside the session. This can lead to more modular and manageable code as in the following example:
DECLARE @i INT = 1
, @t INT = (SELECT COUNT(*) FROM #stats_ddl)
, @s NVARCHAR(4000) = N''
WHILE @i <= @t
BEGIN
SET @s=(SELECT update_stats_ddl FROM #stats_ddl WHERE seq_nmbr = @i);
PRINT @s
EXEC sp_executesql @s
SET @i+=1;
END
Next steps
To learn more about developing tables, see the Table Overview.
Using T-SQL loops in SQL Data Warehouse
7/24/2019 • 2 minutes to read • Edit Online
Tips for using T-SQL loops and replacing cursors in Azure SQL Data Warehouse for developing solutions.
Next steps
For more development tips, see development overview.
Using stored procedures in SQL Data Warehouse
7/24/2019 • 2 minutes to read • Edit Online
Tips for implementing stored procedures in Azure SQL Data Warehouse for developing solutions.
What to expect
SQL Data Warehouse supports many of the T-SQL features that are used in SQL Server. More importantly, there
are scale-out specific features that you can use to maximize the performance of your solution.
However, to maintain the scale and performance of SQL Data Warehouse there are also some features and
functionality that have behavioral differences and others that are not supported.
EXEC prc_nesting
If the stored procedure also makes another EXEC call, the nest level increases to two.
Note, SQL Data Warehouse does not currently support @@NESTLEVEL. You need to track the nest level. It is
unlikely for you to exceed the eight nest level limit, but if you do, you need to rework your code to fit the nesting
levels within this limit.
INSERT..EXECUTE
SQL Data Warehouse does not permit you to consume the result set of a stored procedure with an INSERT
statement. However, there is an alternative approach you can use. For an example, see the article on temporary
tables.
Limitations
There are some aspects of Transact-SQL stored procedures that are not implemented in SQL Data Warehouse.
They are:
temporary stored procedures
numbered stored procedures
extended stored procedures
CLR stored procedures
encryption option
replication option
table-valued parameters
read-only parameters
default parameters
execution contexts
return statement
Next steps
For more development tips, see development overview.
Using transactions in SQL Data Warehouse
7/24/2019 • 5 minutes to read • Edit Online
Tips for implementing transactions in Azure SQL Data Warehouse for developing solutions.
What to expect
As you would expect, SQL Data Warehouse supports transactions as part of the data warehouse workload.
However, to ensure the performance of SQL Data Warehouse is maintained at scale some features are limited
when compared to SQL Server. This article highlights the differences and lists the others.
Transaction size
A single data modification transaction is limited in size. The limit is applied per distribution. Therefore, the total
allocation can be calculated by multiplying the limit by the distribution count. To approximate the maximum
number of rows in the transaction divide the distribution cap by the total size of each row. For variable length
columns, consider taking an average column length rather than using the maximum size.
In the table below the following assumptions have been made:
An even distribution of data has occurred
The average row length is 250 bytes
Gen2
CAP PER MAX
DISTRIBUTION NUMBER OF TRANSACTION # ROWS PER MAX ROWS PER
DWU (GB) DISTRIBUTIONS SIZE (GB) DISTRIBUTION TRANSACTION
Gen1
CAP PER MAX
DISTRIBUTION NUMBER OF TRANSACTION # ROWS PER MAX ROWS PER
DWU (GB) DISTRIBUTIONS SIZE (GB) DISTRIBUTION TRANSACTION
The transaction size limit is applied per transaction or operation. It is not applied across all concurrent
transactions. Therefore each transaction is permitted to write this amount of data to the log.
To optimize and minimize the amount of data written to the log, please refer to the Transactions best practices
article.
WARNING
The maximum transaction size can only be achieved for HASH or ROUND_ROBIN distributed tables where the spread of the
data is even. If the transaction is writing data in a skewed fashion to the distributions then the limit is likely to be reached
prior to the maximum transaction size.
Transaction state
SQL Data Warehouse uses the XACT_STATE () function to report a failed transaction using the value -2. This value
means the transaction has failed and is marked for rollback only.
NOTE
The use of -2 by the XACT_STATE function to denote a failed transaction represents different behavior to SQL Server. SQL
Server uses the value -1 to represent an uncommittable transaction. SQL Server can tolerate some errors inside a
transaction without it having to be marked as uncommittable. For example SELECT 1/0 would cause an error but not force
a transaction into an uncommittable state. SQL Server also permits reads in the uncommittable transaction. However, SQL
Data Warehouse does not let you do this. If an error occurs inside a SQL Data Warehouse transaction it will automatically
enter the -2 state and you will not be able to make any further select statements until the statement has been rolled back.
It is therefore important to check that your application code to see if it uses XACT_STATE() as you may need to make code
modifications.
For example, in SQL Server you might see a transaction that looks like the following:
SET NOCOUNT ON;
DECLARE @xact_state smallint = 0;
BEGIN TRAN
BEGIN TRY
DECLARE @i INT;
SET @i = CONVERT(INT,'ABC');
END TRY
BEGIN CATCH
SET @xact_state = XACT_STATE();
IF @@TRANCOUNT > 0
BEGIN
ROLLBACK TRAN;
PRINT 'ROLLBACK';
END
END CATCH;
IF @@TRANCOUNT >0
BEGIN
PRINT 'COMMIT';
COMMIT TRAN;
END
BEGIN TRAN
BEGIN TRY
DECLARE @i INT;
SET @i = CONVERT(INT,'ABC');
END TRY
BEGIN CATCH
SET @xact_state = XACT_STATE();
IF @@TRANCOUNT > 0
BEGIN
PRINT 'ROLLBACK';
ROLLBACK TRAN;
END
IF @@TRANCOUNT >0
BEGIN
PRINT 'COMMIT';
COMMIT TRAN;
END
The expected behavior is now observed. The error in the transaction is managed and the ERROR_* functions
provide values as expected.
All that has changed is that the ROLLBACK of the transaction had to happen before the read of the error
information in the CATCH block.
Error_Line() function
It is also worth noting that SQL Data Warehouse does not implement or support the ERROR_LINE () function. If
you have this in your code, you need to remove it to be compliant with SQL Data Warehouse. Use query labels in
your code instead to implement equivalent functionality. For more details, see the LABEL article.
Limitations
SQL Data Warehouse does have a few other restrictions that relate to transactions.
They are as follows:
No distributed transactions
No nested transactions permitted
No save points allowed
No named transactions
No marked transactions
No support for DDL such as CREATE TABLE inside a user-defined transaction
Next steps
To learn more about optimizing transactions, see Transactions best practices. To learn about other SQL Data
Warehouse best practices, see SQL Data Warehouse best practices.
Optimizing transactions in Azure SQL Data
Warehouse
7/24/2019 • 9 minutes to read • Edit Online
Learn how to optimize the performance of your transactional code in Azure SQL Data Warehouse while
minimizing risk for long rollbacks.
NOTE
Minimally logged operations can participate in explicit transactions. As all changes in allocation structures are tracked, it is
possible to roll back minimally logged operations.
NOTE
Internal data movement operations (such as BROADCAST and SHUFFLE) are not affected by the transaction safety limit.
Clustered Columnstore Index Batch size >= 102,400 per partition Minimal
aligned distribution
Clustered Columnstore Index Batch size < 102,400 per partition Full
aligned distribution
It is worth noting that any writes to update secondary or non-clustered indexes will always be fully logged
operations.
IMPORTANT
SQL Data Warehouse has 60 distributions. Therefore, assuming all rows are evenly distributed and landing in a single
partition, your batch will need to contain 6,144,000 rows or larger to be minimally logged when writing to a Clustered
Columnstore Index. If the table is partitioned and the rows being inserted span partition boundaries, then you will need
6,144,000 rows per partition boundary assuming even data distribution. Each partition in each distribution must
independently exceed the 102,400 row threshold for the insert to be minimally logged into the distribution.
Loading data into a non-empty table with a clustered index can often contain a mixture of fully logged and
minimally logged rows. A clustered index is a balanced tree (b-tree) of pages. If the page being written to already
contains rows from another transaction, then these writes will be fully logged. However, if the page is empty then
the write to that page will be minimally logged.
Optimizing deletes
DELETE is a fully logged operation. If you need to delete a large amount of data in a table or a partition, it often
makes more sense to SELECT the data you wish to keep, which can be run as a minimally logged operation. To
select the data, create a new table with CTAS. Once created, use RENAME to swap out your old table with the
newly created table.
--Step 01. Create a new table select only the records we want to kep (PromotionKey 2)
CREATE TABLE [dbo].[FactInternetSales_d]
WITH
( CLUSTERED COLUMNSTORE INDEX
, DISTRIBUTION = HASH([ProductKey])
, PARTITION ( [OrderDateKey] RANGE RIGHT
FOR VALUES ( 20000101, 20010101, 20020101, 20030101, 20040101,
20050101
, 20060101, 20070101, 20080101, 20090101, 20100101,
20110101
, 20120101, 20130101, 20140101, 20150101, 20160101,
20170101
, 20180101, 20190101, 20200101, 20210101, 20220101,
20230101
, 20240101, 20250101, 20260101, 20270101, 20280101,
20290101
)
)
AS
SELECT *
FROM [dbo].[FactInternetSales]
WHERE [PromotionKey] = 2
OPTION (LABEL = 'CTAS : Delete')
;
Optimizing updates
UPDATE is a fully logged operation. If you need to update a large number of rows in a table or a partition, it can
often be far more efficient to use a minimally logged operation such as CTAS to do so.
In the example below a full table update has been converted to a CTAS so that minimal logging is possible.
In this case, we are retrospectively adding a discount amount to the sales in the table:
--Step 01. Create a new table containing the "Update".
CREATE TABLE [dbo].[FactInternetSales_u]
WITH
( CLUSTERED INDEX
, DISTRIBUTION = HASH([ProductKey])
, PARTITION ( [OrderDateKey] RANGE RIGHT
FOR VALUES ( 20000101, 20010101, 20020101, 20030101, 20040101,
20050101
, 20060101, 20070101, 20080101, 20090101, 20100101,
20110101
, 20120101, 20130101, 20140101, 20150101, 20160101,
20170101
, 20180101, 20190101, 20200101, 20210101, 20220101,
20230101
, 20240101, 20250101, 20260101, 20270101, 20280101,
20290101
)
)
)
AS
SELECT
[ProductKey]
, [OrderDateKey]
, [DueDateKey]
, [ShipDateKey]
, [CustomerKey]
, [PromotionKey]
, [CurrencyKey]
, [SalesTerritoryKey]
, [SalesOrderNumber]
, [SalesOrderLineNumber]
, [RevisionNumber]
, [OrderQuantity]
, [UnitPrice]
, [ExtendedAmount]
, [UnitPriceDiscountPct]
, ISNULL(CAST(5 as float),0) AS [DiscountAmount]
, [ProductStandardCost]
, [TotalProductCost]
, ISNULL(CAST(CASE WHEN [SalesAmount] <=5 THEN 0
ELSE [SalesAmount] - 5
END AS MONEY),0) AS [SalesAmount]
, [TaxAmt]
, [Freight]
, [CarrierTrackingNumber]
, [CustomerPONumber]
FROM [dbo].[FactInternetSales]
OPTION (LABEL = 'CTAS : Update')
;
NOTE
Re-creating large tables can benefit from using SQL Data Warehouse workload management features. For more information,
see Resource classes for workload management.
This procedure maximizes code reuse and keeps the partition switching example more compact.
The following code demonstrates the steps mentioned previously to achieve a full partition switching routine.
--Create a partitioned aligned table and update the data in the select portion of the CTAS
IF OBJECT_ID('[dbo].[FactInternetSales_in]') IS NOT NULL
BEGIN
DROP TABLE [dbo].[FactInternetSales_in]
END
CREATE TABLE #t
WITH ( DISTRIBUTION = ROUND_ROBIN
, HEAP
)
AS
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS seq_nmbr
, SalesOrderNumber
, SalesOrderLineNumber
FROM dbo.FactInternetSales
WHERE [OrderDateKey] BETWEEN 20010101 and 20011231
;
SELECT COUNT(*)
FROM dbo.FactInternetSales f
The best scenario is to let in flight data modification transactions complete prior to pausing or scaling SQL Data
Warehouse. However, this scenario might not always be practical. To mitigate the risk of a long rollback, consider
one of the following options:
Rewrite long running operations using CTAS
Break the operation into chunks; operating on a subset of the rows
Next steps
See Transactions in SQL Data Warehouse to learn more about isolation levels and transactional limits. For an
overview of other Best Practices, see SQL Data Warehouse Best Practices.
Using user-defined schemas in SQL Data Warehouse
7/24/2019 • 3 minutes to read • Edit Online
Tips for using T-SQL user-defined schemas in Azure SQL Data Warehouse for developing solutions.
NOTE
SQL Data Warehouse does not support cross database queries of any kind. Consequently, data warehouse implementations
that leverage this pattern will need to be revised.
Recommendations
These are recommendations for consolidating workloads, security, domain and functional boundaries by using
user defined schemas
1. Use one SQL Data Warehouse database to run your entire data warehouse workload
2. Consolidate your existing data warehouse environment to use one SQL Data Warehouse database
3. Leverage user-defined schemas to provide the boundary previously implemented using databases.
If user-defined schemas have not been used previously then you have a clean slate. Simply use the old database
name as the basis for your user-defined schemas in the SQL Data Warehouse database.
If schemas have already been used then you have a few options:
1. Remove the legacy schema names and start fresh
2. Retain the legacy schema names by pre-pending the legacy schema name to the table name
3. Retain the legacy schema names by implementing views over the table in an extra schema to re-create the old
schema structure.
NOTE
On first inspection option 3 may seem like the most appealing option. However, the devil is in the detail. Views are read only
in SQL Data Warehouse. Any data or table modification would need to be performed against the base table. Option 3 also
introduces a layer of views into your system. You might want to give this some additional thought if you are using views in
your architecture already.
Examples:
Implement user-defined schemas based on database names
CREATE SCHEMA [stg]; -- stg previously database name for staging database
GO
CREATE SCHEMA [edw]; -- edw previously database name for the data warehouse
GO
CREATE TABLE [stg].[customer] -- create staging tables in the stg schema
( CustKey BIGINT NOT NULL
, ...
);
GO
CREATE TABLE [edw].[customer] -- create data warehouse tables in the edw schema
( CustKey BIGINT NOT NULL
, ...
);
Retain legacy schema names by pre-pending them to the table name. Use schemas for the workload boundary.
Next steps
For more development tips, see development overview.
Assigning variables in Azure SQL Data Warehouse
7/24/2019 • 2 minutes to read • Edit Online
Tips for assigning T-SQL variables in Azure SQL Data Warehouse for developing solutions.
DECLARE @v int = 0
;
You can also use DECLARE to set more than one variable at a time. You cannot use SELECT or UPDATE to do the
following:
DECLARE @v INT = (SELECT TOP 1 c_customer_sk FROM Customer where c_last_name = 'Smith')
, @v1 INT = (SELECT TOP 1 c_customer_sk FROM Customer where c_last_name = 'Jones')
;
You cannot initialize and use a variable in the same DECLARE statement. To illustrate the point, the following
example is not allowed since @p1 is both initialized and used in the same DECLARE statement. The following
example gives an error.
You can only set one variable at a time with SET. However, compound operators are permissible.
Limitations
You cannot use UPDATE for variable assignment.
Next steps
For more development tips, see development overview.
Views in Azure SQL Data Warehouse
10/24/2019 • 2 minutes to read • Edit Online
Views can be used in a number of different ways to improve the quality of your solution.
Azure SQL Data Warehouse supports standard and materialized views. Both are virtual tables created with
SELECT expressions and presented to queries as logical tables. Views encapsulate the complexity of common data
computation and add an abstraction layer to computation changes so there's no need to rewrite queries.
Standard view
A standard view computes its data each time when the view is used. There's no data stored on disk. People
typically use standard views as a tool that helps organize the logical objects and queries in a database. To use a
standard view, a query needs to make direct reference to it. For more information, see the CREATE VIEW
documentation.
Views in SQL Data Warehouse are stored as metadata only. Consequently, the following options are not available:
There is no schema binding option
Base tables cannot be updated through the view
Views cannot be created over temporary tables
There is no support for the EXPAND / NOEXPAND hints
There are no indexed views in SQL Data Warehouse
Standard views can be utilized to enforce performance optimized joins between tables. For example, a view can
incorporate a redundant distribution key as part of the joining criteria to minimize data movement. Another benefit
of a view could be to force a specific query or joining hint. Using views in this manner guarantees that joins are
always performed in an optimal fashion avoiding the need for users to remember the correct construct for their
joins.
Materialized view
A materialized view pre-computes, stores, and maintains its data in Azure SQL Data Warehouse just like a table.
There's no recomputation needed each time when a materialized view is used. As the data gets loaded into base
tables, Azure SQL Data Warehouse synchronously refreshes the materialized views. The query optimizer
automatically uses deployed materialized views to improve query performance even if the views are not
referenced in the query. Queries benefiting most from materialized views are complex queries (typically queries
with joins and aggregations) on large tables that produce small result set.
For details on the materialized view syntax and other requirements, refer to CREATE MATERIALIZED VIEW AS
SELECT.
For query tuning guidance, check Performance tuning with materialized views.
Example
A common application pattern is to re-create tables using CREATE TABLE AS SELECT (CTAS ) followed by an
object renaming pattern whilst loading data. The following example adds new date records to a date dimension.
Note how a new table, DimDate_New, is first created and then renamed to replace the original version of the table.
CREATE TABLE dbo.DimDate_New
WITH (DISTRIBUTION = ROUND_ROBIN
, CLUSTERED INDEX (DateKey ASC)
)
AS
SELECT *
FROM dbo.DimDate AS prod
UNION ALL
SELECT *
FROM dbo.DimDate_stg AS stg;
However, this approach can result in tables appearing and disappearing from a user's view as well as "table does
not exist" error messages. Views can be used to provide users with a consistent presentation layer whilst the
underlying objects are renamed. By providing access to data through views, users do not need visibility to the
underlying tables. This layer provides a consistent user experience while ensuring that the data warehouse
designers can evolve the data model. Being able to evolve the underlying tables means designers can use CTAS to
maximize performance during the data loading process.
Next steps
For more development tips, see SQL Data Warehouse development overview.
Dynamic SQL in SQL Data Warehouse
7/24/2019 • 2 minutes to read • Edit Online
Tips for using dynamic SQL in Azure SQL Data Warehouse for developing solutions.
NOTE
Statements executed as dynamic SQL will still be subject to all TSQL validation rules.
Next steps
For more development tips, see development overview.
Group by options in SQL Data Warehouse
7/24/2019 • 3 minutes to read • Edit Online
Tips for implementing group by options in Azure SQL Data Warehouse for developing solutions.
SELECT [SalesTerritoryCountry]
, [SalesTerritoryRegion]
, SUM(SalesAmount) AS TotalSalesAmount
FROM dbo.factInternetSales s
JOIN dbo.DimSalesTerritory t ON s.SalesTerritoryKey = t.SalesTerritoryKey
GROUP BY ROLLUP (
[SalesTerritoryCountry]
, [SalesTerritoryRegion]
)
;
To replace GROUPING SETS, the sample principle applies. You only need to create UNION ALL sections for the
aggregation levels you want to see.
Cube options
It is possible to create a GROUP BY WITH CUBE using the UNION ALL approach. The problem is that the code
can quickly become cumbersome and unwieldy. To mitigate this, you can use this more advanced approach.
Let's use the example above.
The first step is to define the 'cube' that defines all the levels of aggregation that we want to create. It is important
to take note of the CROSS JOIN of the two derived tables. This generates all the levels for us. The rest of the code
is really there for formatting.
CREATE TABLE #Cube
WITH
( DISTRIBUTION = ROUND_ROBIN
, LOCATION = USER_DB
)
AS
WITH GrpCube AS
(SELECT CAST(ISNULL(Country,'NULL')+','+ISNULL(Region,'NULL') AS NVARCHAR(50)) as 'Cols'
, CAST(ISNULL(Country+',','')+ISNULL(Region,'') AS NVARCHAR(50)) as 'GroupBy'
, ROW_NUMBER() OVER (ORDER BY Country) as 'Seq'
FROM ( SELECT 'SalesTerritoryCountry' as Country
UNION ALL
SELECT NULL
) c
CROSS JOIN ( SELECT 'SalesTerritoryRegion' as Region
UNION ALL
SELECT NULL
) r
)
SELECT Cols
, CASE WHEN SUBSTRING(GroupBy,LEN(GroupBy),1) = ','
THEN SUBSTRING(GroupBy,1,LEN(GroupBy)-1)
ELSE GroupBy
END AS GroupBy --Remove Trailing Comma
,Seq
FROM GrpCube;
DECLARE
@SQL NVARCHAR(4000)
,@Columns NVARCHAR(4000)
,@GroupBy NVARCHAR(4000)
,@i INT = 1
,@nbr INT = 0
;
CREATE TABLE #Results
(
[SalesTerritoryCountry] NVARCHAR(50)
,[SalesTerritoryRegion] NVARCHAR(50)
,[TotalSalesAmount] MONEY
)
WITH
( DISTRIBUTION = ROUND_ROBIN
, LOCATION = USER_DB
)
;
The third step is to loop over our cube of columns performing the aggregation. The query will run once for every
row in the #Cube temporary table and store the results in the #Results temp table
SET @nbr =(SELECT MAX(Seq) FROM #Cube);
WHILE @i<=@nbr
BEGIN
SET @Columns = (SELECT Cols FROM #Cube where seq = @i);
SET @GroupBy = (SELECT GroupBy FROM #Cube where seq = @i);
Lastly, you can return the results by simply reading from the #Results temporary table
SELECT *
FROM #Results
ORDER BY 1,2,3
;
By breaking the code up into sections and generating a looping construct, the code becomes more manageable
and maintainable.
Next steps
For more development tips, see development overview.
Using labels to instrument queries in Azure SQL Data
Warehouse
7/24/2019 • 2 minutes to read • Edit Online
Tips for using labels to instrument queries in Azure SQL Data Warehouse for developing solutions.
SELECT *
FROM sys.tables
OPTION (LABEL = 'My Query Label')
;
The last line tags the string 'My Query Label' to the query. This tag is particularly helpful since the label is query-
able through the DMVs. Querying for labels provides a mechanism for locating problem queries and helping to
identify progress through an ELT run.
A good naming convention really helps. For example, starting the label with PROJECT, PROCEDURE,
STATEMENT, or COMMENT helps to uniquely identify the query among all the code in source control.
The following query uses a dynamic management view to search by label.
SELECT *
FROM sys.dm_pdw_exec_requests r
WHERE r.[label] = 'My Query Label'
;
NOTE
It is essential to put square brackets or double quotes around the word label when querying. Label is a reserved word and
causes an error when it is not delimited.
Next steps
For more development tips, see development overview.
Workload management with resource classes in
Azure SQL Data Warehouse
10/8/2019 • 16 minutes to read • Edit Online
Guidance for using resource classes to manage memory and concurrency for queries in your Azure SQL Data
Warehouse.
smallrc 3% 32
mediumrc 10% 10
largerc 22% 4
xlargerc 70% 1
NOTE
Users or groups defined as Active Directory admin are also service administrators.
NOTE
SELECT statements on dynamic management views (DMVs) or other system views are not governed by any of the
concurrency limits. You can monitor the system regardless of the number of queries executing on it.
Concurrency slots
Concurrency slots are a convenient way to track the resources available for query execution. They are like
tickets that you purchase to reserve seats at a concert because seating is limited. The total number of
concurrency slots per data warehouse is determined by the service level. Before a query can start executing, it
must be able to reserve enough concurrency slots. When a query finishes, it releases its concurrency slots.
A query running with 10 concurrency slots can access 5 times more compute resources than a query
running with 2 concurrency slots.
If each query requires 10 concurrency slots and there are 40 concurrency slots, then only 4 queries can run
concurrently.
Only resource governed queries consume concurrency slots. System queries and some trivial queries don't
consume any slots. The exact number of concurrency slots consumed is determined by the query's resource
class.
To decrease the resource class, use sp_droprolemember. If 'loaduser' is not a member or any other resource
classes, they go into the default smallrc resource class with a 3% memory grant.
Recommendations
We recommend creating a user that is dedicated to running a specific type of query or load operation. Give
that user a permanent resource class instead of changing the resource class on a frequent basis. Static resource
classes afford greater overall control on the workload, so we suggest using static resource classes before
considering dynamic resource classes.
Resource classes for load users
CREATE TABLE uses clustered columnstore indexes by default. Compressing data into a columnstore index is a
memory-intensive operation, and memory pressure can reduce the index quality. Memory pressure can lead to
needing a higher resource class when loading data. To ensure loads have enough memory, you can create a
user that is designated for running loads and assign that user to a higher resource class.
The memory needed to process loads efficiently depends on the nature of the table loaded and the data size.
For more information on memory requirements, see Maximizing rowgroup quality.
Once you have determined the memory requirement, choose whether to assign the load user to a static or
dynamic resource class.
Use a static resource class when table memory requirements fall within a specific range. Loads run with
appropriate memory. When you scale the data warehouse, the loads do not need more memory. By using a
static resource class, the memory allocations stay constant. This consistency conserves memory and allows
more queries to run concurrently. We recommend that new solutions use the static resource classes first as
these provide greater control.
Use a dynamic resource class when table memory requirements vary widely. Loads might require more
memory than the current DWU or cDWU level provides. Scaling the data warehouse adds more memory to
load operations, which allows loads to perform faster.
Resource classes for queries
Some queries are compute-intensive and some aren't.
Choose a dynamic resource class when queries are complex, but don't need high concurrency. For example,
generating daily or weekly reports is an occasional need for resources. If the reports are processing large
amounts of data, scaling the data warehouse provides more memory to the user's existing resource class.
Choose a static resource class when resource expectations vary throughout the day. For example, a static
resource class works well when the data warehouse is queried by many people. When scaling the data
warehouse, the amount of memory allocated to the user doesn't change. Consequently, more queries can
be executed in parallel on the system.
Proper memory grants depend on many factors, such as the amount of data queried, the nature of the table
schemas, and various joins, select, and group predicates. In general, allocating more memory allows queries to
complete faster, but reduces the overall concurrency. If concurrency is not an issue, over-allocating memory
does not harm throughput.
To tune performance, use different resource classes. The next section gives a stored procedure that helps you
figure out the best resource class.
Usage example
Syntax:
EXEC dbo.prc_workload_management_by_DWU @DWU VARCHAR(7), @SCHEMA_NAME VARCHAR(128), @TABLE_NAME VARCHAR(128)
1. @DWU: Either provide a NULL parameter to extract the current DWU from the DW DB or provide any
supported DWU in the form of 'DW100c'
2. @SCHEMA_NAME: Provide a schema name of the table
3. @TABLE_NAME: Provide a table name of the interest
Examples executing this stored proc:
The following statement creates Table1 that is used in the preceding examples.
CREATE TABLE Table1 (a int, b varchar(50), c decimal (18,10), d char(10), e varbinary(15), f float, g
datetime, h date);
-------------------------------------------------------------------------------
-- Dropping prc_workload_management_by_DWU procedure if it exists.
-------------------------------------------------------------------------------
IF EXISTS (SELECT * FROM sys.objects WHERE type = 'P' AND name = 'prc_workload_management_by_DWU')
DROP PROCEDURE dbo.prc_workload_management_by_DWU
GO
-------------------------------------------------------------------------------
-- Creating prc_workload_management_by_DWU.
-------------------------------------------------------------------------------
CREATE PROCEDURE dbo.prc_workload_management_by_DWU
(@DWU VARCHAR(7),
@SCHEMA_NAME VARCHAR(128),
@TABLE_NAME VARCHAR(128)
)
AS
IF @DWU IS NULL
BEGIN
-- Selecting proper DWU for the current DB if not specified.
, column_count*1048576*8 AS column_size
, short_string_column_count*1048576*32 AS short_string_size,
(long_string_column_count*16777216) AS long_string_size
(long_string_column_count*16777216) AS long_string_size
FROM base
UNION
SELECT CASE WHEN COUNT(*) = 0 THEN 'EMPTY' END as schema_name
,CASE WHEN COUNT(*) = 0 THEN 'EMPTY' END as table_name
,CASE WHEN COUNT(*) = 0 THEN 0 END as table_overhead
,CASE WHEN COUNT(*) = 0 THEN 0 END as column_size
,CASE WHEN COUNT(*) = 0 THEN 0 END as short_string_size
Next step
For more information about managing database users and security, see Secure a database in SQL Data
Warehouse. For more information about how larger resource classes can improve clustered columnstore index
quality, see Memory optimizations for columnstore compression.
Azure SQL Data Warehouse workload classification
7/5/2019 • 3 minutes to read • Edit Online
This article explains the SQL Data Warehouse workload classification process of assigning a resource class and
importance to incoming requests.
Classification
Workload management classification allows workload policies to be applied to requests through assigning
resource classes and importance.
While there are many ways to classify data warehousing workloads, the simplest and most common classification
is load and query. You load data with insert, update, and delete statements. You query the data using selects. A
data warehousing solution will often have a workload policy for load activity, such as assigning a higher resource
class with more resources. A different workload policy could apply to queries, such as lower importance compared
to load activities.
You can also subclassify your load and query workloads. Subclassification gives you more control of your
workloads. For example, query workloads can consist of cube refreshes, dashboard queries or ad-hoc queries. You
can classify each of these query workloads with different resource classes or importance settings. Load can also
benefit from subclassification. Large transformations can be assigned to larger resource classes. Higher
importance can be used to ensure key sales data is loader before weather data or a social data feed.
Not all statements are classified as they do not require resources or need importance to influence execution.
DBCC commands, BEGIN, COMMIT, and ROLLBACK TRANSACTION statements are not classified.
Classification process
Classification in SQL Data Warehouse is achieved today by assigning users to a role that has a corresponding
resource class assigned to it using sp_addrolemember. The ability to characterize requests beyond a login to a
resource class is limited with this capability. A richer method for classification is now available with the CREATE
WORKLOAD CLASSIFIER syntax. With this syntax, SQL Data Warehouse users can assign importance and a
resource class to requests.
NOTE
Classification is evaluated on a per request basis. Multiple requests in a single session can be classified differently.
Classification precedence
As part of the classification process, precedence is in place to determine which resource class is assigned.
Classification based on a database user takes precedence over role membership. If you create a classifier that
maps the UserA database user to the mediumrc resource class. Then, map the RoleA database role (of which
UserA is a member) to the largerc resource class. The classifier that maps the database user to the mediumrc
resource class will take precedence over the classifier that maps the RoleA database role to the largerc resource
class.
If a user is a member of multiple roles with different resource classes assigned or matched in multiple classifiers,
the user is given the highest resource class assignment. This behavior is consistent with existing resource class
assignment behavior.
System classifiers
Workload classification has system workload classifiers. The system classifiers map existing resource class role
memberships to resource class resource allocations with normal importance. System classifiers can't be dropped.
To view system classifiers, you can run the below query:
Next steps
For more information on creating a classifier, see the CREATE WORKLOAD CLASSIFIER (Transact-SQL ).
See the Quickstart on how to create a workload classifier Create a workload classifier.
See the how -to articles to Configure Workload Importance and how to manage and monitor Workload
Management.
See sys.dm_pdw_exec_requests to view queries and the importance assigned.
Azure SQL Data Warehouse workload importance
7/5/2019 • 2 minutes to read • Edit Online
This article explains how workload importance can influence the order of execution for SQL Data Warehouse
requests.
Importance
Business needs can require data warehousing workloads to be more important than others. Consider a scenario
where mission critical sales data is loaded before the fiscal period close. Data loads for other sources such as
weather data don't have strict SLAs. Setting high importance for a request to load sales data and low importance
to a request to load whether data ensures the sales data load gets first access to resources and completes quicker.
Importance levels
There are five levels of importance: low, below_normal, normal, above_normal, and high. Requests that don't set
importance are assigned the default level of normal. Requests that have the same importance level have the same
scheduling behavior that exists today.
Importance scenarios
Beyond the basic importance scenario described above with sales and weather data, there are other scenarios
where workload importance helps meet data processing and querying needs.
Locking
Access to locks for read and write activity is one area of natural contention. Activities such as partition switching
or RENAME OBJECT require elevated locks. Without workload importance, SQL Data Warehouse optimizes for
throughput. Optimizing for throughput means that when running and queued requests have the same locking
needs and resources are available, the queued requests can bypass requests with higher locking needs that arrived
in the request queue earlier. Once workload importance is applied to requests with higher locking needs. Request
with higher importance will be run before request with lower importance.
Consider the following example:
Q1 is actively running and selecting data from SalesFact. Q2 is queued waiting for Q1 to complete. It was
submitted at 9am and is attempting to partition switch new data into SalesFact. Q3 is submitted at 9:01am and
wants to select data from SalesFact.
If Q2 and Q3 have the same importance and Q1 is still executing, Q3 will begin executing. Q2 will continue to
wait for an exclusive lock on SalesFact. If Q2 has higher importance than Q3, Q3 will wait until Q2 is finished
before it can begin execution.
Non-uniform requests
Another scenario where importance can help meet querying demands is when requests with different resource
classes are submitted. As was previously mentioned, under the same importance, SQL Data Warehouse optimizes
for throughput. When mixed size requests (such as smallrc or mediumrc) are queued, SQL Data Warehouse will
choose the earliest arriving request that fits within the available resources. If workload importance is applied, the
highest importance request is scheduled next.
Consider the following example on DW500c:
Q1, Q2, Q3, and Q4 are running smallrc queries. Q5 is submitted with the mediumrc resource class at 9am. Q6 is
submitted with smallrc resource class at 9:01am.
Because Q5 is mediumrc, it requires two concurrency slots. Q5 needs to wait for two of the running queries to
complete. However, when one of the running queries (Q1-Q4) completes, Q6 is scheduled immediately because
the resources exist to execute the query. If Q5 has higher importance than Q6, Q6 waits until Q5 is running before
it can begin executing.
Next steps
For more information on creating a classifier, see the CREATE WORKLOAD CLASSIFIER (Transact-SQL ).
For more information about SQL Data Warehouse workload classification, see Workload Classification.
See the Quickstart Create workload classifier for how to create a workload classifier.
See the how -to articles to Configure Workload Importance and how to Manage and monitor Workload
Management.
See sys.dm_pdw_exec_requests to view queries and the importance assigned.
Memory and concurrency limits for Azure SQL Data
Warehouse
10/8/2019 • 3 minutes to read • Edit Online
View the memory and concurrency limits allocated to the various performance levels and resource classes in
Azure SQL Data Warehouse. For more information, and to apply these capabilities to your workload
management plan, see Resource classes for workload management.
DW100c 1 60 60
DW200c 1 60 120
DW300c 1 60 180
DW400c 1 60 240
DW500c 1 60 300
DW1000c 2 30 600
DW1500c 3 20 900
DW2000c 4 15 1200
DW2500c 5 12 1500
DW3000c 6 10 1800
DW5000c 10 6 3000
DW6000c 12 5 3600
DW7500c 15 4 4500
DW10000c 20 3 6000
DW15000c 30 2 9000
DISTRIBUTIONS PER COMPUTE MEMORY PER DATA
PERFORMANCE LEVEL COMPUTE NODES NODE WAREHOUSE (GB)
DW30000c 60 1 18000
The maximum service level is DW30000c, which has 60 Compute nodes and one distribution per Compute node.
For example, a 600 TB data warehouse at DW30000c processes approximately 10 TB per Compute node.
Concurrency maximums
To ensure each query has enough resources to execute efficiently, SQL Data Warehouse tracks resource
utilization by assigning concurrency slots to each query. The system puts queries into a queue based on
importance and concurrency slots. Queries wait in the queue until enough concurrency slots are available.
Importance and concurrency slots determine CPU prioritization. For more information, see Analyze your
workload
Static resource classes
The following table shows the maximum concurrent queries and concurrency slots for each static resource class.
MAXIM
UM CONCU SLOTS SLOTS SLOTS SLOTS SLOTS SLOTS SLOTS SLOTS
CONCU RRENCY USED USED USED USED USED USED USED USED
RRENT SLOTS BY BY BY BY BY BY BY BY
SERVIC QUERIE AVAILA STATIC STATIC STATIC STATIC STATIC STATIC STATIC STATIC
E LEVEL S BLE RC10 RC20 RC30 RC40 RC50 RC60 RC70 RC80
DW10 4 4 1 2 4 4 4 4 4 4
0c
DW20 8 8 1 2 4 8 8 8 8 8
0c
DW30 12 12 1 2 4 8 8 8 8 8
0c
DW40 16 16 1 2 4 8 16 16 16 16
0c
DW50 20 20 1 2 4 8 16 16 16 16
0c
DW10 32 40 1 2 4 8 16 32 32 32
00c
DW15 32 60 1 2 4 8 16 32 32 32
00c
DW20 48 80 1 2 4 8 16 32 64 64
00c
DW25 48 100 1 2 4 8 16 32 64 64
00c
DW30 64 120 1 2 4 8 16 32 64 64
00c
MAXIM
UM CONCU SLOTS SLOTS SLOTS SLOTS SLOTS SLOTS SLOTS SLOTS
CONCU RRENCY USED USED USED USED USED USED USED USED
RRENT SLOTS BY BY BY BY BY BY BY BY
SERVIC QUERIE AVAILA STATIC STATIC STATIC STATIC STATIC STATIC STATIC STATIC
E LEVEL S BLE RC10 RC20 RC30 RC40 RC50 RC60 RC70 RC80
MAXIMUM CONCURRENCY
CONCURRENT SLOTS SLOTS USED BY SLOTS USED BY SLOTS USED BY SLOTS USED BY
SERVICE LEVEL QUERIES AVAILABLE SMALLRC MEDIUMRC LARGERC XLARGERC
DW100c 4 4 1 1 1 2
DW200c 8 8 1 1 1 5
DW300c 12 12 1 1 2 8
DW400c 16 16 1 1 3 11
DW500c 20 20 1 2 4 14
DW1000c 32 40 1 4 8 28
DW1500c 32 60 1 6 13 42
DW2000c 32 80 2 8 17 56
DW2500c 32 100 3 10 22 70
DW3000c 32 120 3 12 26 84
When there are not enough concurrency slots free to start query execution, queries are queued and executed
based on importance. If there is equivalent importance, queries are executed on a first-in, first-out basis. As a
queries finishes and the number of queries and slots fall below the limits, SQL Data Warehouse releases queued
queries.
Next steps
To learn more about how to leverage resource classes to optimize your workload further please review the
following articles:
Resource classes for workload management
Analyzing your workload
Manageability and monitoring with Azure SQL Data
Warehouse
1/30/2019 • 2 minutes to read • Edit Online
Take a look through what's available to help you manage and monitor SQL Data Warehouse. The following articles
highlight ways to optimize performance and usage of your data warehouse.
Overview
Learn about compute management and elasticity
Understand what metrics and logs are available in the Azure portal
Learn about backup and restore capabilities
Learn about built-in intelligence and recommendations
Learn about maintenance periods and what is available to minimize downtime of your data warehouse
Find common troubleshooting guidance
Next steps
For How -to guides, see Monitor and tune your data warehouse.
Manage compute in Azure SQL Data Warehouse
8/18/2019 • 6 minutes to read • Edit Online
Learn about managing compute resources in Azure SQL Data Warehouse. Lower costs by pausing the data
warehouse, or scale the data warehouse to meet performance demands.
Scaling compute
You can scale out or scale back compute by adjusting the data warehouse units setting for your data warehouse.
Loading and query performance can increase linearly as you add more data warehouse units.
For scale-out steps, see the Azure portal, PowerShell, or T-SQL quickstarts. You can also perform scale-out
operations with a REST API.
To perform a scale operation, SQL Data Warehouse first kills all incoming queries and then rolls back
transactions to ensure a consistent state. Scaling only occurs once the transaction rollback is complete. For a
scale operation, the system detaches the storage layer from the Compute nodes, adds Compute nodes, and then
reattaches the storage layer to the Compute layer. Each data warehouse is stored as 60 distributions, which are
evenly distributed to the Compute nodes. Adding more Compute nodes adds more compute power. As the
number of Compute nodes increases, the number of distributions per compute node decreases, providing more
compute power for your queries. Likewise, decreasing data warehouse units reduces the number of Compute
nodes, which reduces the compute resources for queries.
The following table shows how the number of distributions per Compute node changes as the data warehouse
units change. DWU6000 provides 60 Compute nodes and achieves much higher query performance than
DWU100.
100 1 60
200 2 30
300 3 20
400 4 15
500 5 12
600 6 10
1000 10 6
DATA WAREHOUSE UNITS # OF COMPUTE NODES # OF DISTRIBUTIONS PER NODE
1200 12 5
1500 15 4
2000 20 3
3000 30 2
6000 60 1
Permissions
Scaling the data warehouse requires the permissions described in ALTER DATABASE. Pause and Resume
require the SQL DB Contributor permission, specifically Microsoft.Sql/servers/databases/action.
Next steps
See the how to guide for manage compute Another aspect of managing compute resources is allocating
different compute resources for individual queries. For more information, see Resource classes for workload
management.
Monitoring resource utilization and query activity in
Azure SQL Data Warehouse
9/27/2019 • 2 minutes to read • Edit Online
Azure SQL Data Warehouse provides a rich monitoring experience within the Azure portal to surface insights to
your data warehouse workload. The Azure portal is the recommended tool when monitoring your data warehouse
as it provides configurable retention periods, alerts, recommendations, and customizable charts and dashboards
for metrics and logs. The portal also enables you to integrate with other Azure monitoring services such as
Operations Management Suite (OMS ) and Azure Monitor (logs) to provide a holistic monitoring experience for
not only your data warehouse but also your entire Azure analytics platform for an integrated monitoring
experience. This documentation describes what monitoring capabilities are available to optimize and manage your
analytics platform with SQL Data Warehouse.
Resource utilization
The following metrics are available in the Azure portal for SQL Data Warehouse. These metrics are surfaced
through Azure Monitor.
CPU percentage CPU utilization across all nodes for the Maximum
data warehouse
Cache hit percentage (cache hits / cache miss) * 100 where Maximum
cache hits is the sum of all columnstore
segments hits in the local SSD cache
and cache miss is the columnstore
segments misses in the local SSD cache
summed across all nodes
Query activity
For a programmatic experience when monitoring SQL Data Warehouse via T-SQL, the service provides a set of
Dynamic Management Views (DMVs). These views are useful when actively troubleshooting and identifying
performance bottlenecks with your workload.
To view the list of DMVs that SQL Data Warehouse provides, refer to this documentation.
Next steps
The following How -to guides describe common scenarios and use cases when monitoring and managing your
data warehouse:
Monitor your data warehouse workload with DMVs
Backup and restore in Azure SQL Data Warehouse
10/22/2019 • 6 minutes to read • Edit Online
Learn how to use backup and restore in Azure SQL Data Warehouse. Use data warehouse restore points to
recover or copy your data warehouse to a previous state in the primary region. Use data warehouse geo-
redundant backups to restore to a different geographical region.
select top 1 *
from sys.pdw_loader_backup_runs
order by run_id desc
;
IMPORTANT
If you delete a logical SQL server instance, all databases that belong to the instance are also deleted and cannot be
recovered. You cannot restore a deleted server.
NOTE
If you require a shorter RPO for geo-backups, vote for this capability here. You can also create a user-defined restore point
and restore from the newly created restore point to a new data warehouse in a different region. Once you have restored,
you have the data warehouse online and can pause it indefinitely to save compute costs. The paused database incurs storage
charges at the Azure Premium Storage rate. Should you need an active copy of the data warehouse, you can resume which
should take only a few minutes.
Geo-redundant restore
You can restore your data warehouse to any region supporting SQL Data Warehouse at your chosen performance
level.
NOTE
To perform a geo-redundant restore you must not have opted out of this feature.
Next steps
For more information about disaster planning, see Business continuity overview
SQL Data Warehouse Recommendations
3/15/2019 • 2 minutes to read • Edit Online
This article describes the recommendations served by SQL Data Warehouse through Azure Advisor.
SQL Data Warehouse provides recommendations to ensure your data warehouse is consistently optimized for
performance. Data warehouse recommendations are tightly integrated with Azure Advisor to provide you with
best practices directly within the Azure portal. SQL Data Warehouse analyzes the current state of your data
warehouse, collects telemetry, and surfaces recommendations for your active workload on a daily cadence. The
supported data warehouse recommendation scenarios are outlined below along with how to apply recommended
actions.
If you have any feedback on the SQL Data Warehouse Advisor or run into any issues, reach out to
[email protected].
Click here to check your recommendations today! Currently this feature is applicable to Gen2 data warehouses
only.
Data Skew
Data skew can cause additional data movement or resource bottlenecks when running your workload. The
following documentation describes show to identify data skew and prevent it from happening by selecting an
optimal distribution key.
Identify and remove skew
No or Outdated Statistics
Having suboptimal statistics can severely impact query performance as it can cause the SQL Data Warehouse
query optimizer to generate suboptimal query plans. The following documentation describes the best practices
around creating and updating statistics:
Creating and updating table statistics
To see the list of impacted tables by these recommendations, run the following T-SQL script. Advisor continuously
runs the same T-SQL script to generate these recommendations.
Replicate Tables
For replicated table recommendations, Advisor detects table candidates based on the following physical
characteristics:
Replicated table size
Number of columns
Table distribution type
Number of partitions
Advisor continuously leverages workload-based heuristics such as table access frequency, rows returned on
average, and thresholds around data warehouse size and activity to ensure high-quality recommendations are
generated.
The following describes workload-based heuristics you may find in the Azure portal for each replicated table
recommendation:
Scan avg- the average percent of rows that were returned from the table for each table access over the past
seven days
Frequent read, no update - indicates that the table has not been updated in the past seven days while showing
access activity
Read/update ratio - the ratio of how frequent the table was accessed relative to when it gets updated over the
past seven days
Activity - measures the usage based on access activity. This compares the table access activity relative to the
average table access activity across the data warehouse over the past seven days.
Currently Advisor will only show at most four replicated table candidates at once with clustered columnstore
indexes prioritizing the highest activity.
IMPORTANT
The replicated table recommendation is not full proof and does not take into account data movement operations. We are
working on adding this as a heuristic but in the meantime, you should always validate your workload after applying the
recommendation. Please contact [email protected] if you discover replicated table recommendations that
causes your workload to regress. To learn more about replicated tables, visit the following documentation.
Troubleshooting Azure SQL Data Warehouse
8/18/2019 • 4 minutes to read • Edit Online
Connecting
ISSUE RESOLUTION
Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'. This error occurs when an AAD user tries to connect to the
(Microsoft SQL Server, Error: 18456) master database, but does not have a user in master. To
correct this issue, either specify the SQL Data Warehouse you
wish to connect to at connection time or add the user to the
master database. See Security overview article for more
details.
The server principal "MyUserName" is not able to access the This error occurs when an AAD user tries to connect to the
database "master" under the current security context. Cannot master database, but does not have a user in master. To
open user default database. Login failed. Login failed for user correct this issue, either specify the SQL Data Warehouse you
'MyUserName'. (Microsoft SQL Server, Error: 916) wish to connect to at connection time or add the user to the
master database. See Security overview article for more
details.
CTAIP error This error can occur when a login has been created on the
SQL server master database, but not in the SQL Data
Warehouse database. If you encounter this error, take a look
at the Security overview article. This article explains how to
create a login and user on master, and then how to create a
user in the SQL Data Warehouse database.
Blocked by Firewall Azure SQL databases are protected by server and database
level firewalls to ensure only known IP addresses have access
to a database. The firewalls are secure by default, which means
that you must explicitly enable and IP address or range of
addresses before you can connect. To configure your firewall
for access, follow the steps in Configure server firewall access
for your client IP in the Provisioning instructions.
Cannot connect with tool or driver SQL Data Warehouse recommends using SSMS, SSDT for
Visual Studio, or sqlcmd to query your data. For more
information on drivers and connecting to SQL Data
Warehouse, see Drivers for Azure SQL Data Warehouse and
Connect to Azure SQL Data Warehouse articles.
Tools
ISSUE RESOLUTION
Visual Studio object explorer is missing AAD users This is a known issue. As a workaround, view the users in
sys.database_principals. See Authentication to Azure SQL Data
Warehouse to learn more about using Azure Active Directory
with SQL Data Warehouse.
ISSUE RESOLUTION
Manual scripting, using the scripting wizard, or connecting via Ensure that users have been created in the master database.
SSMS is slow, not responding, or producing errors In scripting options, also make sure that the engine edition is
set as “Microsoft Azure SQL Data Warehouse Edition” and
engine type is “Microsoft Azure SQL Database”.
Generate scripts fails in SSMS Generating a script for SQL Data Warehouse fails if the option
"Generate script for dependent objects" option is set to "True."
As a workaround, users must manually go to Tools ->
Options ->SQL Server Object Explorer -> Generate script for
dependent options and set to false
Performance
ISSUE RESOLUTION
Query performance troubleshooting If you are trying to troubleshoot a particular query, start with
Learning how to monitor your queries.
Poor query performance and plans often is a result of missing The most common cause of poor performance is lack of
statistics statistics on your tables. See Maintaining table statistics for
details on how to create statistics and why they are critical to
your performance.
How to implement best practices The best place to start to learn ways to improve query
performance is SQL Data Warehouse best practices article.
How to improve performance with scaling Sometimes the solution to improving performance is to simply
add more compute power to your queries by Scaling your
SQL Data Warehouse.
Poor query performance as a result of poor index quality Some times queries can slow down because of Poor
columnstore index quality. See this article for more
information and how to Rebuild indexes to improve segment
quality.
System management
ISSUE RESOLUTION
Msg 40847: Could not perform the operation because server Either reduce the DWU of the database you are trying to
would exceed the allowed Database Transaction Unit quota of create or request a quota increase.
45000.
Investigating space utilization See Table sizes to understand the space utilization of your
system.
ISSUE RESOLUTION
Help with managing tables See the Table overview article for help with managing your
tables. This article also includes links into more detailed topics
like Table data types, Distributing a table, Indexing a table,
Partitioning a table, Maintaining table statistics and
Temporary tables.
Transparent data encryption (TDE) progress bar is not You can view the state of TDE via powershell.
updating in the Azure portal
DELETE and UPDATE limitations See UPDATE workarounds, DELETE workarounds and Using
CTAS to work around unsupported UPDATE and DELETE
syntax.
Stored procedure limitations See Stored procedure limitations to understand some of the
limitations of stored procedures.
UDFs do not support SELECT statements This is a current limitation of our UDFs. See CREATE
FUNCTION for the syntax we support.
Next steps
For more help in finding solution to your issue, here are some other resources you can try.
Blogs
Feature requests
Videos
CAT team blogs
Create support ticket
MSDN forum
Stack Overflow forum
Twitter
Use maintenance schedules to manage service
updates and maintenance
10/4/2019 • 4 minutes to read • Edit Online
Maintenance schedules are now available in all Azure SQL Data Warehouse regions. The maintenance schedule
feature integrates the Service Health Planned Maintenance Notifications, Resource Health Check Monitor, and the
Azure SQL Data Warehouse maintenance scheduling service.
You use maintenance scheduling to choose a time window when it's convenient to receive new features, upgrades,
and patches. You choose a primary and a secondary maintenance window within a seven-day period. To use this
feature you will need to identify a primary and secondary window within separate day ranges.
For example, you can schedule a primary window of Saturday 22:00 to Sunday 01:00, and then schedule a
secondary window of Wednesday 19:00 to 22:00. If SQL Data Warehouse can't perform maintenance during your
primary maintenance window, it will try the maintenance again during your secondary maintenance window.
Service maintenance could occur during both the primary and secondary windows. To ensure rapid completion of
all maintenance operations, DW400(c) and lower data warehouse tiers could complete maintenance outside of a
designated maintenance window.
All newly created Azure SQL Data Warehouse instances will have a system-defined maintenance schedule applied
during deployment. The schedule can be edited as soon as deployment is complete.
Each maintenance window can be between three and eight hours. Maintenance can occur at any time within the
window. When maintenance starts, all active sessions will be canceled and Non-committed transactions will be
rolled back. You should expect multiple brief losses in connectivity as the service deploys new code to your data
warehouse. You'll be notified immediately after your data warehouse maintenance is completed.
All maintenance operations should finish within the scheduled maintenance windows. No maintenance will take
place outside the specified maintenance windows without prior notification. If your data warehouse is paused
during a scheduled maintenance, it will be updated during the resume operation.
NOTE
In the event we are required to deploy a time critical update, advanced notification times may be significantly reduced.
If you received an advance notification that maintenance will take place, but SQL Data Warehouse can't perform
maintenance during that time, you'll receive a cancellation notification. Maintenance will then resume during the
next scheduled maintenance period.
All active maintenance events appear in the Service Health - Planned Maintenance section. The Service Health
history includes a full record of past events. You can monitor maintenance via the Azure Service Health check
portal dashboard during an active event.
Maintenance schedule availability
Even if maintenance scheduling isn't available in your selected region, you can view and edit your maintenance
schedule at any time. When maintenance scheduling becomes available in your region, the identified schedule will
immediately become active on your data warehouse.
4. Identify the preferred day range for your primary maintenance window by using the options at the top of
the page. This selection determines if your primary window will occur on a weekday or over the weekend.
Your selection will update the drop-down values. During preview, some regions might not yet support the
full set of available Day options.
5. Choose your preferred primary and secondary maintenance windows by using the drop-down list boxes:
Day: Preferred day to perform maintenance during the selected window.
Start time: Preferred start time for the maintenance window.
Time window: Preferred duration of your time window.
The Schedule summary area at the bottom of the blade is updated based on the values that you selected.
6. Select Save. A message appears, confirming that your new schedule is now active.
If you're saving a schedule in a region that doesn't support maintenance scheduling, the following message
appears. Your settings are saved and become active when the feature becomes available in your selected
region.
Next steps
Learn more about creating, viewing, and managing alerts by using Azure Monitor.
Learn more about webhook actions for log alert rules.
Learn more Creating and managing Action Groups.
Learn more about Azure Service Health.
Integrate other services with SQL Data Warehouse
5/17/2019 • 2 minutes to read • Edit Online
In addition to its core functionality, SQL Data Warehouse enables users to integrate with many of the other
services in Azure. Some of these services include:
Power BI
Azure Data Factory
Azure Machine Learning
Azure Stream Analytics
SQL Data Warehouse continues to integrate with more services across Azure, and more Integration partners.
Power BI
Power BI integration allows you to combine the compute power of SQL Data Warehouse with the dynamic
reporting and visualization of Power BI. Power BI integration currently includes:
Direct Connect: A more advanced connection with logical pushdown against SQL Data Warehouse.
Pushdown provides faster analysis on a larger scale.
Open in Power BI: The 'Open in Power BI' button passes instance information to Power BI for a simplified way
to connect.
For more information, see Integrate with Power BI, or the Power BI documentation.
This article shows you how to create and populate Azure AD, and then use Azure AD with Azure SQL Database,
managed instance, and SQL Data Warehouse. For an overview, see Azure Active Directory Authentication.
NOTE
This article applies to Azure SQL server, and to both SQL Database and SQL Data Warehouse databases that are created on
the Azure SQL server. For simplicity, SQL Database is used when referring to both SQL Database and SQL Data Warehouse.
IMPORTANT
Connecting to SQL Server running on an Azure VM is not supported using an Azure Active Directory account. Use a
domain Active Directory account instead.
NOTE
This article has been updated to use the new Azure PowerShell Az module. You can still use the AzureRM module, which will
continue to receive bug fixes until at least December 2020. To learn more about the new Az module and AzureRM
compatibility, see Introducing the new Azure PowerShell Az module. For Az module installation instructions, see Install Azure
PowerShell.
IMPORTANT
The PowerShell Azure Resource Manager module is still supported by Azure SQL Database, but all future development is for
the Az.Sql module. For these cmdlets, see AzureRM.Sql. The arguments for the commands in the Az module and in the
AzureRm modules are substantially identical.
NOTE
Users that are not based on an Azure AD account (including the Azure SQL server administrator account), cannot create
Azure AD-based users, because they do not have permission to validate proposed database users with the Azure AD.
Your managed instance needs permissions to read Azure AD to successfully accomplish tasks such as
authentication of users through security group membership or creation of new users. For this to work, you need
to grant permissions to managed instance to read Azure AD. There are two ways to do it: from Portal and
PowerShell. The following steps both methods.
1. In the Azure portal, in the upper-right corner, select your connection to drop down a list of possible Active
Directories.
2. Choose the correct Active Directory as the default Azure AD.
This step links the subscription associated with Active Directory with managed instance making sure that
the same subscription is used for both Azure AD and the managed instance.
3. Navigate to managed instance and select one that you want to use for Azure AD integration.
4. Select the banner on top of the Active Directory admin page and grant permission to the current user. If
you are logged in as Global/Company administrator in Azure AD, you can do it from the Azure portal or
using PowerShell with the script below.
# Gives Azure Active Directory read permission to a Service Principal representing the managed
instance.
# Can be executed only by a "Company Administrator", "Global Administrator", or "Privileged Role
Administrator" type of user.
5. After the operation is successfully completed, the following notification will show up in the top-right
corner:
6. Now you can choose your Azure AD admin for your managed instance. For that, on the Active Directory
admin page, select Set admin command.
7. In the AAD admin page, search for a user, select the user or group to be an administrator, and then select
Select.
The Active Directory admin page shows all members and groups of your Active Directory. Users or groups
that are grayed out can't be selected because they aren't supported as Azure AD administrators. See the
list of supported admins in Azure AD Features and Limitations. Role-based access control (RBAC ) applies
only to the Azure portal and isn't propagated to SQL Server.
8. At the top of the Active Directory admin page, select Save.
The process of changing the administrator may take several minutes. Then the new administrator appears
in the Active Directory admin box.
After provisioning an Azure AD admin for your managed instance, you can begin to create Azure AD server
principals (logins) (public preview) with the CREATE LOGIN syntax. For more information, see managed
instance Overview.
TIP
To later remove an Admin, at the top of the Active Directory admin page, select Remove admin, and then select Save.
The following command provisions an Azure AD administrator group named DBAs for the managed instance
named ManagedInstance01. This server is associated with resource group ResourceGroup01.
The following command removes the Azure AD administrator for the managed instance named
ManagedInstanceName01 associated with the resource group ResourceGroup01.
COMMAND DESCRIPTION
az sql mi ad-admin create Provisions an Azure Active Directory administrator for SQL
managed instance. (Must be from the current subscription)
COMMAND DESCRIPTION
az sql mi ad-admin delete Removes an Azure Active Directory administrator for SQL
managed instance.
az sql mi ad-admin update Updates the Active Directory administrator for a SQL
managed instance.
The following two procedures show you how to provision an Azure Active Directory administrator for your Azure
SQL server in the Azure portal and by using PowerShell.
Azure portal
1. In the Azure portal, in the upper-right corner, select your connection to drop down a list of possible Active
Directories. Choose the correct Active Directory as the default Azure AD. This step links the subscription-
associated Active Directory with Azure SQL server making sure that the same subscription is used for both
Azure AD and SQL Server. (The Azure SQL server can be hosting either Azure SQL Database or Azure
5. In the Add admin page, search for a user, select the user or group to be an administrator, and then select
Select. (The Active Directory admin page shows all members and groups of your Active Directory. Users
or groups that are grayed out cannot be selected because they are not supported as Azure AD
administrators. (See the list of supported admins in the Azure AD Features and Limitations section of
Use Azure Active Directory Authentication for authentication with SQL Database or SQL Data
Warehouse.) Role-based access control (RBAC ) applies only to the portal and is not propagated to SQL
Server.
6. At the top of the Active Directory admin page, select SAVE.
The process of changing the administrator may take several minutes. Then the new administrator appears in the
Active Directory admin box.
NOTE
When setting up the Azure AD admin, the new admin name (user or group) cannot already be present in the virtual master
database as a SQL Server authentication user. If present, the Azure AD admin setup will fail; rolling back its creation and
indicating that such an admin (name) already exists. Since such a SQL Server authentication user is not part of the Azure
AD, any effort to connect to the server using Azure AD authentication fails.
To later remove an Admin, at the top of the Active Directory admin page, select Remove admin, and then
select Save.
PowerShell for Azure SQL Database and Azure SQL Data Warehouse
To run PowerShell cmdlets, you need to have Azure PowerShell installed and running. For detailed information,
see How to install and configure Azure PowerShell. To provision an Azure AD admin, execute the following Azure
PowerShell commands:
Connect-AzAccount
Select-AzSubscription
Cmdlets used to provision and manage Azure AD admin for Azure SQL Database and Azure SQL Data
Warehouse:
Use PowerShell command get-help to see more information for each of these commands. For example,
get-help Set-AzSqlServerActiveDirectoryAdministrator .
PowerShell examples for Azure SQL Database and Azure SQL Data Warehouse
The following script provisions an Azure AD administrator group named DBA_Group (object ID
40b79501-b343-44ed-9ce7-da4c8cc7353f ) for the demo_server server in a resource group named Group-23:
The DisplayName input parameter accepts either the Azure AD display name or the User Principal Name. For
example, DisplayName="John Smith" and DisplayName="[email protected]" . For Azure AD groups only the Azure
AD display name is supported.
NOTE
The Azure PowerShell command Set-AzSqlServerActiveDirectoryAdministrator does not prevent you from
provisioning Azure AD admins for unsupported users. An unsupported user can be provisioned, but can not connect to a
database.
The following example uses the optional ObjectID:
NOTE
The Azure AD ObjectID is required when the DisplayName is not unique. To retrieve the ObjectID and DisplayName
values, use the Active Directory section of Azure Classic Portal, and view the properties of a user or group.
The following example returns information about the current Azure AD admin for Azure SQL server:
NOTE
You can also provision an Azure Active Directory Administrator by using the REST APIs. For more information, see Service
Management REST API Reference and Operations for Azure SQL Database Operations for Azure SQL Database
CLI for Azure SQL Database and Azure SQL Data Warehouse
You can also provision an Azure AD admin by calling the following CLI commands:
COMMAND DESCRIPTION
az sql server ad-admin create Provisions an Azure Active Directory administrator for Azure
SQL server or Azure SQL Data Warehouse. (Must be from the
current subscription)
az sql server ad-admin delete Removes an Azure Active Directory administrator for Azure
SQL server or Azure SQL Data Warehouse.
az sql server ad-admin list Returns information about an Azure Active Directory
administrator currently configured for the Azure SQL server
or Azure SQL Data Warehouse.
az sql server ad-admin update Updates the Active Directory administrator for an Azure SQL
server or Azure SQL Data Warehouse.
Azure Active Directory authentication requires database users to be created as contained database users. A
contained database user based on an Azure AD identity, is a database user that does not have a login in the
master database, and which maps to an identity in the Azure AD directory that is associated with the database.
The Azure AD identity can be either an individual user account or a group. For more information about contained
database users, see Contained Database Users- Making Your Database Portable.
NOTE
Database users (with the exception of administrators) cannot be created using the Azure portal. RBAC roles are not
propagated to SQL Server, SQL Database, or SQL Data Warehouse. Azure RBAC roles are used for managing Azure
Resources, and do not apply to database permissions. For example, the SQL Server Contributor role does not grant
access to connect to the SQL Database or SQL Data Warehouse. The access permission must be granted directly in the
database using Transact-SQL statements.
WARNING
Special characters like colon : or ampersand & when included as user names in the T-SQL CREATE LOGIN and CREATE
USER statements are not supported.
To create an Azure AD -based contained database user (other than the server administrator that owns the
database), connect to the database with an Azure AD identity, as a user with at least the ALTER ANY USER
permission. Then use the following Transact-SQL syntax:
Azure_AD_principal_name can be the user principal name of an Azure AD user or the display name for an Azure
AD group.
Examples: To create a contained database user representing an Azure AD federated or managed domain user:
CREATE USER [[email protected]] FROM EXTERNAL PROVIDER;
CREATE USER [[email protected]] FROM EXTERNAL PROVIDER;
To create a contained database user representing an Azure AD or federated domain group, provide the display
name of a security group:
To create a contained database user representing an application that connects using an Azure AD token:
NOTE
This command requires that SQL access Azure AD (the "external provider") on behalf of the logged-in user. Sometimes,
circumstances will arise that cause Azure AD to return an exception back to SQL. In these cases, the user will see SQL error
33134, which should contain the AAD-specific error message. Most of the time, the error will say that access is denied, or
that the user must enroll in MFA to access the resource, or that access between first-party applications must be handled via
preauthorization. In the first two cases, the issue is usually caused by Conditional Access policies that are set in the user's
AAD tenant: they prevent the user from accessing the external provider. Updating the CA policies to allow access to the
application '00000002-0000-0000-c000-000000000000' (the application ID of the AAD Graph API) should resolve the
issue. In the case that the error says access between first-party applications must be handled via preauthorization, the issue
is because the user is signed in as a service principal. The command should succeed if it is executed by a user instead.
TIP
You cannot directly create a user from an Azure Active Directory other than the Azure Active Directory that is associated
with your Azure subscription. However, members of other Active Directories that are imported users in the associated
Active Directory (known as external users) can be added to an Active Directory group in the tenant Active Directory. By
creating a contained database user for that AD group, the users from the external Active Directory can gain access to SQL
Database.
For more information about creating contained database users based on Azure Active Directory identities, see
CREATE USER (Transact-SQL ).
NOTE
Removing the Azure Active Directory administrator for Azure SQL server prevents any Azure AD authentication user from
connecting to the server. If necessary, unusable Azure AD users can be dropped manually by a SQL Database administrator.
NOTE
If you receive a Connection Timeout Expired, you may need to set the TransparentNetworkIPResolution parameter of
the connection string to false. For more information, see Connection timeout issue with .NET Framework 4.6.1 -
TransparentNetworkIPResolution.
When you create a database user, that user receives the CONNECT permission and can connect to that database
as a member of the PUBLIC role. Initially the only permissions available to the user are any permissions granted
to the PUBLIC role, or any permissions granted to any Azure AD groups that they are a member of. Once you
provision an Azure AD -based contained database user, you can grant the user additional permissions, the same
way as you grant permission to any other type of user. Typically grant permissions to database roles, and add
users to roles. For more information, see Database Engine Permission Basics. For more information about special
SQL Database roles, see Managing Databases and Logins in Azure SQL Database. A federated domain user
account that is imported into a managed domain as an external user, must use the managed domain identity.
NOTE
Azure AD users are marked in the database metadata with type E (EXTERNAL_USER) and for groups with type X
(EXTERNAL_GROUPS). For more information, see sys.database_principals.
IMPORTANT
Support for Azure Active Directory authentication is available with SQL Server 2016 Management Studio and SQL Server
Data Tools in Visual Studio 2015. The August 2016 release of SSMS also includes support for Active Directory Universal
Authentication, which allows administrators to require Multi-Factor Authentication using a phone call, text message, smart
cards with pin, or mobile app notification.
4. Select the Options button, and on the Connection Properties page, in the Connect to database box,
type the name of the user database you want to connect to. (See the graphic in the previous option.)
string ConnectionString =
@"Data Source=n9lxnyuzhv.database.windows.net; Authentication=Active Directory Integrated; Initial
Catalog=testdb;";
SqlConnection conn = new SqlConnection(ConnectionString);
conn.Open();
The connection string keyword Integrated Security=True is not supported for connecting to Azure SQL
Database. When making an ODBC connection, you will need to remove spaces and set Authentication to
'ActiveDirectoryIntegrated'.
Active Directory password authentication
To connect to a database using integrated authentication and an Azure AD identity, the Authentication keyword
must be set to Active Directory Password. The connection string must contain User ID/UID and Password/PWD
keywords and values. The following C# code sample uses ADO .NET.
string ConnectionString =
@"Data Source=n9lxnyuzhv.database.windows.net; Authentication=Active Directory Password; Initial
Catalog=testdb; [email protected]; PWD=MyPassWord!";
SqlConnection conn = new SqlConnection(ConnectionString);
conn.Open();
Learn more about Azure AD authentication methods using the demo code samples available at Azure AD
Authentication GitHub Demo.
Azure AD token
This authentication method allows middle-tier services to connect to Azure SQL Database or Azure SQL Data
Warehouse by obtaining a token from Azure Active Directory (AAD ). It enables sophisticated scenarios including
certificate-based authentication. You must complete four basic steps to use Azure AD token authentication:
1. Register your application with Azure Active Directory and get the client ID for your code.
2. Create a database user representing the application. (Completed earlier in step 6.)
3. Create a certificate on the client computer runs the application.
4. Add the certificate as a key for your application.
Sample connection string:
For more information, see SQL Server Security Blog. For information about adding a certificate, see Get started
with certificate-based authentication in Azure Active Directory.
sqlcmd
The following statements, connect using version 13.1 of sqlcmd, which is available from the Download Center.
NOTE
sqlcmd with the -G command does not work with system identities, and requires a user principal login.
sqlcmd -S Target_DB_or_DW.testsrv.database.windows.net -G
sqlcmd -S Target_DB_or_DW.testsrv.database.windows.net -U [email protected] -P MyAADPassword -G -l 30
Next steps
For an overview of access and control in SQL Database, see SQL Database access and control.
For an overview of logins, users, and database roles in SQL Database, see Logins, users, and database roles.
For more information about database principals, see Principals.
For more information about database roles, see Database roles.
For more information about firewall rules in SQL Database, see SQL Database firewall rules.
Conditional Access (MFA) with Azure SQL Database
and Data Warehouse
7/26/2019 • 2 minutes to read • Edit Online
Azure SQL Database, Managed Instance, and SQL Data Warehouse support Microsoft Conditional Access.
NOTE
This topic applies to Azure SQL server, and to both SQL Database and SQL Data Warehouse databases that are created on
the Azure SQL server. For simplicity, SQL Database is used when referring to both SQL Database and SQL Data Warehouse.
The following steps show how to configure SQL Database to enforce a Conditional Access policy.
Prerequisites
You must configure your SQL Database or SQL Data Warehouse to support Azure Active Directory
authentication. For specific steps, see Configure and manage Azure Active Directory authentication with SQL
Database or SQL Data Warehouse.
When multi-factor authentication is enabled, you must connect with at supported tool, such as the latest SSMS.
For more information, see Configure Azure SQL Database multi-factor authentication for SQL Server
Management Studio.
Next steps
For a tutorial, see Secure your Azure SQL Database.
PowerShell: Create a Virtual Service endpoint and
VNet rule for SQL
8/6/2019 • 12 minutes to read • Edit Online
Virtual network rules are one firewall security feature that controls whether the database server for your single
databases and elastic pool in Azure SQL Database or for your databases in SQL Data Warehouse accepts
communications that are sent from particular subnets in virtual networks.
IMPORTANT
This article applies to Azure SQL server, and to both SQL Database and SQL Data Warehouse databases that are created on
the Azure SQL server. For simplicity, SQL Database is used when referring to both SQL Database and SQL Data Warehouse.
This article does not apply to a managed instance deployment in Azure SQL Database because it does not have a service
endpoint associated with it.
This article provides and explains a PowerShell script that takes the following actions:
1. Creates a Microsoft Azure Virtual Service endpoint on your subnet.
2. Adds the endpoint to the firewall of your Azure SQL Database server, to create a virtual network rule.
Your motivations for creating a rule are explained in: Virtual Service endpoints for Azure SQL Database.
TIP
If all you need is to assess or add the Virtual Service endpoint type name for SQL Database to your subnet, you can skip
ahead to our more direct PowerShell script.
NOTE
This article has been updated to use the new Azure PowerShell Az module. You can still use the AzureRM module, which will
continue to receive bug fixes until at least December 2020. To learn more about the new Az module and AzureRM
compatibility, see Introducing the new Azure PowerShell Az module. For Az module installation instructions, see Install Azure
PowerShell.
IMPORTANT
The PowerShell Azure Resource Manager module is still supported by Azure SQL Database, but all future development is for
the Az.Sql module. For these cmdlets, see AzureRM.Sql. The arguments for the commands in the Az module and in the
AzureRm modules are substantially identical.
Major cmdlets
This article emphasizes the New-AzSqlServerVirtualNetworkRule cmdlet that adds the subnet endpoint to the
access control list (ACL ) of your Azure SQL Database server, thereby creating a rule.
The following list shows the sequence of other major cmdlets that you must run to prepare for your call to New-
AzSqlServerVirtualNetworkRule. In this article, these calls occur in script 3 "Virtual network rule":
1. New -AzVirtualNetworkSubnetConfig: Creates a subnet object.
2. New -AzVirtualNetwork: Creates your virtual network, giving it the subnet.
3. Set-AzVirtualNetworkSubnetConfig: Assigns a Virtual Service endpoint to your subnet.
4. Set-AzVirtualNetwork: Persists updates made to your virtual network.
5. New -AzSqlServerVirtualNetworkRule: After your subnet is an endpoint, adds your subnet as a virtual network
rule, into the ACL of your Azure SQL Database server.
This cmdlet Offers the parameter -IgnoreMissingVNetServiceEndpoint, starting in Azure RM
PowerShell Module version 5.1.1.
NOTE
Please ensure that service endpoints are turned on for the VNet/Subnet that you want to add to your Server otherwise
creation of the VNet Firewall Rule will fail.
IMPORTANT
Before you run this script, you can edit the values, if you like. For example, if you already have a resource group, you might
want to edit your resource group name as the assigned value.
Your subscription name should be edited into the script.
$yesno = Read-Host 'Do you need to log into Azure (only one time per powershell.exe session)? [yes/no]';
if ('yes' -eq $yesno) { Connect-AzAccount; }
###########################################################
## Assignments to variables used by the later scripts. ##
###########################################################
$SubscriptionName = 'yourSubscriptionName';
Select-AzSubscription -SubscriptionName $SubscriptionName;
$ResourceGroupName = 'RG-YourNameHere';
$Region = 'westcentralus';
$VNetName = 'myVNet';
$SubnetName = 'mySubnet';
$VNetAddressPrefix = '10.1.0.0/16';
$SubnetAddressPrefix = '10.1.1.0/24';
$VNetRuleName = 'myFirstVNetRule-ForAcl';
$SqlDbServerName = 'mysqldbserver-forvnet';
$SqlDbAdminLoginName = 'ServerAdmin';
$SqlDbAdminLoginPassword = 'ChangeYourAdminPassword1';
Script 2: Prerequisites
This script prepares for the next script, where the endpoint action is. This script creates for you the following listed
items, but only if they do not already exist. You can skip script 2 if you are sure these items already exist:
Azure resource group
Azure SQL Database server
PowerShell script 2 source code
$gottenResourceGroup = $null;
$gottenResourceGroup = Get-AzResourceGroup `
-Name $ResourceGroupName `
-ErrorAction SilentlyContinue;
$gottenResourceGroup = New-AzResourceGroup `
-Name $ResourceGroupName `
-Location $Region;
$gottenResourceGroup;
}
}
else { Write-Host "Good, your Resource Group already exists - $ResourceGroupName."; }
$gottenResourceGroup = $null;
###########################################################
## Ensure your Azure SQL Database server already exists. ##
###########################################################
Write-Host "Check whether your Azure SQL Database server already exists.";
$sqlDbServer = $null;
$sqlDbServer = Get-AzSqlServer `
-ResourceGroupName $ResourceGroupName `
-ServerName $SqlDbServerName `
-ErrorAction SilentlyContinue;
Write-Host "Gather the credentials necessary to next create an Azure SQL Database server.";
$sqlAdministratorCredentials = New-Object `
-TypeName System.Management.Automation.PSCredential `
-ArgumentList `
$SqlDbAdminLoginName, `
$(ConvertTo-SecureString `
-String $SqlDbAdminLoginPassword `
-AsPlainText `
-Force `
);
$sqlDbServer = New-AzSqlServer `
-ResourceGroupName $ResourceGroupName `
-ServerName $SqlDbServerName `
-Location $Region `
-SqlAdministratorCredentials $sqlAdministratorCredentials;
$sqlDbServer;
}
else { Write-Host "Good, your Azure SQL Database server already exists - $SqlDbServerName."; }
$sqlAdministratorCredentials = $null;
$sqlDbServer = $null;
$subnet = New-AzVirtualNetworkSubnetConfig `
-Name $SubnetName `
-AddressPrefix $SubnetAddressPrefix `
-ServiceEndpoint $ServiceEndpointTypeName_SqlDb;
$vnet = New-AzVirtualNetwork `
-Name $VNetName `
-AddressPrefix $VNetAddressPrefix `
-Subnet $subnet `
-ResourceGroupName $ResourceGroupName `
-Location $Region;
###########################################################
## Create a Virtual Service endpoint on the subnet. ##
###########################################################
$vnet = Set-AzVirtualNetworkSubnetConfig `
-Name $SubnetName `
-AddressPrefix $SubnetAddressPrefix `
-VirtualNetwork $vnet `
-ServiceEndpoint $ServiceEndpointTypeName_SqlDb;
Write-Host "Persist the updates made to the virtual network > subnet.";
$vnet = Set-AzVirtualNetwork `
-VirtualNetwork $vnet;
###########################################################
## Add the Virtual Service endpoint Id as a rule, ##
## into SQL Database ACLs. ##
###########################################################
$vnet = Get-AzVirtualNetwork `
-ResourceGroupName $ResourceGroupName `
-Name $VNetName;
$subnet = Get-AzVirtualNetworkSubnetConfig `
-Name $SubnetName `
-VirtualNetwork $vnet;
Write-Host "Add the subnet .Id as a rule, into the ACLs for your Azure SQL Database server.";
$vnetRuleObject1 = New-AzSqlServerVirtualNetworkRule `
-ResourceGroupName $ResourceGroupName `
-ServerName $SqlDbServerName `
-VirtualNetworkRuleName $VNetRuleName `
-VirtualNetworkSubnetId $subnet.Id;
$vnetRuleObject1;
$vnetRuleObject2 = Get-AzSqlServerVirtualNetworkRule `
-ResourceGroupName $ResourceGroupName `
-ServerName $SqlDbServerName `
-VirtualNetworkRuleName $VNetRuleName;
$vnetRuleObject2;
Script 4: Clean-up
This final script deletes the resources that the previous scripts created for the demonstration. However, the script
asks for confirmation before it deletes the following:
Azure SQL Database server
Azure Resource Group
You can run script 4 any time after script 1 completes.
PowerShell script 4 source code
######### Script 4 ########################################
## Clean-up phase A: Unconditional deletes. ##
## ##
## 1. The test rule is deleted from SQL DB ACL. ##
## 2. The test endpoint is deleted from the subnet. ##
## 3. The test virtual network is deleted. ##
###########################################################
Remove-AzSqlServerVirtualNetworkRule `
-ResourceGroupName $ResourceGroupName `
-ServerName $SqlDbServerName `
-VirtualNetworkRuleName $VNetRuleName `
-ErrorAction SilentlyContinue;
$vnet = Get-AzVirtualNetwork `
-ResourceGroupName $ResourceGroupName `
-Name $VNetName;
Remove-AzVirtualNetworkSubnetConfig `
-Name $SubnetName `
-VirtualNetwork $vnet;
Write-Host "Delete the virtual network (thus also deletes the subnet).";
Remove-AzVirtualNetwork `
-Name $VNetName `
-ResourceGroupName $ResourceGroupName `
-ErrorAction SilentlyContinue;
###########################################################
## Clean-up phase B: Conditional deletes. ##
## ##
## These might have already existed, so user might ##
## want to keep. ##
## ##
## 1. Azure SQL Database server ##
## 2. Azure resource group ##
###########################################################
$yesno = Read-Host 'CAUTION !: Do you want to DELETE your Azure SQL Database server AND your Resource Group?
[yes/no]';
if ('yes' -eq $yesno)
{
Write-Host "Remove the Azure SQL DB server.";
Remove-AzSqlServer `
-ServerName $SqlDbServerName `
-ResourceGroupName $ResourceGroupName `
-ErrorAction SilentlyContinue;
Remove-AzResourceGroup `
-Name $ResourceGroupName `
-ErrorAction SilentlyContinue;
}
else
{
Write-Host "Skipped over the DELETE of SQL Database and resource group.";
}
[C:\WINDOWS\system32\]
0 >> C:\Demo\PowerShell\sql-database-vnet-service-endpoint-powershell-s1-variables.ps1
Do you need to log into Azure (only one time per powershell.exe session)? [yes/no]: yes
Environment : AzureCloud
Account : [email protected]
TenantId : 11111111-1111-1111-1111-111111111111
SubscriptionId : 22222222-2222-2222-2222-222222222222
SubscriptionName : MySubscriptionName
CurrentStorageAccount :
[C:\WINDOWS\system32\]
0 >> C:\Demo\PowerShell\sql-database-vnet-service-endpoint-powershell-s2-prerequisites.ps1
Check whether your Resource Group already exists.
Creating your missing Resource Group - RG-YourNameHere.
ResourceGroupName : RG-YourNameHere
Location : westcentralus
ProvisioningState : Succeeded
Tags :
ResourceId : /subscriptions/22222222-2222-2222-2222-222222222222/resourceGroups/RG-YourNameHere
ResourceGroupName : RG-YourNameHere
ServerName : mysqldbserver-forvnet
Location : westcentralus
SqlAdministratorLogin : ServerAdmin
SqlAdministratorPassword :
ServerVersion : 12.0
Tags :
Identity :
[C:\WINDOWS\system32\]
0 >> C:\Demo\PowerShell\sql-database-vnet-service-endpoint-powershell-s3-vnet-rule.ps1
Define a subnet 'mySubnet', to be given soon to a virtual network.
Create a virtual network 'myVNet'. Give the subnet to the virtual network that we created.
WARNING: The output object type of this cmdlet will be modified in a future release.
Assign a Virtual Service endpoint 'Microsoft.Sql' to the subnet.
Persist the updates made to the virtual network > subnet.
[C:\WINDOWS\system32\]
0 >> C:\Demo\PowerShell\sql-database-vnet-service-endpoint-powershell-s4-clean-up.ps1
Delete the rule from the SQL DB ACL.
ResourceGroupName : RG-YourNameHere
ServerName : mysqldbserver-forvnet
Location : westcentralus
SqlAdministratorLogin : ServerAdmin
SqlAdministratorPassword :
ServerVersion : 12.0
Tags :
Identity :
IMPORTANT
Before you run this script, you must edit the values assigned to the $-variables, near the top of the script.
### 1. LOG into to your Azure account, needed only once per PS session. Assign variables.
$yesno = Read-Host 'Do you need to log into Azure (only one time per powershell.exe session)? [yes/no]';
if ('yes' -eq $yesno) { Connect-AzAccount; }
$SubscriptionName = 'yourSubscriptionName';
Select-AzSubscription -SubscriptionName "$SubscriptionName";
$ResourceGroupName = 'yourRGName';
$VNetName = 'yourVNetName';
$SubnetName = 'yourSubnetName';
$SubnetAddressPrefix = 'Obtain this value from the Azure portal.'; # Looks roughly like: '10.0.0.0/24'
### 2. Search for your virtual network, and then for your subnet.
$subnet = $null;
for ($nn=0; $nn -lt $vnet.Subnets.Count; $nn++)
{
$subnet = $vnet.Subnets[$nn];
if ($subnet.Name -eq $SubnetName)
{ break; }
$subnet = $null;
}
$endpointMsSql = $null;
for ($nn=0; $nn -lt $subnet.ServiceEndpoints.Count; $nn++)
{
$endpointMsSql = $subnet.ServiceEndpoints[$nn];
if ($endpointMsSql.Service -eq $ServiceEndpointTypeName_SqlDb)
{
$endpointMsSql;
break;
}
$endpointMsSql = $null;
}
### 4. Add a Virtual Service endpoint of type name 'Microsoft.Sql', on your subnet.
$vnet = Set-AzVirtualNetworkSubnetConfig `
-Name $SubnetName `
-AddressPrefix $SubnetAddressPrefix `
-VirtualNetwork $vnet `
-ServiceEndpoint $ServiceEndpointTypeName_SqlDb;
Actual output
The following block displays our actual feedback (with cosmetic edits).
<# Our output example (with cosmetic edits), when the subnet was already tagged:
Do you need to log into Azure (only one time per powershell.exe session)? [yes/no]: no
Environment : AzureCloud
Account : [email protected]
TenantId : 11111111-1111-1111-1111-111111111111
SubscriptionId : 22222222-2222-2222-2222-222222222222
SubscriptionName : MySubscriptionName
CurrentStorageAccount :
ProvisioningState : Succeeded
Service : Microsoft.Sql
Locations : {westcentralus}
Required Permissions
To enable Transparent Data Encryption (TDE ), you must be an administrator or a member of the dbmanager role.
Enabling Encryption
To enable TDE for a SQL Data Warehouse, follow the steps below:
1. Open the database in the Azure portal
2. In the database blade, click the Settings button
3. Select the Transparent data encryption option
5. Select Save
Disabling Encryption
To disable TDE for a SQL Data Warehouse, follow the steps below:
1. Open the database in the Azure portal
2. In the database blade, click the Settings button
3. Select the Transparent data encryption option
5. Select Save
Encryption DMVs
Encryption can be confirmed with the following DMVs:
sys.databases
sys.dm_pdw_nodes_database_encryption_keys
Get started with Transparent Data Encryption (TDE)
5/6/2019 • 2 minutes to read • Edit Online
Required Permissions
To enable Transparent Data Encryption (TDE ), you must be an administrator or a member of the dbmanager role.
Enabling Encryption
Follow these steps to enable TDE for a SQL Data Warehouse:
1. Connect to the master database on the server hosting the database using a login that is an administrator or a
member of the dbmanager role in the master database
2. Execute the following statement to encrypt the database.
Disabling Encryption
Follow these steps to disable TDE for a SQL Data Warehouse:
1. Connect to the master database using a login that is an administrator or a member of the dbmanager role in
the master database
2. Execute the following statement to encrypt the database.
NOTE
A paused SQL Data Warehouse must be resumed before making changes to the TDE settings.
Verifying Encryption
To verify encryption status for a SQL Data Warehouse, follow the steps below:
1. Connect to the master or instance database using a login that is an administrator or a member of the
dbmanager role in the master database
2. Execute the following statement to encrypt the database.
SELECT
[name],
[is_encrypted]
FROM
sys.databases;
Encryption DMVs
sys.databases
sys.dm_pdw_nodes_database_encryption_keys
Tutorial: Load New York Taxicab data to Azure SQL
Data Warehouse
8/18/2019 • 17 minutes to read • Edit Online
This tutorial uses PolyBase to load New York Taxicab data from a public Azure blob to Azure SQL Data
Warehouse. The tutorial uses the Azure portal and SQL Server Management Studio (SSMS ) to:
Create a data warehouse in the Azure portal
Set up a server-level firewall rule in the Azure portal
Connect to the data warehouse with SSMS
Create a user designated for loading data
Create external tables for data in Azure blob storage
Use the CTAS T-SQL statement to load data into your data warehouse
View the progress of data as it is loading
Create statistics on the newly loaded data
If you don't have an Azure subscription, create a free account before you begin.
Server name Any globally unique name For valid server names, see Naming
rules and restrictions.
Server admin login Any valid name For valid login names, see Database
Identifiers.
5. Click Select.
6. Click Performance level to specify whether the data warehouse is Gen1 or Gen2, and the number of
data warehouse units.
7. For this tutorial, select Gen2 of SQL Data Warehouse. The slider is set to DW1000c by default. Try
moving it up and down to see how it works.
8. Click Apply.
9. In the SQL Data Warehouse page, select a collation for the blank database. For this tutorial, use the
default value. For more information about collations, see Collations
10. Now that you have completed the SQL Database form, click Create to provision the database.
Provisioning takes a few minutes.
11. On the toolbar, click Notifications to monitor the deployment process.
NOTE
SQL Data Warehouse communicates over port 1433. If you are trying to connect from within a corporate network,
outbound traffic over port 1433 might not be allowed by your network's firewall. If so, you cannot connect to your Azure
SQL Database server unless your IT department opens port 1433.
1. After the deployment completes, click SQL databases from the left-hand menu and then click
mySampleDatabase on the SQL databases page. The overview page for your database opens,
showing you the fully qualified server name (such as mynewserver-20180430.database.windows.net)
and provides options for further configuration.
2. Copy this fully qualified server name for use to connect to your server and its databases in subsequent
quick starts. Then click on the server name to open server settings.
4. Click Show firewall settings. The Firewall settings page for the SQL Database server opens.
5. Click Add client IP on the toolbar to add your current IP address to a new firewall rule. A firewall rule
can open port 1433 for a single IP address or a range of IP addresses.
6. Click Save. A server-level firewall rule is created for your current IP address opening port 1433 on the
logical server.
7. Click OK and then close the Firewall settings page.
You can now connect to the SQL server and its data warehouses using this IP address. The connection works
from SQL Server Management Studio or another tool of your choice. When you connect, use the ServerAdmin
account you created previously.
IMPORTANT
By default, access through the SQL Database firewall is enabled for all Azure services. Click OFF on this page and then click
Save to disable the firewall for all Azure services.
Server name The fully qualified server name The name should be something like
this: mynewserver-
20180430.database.windows.net.
Login The server admin account This is the account that you
specified when you created the
server.
Password The password for your server admin This is the password that you
account specified when you created the
server.
3. Click Connect. The Object Explorer window opens in SSMS.
4. In Object Explorer, expand Databases. Then expand System databases and master to view the objects
in the master database. Expand mySampleDatabase to view the objects in your new database.
3. Click Execute.
4. Right-click mySampleDataWarehouse, and choose New Query. A new query Window opens.
5. Enter the following T-SQL commands to create a database user named LoaderRC20 for the LoaderRC20
login. The second line grants the new user CONTROL permissions on the new data warehouse. These
permissions are similar to making the user the owner of the database. The third line adds the new user as
a member of the staticrc20 resource class.
6. Click Execute.
2. Enter the fully qualified server name, and enter LoaderRC20 as the Login. Enter your password for
LoaderRC20.
3. Click Connect.
4. When your connection is ready, you will see two server connections in Object Explorer. One connection
as ServerAdmin and one connection as MedRCLogin.
Create external tables for the sample data
You are ready to begin the process of loading data into your new data warehouse. This tutorial shows you how
to use external tables to load New York City taxi cab data from an Azure storage blob. For future reference, to
learn how to get your data to Azure blob storage or to load it directly from your source into SQL Data
Warehouse, see the loading overview.
Run the following SQL scripts specify information about the data you wish to load. This information includes
where the data is located, the format of the contents of the data, and the table definition for the data.
1. In the previous section, you logged into your data warehouse as LoaderRC20. In SSMS, right-click your
LoaderRC20 connection and select New Query. A new query window appears.
2. Compare your query window to the previous image. Verify your new query window is running as
LoaderRC20 and performing queries on your MySampleDataWarehouse database. Use this query
window to perform all of the loading steps.
3. Create a master key for the MySampleDataWarehouse database. You only need to create a master key
once per database.
4. Run the following CREATE EXTERNAL DATA SOURCE statement to define the location of the Azure
blob. This is the location of the external taxi cab data. To run a command that you have appended to the
query window, highlight the commands you wish to run and click Execute.
5. Run the following CREATE EXTERNAL FILE FORMAT T-SQL statement to specify formatting
characteristics and options for the external data file. This statement specifies the external data is stored as
text and the values are separated by the pipe ('|') character. The external file is compressed with Gzip.
CREATE EXTERNAL FILE FORMAT uncompressedcsv
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = ',',
STRING_DELIMITER = '',
DATE_FORMAT = '',
USE_TYPE_DEFAULT = False
)
);
CREATE EXTERNAL FILE FORMAT compressedcsv
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS ( FIELD_TERMINATOR = '|',
STRING_DELIMITER = '',
DATE_FORMAT = '',
USE_TYPE_DEFAULT = False
),
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.GzipCodec'
);
6. Run the following CREATE SCHEMA statement to create a schema for your external file format. The
schema provides a way to organize the external tables you are about to create.
7. Create the external tables. The table definitions are stored in SQL Data Warehouse, but the tables
reference data that is stored in Azure blob storage. Run the following T-SQL commands to create several
external tables that all point to the Azure blob we defined previously in our external data source.
8. In Object Explorer, expand mySampleDataWarehouse to see the list of external tables you just created.
Load the data into your data warehouse
This section uses the external tables you just defined to load the sample data from Azure Storage Blob to SQL
Data Warehouse.
NOTE
This tutorial loads the data directly into the final table. In a production environment, you will usually use CREATE TABLE AS
SELECT to load into a staging table. While data is in the staging table you can perform any necessary transformations. To
append the data in the staging table to a production table, you can use the INSERT...SELECT statement. For more
information, see Inserting data into a production table.
The script uses the CREATE TABLE AS SELECT (CTAS ) T-SQL statement to load the data from Azure Storage
Blob into new tables in your data warehouse. CTAS creates a new table based on the results of a select
statement. The new table has the same columns and data types as the results of the select statement. When the
select statement selects from an external table, SQL Data Warehouse imports the data into a relational table in
the data warehouse.
1. Run the following script to load the data into new tables in your data warehouse.
CREATE TABLE [dbo].[Date]
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
)
AS SELECT * FROM [ext].[Date]
OPTION (LABEL = 'CTAS : Load [dbo].[Date]')
;
CREATE TABLE [dbo].[Geography]
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
)
AS
SELECT * FROM [ext].[Geography]
OPTION (LABEL = 'CTAS : Load [dbo].[Geography]')
;
CREATE TABLE [dbo].[HackneyLicense]
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
)
AS SELECT * FROM [ext].[HackneyLicense]
OPTION (LABEL = 'CTAS : Load [dbo].[HackneyLicense]')
;
CREATE TABLE [dbo].[Medallion]
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
)
AS SELECT * FROM [ext].[Medallion]
OPTION (LABEL = 'CTAS : Load [dbo].[Medallion]')
;
CREATE TABLE [dbo].[Time]
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
)
AS SELECT * FROM [ext].[Time]
OPTION (LABEL = 'CTAS : Load [dbo].[Time]')
;
CREATE TABLE [dbo].[Weather]
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
)
AS SELECT * FROM [ext].[Weather]
OPTION (LABEL = 'CTAS : Load [dbo].[Weather]')
;
CREATE TABLE [dbo].[Trip]
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
)
AS SELECT * FROM [ext].[Trip]
OPTION (LABEL = 'CTAS : Load [dbo].[Trip]')
;
2. View your data as it loads. You’re loading several GBs of data and compressing it into highly performant
clustered columnstore indexes. Run the following query that uses a dynamic management views (DMVs)
to show the status of the load. After starting the query, grab a coffee and a snack while SQL Data
Warehouse does some heavy lifting.
SELECT
r.command,
s.request_id,
r.status,
count(distinct input_name) as nbr_files,
sum(s.bytes_processed)/1024/1024/1024.0 as gb_processed
FROM
sys.dm_pdw_exec_requests r
INNER JOIN sys.dm_pdw_dms_external_work s
ON r.request_id = s.request_id
WHERE
r.[label] = 'CTAS : Load [dbo].[Date]' OR
r.[label] = 'CTAS : Load [dbo].[Geography]' OR
r.[label] = 'CTAS : Load [dbo].[HackneyLicense]' OR
r.[label] = 'CTAS : Load [dbo].[Medallion]' OR
r.[label] = 'CTAS : Load [dbo].[Time]' OR
r.[label] = 'CTAS : Load [dbo].[Weather]' OR
r.[label] = 'CTAS : Load [dbo].[Trip]'
GROUP BY
r.command,
s.request_id,
r.status
ORDER BY
nbr_files desc,
gb_processed desc;
4. Enjoy seeing your data nicely loaded into your data warehouse.
Authenticate using managed identities to load (optional)
Loading using PolyBase and authenticating through managed identities is the most secure mechanism and
enables you to leverage VNet Service Endpoints with Azure storage.
Prerequisites
1. Install Azure PowerShell using this guide.
2. If you have a general-purpose v1 or blob storage account, you must first upgrade to general-purpose v2
using this guide.
3. You must have Allow trusted Microsoft services to access this storage account turned on under Azure
Storage account Firewalls and Virtual networks settings menu. Refer to this guide for more information.
Steps
1. In PowerShell, register your SQL Database server with Azure Active Directory (AAD ):
Connect-AzAccount
Select-AzSubscription -SubscriptionId your-subscriptionId
Set-AzSqlServer -ResourceGroupName your-database-server-resourceGroup -ServerName your-database-
servername -AssignIdentity
NOTE
If you have a general-purpose v1 or blob storage account, you must first upgrade to v2 using this guide.
2. Under your storage account, navigate to Access Control (IAM ), and click Add role assignment. Assign
Storage Blob Data Contributor RBAC role to your SQL Database server.
NOTE
Only members with Owner privilege can perform this step. For various built-in roles for Azure resources, refer to
this guide.
CREATE DATABASE SCOPED CREDENTIAL msi_cred WITH IDENTITY = 'Managed Service Identity';
NOTE
There is no need to specify SECRET with Azure Storage access key because this mechanism uses
Managed Identity under the covers.
IDENTITY name should be 'Managed Service Identity' for PolyBase connectivity to work with Azure
Storage account.
b. Create the External Data Source specifying the Database Scoped Credential with the Managed
Service Identity.
c. Query as normal using external tables.
Refer to the following documentation if you'd like to set up virtual network service endpoints for SQL Data
Warehouse.
Clean up resources
You are being charged for compute resources and data that you loaded into your data warehouse. These are
billed separately.
If you want to keep the data in storage, you can pause compute when you aren't using the data warehouse.
By pausing compute you will only be charge for data storage and you can resume the compute whenever
you are ready to work with the data.
If you want to remove future charges, you can delete the data warehouse.
Follow these steps to clean up resources as you desire.
1. Log in to the Azure portal, click on your data warehouse.
2. To pause compute, click the Pause button. When the data warehouse is paused, you will see a Start
button. To resume compute, click Start.
3. To remove the data warehouse so you won't be charged for compute or storage, click Delete.
4. To remove the SQL server you created, click mynewserver-20180430.database.windows.net in the
previous image, and then click Delete. Be careful with this as deleting the server will delete all databases
assigned to the server.
5. To remove the resource group, click myResourceGroup, and then click Delete resource group.
Next steps
In this tutorial, you learned how to create a data warehouse and create a user for loading data. You created
external tables to define the structure for data stored in Azure Storage Blob, and then used the PolyBase
CREATE TABLE AS SELECT statement to load data into your data warehouse.
You did these things:
Created a data warehouse in the Azure portal
Set up a server-level firewall rule in the Azure portal
Connected to the data warehouse with SSMS
Created a user designated for loading data
Created external tables for data in Azure Storage Blob
Used the CTAS T-SQL statement to load data into your data warehouse
Viewed the progress of data as it is loading
Created statistics on the newly loaded data
Advance to the development overview to learn how to migrate an existing database to SQL Data Warehouse.
Design decisions to migrate an existing database to SQL Data Warehouse
Load Contoso Retail data to Azure SQL Data
Warehouse
7/5/2019 • 8 minutes to read • Edit Online
In this tutorial, you learn to use PolyBase and T-SQL commands to load two tables from the Contoso Retail data
into Azure SQL Data Warehouse.
In this tutorial you will:
1. Configure PolyBase to load from Azure blob storage
2. Load public data into your database
3. Perform optimizations after the load is finished.
IMPORTANT
If you choose to make your azure blob storage containers public, remember that as the data owner you will be charged for
data egress charges when data leaves the data center.
--DimProduct
CREATE EXTERNAL TABLE [asb].DimProduct (
[ProductKey] [int] NOT NULL,
[ProductLabel] [nvarchar](255) NULL,
[ProductName] [nvarchar](500) NULL,
[ProductDescription] [nvarchar](400) NULL,
[ProductSubcategoryKey] [int] NULL,
[Manufacturer] [nvarchar](50) NULL,
[BrandName] [nvarchar](50) NULL,
[ClassID] [nvarchar](10) NULL,
[ClassName] [nvarchar](20) NULL,
[StyleID] [nvarchar](10) NULL,
[StyleName] [nvarchar](20) NULL,
[ColorID] [nvarchar](10) NULL,
[ColorName] [nvarchar](20) NOT NULL,
[Size] [nvarchar](50) NULL,
[SizeRange] [nvarchar](50) NULL,
[SizeUnitMeasureID] [nvarchar](20) NULL,
[Weight] [float] NULL,
[WeightUnitMeasureID] [nvarchar](20) NULL,
[UnitOfMeasureID] [nvarchar](10) NULL,
[UnitOfMeasureName] [nvarchar](40) NULL,
[StockTypeID] [nvarchar](10) NULL,
[StockTypeName] [nvarchar](40) NULL,
[UnitCost] [money] NULL,
[UnitPrice] [money] NULL,
[AvailableForSaleDate] [datetime] NULL,
[StopSaleDate] [datetime] NULL,
[Status] [nvarchar](7) NULL,
[ImageURL] [nvarchar](150) NULL,
[ProductURL] [nvarchar](150) NULL,
[ETLLoadID] [int] NULL,
[LoadDate] [datetime] NULL,
[LoadDate] [datetime] NULL,
[UpdateDate] [datetime] NULL
)
WITH
(
LOCATION='/DimProduct/'
, DATA_SOURCE = AzureStorage_west_public
, FILE_FORMAT = TextFileFormat
, REJECT_TYPE = VALUE
, REJECT_VALUE = 0
)
;
--FactOnlineSales
CREATE EXTERNAL TABLE [asb].FactOnlineSales
(
[OnlineSalesKey] [int] NOT NULL,
[DateKey] [datetime] NOT NULL,
[StoreKey] [int] NOT NULL,
[ProductKey] [int] NOT NULL,
[PromotionKey] [int] NOT NULL,
[CurrencyKey] [int] NOT NULL,
[CustomerKey] [int] NOT NULL,
[SalesOrderNumber] [nvarchar](20) NOT NULL,
[SalesOrderLineNumber] [int] NULL,
[SalesQuantity] [int] NOT NULL,
[SalesAmount] [money] NOT NULL,
[ReturnQuantity] [int] NOT NULL,
[ReturnAmount] [money] NULL,
[DiscountQuantity] [int] NULL,
[DiscountAmount] [money] NULL,
[TotalCost] [money] NOT NULL,
[UnitCost] [money] NULL,
[UnitPrice] [money] NULL,
[ETLLoadID] [int] NULL,
[LoadDate] [datetime] NULL,
[UpdateDate] [datetime] NULL
)
WITH
(
LOCATION='/FactOnlineSales/'
, DATA_SOURCE = AzureStorage_west_public
, FILE_FORMAT = TextFileFormat
, REJECT_TYPE = VALUE
, REJECT_VALUE = 0
)
;
SELECT GETDATE();
GO
For more information on maintaining columnstore indexes, see the manage columnstore indexes article.
6. Optimize statistics
It's best to create single-column statistics immediately after a load. If you know certain columns aren't going to be
in query predicates, you can skip creating statistics on those columns. If you create single-column statistics on
every column, it might take a long time to rebuild all the statistics.
If you decide to create single-column statistics on every column of every table, you can use the stored procedure
code sample prc_sqldw_create_stats in the statistics article.
The following example is a good starting point for creating statistics. It creates single-column statistics on each
column in the dimension table, and on each joining column in the fact tables. You can always add single or multi-
column statistics to other fact table columns later on.
Next steps
To load the full data set, run the example load the full Contoso Retail Data Warehouse from the Microsoft SQL
Server Samples repository.
For more development tips, see SQL Data Warehouse development overview.
Load data from Azure Data Lake Storage to SQL
Data Warehouse
8/9/2019 • 7 minutes to read • Edit Online
Use PolyBase external tables to load data from Azure Data Lake Storage into Azure SQL Data Warehouse.
Although you can run adhoc queries on data stored in Data Lake Storage, we recommend importing the data into
the SQL Data Warehouse for best performance.
Create database objects required to load from Data Lake Storage.
Connect to a Data Lake Storage directory.
Load data into Azure SQL Data Warehouse.
If you don't have an Azure subscription, create a free account before you begin.
Create a credential
To access your Data Lake Storage account, you will need to create a Database Master Key to encrypt your
credential secret used in the next step. You then create a Database Scoped Credential. When authenticating using
service principals, the Database Scoped Credential stores the service principal credentials set up in AAD. You can
also use the storage account key in the Database Scoped Credential for Gen2.
To connect to Data Lake Storage using service principals, you must first create an Azure Active Directory
Application, create an access key, and grant the application access to the Data Lake Storage account. For
instructions, see Authenticate to Azure Data Lake Storage Using Active Directory.
-- A: Create a Database Master Key.
-- Only necessary if one does not already exist.
-- Required to encrypt the credential secret in the next step.
-- For more information on Master Key: https://msdn.microsoft.com/library/ms174382.aspx?f=255&MSPPError=-
2147217396
-- It should look something like this when authenticating using service principals:
CREATE DATABASE SCOPED CREDENTIAL ADLSCredential
WITH
IDENTITY = '536540b4-4239-45fe-b9a3-629f97591c0c@https://login.microsoftonline.com/42f988bf-85f1-41af-
91ab-2d2cd011da47/oauth2/token',
SECRET = 'BjdIlmtKp4Fpyh9hIvr8HJlUida/seM5kQ3EpLAmeDI='
;
-- DimProduct
CREATE EXTERNAL TABLE [dbo].[DimProduct_external] (
[ProductKey] [int] NOT NULL,
[ProductLabel] [nvarchar](255) NULL,
[ProductName] [nvarchar](500) NULL
)
WITH
(
LOCATION='/DimProduct/'
, DATA_SOURCE = AzureDataLakeStorage
, FILE_FORMAT = TextFileFormat
, REJECT_TYPE = VALUE
, REJECT_VALUE = 0
)
;
Optimize statistics
It is best to create single-column statistics immediately after a load. There are some choices for statistics. For
example, if you create single-column statistics on every column it might take a long time to rebuild all the statistics.
If you know certain columns are not going to be in query predicates, you can skip creating statistics on those
columns.
If you decide to create single-column statistics on every column of every table, you can use the stored procedure
code sample prc_sqldw_create_stats in the statistics article.
The following example is a good starting point for creating statistics. It creates single-column statistics on each
column in the dimension table, and on each joining column in the fact tables. You can always add single or multi-
column statistics to other fact table columns later on.
Achievement unlocked!
You have successfully loaded data into Azure SQL Data Warehouse. Great job!
Next steps
In this tutorial, you created external tables to define the structure for data stored in Data Lake Storage Gen1, and
then used the PolyBase CREATE TABLE AS SELECT statement to load data into your data warehouse.
You did these things:
Created database objects required to load from Data Lake Storage Gen1.
Connected to a Data Lake Storage Gen1 directory.
Loaded data into Azure SQL Data Warehouse.
Loading data is the first step to developing a data warehouse solution using SQL Data Warehouse. Check out our
development resources.
Learn how to develop tables in SQL Data Warehouse
Tutorial: Load data to Azure SQL Data Warehouse
8/18/2019 • 31 minutes to read • Edit Online
This tutorial uses PolyBase to load the WideWorldImportersDW data warehouse from Azure Blob storage to
Azure SQL Data Warehouse. The tutorial uses the Azure portal and SQL Server Management Studio (SSMS ) to:
Create a data warehouse in the Azure portal
Set up a server-level firewall rule in the Azure portal
Connect to the data warehouse with SSMS
Create a user designated for loading data
Create external tables that use Azure blob as the data source
Use the CTAS T-SQL statement to load data into your data warehouse
View the progress of data as it is loading
Generate a year of data in the date dimension and sales fact tables
Create statistics on the newly loaded data
If you don't have an Azure subscription, create a free account before you begin.
Server name Any globally unique name For valid server names, see Naming
rules and restrictions.
Server admin login Any valid name For valid login names, see Database
Identifiers.
8. Click Apply.
9. In the SQL Data Warehouse page, select a collation for the blank database. For this tutorial, use the default
value. For more information about collations, see Collations
10. Now that you have completed the SQL Database form, click Create to provision the database. Provisioning
takes a few minutes.
11. On the toolbar, click Notifications to monitor the deployment process.
1. After the deployment completes, click SQL databases from the left-hand menu and then click SampleDW
on the SQL databases page. The overview page for your database opens, showing you the fully qualified
server name (such as sample-svr.database.windows.net) and provides options for further configuration.
2. Copy this fully qualified server name for use to connect to your server and its databases in subsequent
quick starts. To open the server settings, click the server name.
4. Click Show firewall settings. The Firewall settings page for the SQL Database server opens.
5. To add your current IP address to a new firewall rule, click Add client IP on the toolbar. A firewall rule can
open port 1433 for a single IP address or a range of IP addresses.
6. Click Save. A server-level firewall rule is created for your current IP address opening port 1433 on the
logical server.
7. Click OK and then close the Firewall settings page.
You can now connect to the SQL server and its data warehouses using this IP address. The connection works from
SQL Server Management Studio or another tool of your choice. When you connect, use the serveradmin account
you created previously.
IMPORTANT
By default, access through the SQL Database firewall is enabled for all Azure services. Click OFF on this page and then click
Save to disable the firewall for all Azure services.
Server name The fully qualified server name For example, sample-
svr.database.windows.net is a fully
qualified server name.
Login The server admin account This is the account that you specified
when you created the server.
Password The password for your server admin This is the password that you
account specified when you created the
server.
3. Click Connect. The Object Explorer window opens in SSMS.
4. In Object Explorer, expand Databases. Then expand System databases and master to view the objects in
the master database. Expand SampleDW to view the objects in your new database.
2. In the query window, enter these T-SQL commands to create a login and user named LoaderRC60,
substituting your own password for 'a123STRONGpassword!'.
3. Click Execute.
4. Right-click SampleDW, and choose New Query. A new query Window opens.
5. Enter the following T-SQL commands to create a database user named LoaderRC60 for the LoaderRC60
login. The second line grants the new user CONTROL permissions on the new data warehouse. These
permissions are similar to making the user the owner of the database. The third line adds the new user as a
member of the staticrc60 resource class.
6. Click Execute.
2. Compare your query window to the previous image. Verify your new query window is running as
LoaderRC60 and performing queries on your SampleDW database. Use this query window to perform all
of the loading steps.
3. Create a master key for the SampleDW database. You only need to create a master key once per database.
4. Run the following CREATE EXTERNAL DATA SOURCE statement to define the location of the Azure blob.
This is the location of the external worldwide importers data. To run a command that you have appended to
the query window, highlight the commands you wish to run and click Execute.
5. Run the following CREATE EXTERNAL FILE FORMAT T-SQL statement to specify the formatting
characteristics and options for the external data file. This statement specifies the external data is stored as
text and the values are separated by the pipe ('|') character.
CREATE EXTERNAL FILE FORMAT TextFileFormat
WITH
(
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS
(
FIELD_TERMINATOR = '|',
USE_TYPE_DEFAULT = FALSE
)
);
6. Run the following CREATE SCHEMA statements to create a schema for your external file format. The ext
schema provides a way to organize the external tables you are about to create. The wwi schema organizes
the standard tables that will contain the data.
7. Create the external tables. The table definitions are stored in SQL Data Warehouse, but the tables reference
data that is stored in Azure blob storage. Run the following T-SQL commands to create several external
tables that all point to the Azure blob you defined previously in the external data source.
8. In Object Explorer, expand SampleDW to see the list of external tables you created.
The script uses the CREATE TABLE AS SELECT (CTAS ) T-SQL statement to load the data from Azure Storage
Blob into new tables in your data warehouse. CTAS creates a new table based on the results of a select statement.
The new table has the same columns and data types as the results of the select statement. When the select
statement selects from an external table, SQL Data Warehouse imports the data into a relational table in the data
warehouse.
This script does not load data into the wwi.dimension_Date and wwi.fact_Sale tables. These tables are generated in
a later step in order to make the tables have a sizeable number of rows.
1. Run the following script to load the data into new tables in your data warehouse.
2. View your data as it loads. You’re loading several GBs of data and compressing it into highly performant
clustered columnstore indexes. Open a new query window on SampleDW, and run the following query to
show the status of the load. After starting the query, grab a coffee and a snack while SQL Data Warehouse
does some heavy lifting.
SELECT
r.command,
s.request_id,
r.status,
count(distinct input_name) as nbr_files,
sum(s.bytes_processed)/1024/1024/1024 as gb_processed
FROM
sys.dm_pdw_exec_requests r
INNER JOIN sys.dm_pdw_dms_external_work s
ON r.request_id = s.request_id
WHERE
r.[label] = 'CTAS : Load [wwi].[dimension_City]' OR
r.[label] = 'CTAS : Load [wwi].[dimension_Customer]' OR
r.[label] = 'CTAS : Load [wwi].[dimension_Employee]' OR
r.[label] = 'CTAS : Load [wwi].[dimension_PaymentMethod]' OR
r.[label] = 'CTAS : Load [wwi].[dimension_StockItem]' OR
r.[label] = 'CTAS : Load [wwi].[dimension_Supplier]' OR
r.[label] = 'CTAS : Load [wwi].[dimension_TransactionType]' OR
r.[label] = 'CTAS : Load [wwi].[fact_Movement]' OR
r.[label] = 'CTAS : Load [wwi].[fact_Order]' OR
r.[label] = 'CTAS : Load [wwi].[fact_Purchase]' OR
r.[label] = 'CTAS : Load [wwi].[fact_StockHolding]' OR
r.[label] = 'CTAS : Load [wwi].[fact_Transaction]'
GROUP BY
r.command,
s.request_id,
r.status
ORDER BY
nbr_files desc,
gb_processed desc;
3. View all system queries.
4. Enjoy seeing your data nicely loaded into your data warehouse.
Create tables and procedures to generate the Date and Sales tables
This section creates the wwi.dimension_Date and wwi.fact_Sale tables. It also creates stored procedures that can
generate millions of rows in the wwi.dimension_Date and wwi.fact_Sale tables.
1. Create the dimension_Date and fact_Sale tables.
CREATE TABLE [wwi].[dimension_Date]
(
[Date] [datetime] NOT NULL,
[Day Number] [int] NOT NULL,
[Day] [nvarchar](10) NOT NULL,
[Month] [nvarchar](10) NOT NULL,
[Short Month] [nvarchar](3) NOT NULL,
[Calendar Month Number] [int] NOT NULL,
[Calendar Month Label] [nvarchar](20) NOT NULL,
[Calendar Year] [int] NOT NULL,
[Calendar Year Label] [nvarchar](10) NOT NULL,
[Fiscal Month Number] [int] NOT NULL,
[Fiscal Month Label] [nvarchar](20) NOT NULL,
[Fiscal Year] [int] NOT NULL,
[Fiscal Year Label] [nvarchar](10) NOT NULL,
[ISO Week Number] [int] NOT NULL
)
WITH
(
DISTRIBUTION = REPLICATE,
CLUSTERED INDEX ([Date])
);
CREATE TABLE [wwi].[fact_Sale]
(
[Sale Key] [bigint] IDENTITY(1,1) NOT NULL,
[City Key] [int] NOT NULL,
[Customer Key] [int] NOT NULL,
[Bill To Customer Key] [int] NOT NULL,
[Stock Item Key] [int] NOT NULL,
[Invoice Date Key] [date] NOT NULL,
[Delivery Date Key] [date] NULL,
[Salesperson Key] [int] NOT NULL,
[WWI Invoice ID] [int] NOT NULL,
[Description] [nvarchar](100) NOT NULL,
[Package] [nvarchar](50) NOT NULL,
[Quantity] [int] NOT NULL,
[Unit Price] [decimal](18, 2) NOT NULL,
[Tax Rate] [decimal](18, 3) NOT NULL,
[Total Excluding Tax] [decimal](18, 2) NOT NULL,
[Tax Amount] [decimal](18, 2) NOT NULL,
[Profit] [decimal](18, 2) NOT NULL,
[Total Including Tax] [decimal](18, 2) NOT NULL,
[Total Dry Items] [int] NOT NULL,
[Total Chiller Items] [int] NOT NULL,
[Lineage Key] [int] NOT NULL
)
WITH
(
DISTRIBUTION = HASH ( [WWI Invoice ID] ),
CLUSTERED COLUMNSTORE INDEX
)
INSERT [wwi].[dimension_Date] (
[Date], [Day Number], [Day], [Month], [Short Month], [Calendar Month Number], [Calendar Month
Label], [Calendar Year], [Calendar Year Label], [Fiscal Month Number], [Fiscal Month Label], [Fiscal
Year], [Fiscal Year Label], [ISO Week Number]
)
SELECT
CAST(CAST(monthnum AS VARCHAR(2)) + '/' + CAST([days] AS VARCHAR(3)) + '/' + CAST(@year AS
CHAR(4)) AS DATE) AS [Date]
,DAY(CAST(CAST(monthnum AS VARCHAR(2)) + '/' + CAST([days] AS VARCHAR(3)) + '/' + CAST(@year AS
CHAR(4)) AS DATE)) AS [Day Number]
,CAST(DATENAME(day, CAST(CAST(monthnum AS VARCHAR(2)) + '/' + CAST([days] AS VARCHAR(3)) + '/'
+ CAST(@year AS CHAR(4)) AS DATE)) AS NVARCHAR(10)) AS [Day]
,CAST(DATENAME(month, CAST(CAST(monthnum AS VARCHAR(2)) + '/' + CAST([days] AS VARCHAR(3)) + '/'
+ CAST(@year as char(4)) AS DATE)) AS nvarchar(10)) AS [Month]
,CAST(SUBSTRING(DATENAME(month, CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as
varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)), 1, 3) AS nvarchar(3)) AS [Short Month]
,MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as
char(4)) AS DATE)) AS [Calendar Month Number]
,CAST(N'CY' + CAST(YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) +
'/' + CAST(@year as char(4)) AS DATE)) AS nvarchar(4)) + N'-' + SUBSTRING(DATENAME(month,
CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS
DATE)), 1, 3) AS nvarchar(10)) AS [Calendar Month Label]
,YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as
char(4)) AS DATE)) AS [Calendar Year]
,CAST(N'CY' + CAST(YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) +
'/' + CAST(@year as char(4)) AS DATE)) AS nvarchar(4)) AS nvarchar(10)) AS [Calendar Year Label]
,CASE WHEN MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' +
CAST(@year as char(4)) AS DATE)) IN (11, 12)
THEN MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year
as char(4)) AS DATE)) - 10
ELSE MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year
as char(4)) AS DATE)) + 2 END AS [Fiscal Month Number]
,CAST(N'FY' + CAST(CASE WHEN MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as
varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) IN (11, 12)
THEN YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as
char(4)) AS DATE)) + 1
ELSE YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as
char(4)) AS DATE)) END AS nvarchar(4)) + N'-' + SUBSTRING(DATENAME(month, CAST(CAST(monthnum as
varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)), 1, 3) AS
nvarchar(20)) AS [Fiscal Month Label]
,CASE WHEN MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' +
CAST(@year as char(4)) AS DATE)) IN (11, 12)
THEN YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as
char(4)) AS DATE)) + 1
ELSE YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as
char(4)) AS DATE)) END AS [Fiscal Year]
,CAST(N'FY' + CAST(CASE WHEN MONTH(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as
varchar(3)) + '/' + CAST(@year as char(4)) AS DATE)) IN (11, 12)
THEN YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as
char(4)) AS DATE)) + 1
ELSE YEAR(CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' + CAST(@year as
char(4)) AS DATE))END AS nvarchar(4)) AS nvarchar(10)) AS [Fiscal Year Label]
, DATEPART(ISO_WEEK, CAST(CAST(monthnum as varchar(2)) + '/' + CAST([days] as varchar(3)) + '/' +
CAST(@year as char(4)) AS DATE)) AS [ISO Week Number]
FROM #month m
CROSS JOIN #days d
WHERE d.days <= m.numofdays
INSERT [wwi].[fact_Sale] (
[City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key],
[Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], Package, Quantity, [Unit
Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], Profit, [Total Including Tax], [Total Dry
Items], [Total Chiller Items], [Lineage Key]
)
SELECT TOP(@VariantNumberOfSalesPerDay)
[City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], @DateCounter,
DATEADD(day, 1, @DateCounter), [Salesperson Key], [WWI Invoice ID], [Description], Package, Quantity,
[Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], Profit, [Total Including Tax], [Total
Dry Items], [Total Chiller Items], [Lineage Key]
FROM [wwi].[seed_Sale]
WHERE
--[Sale Key] > @StartingSaleKey and /* IDENTITY DOES NOT WORK THE SAME IN SQLDW AND CAN'T
USE THIS METHOD FOR VARIANT */
[Invoice Date Key] >=cast(@YEAR AS CHAR(4)) + '-01-01'
ORDER BY [Sale Key];
END;
EXEC [wwi].[InitialSalesDataPopulation]
2. Run this procedure to populate wwi.fact_Sale with 100,000 rows per day for each day in the year 2000.
3. The data generation in the previous step might take a while as it progresses through the year. To see which
day the current process is on, open a new query and run this SQL command:
```sql
SELECT TOP 1 * FROM [wwi].[dimension_City];
SELECT TOP 1 * FROM [wwi].[dimension_Customer];
SELECT TOP 1 * FROM [wwi].[dimension_Date];
SELECT TOP 1 * FROM [wwi].[dimension_Employee];
SELECT TOP 1 * FROM [wwi].[dimension_PaymentMethod];
SELECT TOP 1 * FROM [wwi].[dimension_StockItem];
SELECT TOP 1 * FROM [wwi].[dimension_Supplier];
SELECT TOP 1 * FROM [wwi].[dimension_TransactionType];
```
IF @create_type IS NULL
BEGIN
SET @create_type = 1;
END;
IF @create_type NOT IN (1,2,3)
BEGIN
THROW 151000,'Invalid value for @stats_type parameter. Valid range 1 (default), 2 (fullscan) or 3
(sample).',1;
END;
IF @sample_pct IS NULL
BEGIN;
SET @sample_pct = 20;
END;
DECLARE @i INT = 1
, @t INT = (SELECT COUNT(*) FROM #stats_ddl)
, @t INT = (SELECT COUNT(*) FROM #stats_ddl)
, @s NVARCHAR(4000) = N''
;
WHILE @i <= @t
BEGIN
SET @s=(SELECT create_stat_ddl FROM #stats_ddl WHERE seq_nmbr = @i);
PRINT @s
EXEC sp_executesql @s
SET @i+=1;
END
2. Run this command to create statistics on all columns of all tables in the data warehouse.
Clean up resources
You are being charged for compute resources and data that you loaded into your data warehouse. These are billed
separately.
Follow these steps to clean up resources as you desire.
1. Sign in to the Azure portal, click on your data warehouse.
2. If you want to keep the data in storage, you can pause compute when you aren't using the data warehouse.
By pausing compute, you will only be charge for data storage and you can resume the compute whenever
you are ready to work with the data. To pause compute, click the Pause button. When the data warehouse is
paused, you will see a Start button. To resume compute, click Start.
3. If you want to remove future charges, you can delete the data warehouse. To remove the data warehouse so
you won't be charged for compute or storage, click Delete.
4. To remove the SQL server you created, click sample-svr.database.windows.net in the previous image,
and then click Delete. Be careful with this as deleting the server will delete all databases assigned to the
server.
5. To remove the resource group, click SampleRG, and then click Delete resource group.
Next steps
In this tutorial, you learned how to create a data warehouse and create a user for loading data. You created
external tables to define the structure for data stored in Azure Storage Blob, and then used the PolyBase CREATE
TABLE AS SELECT statement to load data into your data warehouse.
You did these things:
Created a data warehouse in the Azure portal
Set up a server-level firewall rule in the Azure portal
Connected to the data warehouse with SSMS
Created a user designated for loading data
Created external tables for data in Azure Storage Blob
Used the CTAS T-SQL statement to load data into your data warehouse
Viewed the progress of data as it is loading
Created statistics on the newly loaded data
Advance to the development overview to learn how to migrate an existing database to SQL Data Warehouse.
Design decisions to migrate an existing database to SQL Data Warehouse
Connect to Azure SQL Data Warehouse
7/24/2019 • 2 minutes to read • Edit Online
NOTE
Consider setting the connection timeout to 300 seconds to allow your connection to survive short periods of unavailability.
Server=tcp:{your_server}.database.windows.net,1433;Database={your_database};User ID={your_user_name};Password=
{your_password_here};Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;
ODBC connection string example
jdbc:sqlserver://yourserver.database.windows.net:1433;database=yourdatabase;user={your_user_name};password=
{your_password_here};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;lo
ginTimeout=30;
Connection settings
SQL Data Warehouse standardizes some settings during connection and object creation. These settings cannot be
overridden and include:
ANSI_NULLS ON
QUOTED_IDENTIFIERS ON
DATEFORMAT mdy
DATEFIRST 7
Next steps
To connect and query with Visual Studio, see Query with Visual Studio. To learn more about authentication
options, see Authentication to Azure SQL Data Warehouse.
Connection strings for Azure SQL Data Warehouse
7/24/2019 • 2 minutes to read • Edit Online
You can connect to SQL Data Warehouse with several different application protocols such as, ADO.NET, ODBC,
PHP and JDBC. Below are some examples of connections strings for each protocol. You can also use the Azure
portal to build your connection string. To build your connection string using the Azure portal, navigate to your
database blade, under Essentials click on Show database connection strings.
NOTE
Consider setting the connection timeout to 300 seconds in order to allow the connection to survive short periods of
unavailability.
Next steps
To start querying your data warehouse with Visual Studio and other applications, see Query with Visual Studio.
Getting started with Visual Studio 2019 for SQL Data
Warehouse
10/17/2019 • 2 minutes to read • Edit Online
Visual Studio 2019 SQL Server Data Tools (SSDT) is a single tool allowing you to do the following:
Connect, query, and develop applications for SQL Data Warehouse
Leverage an object explorer to visually explore all objects in your data model including tables, views, stored
procedures, and etc.
Generate T-SQL data definition language (DDL ) scripts for your objects
Develop your data warehouse using a state-based approach with SSDT Database Projects
Integrate your database project with source control systems such as Git with Azure DevOps Repos
Set up continuous integration and deployment pipelines with automation servers such as Azure DevOps
NOTE
Currently Visual Studio SSDT Database Projects is in preview. To receive periodic updates on this feature, please vote on
UserVoice.
Next steps
Now that you have the latest version of SSDT, you're ready to connect to your SQL Data Warehouse.
Source Control Integration for Azure SQL Data
Warehouse
8/28/2019 • 2 minutes to read • Edit Online
This tutorial outlines how to integrate your SQL Server Data tools (SSDT) database project with source control.
Source control integration is the first step in building your continuous integration and deployment pipeline with
SQL Data Warehouse.
2. Open Visual Studio and connect to your Azure DevOps organization and project from step 1 by selecting
“Manage Connections”
3. Clone your Azure Repo repository from your project to your local machine
3. In team explorer in Visual Studio, commit your all changes to your local Git repository
4. Now that you have the changes committed locally in the cloned repository, sync and push your changes to
your Azure Repo repository in your Azure DevOps project.
Validation
1. Verify changes have been pushed to your Azure Repo by updating a table column in your database project
from Visual Studio SQL Server Data Tools (SSDT)
2. Commit and push the change from your local repository to your Azure Repo
3. Verify the change has been pushed in your Azure Repo repository
4. (Optional) Use Schema Compare and update the changes to your target data warehouse using SSDT to
ensure the object definitions in your Azure Repo repository and local repository reflect your data warehouse
Next steps
Developing for Azure SQL Data Warehouse
Continuous integration and deployment for Azure
SQL Data Warehouse
8/29/2019 • 2 minutes to read • Edit Online
This simple tutorial outlines how to integrate your SQL Server Data tools (SSDT) database project with Azure
DevOps and leverage Azure Pipelines to set up continuous integration and deployment. This tutorial is the second
step in building your continuous integration and deployment pipeline with SQL Data Warehouse.
NOTE
SSDT is currently in preview where you will need to leverage a self-hosted agent. The Microsoft-hosted agents will be
updated in the next few months.
3. Edit your YAML file to use the proper pool of your agent. Your YAML file should look something like this:
At this point, you have a simple environment where any check-in to your source control repository master branch
should automatically trigger a successful Visual Studio build of your database project. Validate the automation is
working end to end by making a change in your local database project and checking in that change to your master
branch.
Next steps
Explore Azure SQL Data Warehouse architecture
Quickly create a SQL Data Warehouse
Load sample data.
Explore Videos
Connect to SQL Data Warehouse with Visual Studio
and SSDT
8/18/2019 • 2 minutes to read • Edit Online
Use Visual Studio to query Azure SQL Data Warehouse in just a few minutes. This method uses the SQL Server
Data Tools (SSDT) extension in Visual Studio 2019.
Prerequisites
To use this tutorial, you need:
An existing SQL Data Warehouse. To create one, see Create a SQL Data Warehouse.
SSDT for Visual Studio. If you have Visual Studio, you probably already have this. For installation instructions
and options, see Installing Visual Studio and SSDT.
The fully qualified SQL server name. To find this, see Connect to SQL Data Warehouse.
4. Run the query. To do this, click the green arrow or use the following shortcut: CTRL + SHIFT + E .
5. Look at the query results. In this example, the FactInternetSales table has 60398 rows.
Next steps
Now that you can connect and query, try visualizing the data with PowerBI.
To configure your environment for Azure Active Directory authentication, see Authenticate to SQL Data
Warehouse.
Connect to SQL Data Warehouse with SQL Server
Management Studio (SSMS)
8/18/2019 • 2 minutes to read • Edit Online
Use SQL Server Management Studio (SSMS ) to connect to and query Azure SQL Data Warehouse.
Prerequisites
To use this tutorial, you need:
An existing SQL Data Warehouse. To create one, see Create a SQL Data Warehouse.
SQL Server Management Studio (SSMS ) installed. Install SSMS for free if you don't already have it.
The fully qualified SQL server name. To find this, see Connect to SQL Data Warehouse.
4. Run the query. To do this, click Execute or use the following shortcut: F5 .
5. Look at the query results. In this example, the FactInternetSales table has 60398 rows.
Next steps
Now that you can connect and query, try visualizing the data with PowerBI.
To configure your environment for Azure Active Directory authentication, see Authenticate to SQL Data
Warehouse.
Connect to SQL Data Warehouse with sqlcmd
7/24/2019 • 2 minutes to read • Edit Online
Use sqlcmd command-line utility to connect to and query an Azure SQL Data Warehouse.
1. Connect
To get started with sqlcmd, open the command prompt and enter sqlcmd followed by the connection string for
your SQL Data Warehouse database. The connection string requires the following parameters:
Server (-S ): Server in the form < Server Name > .database.windows.net
Database (-d): Database name.
Enable Quoted Identifiers (-I ): Quoted identifiers must be enabled to connect to a SQL Data Warehouse
instance.
To use SQL Server Authentication, you need to add the username/password parameters:
User (-U ): Server user in the form < User >
Password (-P ): Password associated with the user.
For example, your connection string might look like the following:
To use Azure Active Directory Integrated authentication, you need to add the Azure Active Directory parameters:
Azure Active Directory Authentication (-G): use Azure Active Directory for authentication
For example, your connection string might look like the following:
NOTE
You need to enable Azure Active Directory Authentication to authenticate using Active Directory.
2. Query
After connection, you can issue any supported Transact-SQL statements against the instance. In this example,
queries are submitted in interactive mode.
These next examples show how you can run your queries in batch mode using the -Q option or piping your SQL
to sqlcmd.
sqlcmd -S MySqlDw.database.windows.net -d Adventure_Works -U myuser -P myP@ssword -I -Q "SELECT name FROM
sys.tables;"
Next steps
See sqlcmd documentation for more about details about the options available in sqlcmd.
Analyze your workload in Azure SQL Data
Warehouse
7/5/2019 • 2 minutes to read • Edit Online
Resource Classes
SQL Data Warehouse provides resource classes to assign system resources to queries. For more information on
resource classes, see Resource classes & workload management. Queries will wait if the resource class assigned to
a query needs more resources than are currently available.
The following query shows which role each user is assigned to.
SELECT w.[wait_id]
, w.[session_id]
, w.[type] AS Wait_type
, w.[object_type]
, w.[object_name]
, w.[request_id]
, w.[request_time]
, w.[acquire_time]
, w.[state]
, w.[priority]
, SESSION_ID() AS Current_session
, s.[status] AS Session_status
, s.[login_name]
, s.[query_count]
, s.[client_id]
, s.[sql_spid]
, r.[command] AS Request_command
, r.[label]
, r.[status] AS Request_status
, r.[submit_time]
, r.[start_time]
, r.[end_compile_time]
, r.[end_time]
, DATEDIFF(ms,r.[submit_time],r.[start_time]) AS Request_queue_time_ms
, DATEDIFF(ms,r.[start_time],r.[end_compile_time]) AS Request_compile_time_ms
, DATEDIFF(ms,r.[end_compile_time],r.[end_time]) AS Request_execution_time_ms
, r.[total_elapsed_time]
FROM sys.dm_pdw_waits w
JOIN sys.dm_pdw_exec_sessions s ON w.[session_id] = s.[session_id]
JOIN sys.dm_pdw_exec_requests r ON w.[request_id] = r.[request_id]
WHERE w.[session_id] <> SESSION_ID()
;
The sys.dm_pdw_resource_waits DMV shows the wait information for a given query. Resource wait time measures
the time waiting for resources to be provided. Signal wait time is the time it takes for the underlying SQL servers
to schedule the query onto the CPU.
SELECT [session_id]
, [type]
, [object_type]
, [object_name]
, [request_id]
, [request_time]
, [acquire_time]
, DATEDIFF(ms,[request_time],[acquire_time]) AS acquire_duration_ms
, [concurrency_slots_used] AS concurrency_slots_reserved
, [resource_class]
, [wait_id] AS queue_position
FROM sys.dm_pdw_resource_waits
WHERE [session_id] <> SESSION_ID()
;
You can also use the sys.dm_pdw_resource_waits DMV calculate how many concurrency slots have been granted.
SELECT SUM([concurrency_slots_used]) as total_granted_slots
FROM sys.[dm_pdw_resource_waits]
WHERE [state] = 'Granted'
AND [resource_class] is not null
AND [session_id] <> session_id()
;
The sys.dm_pdw_wait_stats DMV can be used for historic trend analysis of waits.
SELECT w.[pdw_node_id]
, w.[wait_name]
, w.[max_wait_time]
, w.[request_count]
, w.[signal_time]
, w.[completed_count]
, w.[wait_time]
FROM sys.dm_pdw_wait_stats w
;
Next steps
For more information about managing database users and security, see Secure a database in SQL Data
Warehouse. For more information about how larger resource classes can improve clustered columnstore index
quality, see Rebuilding indexes to improve segment quality.
Manage and monitor workload importance in Azure
SQL Data Warehouse
7/5/2019 • 2 minutes to read • Edit Online
Manage and monitor request level importance in Azure SQL Data Warehouse using DMVs and catalog views.
Monitor importance
Monitor importance using the new importance column in the sys.dm_pdw_exec_requests dynamic management
view. The below monitoring query shows submit time and start time for queries. Review the submit time and start
time along with importance to see how importance influenced scheduling.
To look further into how queries are being schedule, use the catalog views.
SELECT *
FROM sys.workload_management_workload_classifiers
WHERE classifier_id > 12
To simplify troubleshooting misclassification, we recommended you remove resource class role mappings as you
create workload classifiers. The code below returns existing resource class role memberships. Run
sp_droprolemember for each membername returned from the corresponding resource class. Below is an example of
checking for existence before dropping a workload classifier:
IF EXISTS (SELECT 1 FROM sys.workload_management_workload_classifiers WHERE name = 'ExecReportsClassifier')
DROP WORKLOAD CLASSIFIER ExecReportsClassifier;
GO
Next steps
For more information on Classification, see Workload Classification.
For more information on Importance, see Workload Importance
Go to Configure Workload Importance
Configure workload importance in Azure SQL Data
Warehouse
7/5/2019 • 2 minutes to read • Edit Online
Setting importance in the SQL Data Warehouse allows you to influence the scheduling of queries. Queries with
higher importance will be scheduled to run before queries with lower importance. To assign importance to
queries, you need to create a workload classifier.
To create a workload classifier for a user running adhoc queries with lower importance run:
Next Steps
For more information about workload management, see Workload Classification
For more information on Importance, see Workload Importance
Go to Manage and monitor Workload Importance
Optimize performance by upgrading SQL Data
Warehouse
8/18/2019 • 6 minutes to read • Edit Online
Upgrade Azure SQL Data Warehouse to latest generation of Azure hardware and storage architecture.
Why upgrade?
You can now seamlessly upgrade to the SQL Data Warehouse Compute Optimized Gen2 tier in the Azure portal
for supported regions. If your region does not support self-upgrade, you can upgrade to a supported region or
wait for self-upgrade to be available in your region. Upgrade now to take advantage of the latest generation of
Azure hardware and enhanced storage architecture including faster performance, higher scalability, and unlimited
columnar storage.
Applies to
This upgrade applies to Compute Optimized Gen1 tier data warehouses in supported regions.
DW100 DW100c
DW200 DW200c
DW300 DW300c
DW400 DW400c
DW500 DW500c
DW600 DW500c
DW1000 DW1000c
COMPUTE OPTIMIZED GEN1 TIER COMPUTE OPTIMIZED GEN2 TIER
DW1200 DW1000c
DW1500 DW1500c
DW2000 DW2000c
DW3000 DW3000c
DW6000 DW6000c
NOTE
Suggested performance levels are not a direct conversion. For example, we recommend going from DW600 to DW500c.
NOTE
Migration from Gen1 to Gen2 through the Azure portal is permanent. There is not a process for returning to Gen1.
NOTE
Azure SQL Data Warehouse must be running to migrate to Gen2.
Modified to:
NOTE
-RequestedServiceObjectiveName "DW300" is changed to - RequestedServiceObjectiveName "DW300c"
Modified to:
NOTE
SERVICE_OBJECTIVE = 'DW300' is changed to SERVICE_OBJECTIVE = 'DW300c'
3. Ensure your workload has completed running and quiesced before upgrading. You'll experience downtime
for a few minutes before your data warehouse is back online as a Compute Optimized Gen2 tier data
warehouse. Select Upgrade:
The first step of the upgrade process goes through the scale operation ("Upgrading - Offline") where all
sessions will be killed, and connections will be dropped.
The second step of the upgrade process is data migration ("Upgrading - Online"). Data migration is an
online trickle background process. This process slowly moves columnar data from the old storage
architecture to the new storage architecture using a local SSD cache. During this time, your data
warehouse will be online for querying and loading. Your data will be available to query regardless of
whether it has been migrated or not. The data migration happens at varying rates depending on your data
size, your performance level, and the number of your columnstore segments.
5. Optional Recommendation: Once the scaling operation is complete, you can speed up the data
migration background process. You can force data movement by running Alter Index rebuild on all primary
columnstore tables you'd be querying at a larger SLO and resource class. This operation is offline
compared to the trickle background process, which can take hours to complete depending on the number
and sizes of your tables. However, once complete, data migration will be much quicker due to the new
enhanced storage architecture with high-quality rowgroups.
NOTE
Alter Index rebuild is an offline operation and the tables will not be available until the rebuild completes.
The following query generates the required Alter Index Rebuild commands to expedite data migration:
5. For User-Defined Restore Points, select a Restore point or Create a new user-defined restore point.
Choose a server in a Gen2 supported geographic region.
Restore from an Azure geographical region using PowerShell
NOTE
This article has been updated to use the new Azure PowerShell Az module. You can still use the AzureRM module, which will
continue to receive bug fixes until at least December 2020. To learn more about the new Az module and AzureRM
compatibility, see Introducing the new Azure PowerShell Az module. For Az module installation instructions, see Install Azure
PowerShell.
NOTE
You can perform a geo-restore to Gen2! To do so, specify an Gen2 ServiceObjectiveName (e.g. DW1000c) as an optional
parameter.
Connect-AzAccount
Get-AzSubscription
Select-AzSubscription -SubscriptionName "<Subscription_name>"
# Recover database
$GeoRestoredDatabase = Restore-AzSqlDatabase –FromGeoBackup -ResourceGroupName "<YourResourceGroupName>" -
ServerName "<YourTargetServer>" -TargetDatabaseName "<NewDatabaseName>" –ResourceId $GeoBackup.ResourceID -
ServiceObjectiveName "<YourTargetServiceLevel>" -RequestedServiceObjectiveName "DW300c"
NOTE
To configure your database after the restore has completed, see Configure your database after recovery.
The recovered database will be TDE -enabled if the source database is TDE -enabled.
If you experience any issues with your data warehouse, create a support request and reference “Gen2 upgrade” as
the possible cause.
Next steps
Your upgraded data warehouse is online. To take advantage of the enhanced architecture, see Resource classes for
Workload Management.
Use Azure Functions to manage compute resources
in Azure SQL Data Warehouse
3/15/2019 • 4 minutes to read • Edit Online
This tutorial uses Azure Functions to manage compute resources for a data warehouse in Azure SQL Data
Warehouse.
In order to use Azure Function App with SQL Data Warehouse, you must create a Service Principal Account with
contributor access under the same subscription as your data warehouse instance.
Once you've deployed the template, you should find three new resources: a free Azure App Service Plan, a
consumption-based Function App plan, and a storage account that handles the logging and the operations queue.
Continue reading the other sections to see how to modify the deployed functions to fit your need.
3. Currently the value displayed should say either %ScaleDownTime% or %ScaleUpTime%. These values
indicate the schedule is based on values defined in your Application Settings. For now, you can ignore this
value and change the schedule to your preferred time based on the next steps.
4. In the schedule area, add the time the CRON expression you would like to reflect how often you want the
SQL Data Warehouse to be scaled up.
The value of schedule is a CRON expression that includes these six fields:
For example, "0 30 9 * * 1 -5" would reflect a trigger every weekday at 9:30am. For more information, visit
Azure Functions schedule examples.
3. Change the value of ServiceLevelObjective to the level you would like and hit save. This value is the
compute level that your data warehouse instance will scale to based on the schedule defined in the
Integrate section.
2. Click on the sliding toggle for the corresponding triggers you would like to enable.
3. Navigate to the Integrate tabs for the respective triggers to change their schedule.
NOTE
The functional difference between the scaling triggers and the pause/resume triggers is the message that is sent to
the queue. For more information, see Add a new trigger function.
3. Name your function and set your schedule. The image shows how one may trigger their function every
Saturday at midnight (late Friday evening).
4. Copy the content of index.js from one of the other trigger functions.
Complex scheduling
This section briefly demonstrates what is necessary to get more complex scheduling of pause, resume, and scaling
capabilities.
Example 1:
Daily scale up at 8am to DW600 and scale down at 8pm to DW200.
Example 2:
Daily scale up at 8am to DW1000, scale down once to DW600 at 4pm, and scale down at 10pm to DW200.
Example 3:
Scale up at 8am to DW1000 , scale down once to DW600 at 4pm on the weekdays. Pauses Friday 11pm, resumes
7am Monday morning.
FUNCTION SCHEDULE OPERATION
Next steps
Learn more about timer trigger Azure functions.
Checkout the SQL Data Warehouse samples repository.
Monitor workload - Azure portal
8/22/2019 • 2 minutes to read • Edit Online
This article describes how to use the Azure portal to monitor your workload. This includes setting up Azure
Monitor Logs to investigate query execution and workload trends using log analytics for Azure SQL Data
Warehouse.
Prerequisites
Azure subscription: If you don't have an Azure subscription, create a free account before you begin.
Azure SQL Data Warehouse: We will be collecting logs for a SQL Data Warehouse. If you don't have a SQL
Data Warehouse provisioned, see the instructions in Create a SQL Data Warehouse.
Logs can be emitted to Azure Storage, Stream Analytics, or Log Analytics. For this tutorial, select Log Analytics.
Next steps
Now that you have set up and configured Azure monitor logs, customize Azure dashboards to share across your
team.
Monitor your workload using DMVs
8/25/2019 • 8 minutes to read • Edit Online
This article describes how to use Dynamic Management Views (DMVs) to monitor your workload. This includes
investigating query execution in Azure SQL Data Warehouse.
Permissions
To query the DMVs in this article, you need either VIEW DATABASE STATE or CONTROL permission. Usually
granting VIEW DATABASE STATE is the preferred permission as it is much more restrictive.
Monitor connections
All logins to SQL Data Warehouse are logged to sys.dm_pdw_exec_sessions. This DMV contains the last 10,000
logins. The session_id is the primary key and is assigned sequentially for each new logon.
NOTE
Stored procedures use multiple Request IDs. Request IDs are assigned in sequential order.
Here are steps to follow to investigate query execution plans and times for a particular query.
STEP 1: Identify the query you wish to investigate
From the preceding query results, note the Request ID of the query that you would like to investigate.
Queries in the Suspended state can be queued due to a large number of active running queries. These queries
also appear in the sys.dm_pdw_waits waits query with a type of UserConcurrencyResourceType. For information
on concurrency limits, see Performance tiers or Resource classes for workload management. Queries can also
wait for other reasons such as for object locks. If your query is waiting for a resource, see Investigating queries
waiting for resources further down in this article.
To simplify the lookup of a query in the sys.dm_pdw_exec_requests table, use LABEL to assign a comment to your
query that can be looked up in the sys.dm_pdw_exec_requests view.
When a DSQL plan is taking longer than expected, the cause can be a complex plan with many DSQL steps or
just one step taking a long time. If the plan is many steps with several move operations, consider optimizing your
table distributions to reduce data movement. The Table distribution article explains why data must be moved to
solve a query and explains some distribution strategies to minimize data movement.
To investigate further details about a single step, the operation_type column of the long-running query step and
note the Step Index:
Proceed with Step 3a for SQL operations: OnOperation, RemoteOperation, ReturnOperation.
Proceed with Step 3b for Data Movement operations: ShuffleMoveOperation, BroadcastMoveOperation,
TrimMoveOperation, PartitionMoveOperation, MoveOperation, CopyOperation.
STEP 3a: Investigate SQL on the distributed databases
Use the Request ID and the Step Index to retrieve details from sys.dm_pdw_sql_requests, which contains
execution information of the query step on all of the distributed databases.
When the query step is running, DBCC PDW_SHOWEXECUTIONPLAN can be used to retrieve the SQL Server
estimated plan from the SQL Server plan cache for the step running on a particular distribution.
-- Find the SQL Server execution plan for a query running on a specific SQL Data Warehouse Compute or Control
node.
-- Replace distribution_id and spid with values from previous query.
-- Find the information about all the workers completing a Data Movement Step.
-- Replace request_id and step_index with values from Step 1 and 3.
Check the total_elapsed_time column to see if a particular distribution is taking significantly longer than
others for data movement.
For the long-running distribution, check the rows_processed column to see if the number of rows being moved
from that distribution is significantly larger than others. If so, this finding might indicate skew of your
underlying data.
If the query is running, DBCC PDW_SHOWEXECUTIONPLAN can be used to retrieve the SQL Server estimated
plan from the SQL Server plan cache for the currently running SQL Step within a particular distribution.
-- Find the SQL Server estimated plan for a query running on a specific SQL Data Warehouse Compute or Control
node.
-- Replace distribution_id and spid with values from previous query.
-- Find queries
-- Replace request_id with value from Step 1.
SELECT waits.session_id,
waits.request_id,
requests.command,
requests.status,
requests.start_time,
waits.type,
waits.state,
waits.object_type,
waits.object_name
FROM sys.dm_pdw_waits waits
JOIN sys.dm_pdw_exec_requests requests
ON waits.request_id=requests.request_id
WHERE waits.request_id = 'QID####'
ORDER BY waits.object_name, waits.object_type, waits.state;
If the query is actively waiting on resources from another query, then the state will be AcquireResources. If the
query has all the required resources, then the state will be Granted.
Monitor tempdb
Tempdb is used to hold intermediate results during query execution. High utilization of the tempdb database can
lead to slow query performance. Each node in Azure SQL Data Warehouse has approximately 1 TB of raw space
for tempdb. Below are tips for monitoring tempdb usage and for decreasing tempdb usage in your queries.
Monitoring tempdb with views
To monitor tempdb usage, first install the microsoft.vw_sql_requests view from the Microsoft Toolkit for SQL
Data Warehouse. You can then execute the following query to see the tempdb usage per node for all executed
queries:
-- Monitor tempdb
SELECT
sr.request_id,
ssu.session_id,
ssu.pdw_node_id,
sr.command,
sr.total_elapsed_time,
es.login_name AS 'LoginName',
DB_NAME(ssu.database_id) AS 'DatabaseName',
(es.memory_usage * 8) AS 'MemoryUsage (in KB)',
(ssu.user_objects_alloc_page_count * 8) AS 'Space Allocated For User Objects (in KB)',
(ssu.user_objects_dealloc_page_count * 8) AS 'Space Deallocated For User Objects (in KB)',
(ssu.internal_objects_alloc_page_count * 8) AS 'Space Allocated For Internal Objects (in KB)',
(ssu.internal_objects_dealloc_page_count * 8) AS 'Space Deallocated For Internal Objects (in KB)',
CASE es.is_user_process
WHEN 1 THEN 'User Session'
WHEN 0 THEN 'System Session'
END AS 'SessionType',
es.row_count AS 'RowCount'
FROM sys.dm_pdw_nodes_db_session_space_usage AS ssu
INNER JOIN sys.dm_pdw_nodes_exec_sessions AS es ON ssu.session_id = es.session_id AND ssu.pdw_node_id =
es.pdw_node_id
INNER JOIN sys.dm_pdw_nodes_exec_connections AS er ON ssu.session_id = er.session_id AND ssu.pdw_node_id =
er.pdw_node_id
INNER JOIN microsoft.vw_sql_requests AS sr ON ssu.session_id = sr.spid AND ssu.pdw_node_id = sr.pdw_node_id
WHERE DB_NAME(ssu.database_id) = 'tempdb'
AND es.session_id <> @@SPID
AND es.login_name <> 'sa'
ORDER BY sr.request_id;
If you have a query that is consuming a large amount of memory or have received an error message related to
allocation of tempdb, it could be due to a very large CREATE TABLE AS SELECT (CTAS ) or INSERT SELECT
statement running that is failing in the final data movement operation. This can usually be identified as a
ShuffleMove operation in the distributed query plan right before the final INSERT SELECT. Use
sys.dm_pdw_request_steps to monitor ShuffleMove operations.
The most common mitigation is to break your CTAS or INSERT SELECT statement into multiple load statements
so the data volume will not exceed the 1TB per node tempdb limit. You can also scale your cluster to a larger size
which will spread the tempdb size across more nodes reducing the tempdb on each individual node.
In addition to CTAS and INSERT SELECT statements, large, complex queries running with insufficient memory
can spill into tempdb causing queries to fail. Consider running with a larger resource class to avoid spilling into
tempdb.
Monitor memory
Memory can be the root cause for slow performance and out of memory issues. Consider scaling your data
warehouse if you find SQL Server memory usage reaching its limits during query execution.
The following query returns SQL Server memory usage and memory pressure per node:
-- Memory consumption
SELECT
pc1.cntr_value as Curr_Mem_KB,
pc1.cntr_value/1024.0 as Curr_Mem_MB,
(pc1.cntr_value/1048576.0) as Curr_Mem_GB,
pc2.cntr_value as Max_Mem_KB,
pc2.cntr_value/1024.0 as Max_Mem_MB,
(pc2.cntr_value/1048576.0) as Max_Mem_GB,
pc1.cntr_value * 100.0/pc2.cntr_value AS Memory_Utilization_Percentage,
pc1.pdw_node_id
FROM
-- pc1: current memory
sys.dm_pdw_nodes_os_performance_counters AS pc1
-- pc2: total memory allowed for this SQL instance
JOIN sys.dm_pdw_nodes_os_performance_counters AS pc2
ON pc1.object_name = pc2.object_name AND pc1.pdw_node_id = pc2.pdw_node_id
WHERE
pc1.counter_name = 'Total Server Memory (KB)'
AND pc2.counter_name = 'Target Server Memory (KB)'
-- Monitor rollback
SELECT
SUM(CASE WHEN t.database_transaction_next_undo_lsn IS NOT NULL THEN 1 ELSE 0 END),
t.pdw_node_id,
nod.[type]
FROM sys.dm_pdw_nodes_tran_database_transactions t
JOIN sys.dm_pdw_nodes nod ON t.pdw_node_id = nod.pdw_node_id
GROUP BY t.pdw_node_id, nod.[type]
Next steps
For more information about DMVs, see System views.
How to monitor the Gen2 cache
3/12/2019 • 2 minutes to read • Edit Online
The Gen2 storage architecture automatically tiers your most frequently queried columnstore segments in a cache
residing on NVMe based SSDs designed for Gen2 data warehouses. Greater performance is realized when your
queries retrieve segments that are residing in the cache. This article describes how to monitor and troubleshoot
slow query performance by determining whether your workload is optimally leveraging the Gen2 cache.
Select the metrics button and fill in the Subscription, Resource group, Resource type, and Resource name of
your data warehouse.
The key metrics for troubleshooting the Gen2 cache are Cache hit percentage and Cache used percentage.
Configure the Azure metric chart to display these two metrics.
Cache hit and used percentage
The matrix below describes scenarios based on the values of the cache metrics:
Scenario 1: You are optimally using your cache. Troubleshoot other areas which may be slowing down your
queries.
Scenario 2: Your current working data set cannot fit into the cache which causes a low cache hit percentage due to
physical reads. Consider scaling up your performance level and rerun your workload to populate the cache.
Scenario 3: It is likely that your query is running slow due to reasons unrelated to the cache. Troubleshoot other
areas which may be slowing down your queries. You can also consider scaling down your instance to reduce your
cache size to save costs.
Scenario 4: You had a cold cache which could be the reason why your query was slow. Consider rerunning your
query as your working dataset should now be in cached.
Important: If the cache hit percentage or cache used percentage is not updating after rerunning your
workload, your working set can already be residing in memory. Note only clustered columnstore tables
are cached.
Next steps
For more information on general query performance tuning, see Monitor query execution.
User-defined restore points
8/18/2019 • 2 minutes to read • Edit Online
In this article, you learn to create a new user-defined restore point for Azure SQL Data Warehouse using
PowerShell and Azure portal.
$SubscriptionName="<YourSubscriptionName>"
$ResourceGroupName="<YourResourceGroupName>"
$ServerName="<YourServerNameWithoutURLSuffixSeeNote>" # Without database.windows.net
$DatabaseName="<YourDatabaseName>"
$Label = "<YourRestorePointLabel>"
Connect-AzAccount
Get-AzSubscription
Select-AzSubscription -SubscriptionName $SubscriptionName
Next steps
Restore an existing data warehouse
Restore a deleted data warehouse
Restore from a geo-backup data warehouse
Restore an existing Azure SQL Data Warehouse
8/18/2019 • 2 minutes to read • Edit Online
In this article, you learn to restore an existing SQL Data Warehouse through Azure portal and PowerShell:
NOTE
This article has been updated to use the new Azure PowerShell Az module. You can still use the AzureRM module, which will
continue to receive bug fixes until at least December 2020. To learn more about the new Az module and AzureRM
compatibility, see Introducing the new Azure PowerShell Az module. For Az module installation instructions, see Install Azure
PowerShell.
Verify your DTU capacity. Each SQL Data Warehouse is hosted by a SQL server (for example,
myserver.database.windows.net) which has a default DTU quota. Verify the SQL server has enough remaining
DTU quota for the database being restored. To learn how to calculate DTU needed or to request more DTU, see
Request a DTU quota change.
Connect-AzAccount
Get-AzSubscription
Select-AzSubscription -SubscriptionName $SubscriptionName
4. Select either Automatic Restore Points or User-Defined Restore Points. If the data warehouse doesn't
have any automatic restore points, wait a few hours or create a user defined restore point before restoring.
For User-Defined Restore Points, select an existing one or create a new one. For Server, you can pick a
logical server in a different resource group and region or create a new one. After providing all the
parameters, click Review + Restore.
Next Steps
Restore a deleted data warehouse
Restore from a geo-backup data warehouse
Restore a deleted Azure SQL Data Warehouse
7/23/2019 • 2 minutes to read • Edit Online
In this article, you learn to restore a deleted SQL Data Warehouse using Azure portal and PowerShell:
Verify your DTU capacity. Each SQL Data Warehouse is hosted by a SQL server (for example,
myserver.database.windows.net) which has a default DTU quota. Verify that the SQL server has enough remaining
DTU quota for the database being restored. To learn how to calculate DTU needed or to request more DTU, see
Request a DTU quota change.
Connect-AzAccount
Get-AzSubscription
Select-AzSubscription -SubscriptionName $SubscriptionName
# Use the following command to restore deleted data warehouse to a different logical server
#$RestoredDatabase = Restore-AzSqlDatabase –FromDeletedDatabaseBackup –DeletionDate
$DeletedDatabase.DeletionDate -ResourceGroupName $TargetResourceGroupName -ServerName $TargetServerName -
TargetDatabaseName $NewDatabaseName –ResourceId $DeletedDatabase.ResourceID
4. Select the deleted SQL Data Warehouse that you want to restore.
5. Specify a new Database name and click OK
Next Steps
Restore an existing data warehouse
Restore from a geo-backup data warehouse
Geo-restore Azure SQL Data Warehouse
7/23/2019 • 2 minutes to read • Edit Online
In this article, you learn to restore your data warehouse from a geo-backup through Azure portal and PowerShell.
Verify your DTU capacity. Each SQL Data Warehouse is hosted by a SQL server (for example,
myserver.database.windows.net) which has a default DTU quota. Verify that the SQL server has enough remaining
DTU quota for the database being restored. To learn how to calculate DTU needed or to request more DTU, see
Request a DTU quota change.
NOTE
You can perform a geo-restore to Gen2! To do so, specify an Gen2 ServiceObjectiveName (e.g. DW1000c) as an optional
parameter.
Connect-AzAccount
Get-AzSubscription
Select-AzSubscription -SubscriptionName $SubscriptionName
Get-AzureSqlDatabase -ServerName $ServerName
The recovered database will be TDE -enabled if the source database is TDE -enabled.
3. Fill out the information requested in the Basics tab and click Next: Additional settings.
4. For Use existing data parameter, select Backup and select the appropriate backup from the scroll down
options. Click Review + Create.
5. Once the data warehouse has been restored, check that the Status is Online.
Next Steps
Restore an existing data warehouse
Restore a deleted data warehouse
Analyze data with Azure Machine Learning
5/17/2019 • 3 minutes to read • Edit Online
This tutorial uses Azure Machine Learning to build a predictive machine learning model based on data stored in
Azure SQL Data Warehouse. Specifically, this builds a targeted marketing campaign for Adventure Works, the
bike shop, by predicting if a customer is likely to buy a bike or not.
Prerequisites
To step through this tutorial, you need:
A SQL Data Warehouse pre-loaded with AdventureWorksDW sample data. To provision this, see Create a SQL
Data Warehouse and choose to load the sample data. If you already have a data warehouse but do not have
sample data, you can load sample data manually.
SELECT [CustomerKey]
,[GeographyKey]
,[CustomerAlternateKey]
,[MaritalStatus]
,[Gender]
,cast ([YearlyIncome] as int) as SalaryYear
,[TotalChildren]
,[NumberChildrenAtHome]
,[EnglishEducation]
,[EnglishOccupation]
,[HouseOwnerFlag]
,[NumberCarsOwned]
,[CommuteDistance]
,[Region]
,[Age]
,[BikeBuyer]
FROM [dbo].[vTargetMail]
5. Then, click Launch column selector in the Properties pane. Select the BikeBuyer column as the column to
predict.
4. Score the model
Now, we will test how the model performs on test data. We will compare the algorithm of our choice with a
different algorithm to see which performs better.
1. Drag Score Model module into the canvas and connect it to Train Model and Split Data modules.
2. Drag the Two-Class Bayes Point Machine into the experiment canvas. We will compare how this algorithm
performs in comparison to the Two-Class Boosted Decision Tree.
3. Copy and Paste the modules Train Model and Score Model in the canvas.
4. Drag the Evaluate Model module into the canvas to compare the two algorithms.
5. Run the experiment.
6. Click the output port at the bottom of the Evaluate Model module and click Visualize.
The metrics provided are the ROC curve, precision-recall diagram and lift curve. Looking at these metrics, we can
see that the first model performed better than the second one. To look at the what the first model predicted, click
on output port of the Score Model and click Visualize.
You will see two more columns added to your test dataset.
Scored Probabilities: the likelihood that a customer is a bike buyer.
Scored Labels: the classification done by the model – bike buyer (1) or not (0). This probability threshold for
labeling is set to 50% and can be adjusted.
Comparing the column BikeBuyer (actual) with the Scored Labels (prediction), you can see how well the model
has performed. As next steps, you can use this model to make predictions for new customers and publish this
model as a web service or write results back to SQL Data Warehouse.
Next steps
To learn more about building predictive machine learning models, refer to Introduction to Machine Learning on
Azure.
Get started quickly with Fivetran and SQL Data
Warehouse
5/17/2019 • 2 minutes to read • Edit Online
This quickstart describes how to set up a new Fivetran user to work with Azure SQL Data Warehouse. The article
assumes that you have an existing instance of SQL Data Warehouse.
Set up a connection
1. Find the fully qualified server name and database name that you use to connect to SQL Data Warehouse.
If you need help finding this information, see Connect to Azure SQL Data Warehouse.
2. In the setup wizard, choose whether to connect your database directly or by using an SSH tunnel.
If you choose to connect directly to your database, you must create a firewall rule to allow access. This
method is the simplest and most secure method.
If you choose to connect by using an SSH tunnel, Fivetran connects to a separate server on your network.
The server provides an SSH tunnel to your database. You must use this method if your database is in an
inaccessible subnet on a virtual network.
3. Add the IP address 52.0.2.4 to your server-level firewall to allow incoming connections to your SQL Data
Warehouse instance from Fivetran.
For more information, see Create a server-level firewall rule.
CONTROL permission is required to create database-scoped credentials that are used when a user loads
files from Azure Blob storage by using PolyBase.
3. Add a suitable resource class to the Fivetran user. The resource class you use depends on the memory that's
required to create a columnstore index. For example, integrations with products like Marketo and Salesforce
require a higher resource class because of the large number of columns and the larger volume of data the
products use. A higher resource class requires more memory to create columnstore indexes.
We recommend that you use static resource classes. You can start with the staticrc20 resource class. The
staticrc20 resource class allocates 200 MB for each user, regardless of the performance level you use. If
columnstore indexing fails at the initial resource class level, increase the resource class.
For more information, read about memory and concurrency limits and resource classes.
Sign in to Fivetran
To sign in to Fivetran, enter the credentials that you use to access SQL Data Warehouse:
Host (your server name).
Port.
Database.
User (the user name should be fivetran@server_name where server_name is part of your Azure host URI:
server_name.database.windows.net).
Password.
Striim Azure SQL DW Marketplace Offering Install
Guide
5/17/2019 • 2 minutes to read • Edit Online
This quickstart assumes that you already have a pre-existing instance of SQL Data Warehouse.
Search for Striim in the Azure Marketplace, and select the Striim for Data Integration to SQL Data Warehouse
(Staged) option
Configure the Striim VM with specified properties, noting down the Striim cluster name, password, and admin
password
Once deployed, click on <VM Name>-masternode in the Azure portal, click Connect, and copy the Login using VM
local account
Execute the following commands to move the JDBC jar file into Striim’s lib directory, and start and stop the server.
1. sudo su
2. cd /tmp
3. mv sqljdbc42.jar /opt/striim/lib
4. systemctl stop striim-node
5. systemctl stop striim-dbms
6. systemctl start striim-dbms
7. systemctl start striim-node
Now, open your favorite browser and navigate to <DNS Name>:9080
Log in with the username and the password you set up in the Azure portal, and select your preferred wizard to get
started, or go to the Apps page to start using the drag and drop UI
Use Azure Stream Analytics with SQL Data
Warehouse
5/17/2019 • 2 minutes to read • Edit Online
Azure Stream Analytics is a fully managed service providing low -latency, highly available, scalable complex event
processing over streaming data in the cloud. You can learn the basics by reading Introduction to Azure Stream
Analytics. You can then learn how to create an end-to-end solution with Stream Analytics by following the Get
started using Azure Stream Analytics tutorial.
In this article, you will learn how to use your Azure SQL Data Warehouse database as an output sink for your
Stream Analytics jobs.
Prerequisites
First, run through the following steps in the Get started using Azure Stream Analytics tutorial.
1. Create an Event Hub input
2. Configure and start event generator application
3. Provision a Stream Analytics job
4. Specify job input and query
Then, create an Azure SQL Data Warehouse database
Step 4
Click the check button to add this job output and to verify that Stream Analytics can successfully connect to the
database.
When the connection to the database succeeds, you will see a notification in the portal. You can click Test to test the
connection to the database.
Next steps
For an overview of integration, see SQL Data Warehouse integration overview.
For more development tips, see SQL Data Warehouse development overview.
Visualize data with Power BI
5/24/2019 • 2 minutes to read • Edit Online
This tutorial shows you how to use Power BI to connect to SQL Data Warehouse and create a few basic
visualizations.
Prerequisites
To step through this tutorial, you need:
A SQL Data Warehouse pre-loaded with the AdventureWorksDW database. To provision a data warehouse,
see Create a SQL Data Warehouse and choose to load the sample data. If you already have a data warehouse
but do not have sample data, you can load WideWorldImportersDW.
2. Create a report
You are now ready to use Power BI to analyze your AdventureWorksDW sample data. To perform the analysis,
AdventureWorksDW has a view called AggregateSales. This view contains a few of the key metrics for analyzing
the sales of the company.
1. To create a map of sales amount according to postal code, in the right-hand fields pane, click the
AggregateSales view to expand it. Click the PostalCode and SalesAmount columns to select them.
Power BI automatically recognized geographic data and put it in a map for you.
2. This step creates a bar graph that shows amount of sales per customer income. To create the bar graph, go
to the expanded AggregateSales view. Click the SalesAmount field. Drag the Customer Income field to the
left and drop it into Axis.
You now have a report that shows three different visualizations of the data.
You can save your progress at any time by clicking File and selecting Save.
This article lists common troubleshooting techniques around connecting to your SQL Data Warehouse.
Check service availability
Check for paused or scaling operation
Check your firewall settings
Check your VNet/Service Endpoint settings
Check for the latest drivers
Check your connection string
Intermittent connection issues
Common error messages
The status of your SQL Data Warehouse will be shown here. If the service isn't showing as Available, check
further steps.
If your Resource health shows that your data warehouse is paused or scaling, follow the guidance to resume your
data warehouse.
If you see that your service is paused or scaling, check to see it isn't during your maintenance schedule. On the
portal for your SQL Data Warehouse Overview, you'll see the elected maintenance schedule.
Otherwise, check with your IT administrator to verify that this maintenance isn't a scheduled event. To resume the
SQL Data Warehouse, follow the steps outlined here.
Server=tcp:{your_server}.database.windows.net,1433;Database={your_database};User ID={your_user_name};Password=
{your_password_here};Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;
jdbc:sqlserver://yourserver.database.windows.net:1433;database=yourdatabase;user={your_user_name};password=
{your_password_here};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;log
inTimeout=30;
You can change the default database collation from the Azure portal when you create a new Azure SQL Data
Warehouse database. This capability makes it even easier to create a new database using one of the 3800
supported database collations for SQL Data Warehouse. Collations provide the locale, code page, sort order and
character sensitivity rules for character-based data types. Once chosen, all columns and expressions requiring
collation information inherit the chosen collation from the database setting. The default inheritance can be
overridden by explicitly stating a different collation for a character-based data type.
Changing collation
To change the default collation, you simple update to the Collation field in the provisioning experience.
For example, if you wanted to change the default collation to case sensitive, you would simply rename the Collation
from SQL_Latin1_General_CP1_CI_AS to SQL_Latin1_General_CP1_CS_AS.
Links to the documentation for T-SQL language elements supported in Azure SQL Data Warehouse.
Core elements
syntax conventions
object naming rules
reserved keywords
collations
comments
constants
data types
EXECUTE
expressions
KILL
IDENTITY property workaround
PRINT
USE
Operators
+ (Add)
+ (String Concatenation)
- (Negative)
- (Subtract)
* (Multiply)
/ (Divide)
Modulo
Functions
@@DATEFIRST
@@ERROR
@@LANGUAGE
@@SPID
@@TRANCOUNT
@@VERSION
ABS
ACOS
ASCII
ASIN
ATAN
ATN2
BINARY_CHECKSUM
CASE
CAST and CONVERT
CEILING
CHAR
CHARINDEX
CHECKSUM
COALESCE
COL_NAME
COLLATIONPROPERTY
CONCAT
COS
COT
COUNT
COUNT_BIG
CUME_DIST
CURRENT_TIMESTAMP
CURRENT_USER
DATABASEPROPERTYEX
DATALENGTH
DATEADD
DATEDIFF
DATEFROMPARTS
DATENAME
DATEPART
DATETIME2FROMPARTS
DATETIMEFROMPARTS
DATETIMEOFFSETFROMPARTS
DAY
DB_ID
DB_NAME
DEGREES
DENSE_RANK
DIFFERENCE
EOMONTH
ERROR_MESSAGE
ERROR_NUMBER
ERROR_PROCEDURE
ERROR_SEVERITY
ERROR_STATE
EXP
FIRST_VALUE
FLOOR
GETDATE
GETUTCDATE
HAS_DBACCESS
HASHBYTES
INDEXPROPERTY
ISDATE
ISNULL
ISNUMERIC
LAG
LAST_VALUE
LEAD
LEFT
LEN
LOG
LOG10
LOWER
LTRIM
MAX
MIN
MONTH
NCHAR
NTILE
NULLIF
OBJECT_ID
OBJECT_NAME
OBJECTPROPERTY
OIBJECTPROPERTYEX
ODBCS scalar functions
OVER clause
PARSENAME
PATINDEX
PERCENTILE_CONT
PERCENTILE_DISC
PERCENT_RANK
PI
POWER
QUOTENAME
RADIANS
RAND
RANK
REPLACE
REPLICATE
REVERSE
RIGHT
ROUND
ROW_NUMBER
RTRIM
SCHEMA_ID
SCHEMA_NAME
SERVERPROPERTY
SESSION_USER
SIGN
SIN
SMALLDATETIMEFROMPARTS
SOUNDEX
SPACE
SQL_VARIANT_PROPERTY
SQRT
SQUARE
STATS_DATE
STDEV
STDEVP
STR
STUFF
SUBSTRING
SUM
SUSER_SNAME
SWITCHOFFSET
SYSDATETIME
SYSDATETIMEOFFSET
SYSTEM_USER
SYSUTCDATETIME
TAN
TERTIARY_WEIGHTS
TIMEFROMPARTS
TODATETIMEOFFSET
TYPE_ID
TYPE_NAME
TYPEPROPERTY
UNICODE
UPPER
USER
USER_NAME
VAR
VARP
YEAR
XACT_STATE
Transactions
transactions
Diagnostic sessions
CREATE DIAGNOSTICS SESSION
Procedures
sp_addrolemember
sp_columns
sp_configure
sp_datatype_info_90
sp_droprolemember
sp_execute
sp_executesql
sp_fkeys
sp_pdw_add_network_credentials
sp_pdw_database_encryption
sp_pdw_database_encryption_regenerate_system_keys
sp_pdw_log_user_data_masking
sp_pdw_remove_network_credentials
sp_pkeys
sp_prepare
sp_spaceused
sp_special_columns_100
sp_sproc_columns
sp_statistics
sp_tables
sp_unprepare
SET statements
SET ANSI_DEFAULTS
SET ANSI_NULL_DFLT_OFF
SET ANSI_NULL_DFLT_ON
SET ANSI_NULLS
SET ANSI_PADDING
SET ANSI_WARNINGS
SET ARITHABORT
SET ARITHIGNORE
SET CONCAT_NULL_YIELDS_NULL
SET DATEFIRST
SET DATEFORMAT
SET FMTONLY
SET IMPLICIT_TRANSACITONS
SET LOCK_TIMEOUT
SET NUMBERIC_ROUNDABORT
SET QUOTED_IDENTIFIER
SET ROWCOUNT
SET TEXTSIZE
SET TRANSACTION ISOLATION LEVEL
SET XACT_ABORT
Next steps
For more reference information, see T-SQL statements in Azure SQL Data Warehouse, and System views in Azure
SQL Data Warehouse.
T-SQL statements supported in Azure SQL Data
Warehouse
7/24/2019 • 2 minutes to read • Edit Online
Links to the documentation for T-SQL statements supported in Azure SQL Data Warehouse.
Query statements
SELECT
WITH common_table_expression
EXCEPT and INTERSECT
EXPLAIN
FROM
Using PIVOT and UNPIVOT
GROUP BY
HAVING
ORDER BY
OPTION
UNION
WHERE
TOP
Aliasing
Search condition
Subqueries
Security statements
Permissions: GRANT, DENY, REVOKE
ALTER AUTHORIZATION
ALTER CERTIFICATE
ALTER DATABASE ENCRYPTION KEY
ALTER LOGIN
ALTER MASTER KEY
ALTER ROLE
ALTER USER
BACKUP CERTIFICATE
CLOSE MASTER KEY
CREATE CERTIFICATE
CREATE DATABASE ENCRYPTION KEY
CREATE LOGIN
CREATE MASTER KEY
CREATE ROLE
CREATE USER
DROP CERTIFICATE
DROP DATABASE ENCRYPTION KEY
DROP LOGIN
DROP MASTER KEY
DROP ROLE
DROP USER
OPEN MASTER KEY
Next steps
For more reference information, see T-SQL language elements in Azure SQL Data Warehouse, and System views
in Azure SQL Data Warehouse.
System views supported in Azure SQL Data
Warehouse
8/16/2019 • 2 minutes to read • Edit Online
Links to the documentation for T-SQL statements supported in Azure SQL Data Warehouse.
NOTE
To use these views, insert ‘pdw_nodes_’ into the name, as shown in the following table:
sys.dm_pdw_nodes_db_column_store_row_group_physical_sta sys.dm_db_column_store_row_group_physical_stats
ts
sys.dm_pdw_nodes_db_column_store_row_group_operational_ sys.dm_db_column_store_row_group_operational_stats
stats
sys.dm_pdw_nodes_db_file_space_usage sys.dm_db_file_space_usage
sys.dm_pdw_nodes_db_index_usage_stats sys.dm_db_index_usage_stats
sys.dm_pdw_nodes_db_partition_stats sys.dm_db_partition_stats
sys.dm_pdw_nodes_db_session_space_usage sys.dm_db_session_space_usage
sys.dm_pdw_nodes_db_task_space_usage sys.dm_db_task_space_usage
sys.dm_pdw_nodes_exec_background_job_queue sys.dm_exec_background_job_queue
sys.dm_pdw_nodes_exec_background_job_queue_stats sys.dm_exec_background_job_queue_stats
sys.dm_pdw_nodes_exec_cached_plans sys.dm_exec_cached_plans
sys.dm_pdw_nodes_exec_connections sys.dm_exec_connections
sys.dm_pdw_nodes_exec_procedure_stats sys.dm_exec_procedure_stats
sys.dm_pdw_nodes_exec_query_memory_grants sys.dm_exec_query_memory_grants
sys.dm_pdw_nodes_exec_query_optimizer_info sys.dm_exec_query_optimizer_info
sys.dm_pdw_nodes_exec_query_resource_semaphores sys.dm_exec_query_resource_semaphores
sys.dm_pdw_nodes_exec_query_stats sys.dm_exec_query_stats
sys.dm_pdw_nodes_exec_requests sys.dm_exec_requests
sys.dm_pdw_nodes_exec_sessions sys.dm_exec_sessions
sys.dm_pdw_nodes_io_pending_io_requests sys.dm_io_pending_io_requests
sys.dm_pdw_nodes_io_virtual_file_stats sys.dm_io_virtual_file_stats
sys.dm_pdw_nodes_os_buffer_descriptors sys.dm_os_buffer_descriptors
DMV NAME IN SQL DATA WAREHOUSE SQL SERVER TRANSACT-SQL ARTICLE
sys.dm_pdw_nodes_os_child_instances sys.dm_os_child_instances
sys.dm_pdw_nodes_os_cluster_nodes sys.dm_os_cluster_nodes
sys.dm_pdw_nodes_os_dispatcher_pools sys.dm_os_dispatcher_pools
sys.dm_pdw_nodes_os_hosts sys.dm_os_hosts
sys.dm_pdw_nodes_os_latch_stats sys.dm_os_latch_stats
sys.dm_pdw_nodes_os_memory_brokers sys.dm_os_memory_brokers
sys.dm_pdw_nodes_os_memory_cache_clock_hands sys.dm_os_memory_cache_clock_hands
sys.dm_pdw_nodes_os_memory_cache_counters sys.dm_os_memory_cache_counters
sys.dm_pdw_nodes_os_memory_cache_entries sys.dm_os_memory_cache_entries
sys.dm_pdw_nodes_os_memory_cache_hash_tables sys.dm_os_memory_cache_hash_tables
sys.dm_pdw_nodes_os_memory_clerks sys.dm_os_memory_clerks
sys.dm_pdw_nodes_os_memory_nodes sys.dm_os_memory_nodes
sys.dm_pdw_nodes_os_memory_objects sys.dm_os_memory_objects
sys.dm_pdw_nodes_os_memory_pools sys.dm_os_memory_pools
sys.dm_pdw_nodes_os_nodes sys.dm_os_nodes
sys.dm_pdw_nodes_os_performance_counters sys.dm_os_performance_counters
sys.dm_pdw_nodes_os_process_memory sys.dm_os_process_memory
sys.dm_pdw_nodes_os_schedulers sys.dm_os_schedulers
sys.dm_pdw_nodes_os_sys_info sys.dm_os_sys_info
sys.dm_pdw_nodes_os_sys_memory sys.dm_os_memory_nodes
sys.dm_pdw_nodes_os_tasks sys.dm_os_tasks
sys.dm_pdw_nodes_os_threads sys.dm_os_threads
DMV NAME IN SQL DATA WAREHOUSE SQL SERVER TRANSACT-SQL ARTICLE
sys.dm_pdw_nodes_os_virtual_address_dump sys.dm_os_virtual_address_dump
sys.dm_pdw_nodes_os_wait_stats sys.dm_os_wait_stats
sys.dm_pdw_nodes_os_waiting_tasks sys.dm_os_waiting_tasks
sys.dm_pdw_nodes_os_workers sys.dm_os_workers
sys.dm_pdw_nodes_tran_active_snapshot_database_transactio sys.dm_tran_active_snapshot_database_transactions
ns
sys.dm_pdw_nodes_tran_active_transactions sys.dm_tran_active_transactions
sys.dm_pdw_nodes_tran_commit_table sys.dm_tran_commit_table
sys.dm_pdw_nodes_tran_current_snapshot sys.dm_tran_current_snapshot
sys.dm_pdw_nodes_tran_current_transaction sys.dm_tran_current_transaction
sys.dm_pdw_nodes_tran_database_transactions sys.dm_tran_database_transactions
sys.dm_pdw_nodes_tran_locks sys.dm_tran_locks
sys.dm_pdw_nodes_tran_session_transactions sys.dm_tran_session_transactions
sys.dm_pdw_nodes_tran_top_version_generators sys.dm_tran_top_version_generators
Next steps
For more reference information, see T-SQL statements in Azure SQL Data Warehouse, and T-SQL language
elements in Azure SQL Data Warehouse.
PowerShell cmdlets and REST APIs for SQL Data
Warehouse
5/6/2019 • 2 minutes to read • Edit Online
Many SQL Data Warehouse administration tasks can be managed using either Azure PowerShell cmdlets or REST
APIs. Below are some examples of how to use PowerShell commands to automate common tasks in your SQL
Data Warehouse. For some good REST examples, see the article Manage scalability with REST.
NOTE
This article has been updated to use the new Azure PowerShell Az module. You can still use the AzureRM module, which will
continue to receive bug fixes until at least December 2020. To learn more about the new Az module and AzureRM
compatibility, see Introducing the new Azure PowerShell Az module. For Az module installation instructions, see Install Azure
PowerShell.
Connect-AzAccount
Get-AzSubscription
Select-AzSubscription -SubscriptionName "MySubscription"
A variation, this example pipes the retrieved object to Suspend-AzSqlDatabase. As a result, the database is paused.
The final command shows the results.
A variation, this example retrieves a database named "Database02" from a server named "Server01" that is
contained in a resource group named "ResourceGroup1." It pipes the retrieved object to Resume-AzSqlDatabase.
NOTE
Note that if your server is foo.database.windows.net, use "foo" as the -ServerName in the PowerShell cmdlets.
Next steps
For more PowerShell examples, see:
Create a SQL Data Warehouse using PowerShell
Database restore
For other tasks which can be automated with PowerShell, see Azure SQL Database Cmdlets. Note that not all
Azure SQL Database cmdlets are supported for Azure SQL Data Warehouse. For a list of tasks which can be
automated with REST, see Operations for Azure SQL Database.
REST APIs for Azure SQL Data Warehouse
6/3/2019 • 2 minutes to read • Edit Online
Scale compute
To change the data warehouse units, use the Create or Update Database REST API. The following example sets the
data warehouse units to DW1000 for the database MySQLDW, which is hosted on server MyServer. The server is
in an Azure resource group named ResourceGroup1.
PATCH https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-
name}/providers/Microsoft.Sql/servers/{server-name}/databases/{database-name}?api-version=2014-04-01-preview
HTTP/1.1
Content-Type: application/json; charset=UTF-8
{
"properties": {
"requestedServiceObjectiveName": DW1000
}
}
Pause compute
To pause a database, use the Pause Database REST API. The following example pauses a database named
Database02 hosted on a server named Server01. The server is in an Azure resource group named
ResourceGroup1.
POST https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-
name}/providers/Microsoft.Sql/servers/{server-name}/databases/{database-name}/pause?api-version=2014-04-01-
preview HTTP/1.1
Resume compute
To start a database, use the Resume Database REST API. The following example starts a database named
Database02 hosted on a server named Server01. The server is in an Azure resource group named
ResourceGroup1.
POST https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-
name}/providers/Microsoft.Sql/servers/{server-name}/databases/{database-name}/resume?api-version=2014-04-01-
preview HTTP/1.1
GET https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-
name}/providers/Microsoft.Sql/servers/{server-name}/databases/{database-name}/maintenanceWindows/current?
maintenanceWindowName=current&api-version=2017-10-01-preview HTTP/1.1
PUT https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-
name}/providers/Microsoft.Sql/servers/{server-name}/databases/{database-name}/maintenanceWindows/current?
maintenanceWindowName=current&api-version=2017-10-01-preview HTTP/1.1
{
"properties": {
"timeRanges": [
{
"dayOfWeek": Saturday,
"startTime": 00:00,
"duration": 08:00,
},
{
"dayOfWeek": Wednesday
"startTime": 00:00,
"duration": 08:00,
}
]
}
}
Next steps
For more information, see Manage compute.
How to create a support ticket for SQL Data
Warehouse
3/15/2019 • 2 minutes to read • Edit Online
If you are having any issues with your SQL Data Warehouse, create a support ticket so the engineering support
team can assist you.
3. On the Help + Support blade, click New support request and fill out the Basics blade.
Select your Azure support plan.
Billing, quota, and subscription management support are available at all support levels.
Break-fix support is provided through Developer, Standard, Professional Direct, or Premier
support. Break-fix issues are problems experienced by customers while using Azure where there
is a reasonable expectation that Microsoft caused the problem.
Developer mentoring and advisory services are available at the Professional Direct and
Premier support levels.
If you have a Premier support plan, you can also report SQL Data Warehouse related issues on
the Microsoft Premier online portal. See Azure support plans to learn more about the various
support plans, including scope, response times, pricing, etc. For frequently asked questions about
Azure support, see Azure support FAQs.
4. Fill out the Problem blade.
NOTE
By default, each SQL server (for example, myserver.database.windows.net) has a DTU Quota of 45,000. This
quota is simply a safety limit. You can increase your quota by creating a support ticket and selecting Quota as
the request type. To calculate your DTU needs, multiply 7.5 by the total DWU needed. For example, you would
like to host two DW6000s on one SQL server, then you should request a DTU quota of 90,000. You can view
your current DTU consumption from the SQL server blade in the portal. Both paused and unpaused databases
count toward the DTU quota.
Watch the latest Azure SQL Data Warehouse videos to learn about new capabilities and performance
improvements.
To get started, select the overview video below to learn about the new updates to Azure SQL Data Warehouse.
Also, learn how Modern Data Warehouse patterns can be used to tackle real world scenarios such as cybercrime.
To create your end-to-end data warehouse solution, choose from a wide variety of industry-leading tools. This
article highlights Microsoft partner companies with official business intelligence (BI) solutions supporting Azure
SQL Data Warehouse.
Next Steps
To learn more about some of our other partners, see Data Integration partners and Data Management partners.
SQL Data Warehouse data integration partners
5/17/2019 • 3 minutes to read • Edit Online
To create your data warehouse solution, choose from a wide variety of industry-leading tools. This article
highlights Microsoft partner companies with official data integration solutions supporting Azure SQL Data
Warehouse.
2.Informatica PowerCenter
PowerCenter is a metadata-driven data
integration platform that jumpstarts
and accelerates data integration
projects in order to deliver data to the
business more quickly than manual
hand coding. It serves as the
foundation for your data integration
investments
To create your data warehouse solution, choose from a wide variety of industry-leading tools. This article
highlights Microsoft partner companies with data management tools and solutions supporting Azure SQL Data
Warehouse.
Next Steps
To learn more about other partners, see Business Intelligence partners and Data Integration partners.