Practical AI in SQL Server 2025: A Vector Demo Database For You

Today, I have the honor and pleasure of debuting a new presentation for MSSQLTips: A Practical Introduction to Vector Search in SQL Server 2025 (you can watch the recording here too). To accompany that new presentation, I opted to create a new demo database instead of retrofitting one of my existing demo databases. And I’m sharing it with you so you don’t have to go through the headache of taking an existing database and creating vector embeddings.

RecipesDemoDB

Background about the Database

This new database was built with SQL Server 2025 CTP 2.1, and backed up using ZSTD-high compression, weighs in around 16GB striped across 8 backup files.

The dbo.recipes table contains just under 500k recipes, and weighs in at about 2GB. This data was sourced from kaggle and is a dump of recipes from food.com.

Next, there’s other tables under the vectors schema, that contain vector embeddings. The naming scheme is such that those tables correspond to the same named column in dbo.recipes. ex: dbo.recipes.description -> vectors.recipes_description. There is one table that is called recipes_other_cols, which is a JSON concatenation of some of the shorter columns from dbo.recipes – name, servings, and serving_size. Each of the vectors.* tables also have a vector index. All of the vector data is about 22 or 23GB, bringing the total database to about 24-25GB in full.

And finally, there’s a few example stored procedures with KNN and ANN code examples. I would also suggest checking my Practical Intro to Vector Search repo which has some other demo code.

You’ll still need to have Ollama setup and make a few changes to match your own environment. Make sure you use the same embedding model that I did (nomic-embed-text) so any vector embeddings you subsequently create match.

And finally, there is also a sub-folder on the demo-dbs repo that has all of the different “steps” that I took to create the various tables and generate the vector embeddings.

Why Should I Use this Database? Creating Vector Embeddings

I am running a Lenovo P14S with a Nvidia GeForce 3080 GPU connected via TBT3 to an external GPU housing. For the ~500k recipes, and 5 or 6 embedding tables, the entire process took an entire weekend. I don’t have an exact time, because I’d kick off one table to process, then come back later/the next day, validate the data, then run the next one. So yeah, it took a while, hence why I thought I’d share this database to save time for others.

Wrapping Up

If you decide to start using this demo database for your own learning and testing of vector search, I’d love to hear about it. And if you write any interesting demo code that you’d be willing to share, that’d also be amazing as well! As always, please let me know if you run into any quirks or have any feedback.

Happy learning – thanks for reading!

Practical AI in SQL Server 2025: Introducing Vector Search – Pt 1

SQL Server 2025 is on the horizon, and one of its new capabilities is support for Vector Search.

To start off, I’ll confess that I was an initial skeptic about AI and its practical uses. I’m not a fan of overused blanket terms/technology and clickbait fads… NoSQL will eliminate all SQL databases… Cloud means the end of on-prem data centers… blah, blah, blah. But I’ve taken time to dig deeper into Vector Search, its role under the umbrella of AI, and I now believe it has practical, legitimate value.

Help Me Find Something

SQL (Structured Query Language) is all about querying data to help us find data that we want. WHERE predicates enable us to do keyword searches, sometimes with wildcards, other times with ranges, and sometimes with crazy programmatic logic.

But this is not how we humans converse on a daily basis. If I’m talking to Deborah, I’m not going to ask her questions using formal SQL Syntax.

SELECT *
FROM dbo.dinner_recipes
WHERE ingredient_list LIKE ‘%chicken%’
AND cooking_time_min <= 30
AND rating > 4.0;

Instead, I’ll communicate with her with “natural” language.

“Hey Deborah. I see we have some chicken in the fridge. What can we make for dinner with that, in under 30 minutes, that’s something that we’ve like when we’ve cooked it before?”

And through the use of natural language, I’m not constrained with how I ask my questions. This gives my questioning abilities far greater power and flexibility.

“I see we have an onion, broccoli, and cheese in the fridge, plus some sourdough bread. Any ideas on what we can do with any of those things to accompany the chicken?”

So what if I could utilize the flexibility of natural language to query the data in a database? That is where vector search plays a key role in changing our data querying and retrieval methodology.

A High Level Breakdown First

Let’s pretend I have an application with search functionality. I’d argue that most applications’ searching capabilities are constrained and limited. We can only searching via specific keywords, maybe with wildcards, ranges of values, and other fairly strict criteria.

But what if that application is upgraded with vector search capabilities? How would that work?

First, I could type a natural language question into an application. That application will encode that question into a vector; I’ll call it a “question vector” for conversation. Prior, I will have also encoded a subset of my human readable data into vectors as well; I’ll refer to that set of vectors as my “dataset vectors.”

My question vector is then compared against my dataset vectors. Vector search will find a set of dataset vectors that are very close to my question vector. This is known as a Similarty Search. That result set of dataset vectors represents the information that answers my original question. Those dataset vectors are then passed to a Large Language Model (LLM). The LLM will then translate those dataset vectors and summarize the results into something that’s human consumable.

In a nutshell, I’ve just described an AI chatbot like ChatGPT. Yes, there’s other details and nuances and extra pieces of the puzzle, this is high level introduction first.

How Can This Help Me Today?

The thing to really understand about this methodology is that it is very useful for analysis of existing data, particularly when looking for data that is “similar to X” as opposed to data that is “equal to X”.

What I’d like to encourage you to do, is to think creatively about the data in your databases today. Think about what questions can you ask of your data, that I cannot easily do so now with traditional query predicates?

Would it be helpful if you could ask of your data…

Are there any common trends throughout?

Are there any anomalies or interesting outliers?

Do we have any data that is similar to this other data?

How About a Real Business Example?

Let’s pretend you have a support ticket system. You regularly get customer tickets for various issues. Each ticket has free text, like the original customer’s complaint write-up, maybe copies of customer/support e-mail exchanges, and internal notes added by support personnel and other internal employees.

If I’m a new support person and I get a ticket about Error 5920, it’s fairly easy to do a traditional keyword search on “Error 5920.” This will help me to find prior tickets and learn how it was resolved by others in the past.

Now, what if I have a customer written description of an error scenario? Maybe it has some instructions to reproduce, but no explicit error message, how can I search that? I might try to cherry pick some key words to then search with, but that can be hit or miss.

Now, if my ticket data was also vectorized and I had vector search capabilities, I could now start asking natural language questions. “I have a customer that clicked this button on this page, and it resulted in this unexpected behavior… has anyone else reported this for build version 3.12 or later?” While any search has the potential to yield false positives, this now has greater possibility to zero in on exactly what you need.

If I have vector search, I can also do bigger picture analysis of data.

Find common issues:
For all tickets logged in the last 30 days, are there any common issues that were reported repeatedly?

Find regressions:
Compare tickets from Build 3.12, 3.5, and 4.0. Are there any new common issues that appeared in 3.12, did not in 3.5, but are now back in 4.0?

Find interesting issues:
For Build 4.0, were there any tickets with much lengthier working notes that might indicate a very complex issue was encountered?

Find anomalies:
Instead of common issues, what issues were reported five times or less, in the last 60 days?

I would strongly argue that these are all very powerful questions to ask of your data, that can be answered rapidly, utilizing the power of Vector Search.

Have an Open Mind & Think Creatively

There’s been a lot of hype and noise around AI. A lot of it is fluff. But I hope that you now have a better idea why I believe that Vector Search can be an extremely powerful tool to more quickly gain key insights into your data in ways that were not reasonably possible before. I would encourage you to think deeper about how these ideas can benefit you in your applications and day-to-day.

Stay tuned for Part 2, when I dive deeper into what a Vector is, and how Vector Search works.

Thanks for reading!

P.S.

Want to give SQL Server 2025 and Vector Search a try? Go download the Public Preview now and you can use my guide to get Ollama setup on your local machine too.

Practical AI in SQL Server 2025: Ollama Quick Start

With the announcement of SQL Server 2025 Public Preview, hopefully you are interested in test driving Vector Search.

Microsoft has already posted demo code, but it’s only for OpenAI on Azure. But many of us are wondering about running things locally. So I thought I’d share a step-by-step of getting Ollama setup and running locally on my laptop. End-to-end, these instructions should take less than 30 minutes to complete.

If you’re on the Docker bandwagon, jump over to Anthony Nocentino’s fast start with Docker Compose instead!

My Setup

My laptop is a Thinkpad P14s Gen 5 AMD, with a Ryzen 7 PRO 8840 HS, 64GB of RAM, and 4TB of storage and running Windows 11 Pro. I also have a Razer Core X eGPU with an NVidia RTX 3070 connected to my laptop.

I run VMware Workstation on my laptop, so have multiple SQL Server VMs including a fresh one for SQL Server 2025. I want Ollama to use my eGPU, but because VMware Workstation does not support GPU passthrough (to my knowledge), I have to install Ollama on my base OS. As such, you’ll see some extra steps here that can be skipped if you’re just doing everything either locally or within a VM.

You’ll need:

Ollama Setup

First, just run the exe & click through the wizard. That’s easy!

Open a PowerShell prompt and run this:

ollama run llama3.2

This will download the llama3.2 model and begin an interactive session after it completes setup of the model. You can start typing and have a conversation with the model!


Once you’re done, you can exit interactive mode, but Ollama should still remain running in the background. Verify this by checking your System Tray for Ollama.

Now let’s confirm connectivity with this test. Run this in the same PowerShell Prompt:

curl http://localhost:11434

Next, you’ll want to pull an embedding model down. There are several that you can choose from: I’m choosing nomic-embed-text.

ollama pull nomic-embed-text

Now let’s test again. Run this in your PowerShell Prompt:

Invoke-RestMethod  -Uri http://localhost:11434/api/embeddings `
     -Method Post  `
     -Body '{"model": "nomic-embed-text", "prompt": "This is a test sentence to embed."}' `
     -ContentType 'application/json'

NOTE: If you’re browsing other embedding models like mxbai-embed-large, you may see that there’s a curl REST API example command. When I was first playing around, I kept getting a “405 Method Not Allowed” error with that example command. This is because curl in PowerShell is different than if it were run from Linux, which is what the doc page assumes. So I just used Invoke-RestMethod instead.

Next, we must address one annoyance that I have with Ollama – it only listens on HTTP and not HTTPS. SQL Server can only communicate over HTTPS (and rightfully so). To resolve this, we must setup a reverse proxy. This is where nginx and mkcert will come into play.

mkcert Setup

Again in your PowerShell Prompt:

mkcert -install

dir C:\Users\<YOUR USERNAME>\AppData\Local\mkcert\rootCA.pem


Next we’ll create a certificate for our Windows machine. My IP address is 192.168.1.160.

mkcert <YOUR IP ADDRESS> localhost 127.0.0.1 ::1


Go to your nginx folder, create a new subfolder called certs, and copy the two new .pem files there.


Next we’ll setup nginx. Just unzip the download and switch to that folder. We’ll now modify \conf\nginx.conf that’s in the nginx folder.

In the default nginx.conf file, there will be a large segment that starts with http and is enclosed by { } braces. Replace all of that with this (BE SURE TO MAKE CHANGES FOR YOUR ENV):

http {
    server {
        listen 443 ssl;
        server_name <YOUR IP ADDRESS HERE>;

        ssl_certificate      ../certs/<YOUR CERT FILENAME HERE>.pem;
        ssl_certificate_key  ../certs/<YOUR CERT FILENAME HERE>-key.pem;

        location / {
            proxy_pass http://localhost:11434;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }
    }
}

Save and start ngnix.

ngnix

NOTE: This will run ngnix actively within the prompt session you have open. To kill this, you’ll want to run the following command in another session.

ngnix -s stop

Later, if you want to start ngnix and run it in the background

start ngnix

Now let’s quick test again with curl. The first will go directly to ollama while the second will go through nginx to ollama

curl http://localhost:11434

curl https://localhost

Invoke-RestMethod -Uri https://localhost/api/embeddings `
   -Method Post `
   -Body '{"model": "nomic-embed-text", "prompt": "This is a test sentence to embed."}' `
   -ContentType 'application/json'

Note the differences:
* The first goes to the original HTTP URI with the Ollama port specified.

* The second goes to HTTPS but does not need the Ollama port specified because of the nginx reverse proxy is listening on 443 and passing traffic to 11434.
* And finally we’ll hit the REST API via HTTPS and create a vector embedding.

VMware Workstation Setup

Next, let’s switch to our VM and get remote access going.

First, we must import the root certificate that we created earlier to our VM.

  1. Copy C:\Users\\AppData\Local\mkcert\rootCA.pem to somewhere on your VM.
  2. Open the Windows Certificate Manager (certmgr.msc)
    • Start -> “Manage computer certificates”
  3. Navigate to Trusted Root Certication Authorities -> Certificates
  4. Right click Certificates -> Import
  5. Navigate to your copied rootCA.pem (you’ll want to change the “Field name:” field to *.pem)
  6. Complete the wizard


Now let’s test from within the VM:

curl https://192.168.1.160

Invoke-RestMethod -Uri https://192.168.1.160/api/embeddings `
  -Method Post `
  -Body '{"model": "nomic-embed-text", "prompt": "This is a test sentence to embed."}' `
  -ContentType 'application/json'

Test within SQL Server 2025

Finally, let’s confirm that this all works within SQL Server!

First, some setup stuff.

-----
-- SETUP
USE master;
GO

-- Enable rest endpoint
sp_configure 'external rest endpoint enabled', 1
GO
RECONFIGURE WITH OVERRIDE;
GO

-- Turn on Trace Flags for Vector Search
DBCC TRACEON (466, 474, 13981, -1) 
GO

-- Check trace flags status
DBCC TraceStatus
GO

Now we’ll use some new T-SQL to create an External Model. This will communicate with Ollama’s HTTPS REST API endpoint that we took the time to create earlier.

-----
-- Create external model
USE CookbookDemo
GO

DROP EXTERNAL MODEL ollama_nomic_embed_text
GO

CREATE EXTERNAL MODEL ollama_nomic_embed_text
WITH (
    LOCATION = 'https://192.168.1.160:443/api/embed',
    API_FORMAT = 'Ollama',
    MODEL_TYPE = EMBEDDINGS,
    MODEL = 'nomic-embed-text'
);
GO

-- Verify
SELECT *
FROM sys.external_models;
GO

Now we will create some vector embeddings using the new T-SQL function AI_GENERATE_EMBEDDINGS().

The below code uses my CookbookDemo database. Of course, you’re welcome to use AdventureWorks or any other database of your choosing, but just make sure to change the final example SELECT statement.

-----
-- Test!
SELECT AI_GENERATE_EMBEDDINGS(
        N'this is some text that I want to turn into a vector embedding' USE MODEL ollama_nomic_embed_text
    ) AS GeneratedEmbedding;


-----
-- Run against a table with some data
SELECT 
    TOP 5000
    RecipeName,
    Ingredients,
    AI_GENERATE_EMBEDDINGS(
            Ingredients USE MODEL ollama_nomic_embed_text
        ) AS Ingredients_Embeddings
FROM dbo.Recipes_Flat;


Note that Ollama is using my external GPU (GPU 1 – 3D) to process the Ingredients records into Vector Embeddings! It took about 3 minutes to generate 5000 embeddings with my hardware. I do plan on looking into tweaking Ollama and my setup to improve performance. But hey, this works!

What Next?

Hopefully this quick start guide helped you get Ollama up and running, so you can start playing around now.

If you want to keep diving deeper, you’ll want to create another table with a Vector datatype, and store a foreign key plus the embedding value. Then you can start using vector search T-SQL to do some KNN searching, or create a DiskANN Index to do some ANN search. Have no idea what I just said?!? Don’t worry – I’ll be writing more blog posts to walk you through all of this!

And don’t forget to check out my other blogs on Vector Search too!

Thanks for reading!

Combat Database Bloat with Data Virtualization

One of SQL Server 2022’s new features is something called Data Virtualization. It enables T-SQL to directly query files that reside in Azure object storage or S3-compatible object storage. In my opinion, since SQL Server 2022’s release, it’s one of those underrated capabilities that I think many have glossed over. But I strongly believe that it is insanely useful and you should take a few minutes to learn more!

Data Bloat Anyone?

Ask yourself, do you have a database that has a large volume of that that will never change again? Sales order history data is a generic but very relatable example. Once an order is placed, it might be amended but after a few months, it’s highly probable that it’ll never change again. We’ll pretend for the sake of our discussion that after 3 months, that data is considered set in stone.

Can We PLEASE Archive/Delete That Old Data?

Who has begged their business colleagues if they could de-bloat a database by archiving off and deleting old, ancient data from a primary production database? I see your raised hands and raise mine too. There’s many practical reasons to de-bloat a database, from performance reasons to maintenance reasons. But then there’s also the business folks that say “no we can’t because we MIGHT need to report on it.” Might… So like my stubborn beer belly fat, that data bloat remains.

Meet Parquet (… not butter)

We should all know what a Comma Separated Values (CSV) flat file. Because of its flat, singular nature, it’s not very efficient to query because the entirety of the file must be scanned, consumed, processed, etc. But what if we had a different file format whose entirety did not need to be fully processed when querying? Wouldn’t it be awesome to have a simpler flat file format, that could offer column elimination, row elimination, and compression for our data and queries? Let me introduce you to Parquet.

Pretend your sales order history table has 30 different columns of data. And let’s say you were to export all of your sales order history into individual CSV’s per calendar year. If you wanted to query say, all sales in May, June, & July of 2021 and get a SUM of total sales, you’d have to consume the entirety of the 2021 CSV file. But with a Parquet file, metadata is utilized to enable you to zero in on rows for May, June, and July only. And instead of all 30 columns, you can just retrieve sales date and sales amount ONLY, and not consume the other 28 columns! That amounts to a tremendous savings from a workload perspective!

Data Virtualization

So how does SQL Server 2022 come into play here? First, we create something called an External Data Source which in this case will point to S3 object storage somewhere. Then we will combine it with creating an External File Format for our Parquet file. The third piece of the solution is Create External Table as SELECT (aka CETAS).

The end result is that each Parquet file will now become available to us within T-SQL as an External Table entity. So I can write T-SQL to query a Parquet file and my code will look identical.

CREATE EXTERNAL TABLE parquet.SalesOrderHistory_2021
WITH (
	LOCATION = '/SalesOrderHistory_2021.parquet', 
	DATA_SOURCE = cetas_demo, 
	FILE_FORMAT = parquet_file_format_object
)
AS 
SELECT 
	[SalesID], 
	[SalesDate],
	[ProductID],
	[Email],
	[PhoneNumber],
	[OtherColumns]
FROM parquet.SalesOrderHistory_2021;
GO

SELECT SalesID, SalesDate
FROM parquet.SalesOrderHistory_2021
WHERE SalesDate >= '2021-05-01' AND SalesDate < '2021-08-01'

Now you may have noticed that we have a singular table for 2021 data, or 1 table per Parquet file. You might be thinking, will that be a problem if I have many years worth of data, thus many Parquet files? This is where another old T-SQL trick comes into play – partitioned views. I can have multiple Parquet files virtualized via multiple External Tables, but overlay them all with a single partitioned view, to then enable me to query everything as I once did.

CREATE OR ALTER VIEW dbo.SalesOrderHistory
AS
SELECT 
	[SalesID], [SalesDate], [ProductID], [Email], [PhoneNumber], [OtherColumns]
FROM parquet.SalesOrderHistory_2020
UNION ALL
SELECT 
	[SalesID], [SalesDate], [ProductID], [Email], [PhoneNumber], [OtherColumns]
FROM parquet.SalesOrderHistory_2021
UNION ALL
SELECT 
	[SalesID], [SalesDate], [ProductID], [Email], [PhoneNumber], [OtherColumns]
FROM parquet.SalesOrderHistory_2022
UNION ALL
SELECT 
	[SalesID], [SalesDate], [ProductID], [Email], [PhoneNumber], [OtherColumns]
FROM parquet.SalesOrderHistory_2023
UNION ALL
SELECT 
	[SalesID], [SalesDate], [ProductID], [Email], [PhoneNumber], [OtherColumns]
FROM parquet.SalesOrderHistory_2024
UNION ALL
SELECT 
	[SalesID], [SalesDate], [ProductID], [Email], [PhoneNumber], [OtherColumns]
FROM dbo.SalesOrderHistory_Current
GO

SELECT SalesID, SalesDate
FROM dbo.SalesOrderHistory
WHERE SalesDate >= '2021-05-01' AND SalesDate < '2021-08-01'

Be sure to look closely at the final SELECT in the partitioned view. That’s a reference to a traditional SQL Server table! What this means is that instead of having a dbo.SalesOrderHistory table that contains everything including new orders, I’ve physically partitioned the data out into 5 Parquet files, virtualized via External Tables, AND still included all current sales order data as well. So existing T-SQL queries against dbo.SalesOrderHistory (before a Parquet split) has no idea and does NOT need to be changed!

You Mentioned Azure & S3… That Means I Need to Cloud, Right?

Whenever people (myself included, for the longest time) hear “S3” and/or “object storage,” they typically mentally default to AWS S3 Object Storage. But let’s break that apart. In brief, “object storage” is essentially a storage architecture geared towards unstructured data like images, videos, and documents. “S3” is Amazon’s “Simple Storage Service” which I’d loosely describe as Amazon’s implementation approach for object storage. The good news here, is that you’re not just stuck with Amazon. S3-compatible object storage is also a thing. This is where other vendors have their own object storage implementations but leverage the Amazon S3 API.

But my data is on-prem and staying that way. This is where another common misconception comes in. Because “Amazon S3 object storage” has become such a common term, many do not realize that S3-compatible on-prem storage is a thing! And yes, this is where Pure Storage comes into the picture with our FlashBlade series. You can have all of this awesomeness on-prem without having to consider compromising with the cloud.

I Want To See This in Action!

Hopefully you’re chomping at the bits to give this a go. I’d like to leave you with three resources to help you get jump started:

  1. Watch… A New Solution for Data Archiving using Data Virtualization
    This is my Pure Storage vendor session from PASS Summit 2023, that covers everything end to end with demos!
  2. Learn… Microsoft’s Introduction to SQL Server 2022 Data Virtualization learning path
    Go learn straight from Microsoft!
  3. Try… Pure Storage’s Modern Storage Platforms for SQL Server workshop
    Regardless of whether you’re a Pure Storage customer or not, you can review Module 4 of this workshop.
    And if you are a Pure Storage customer WITH a FlashBlade, use this workshop to try it out for yourself!
    And if you are a Pure Storage customer but do not have a FlashBlade, you can contact your Account Team and request a Test Drive voucher for this workshop. This will give you access to a virtual lab with hardware, to work through the lab workshop.

Thanks for reading!

Rapidly Recovering from SQL Server Data Mishaps with FlashArray Snapshots

Beginning of a Bad Day

Almost all of us have experienced a time when someone incorrectly modifies or deletes a subset of data in a database. And usually the only recourse is to run a database restore to get the data that we need back.

But… what if we cannot simply run a RESTORE operation over the entire database. Maybe the accident occurred a few hours prior and you cannot roll everything back. You need to do a side-by-side RESTORE.

But… what if the database in question is absolutely massive and will take hours to restore? And what if your server doesn’t have enough available storage to fit a second copy of that database at all?

FlashArray Snapshots to the Rescue!

If you’re fortunate to have your SQL Servers backed by Pure Storage, you can solve this quickly with minimal headache. The key is FlashArray snapshots.

This entire process does NOT require application consistent snapshots either. You can utilize crash consistent snapshots (that do NOT require VSS and do NOT stun your SQL Server when taken), and hopefully you’re already taking these today on your FlashArray. Because of how our snapshots work behind the scenes, you will not consume additional capacity on your storage array either. Best of all, it does not matter if your database was 50GB or 50TB – all of these actions will be nearly instantaneous.

I Need to Execute an Object-Level Restore RIGHT NOW!

Here is a step-by-step video walkthrough. Watch and pause it along the way, to emulate the steps I am doing.

Here are the functional steps (starting at 5m 23s):

  1. Add new volume(s) to your SQL Server, of the exact same size as the existing volume(s) that contain your database’s data and log files.
  2. In FlashArray: Protection Group’s UI, select a snapshot, and find the volume(s) you wish to clone – aka “Copy Snapshot.” Overwrite the volume(s) created in step 1 with the member volume of the Protection Group Snapshot.
  3. In the Windows OS – Disk Management – find the newly added volume(s)/disk(s) and Set Online.
  4. Attach the newly cloned database files in SQL Server (with a different name of course)
  5. Use T-SQL to insert/update/copy from your restored database back to your main database.
  6. Clean-up: drop the restored database, set the volume(s)/disk(s) offline in Windows, delete the disks from your VM

Assumptions & Pre-Requisites:

  • Your SQL Server is a VMware VM – this will work with bare metal as well, but with a slight adjustment to the steps
  • Your VMware VM is using vVols – this will work with RDMs as well, but with a slight adjustment to the steps. VMFS will also work, but you’ll need a slightly different set of steps which I’ll cover in a future blog
  • The volumes containing your SQL Server data and log files are being snapshotted by FlashArray on a pre-scheduled basis, all as part of a Protection Group

Conclusion

If you’re NOT in the midst of an emergency right now, I’d encourage you to practice this on a non-Prod server. Once you grasp the steps and workflow, you’ll see how easy it is and can now add this tool/technique to your arsenal for the next time someone accidentally mangles some data in your database.

Thanks for reading!

Career Thoughts: Never Would I Ever…

Today, Nov 1st, marks my 3 year anniversary at Pure Storage. And this milestone has put me into an introspective mood.

What Do I Want to Be When I Grow Up?

When I studied Computer Science at Marquette, I figured I’d wind up becoming a software developer of some sort. Or maybe a systems administrator. My first job out of college was as a generalist web developer & sys admin, so I got exposed to a ton of different things.

Then I specialized into SQL Server… first as an Jr. Ops DBA, but then shifted solely into the sector of being a T-SQL developer. I thrived there and figured I’d remain in that realm for the rest of my career.

Then after a decade-plus, I wound up finding myself back in an Ops DBA role. Never really thought I’d pivot back that way, since at the time, I barely even understood SQL Server clustering options, much less ever set up or managed an FCI or AG outside of a training class. Even so, after a few roles mixing Dev DBA and Ops DBA, I figured I’d remain solely in the SQL Server tech realm the rest of my career.

Then I got a call (or DM really)… and landed my first role as a Sales Engineer/Solutions Engineer for SentryOne. I never imagined I’d ever find myself in some kind of “sales” type role at all. And boy was it foreign to me. But others saw potential in me which is why I got recruited, and they were right, and I thrived there. And I figured I may remain in that realm the rest of my career.

Then 3 years ago, I joined Pure Storage. What was foreign to me re: Pure, is that I was never a hardware guy. I still remember sitting in “intro to storage” sessions at SQL Saturdays, just trying to wrap my brain around what the hell latency and IOPs all really meant. iSCSI & Fibre Channel were foreign to me as well. And I had rudimentary knowledge of VMware, virtualization, HA, and DR concepts and strategies.

And now, 3 years later, these are all things I talk about on a daily basis. Next week, I’m debuting a new session that I’m calling A Practical Deep Dive into I/O for the T-SQL Performance Tuner. It’s actually geared towards “Andy 5 years ago” who was definitely a T-SQL Perf Tuner but still not totally a hardware or I/O stack person. I’m still amazed at how far I’ve grown. And I’m still learning new stuff all the time!

TL;DR – Never Say Never…

The point to all of this, and this blog post, is to share with everyone how I’ve found myself in completely new realms throughout my career. And though today, you might think “I never could or would do that,” you really don’t know what life has in store for you. But if you have an open heart, an open mind, and an eagerness to learn, you absolutely can.

Thanks for reading.