The Diskspd Storage Performance Tool
The Diskspd Storage Performance Tool
DiskSpd is a highly customizable I/O load generator tool that can be used to run storage performance
tests against files, partitions, or physical disks. DiskSpd can generate a wide variety of disk request
patterns for use in analyzing and diagnosing storage performance issues, without running a full end-to-
end workload. You can simulate SQL Server I/O activity or more complex, changing access patterns,
returning detailed XML output for use in automated results analysis.
This document corresponds to version 2.0.17 of DiskSpd.
Contents
1 Vision for the DiskSpd tool................................................................................................................... 3
1.1 Acknowledgments ...................................................................................................................... 3
2 The DiskSpd command and parameters ............................................................................................. 3
2.1 DiskSpd basic parameters ........................................................................................................... 3
2.2 DiskSpd event parameters .......................................................................................................... 8
2.3 ETW parameters for use with DiskSpd.exe................................................................................. 8
2.4 Size conventions for DiskSpd parameters .................................................................................. 9
3 Customizing DiskSpd tests ................................................................................................................. 10
3.1 Display usage information ........................................................................................................ 10
3.2 Set a test duration .................................................................................................................... 10
3.3 Control caching ......................................................................................................................... 10
3.4 Set random or sequential access hints ..................................................................................... 11
3.5 Block Size .................................................................................................................................. 12
3.6 Test random I/O........................................................................................................................ 12
3.7 Test sequential I/O ................................................................................................................... 12
3.8 Perform a write test.................................................................................................................. 12
3.9 Set a base target offset ............................................................................................................. 13
3.10 Set the maximum target offset................................................................................................. 13
3.11 Limit the total number of threads ............................................................................................ 13
3.12 Access from multiple threads & thread stride ......................................................................... 14
3.13 Number of outstanding I/O requests ....................................................................................... 15
3.14 Balance queues ......................................................................................................................... 17
3.15 Alternatively, specify “think” time and I/O Bursts ................................................................... 18
3.16 Rate limits ................................................................................................................................. 18
3.17 Use completion routines instead of I/O completion ports....................................................... 19
3.18 Set CPU affinity ......................................................................................................................... 19
3.19 Create test files automatically .................................................................................................. 19
1 of 30
3.20 Separate buffers for Read and Write operations ..................................................................... 20
3.21 Include performance counters ................................................................................................. 21
3.22 Display a progress indicator...................................................................................................... 23
3.23 Control the initial state of the random number generator ...................................................... 23
3.24 Run DiskSpd in verbose mode .................................................................................................. 23
3.25 Use named events to synchronize testing ................................................................................ 23
3.26 Use an XML file to provide DiskSpd parameters ...................................................................... 24
4 Canceling a test run (CTRL+C) ............................................................................................................ 24
5 Analyzing DiskSpd test results ........................................................................................................... 24
5.1 Latency ...................................................................................................................................... 25
5.2 IOPs statistics ............................................................................................................................ 26
6 XML results processing ...................................................................................................................... 26
7 Sample command lines ...................................................................................................................... 28
8 Future Improvements ........................................................................................................................ 30
8.1 Verification of written data ...................................................................................................... 30
8.2 Dynamic warm-up .................................................................................................................... 30
2 of 30
1 Vision for the DiskSpd tool
The DiskSpd tool provides the functionality needed to generate a wide variety of disk request patterns,
helpful in diagnosis and analysis of storage-based performance issues. For example, it can be used to
simulate SQL Server I/O activity and more complex patterns of access which change over time. It enables
the user to analyze storage performance without running a full end-to-end workload.
DiskSpd presents results in both a text summary and also a detailed XML form suitable for automated
result analysis.
1.1 Acknowledgments
Previous versions of DiskSpd were developed under Jim Gray by Peter Kukol from Microsoft Research,
results from which can be found here: http://research.microsoft.com/barc/Sequential_IO/.
A binary release of DiskSpd can be obtained here: http://aka.ms/diskspd
DiskSpd is open source (MIT License) and can be found here: https://github.com/microsoft/diskspd
Parameter Description
-? Displays usage information for DiskSpd.
-ag Group affinity - affinitize threads in a round-robin manner
across Processor Groups, starting at group 0.
This is default. Use -n to disable affinity.
-ag#,#[,#,...] Advanced CPU affinity - affinitize threads round-robin to the
CPUs provided. The g# notation specifies Processor Groups
for the following CPU core #s. Multiple Processor Groups may
3 of 30
Parameter Description
be specified, and groups/cores may be repeated. If no group
is specified, 0 is assumed.
Additional groups/processors may be added, comma
separated, or on separate parameters.
Examples:
-a0,1,2 and -ag0,0,1,2 are equivalent.
-ag0,0,1,2,g1,0,1,2 specifies the first three cores in groups 0
and 1. -ag0,0,1,2 -ag1,0,1,2 is an equivalent way of specifying
the same pattern with two -ag# arguments.
-b<size>[K|M|G] Block size in bytes or KiB, MiB, or GiB (default = 64K)
-B<offset>[K|M|G|b] Base target offset in bytes or KiB, MiB, GiB, or blocks from the
beginning of the target (default offset = zero)
-c<size>[K|M|G|b] Create files of the specified size. Size can be stated in bytes or
KiBs, MiBs, GiBs, or blocks.
-C<seconds> Cool down time in seconds - continued duration of the test
load after measurements are complete (default = zero
seconds).
-D<milliseconds> Capture IOPs higher-order statistics in intervals of
<milliseconds>. These are per-thread per-target: text output
provides IOPs standard deviation, XML provides the full IOPs
time series in addition (default = 1000ms or 1 second).
-d<seconds> Duration of measurement period in seconds, not including
cool-down or warm-up time (default = 10 seconds).
-f<size>[K|M|G|b] Target size - use only the first <size> bytes or KiB, MiB, GiB or
blocks of the specified targets, for example to test only the
first sectors of a disk.
-f<rst> Open file with one or more additional access hints specified
to the operating system:
r : the FILE_FLAG_RANDOM_ACCESS hint
s : the FILE_FLAG_SEQUENTIAL_SCAN hint
t : the FILE_ATTRIBUTE_TEMPORARY hint
Note that these hints are generally only applicable to cached
IO.
-F<count> Total number of threads. Conflicts with -t, the option to set
the number of threads per file.
4 of 30
Parameter Description
-g<bytes per ms> Throughput per-thread per-target is throttled to the given
number of bytes per millisecond. This option is incompatible
with completion routines (-x).
-h Deprecated but still honored; see -Sh.
-i<count> Number of IOs (burst size) to issue before pausing. Must be
specified in combination with -j.
-j<milliseconds> Pause in milliseconds before issuing a burst of IOs. Must be
specified in combination with -i.
-I<priority> Set IO priority to <priority>. Available values are: 1-very low,
2-low, 3-normal (default).
-l Use large pages for IO buffers.
-L Measure latency statistics. Full per-thread per-target
distributions are available with XML result output.
-n Disable default affinity (-a).
-o<count> Number of outstanding I/O requests per target per thread. (1
= synchronous I/O, unless more than 1 thread is specified
with by using -F) (default = 2).
-p Start asynchronous (overlapped) I/O operations with the
same offset. Only applicable with 2 or more outstanding I/O
requests per thread (-o2 or greater)
-P<count> Enable printing a progress dot after the specified each
<count> [default = 65536] completed of I/O operations,
counted separately by each thread.
-r<alignment>[K|M|G|b] Random I/O aligned to the specified number of <align> bytes
or KiB, MiB, GiB, or blocks. Overrides -s.
-R[text|xml] Display test results in either text or XML format (default:
text).
-s[i]<size>[K|M|G|b] Sequential stride size, offset between subsequent I/O
operations in bytes or KiB, MiB, GiB, or blocks. Ignored if -r
specified (default access = sequential, default stride = block
size).
By default each thread tracks its own sequential offset. If the
optional interlocked (i) qualifier is used, a single interlocked
offset is shared between all threads operating on a given
5 of 30
Parameter Description
target so that the threads cooperatively issue a single
sequential pattern of access to the target.
-S[bhruw] This flag modifies the caching and write-through modes for
the test target. Any non-conflicting combination of modifiers
can be specified (-Sbu conflicts, -Shw specifies w twice),
order independent (-Suw and -Swu are equivalent).
By default, caching is on and write-through is not specified.
-S No modifying flags specified: disable software caching.
Deprecated but still honored; see -Su.
This opens the target with the FILE_FLAG_NO_BUFFERING
flag. This is included in -Sh.
-Sb Enable software cache (default, explicitly stated).
Can be combined with w.
-Sh Disable both software caching and hardware write caching.
This opens the target with the FILE_FLAG_NO_BUFFERING
and FILE_FLAG_WRITE_THROUGH flags, and is equivalent to -
Suw.
-Sr Disable local caching for remote filesystems. This leaves the
remote system’s cache enabled.
Can be combined with w.
-Su Disable software caching, for unbuffered IO.
This opens the target with the FILE_FLAG_NO_BUFFERING
flag. This option is equivalent -S with no modifiers. Can be
combined with w.
-Sw Enable write-through IO.
This opens the target with the FILE_FLAG_WRITE_THROUGH
flag. This can be combined with either buffered (-Sw or -Sbw)
or unbuffered IO (-Suw). It is included in -Sh.
Note: SATA HDD will generally not honor write through intent
on individual IOs. Devices with persistent write caches -
certain enterprise flash drives, and most storage arrays – will
complete write-through writes when the write is stable in
cache. In both cases, -S / -Su and -Sh / -Suw will see
equivalent behavior.
6 of 30
Parameter Description
-t<count> Number of threads per target. Conflicts with -F, which
specifies the total number of threads.
-T<offset>[K|M|G|b] Stride size between I/O operations performed on the same
target by different threads in bytes or KiB, MiB, GiB, or blocks
(default stride size = 0; starting offset = base file offset +
(<thread number> * <offset>). Makes sense only when
number of threads per target > 1.
-v Verbose mode
-w<percentage> Percentage of write requests to issue (default = 0, 100%
read). The following are equivalent and result in a 100% read-
only workload: omitting -w, specifying -w with no percentage,
and -w0.
IMPORTANT: a write test will destroy existing data without a
warning.
-W<seconds> Warmup time - duration of the test before measurements
start (default = 5 seconds).
-x Use I/O completion routines instead of I/O completion ports
for cases specifying more than one IO per thread1 (see -o).
Unless there is a specific reason to explore differences in the
completion model, this should generally be left at default.
-X<filepath> Use an XML file for configuring the workload. Cannot be used
with other parameters. XML output <Profile> block is a
template. See the diskspd.xsd file for details.
-z[seed] Set random seed to specified integer value. With no -z,
seed=0. With plain -z, seed is based on system run time.
-Z Zero the per-thread I/O buffers. Relevant for write tests. By
default, the buffers are filled with a repeating pattern (0, 1, 2,
..., 255, 0, 1, ...)
-Z<size>[K|M|G|b] Separate read and write buffers, and initialize a per-target
write source buffer sized to the specified number of bytes or
KiB, MiB, GiB, or blocks. This write source buffer is initialized
1
See the Synchronous and Asynchronous I/O topic here for more details:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365683(v=vs.85).aspx
7 of 30
Parameter Description
with random data, and per-IO write data is selected from it at
4-byte granularity.
-Z<size>[K|M|G|b],<file> Same, but using a file as the source of data to fill the write
source buffers.
Parameter Description
-ys<eventname> Signals event <eventname> before starting the actual run (no
warmup). Creates a notification event if <eventname> does
not exist.
-yf<eventname> Signals event <eventname> after the test run completes (no
cooldown). Creates a notification event if <eventname> does
not exist.
-yr<eventname> Waits on event <eventname> before starting the test run
(including warmup). Creates a notification event if
<eventname> does not exist.
-yp<eventname> Stops the run when event <eventname> is set. CTRL+C is
bound to this event. Creates a notification event if
<eventname> does not exist.
-ye<eventname> Sets event <eventname> and quits.
Parameter Description
-e<q|c|s> Use query perf timer (qpc), cycle count, or system timer
respectively (default = q, query perf timer (qpc))
8 of 30
Parameter Description
-ep Use paged memory for the NT Kernel Logger (default = non-
paged memory).
-ePROCESS Capture process start and end events.
-eTHREAD Capture thread start and end events.
-eIMAGE_LOAD Capture image load events.
-eDISK_IO Capture physical disk I/O events.
-eMEMORY_PAGE_FAULTS Capture all page fault events.
-eMEMORY_HARD_FAULTS Capture hard fault events.
-eNETWORK Capture TCP/IP, UDP/IP send and receive events.
-eREGISTRY Capture registry call events.
9 of 30
Fractional sizes with decimal points such as 10.5 are not allowed.
10 of 30
File access must be for numbers of bytes that are integer multiples of the volume's sector size.
For example, if the sector size is 512 bytes, an application can request reads and writes of 512,
1024, or 2048 bytes, but not of 335, 981, or 7171 bytes.
The -S or -Su parameter is equivalent to using FILE_FLAG_NO_BUFFERING on the Win32 CreateFile API.
The -Sh parameter disables both software caching and hardware write caching, and has the same
constraints that apply to disabling software caching. It is equivalent to using both
FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH on the Win32 CreateFile API. The
combination of -Suw is equivalent.
Write-through may be independently specified with -Sw. Stated in isolation, this the same as explicitly
using -Sbw for cached write-through.
SATA HDD will generally not honor write through intent on individual IOs. Devices with persistent write
caches - certain enterprise flash drives, and most storage arrays – will complete write-through writes
when the write is stable in cache. In both cases, -S / -Su and -Sh / -Suw will see equivalent behavior. In
general, and barring bugs, when write-through affects write IO performance it indicates that normal
writes are not in stable caches and data may be lost on hardware or power events.
The -Sr parameter is specific to running tests over remote filesystems such as SMB, and disables the
local cache while leaving the remote system’s cache enabled. This can be useful when investigating the
remote filesystem’s wire transport performance and behavior, allowing a wire-limited read test to occur
without needing extremely fast storage subsystems. This can be combined with w (-Srw) to specify
write-through on the server.
2
https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx
11 of 30
In conventional use, this allows a spill file to be created on the assumption that it will be quickly deleted,
avoiding all writes. This may be useful in focused performance tests for similar reasons.
12 of 30
3.9 Set a base target offset
To omit the beginning of a target during testing, use the -B parameter to specify a base target offset.
The default offset is zero. No I/O operations will be performed between the start of the target and the
base offset.
random operations (-r) are issued at base + <random offset>
sequential operations (-s) wrap back to the base offset when they pass the end of the target
For example, the following command runs a 10-second test on physical drive 1 with a block size of 8KiB,
and skips the first 10 blocks (80KiB):
diskspd -b8K -B10b #1
13 of 30
Thread Thread Thread
File
14 of 30
All the used parameters are also explained in Figure 4.
File
B s b
Figure 4. Parameters: base file offset (B), block size (b), stride size (s) and offset between
threads (T)
In the previous example, while the pattern is suggestive, there is no interlock between the threads to
maintain a strictly sequential pattern of access to the storage (see further discussion in Section 3.7). It is
possible due to thread scheduling that the pattern could separate over time, with one or more threads
falling behind or racing ahead of their peers.
A second use case for thread stride is to create multiple spatially separated sequential streams on a
target:
diskspd -c3G -t3 -T1G -b4K -s c:\testfile
This pattern will create a 3GiB file and three threads, with each thread starting I/O at succeeding 1GiB
intervals.
Thread 1: 0, 4KiB, 8KiB, …
Thread 2: 1GiB, 1GiB+4KiB, 1GiB+8KiB, …
Thread 3: 2GiB, 2GiB+4KiB, 2GiB+8KiB, …
Thread stride need not be a multiple of sequential stride (or vice versa). When the end of file is
encountered, access wraps back to the beginning at an offset such that each thread will reproduce the
same IO offsets on its next sweep through the target. In the earlier examples each thread will loop back
to 0 (zero). Consider the following counter-example:
diskspd -c3G -t3 -T13k -b4K -s c:\testfile
In this case, the second thread will loop back to offset 1K and then produce 5K, 9K, before returning to
13K and continuing through the file again.
15 of 30
At the time the test starts, each thread issues its initial batch of I/O up to the limit created by -o. Then,
by default, as completion for one I/O operation is received another is issued to replace it in the queue.
For sequential I/O, by default the new operation will be issued with respect to the most recent I/O
operation started within the same thread. Figure 5 shows an example of this behavior with 3
outstanding I/Os per thread (-o3) and a sequential stride equal to the block size (-s). The next I/O
operation will start at the offset immediately after I/O #3, which is marked with a dashed line.
File
I/O 1
I/O 2
I/O 3
File
I/O 3
16 of 30
The -p option creates a very specific pattern perhaps most suitable for cache stress, and its effect should
be carefully considered before use in a test.
while(bRun)
{
GetQueuedCompletionStatus //wait on IO Completion Port
CalculateNextOffset
RestartIOOperation
}
I/O Completion Ports have small overhead and therefore are convenient to use in I/O performance
measurements.
In case of synchronous access (-o1), DiskSpd uses a different approach. Because only one I/O operation
is running at any time, the use of I/O Completion Ports is not needed. Instead, I/O operations are
executed in a loop, as demonstrated by the following pseudo-code:
SetThreadIdealProcessor
AffinitizeThread
CreateIOCompletionPort
WaitForASignalToStart
while(bRun)
{
ReadFile
CalculateNextOffset
SetFilePointer
}
17 of 30
In both cases, the bRun global variable is used by the main thread to inform the worker threads how
long they should work. The main thread of the I/O request generator works in the following manner:
OpenFiles
CreateThreads
StartTimer
SendStartSignal
Sleep(duration)
bRun = false
StopTimer
WaitForThreadsToCleanUp
SendResultsToResultParser
18 of 30
3.17 Use completion routines instead of I/O completion ports
As stated earlier, DiskSpd by default uses I/O completion ports to refill outstanding operation queues.
However, Completion Routines can also be used. The -x parameter instructs DiskSpd to use I/O
completion routines instead of I/O completion ports.
When using completion routines the next I/O dispatched from the completion routine as opposed to
returning to a single master loop, as with I/O completion ports.
19 of 30
For example, the following command creates two 100 MiB files, c:\test1 and d:\test2, and runs a 20-
second read test on both files:
diskspd -c100M -d20 c:\test1 d:\test2
When creating files, DiskSpd ensures that the valid data length as tracked by the filesystem is the same
as the size of the file prior to starting test operations. If possible an optimized fast path is taken using
the Win32 SetFileValidData API; however, if the test is run in a security context which does not have
access to the SeManageVolumePrivilege that API requires, the file must be written through once prior to
test operations, which may take significant time for large files. See the Win32 SetFileValidData API
reference3 for more information. Administrative contexts generally have access to this privilege.
DiskSpd will display a warning if the slow path is taken.
WARNING: Could not set privileges for setting valid file size; will use a slower method of preparing
the file
IMPORTANT: the optimized fast path for extending valid data length may expose previously written but
logically deleted content from the storage subsystem. Ensure that if this path is used, either:
test files are not accessible by unauthorized users
the storage subsystem provides protection for previously deleted data
3
SetFileValidData: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365544(v=vs.85).aspx
20 of 30
allows for many distinct block-sized patterns to be chosen from a small source buffer. For instance, an
8KiB write source buffer used for a 4KiB block test provides (8KiB - 4KiB)/4 = 1024 potentially unique
blocks.
By default, write source buffers will be filled with random data. With the -Z<size>[K|M|G|b],<file>
form, a sample file can be provided whose initial contents (from byte zero) is used initialize the write
source buffers. If the file is smaller than the desired buffer size, its content is repeated into the buffer
until they are filled. Each write source buffer filled with random data will be distinct, but each filled from
a sample file will have the same content.
21 of 30
By default, ETW data will be timestamped using the high resolution Query performance counter4. This
can be adjusted as follows:
-eq : Query performance counter [default]
-ec : CPU cycle counter
-es : System time
The following command instructs DiskSpd to save trace data for physical disk I/O events and registry
events:
diskspd -eDISK_IO -eREGISTRY testfile.dat
The test will return data similar to the following:
ETW:
----
Disk I/O
Read: 128
Write: 28
Registry
NtCreateKey: 1
NtDeleteKey: 0
NtDeleteValueKey: 0
NtEnumerateKey: 0
NtEnumerateValueKey: 0
NtFlushKey: 0
KcbDump/create: 0
NtOpenKey: 222
NtQueryKey: 118
NtQueryMultipleValueKey: 0
NtQueryValueKey: 229
NtSetInformationKey: 0
NtSetValueKey: 0
Allocated Buffers: 7
LOST EVENTS: 0
LOST LOG BUFFERS: 0
LOST REAL TIME BUFFERS: 203
4
Please see the discussion of the WNODE_HEADER ClientContext member at the following location for
more information on timestamp tradeoffs:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364160(v=vs.85).aspx
22 of 30
3.22 Display a progress indicator
A progress indicator can be helpful, especially for long-running tests and debugging purposes. By
default, no progress indicator is displayed as it can affect performance. If you want to see one, use the -
P parameter to specify the number of completed I/O operations after which DiskSpd will print a dot in
the progress indicator. For example, -P10000 adds a dot after every 10,000 completed I/O operations.
The number of completed I/O operations is calculated independently by each thread.
23 of 30
signaled right before the measurements start; an event provided with -yf (for example, -
yfMyTestFinishedEvent) sends a notification right after measurements are completed.
24 of 30
DiskSpd provides per-thread per-target statistics on data read and written by each thread in terms of
total bytes, bandwidth and IOPs, in addition to any requested NT Kernel Logger events.
DiskSpd also provides a per-processor CPU utilization summary. It is currently limited to only providing
CPU usage with respect to the Processor Group its main thread is created on. In order to collect CPU
utilization on multiple Processor Group systems, please use alternate mechanisms. This may be
addressed via a future update.
Powershell code for processing batches of XML results into tabular form is provided in the Examples
(Section 6).
There are two more advanced sets of statistics which DiskSpd can optionally collect:
latency, specified with the -L parameter
IOPs statistics, specified with the -D parameter
5.1 Latency
Latency is gathered in per-thread per-target histograms at high precision. In text output, this results in
per-thread per-target per-operation-type average latency along with a summary table of per percentile
latencies as in the following text output example:
%-ile | Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
min | 0.086 | 0.199 | 0.086
25th | 0.351 | 0.371 | 0.362
50th | 5.214 | 0.978 | 1.128
75th | 7.454 | 1.434 | 7.014
90th | 8.412 | 7.671 | 8.092
95th | 8.513 | 8.393 | 8.455
99th | 17.406 | 9.024 | 16.098
3-nines | 34.938 | 25.804 | 34.360
4-nines | 38.058 | 35.514 | 38.058
5-nines | 38.058 | 35.514 | 38.058
6-nines | 38.058 | 35.514 | 38.058
7-nines | 38.058 | 35.514 | 38.058
8-nines | 38.058 | 35.514 | 38.058
max | 38.058 | 35.514 | 38.058
NOTE: The ‘nines’ refer to the number of nines: 3-nines is the 99.9th percentile, and so forth.
When designing tests to characterize the high percentiles it is important to consider the number of
operations which each represent. The data which produced this sample consisted of 6049 I/Os of which
a nearly equal number were reads (2994) and writes (3055). In the case of the 99.9th percentile for
writes, only 3055 * (1 - 0.999) = ~3 write operations are between its 25.804ms and the maximum
35.514ms. As a result, the 99.99th (4-nines) percentile and onward are the same as the maximum since
there is not enough data to differentiate them.
As a result, storage performance and test length need to be considered together in order to acquire
enough data to accurately state the higher-nines percentiles.
25 of 30
Latency is gathered internally at a per-thread per-target, and this full resolution is available in the XML
output form (-Rxml).
Given the per-thread per-target resolution, the memory overhead of these histograms can become
significant with higher thread and target counts. The interplay of -t and -F is important to consider. In
the following examples, both describe 16 threads:
-t1 <target1> … <target16>
-F16 <target1> … <target16>
However, the second results in 16 times as many histograms being tracked. This memory overhead can
in extreme circumstances affect results.
The resulting System.Xml.XmlDocument object exposes the element structure of the XML as a set of
nested properties and values. Normal Powershell capabilities such as implicit nested enumeration work
as expected, making extraction of results across threads and targets concise.
Please explore the XML document alongside the parsed XML in Powershell, and automation should
follow quickly.
The following code is an example of processing a collection of XML results into a tabular form suitable
for direct import into a spreadsheet for analysis. The column schema can be summarized as:
Computer which produced the result
WriteRatio (-w)
26 of 30
Threads (as a result -t or -F)
Outstanding (as a result of -o)
Block (as a result of -b)
Read IOPs
Read Bandwidth
Write IOPs
Write Bandwidth
25th, 50th, 75th, 90th, 99th, 99.9th and maximum read & write IO latency (assuming -L)
In addition, it produces a second file per result with the full latency distribution in tabular form, with
percentile, read latency, and write latency columns.
A more advanced version of XML post-processing, based on this example, is maintained at GitHub:
process-diskspd.ps1. See Section 1.1 (page 3) for links to the DiskSpd GitHub repo.
function get-latency( $x ) {
$x.Results.TimeSpan.Latency.Bucket |% {
$_.Percentile,$_.ReadMilliseconds,$_.WriteMilliseconds -join "`t"
}
}
dir *.xml |% {
$x = [xml](Get-Content $_)
if (-not [io.file]::Exists($lf)) {
get-latency $x > $lf
}
$system = $x.Results.System.ComputerName
$t = $x.Results.TimeSpan.TestTimeSeconds
$ls = $l |% {
$b = $h[$_];
if ($b.ReadMilliseconds) { $b.ReadMilliseconds } else { "" }
if ($b.WriteMilliseconds) { $b.WriteMilliseconds } else { "" }
}
# sum read and write iops across all threads and targets
$ri = ($x.Results.TimeSpan.Thread.Target |
measure -sum -Property ReadCount).Sum
$wi = ($x.Results.TimeSpan.Thread.Target |
measure -sum -Property WriteCount).Sum
$rb = ($x.Results.TimeSpan.Thread.Target |
measure -sum -Property ReadBytes).Sum
$wb = ($x.Results.TimeSpan.Thread.Target |
measure -sum -Property WriteBytes).Sum
# output tab-separated fields. note that with runs specified on the command
# line, only a single write ratio, outstanding request count and blocksize
# can be specified, so sampling the one used for the first thread is
# sufficient.
(($system,
($x.Results.Profile.TimeSpans.TimeSpan.Targets.Target.WriteRatio |
select -first 1),
$x.Results.TimeSpan.ThreadCount,
27 of 30
($x.Results.Profile.TimeSpans.TimeSpan.Targets.Target.RequestCount |
select -first 1),
($x.Results.Profile.TimeSpans.TimeSpan.Targets.Target.BlockSize |
select -first 1),
# calculate iops
($ri / $t),
($rb / $t),
($wi / $t),
($wb / $t)) -join "`t"),
($ls -join "`t") -join "`t"
Large area random concurrent diskspd -c2G -w -b4K -F8 -r -o32 -W60 -d60 -
writes of 4KB blocks Sh testfile.dat
Large area random concurrent diskspd -c2G -b64K -F8 -r -o32 -W60 -d60 -Sh
reads of 64KB blocks testfile.dat
Large area random concurrent diskspd -c2G -w -b64K -F8 -r -o32 -W60 -d60 -
writes of 64KB blocks Sh testfile.dat
Large area random serial reads diskspd -c2G -b4K -r -o1 -W60 -d60 -Sh
of 4KB blocks. testfile.dat
Large area random serial writes diskspd -c2G -w -b4K -r -o1 -W60 -d60 -Sh
of 4KB blocks testfile.dat
Large area random serial reads diskspd -c2G -b64K -r -o1 -W60 -d60 -Sh
of 64KB blocks testfile.dat
Large area random serial writes diskspd -c2G -w -b64K -r -o1 -W60 -d60 -Sh
of 64KB blocks testfile.dat
Large area sequential concurrent diskspd -c2G -b4K -F8 -T1b -s8b -o32 -W60 -
reads of 4KB blocks d60 -Sh testfile.dat
Large area sequential concurrent diskspd -c2G -w -b4K -F8 -T1b -s8b -o32 -W60
writes of 4KB blocks -d60 -Sh testfile.dat
Large area sequential concurrent diskspd -c2G -b64K -F8 -T1b -s8b -o32 -W60 -
reads of 64KB blocks d60 -Sh testfile.dat
Large area sequential concurrent diskspd -c2G -w -b64K -F8 -T1b -s8b -o32 -W60
writes of 64KB blocks -d60 -Sh testfile.dat
28 of 30
Test description Sample command
Large area sequential serial diskspd -c2G -b4K -o1 -W60 -d60 -Sh
reads of 4KB blocks testfile.dat
Large area sequential serial diskspd -c2G -w -b4K -o1 -W60 -d60 -Sh
writes of 4KB blocks testfile.dat
Large area sequential serial diskspd -c2G -b64K -o1 -W60 -d60 -Sh
reads of 64KB blocks testfile.dat
Large area sequential serial diskspd -c2G -w -b64K -o1 -W60 -d60 -Sh
writes of 64KB blocks testfile.dat
Small area concurrent reads of diskspd -c100b -b4K -o32 -F8 -T1b -s8b -W60 -
4KB blocks d60 -Sh testfile.dat
Small area concurrent writes of diskspd -c100b -w -b4K -o32 -F8 -T1b -s8b -
4KB blocks W60 -d60 -Sh testfile.dat
Small area concurrent reads of diskspd -c100b -b64K -o32 -F8 -T1b -s8b -W60
64KB blocks -d60 -Sh testfile.dat
Small area concurrent writes of diskspd -c100b -w -b64K -o32 -F8 -T1b -s8b -
64KB blocks W60 -d60 -Sh testfile.dat
29 of 30
8 Future Improvements
The following are recommendations for future improvements of the DiskSpd tool.
30 of 30