Hooking The Linux System Call Table
Hooking The Linux System Call Table
Search
The Linux kernel maintains a table of pointers that reference various functions made
available to user space as a way of invoking privileged kernel functionality from
unprivileged user space applications. These functions are collectively known as system
calls.
Any legitimate software looking to hook kernel space functions should �rst consider
using existing infrastructure designed for such uses like the Linux kernel tracepoints
framework or the Linux security module framework. Rootkits are about the only
reasonable application of these techniques, for some value of reasonable.
This code was written and tested on Ubuntu 14.04 LTS using the standard Ubuntu
Linux 3.13.x kernel.
Introduction
Hooking the Linux system call table from within a loadable kernel module is not all
that di�cult. After all, we are running with kernel privileges. We can do whatever we
want. We can dereference and overwrite any memory address at will.
This, of course, doesn’t mean that our reckless memory overwriting isn’t going to
cause problems. Because it almost certainly will, done improperly.
There are some �imsy mechanisms in place to discourage LKMs (loadable kernel
modules) from tampering with the syscall table for (hopefully obvious) security
reasons.
First and foremost, the static portion of the Linux kernel - i.e. the portion that doesn’t
reside in loadable kernel modules - does not export the syscall table symbol.
Why?
Because LKMs have no earthly business messing with the syscall table. The only valid
reason for an LKM to overwrite system call pointers is to corrupt the behavior of the
operating system, most often for concealment of malicious software.
Since the kernel does not export the syscall table symbol, we need to �nd it ourselves.
We do this by manually reading in and scanning the System.map-$(uname -r) �le,
looking for the “sys_call_table” address. Once we have retrieved the address, we
simply need to �nd the appropriate o�set for it based on the system call we’re trying
to hook, dereference it, and write to it.
This tutorial will show you how to hook system calls from a loadable kernel module
(LKM) in the Linux kernel, complete with a code walkthrough. The code presented here
has been tested and is known to work reliably.
Implementation
Although this code base has a few hundred lines to it, it’s actually very simple.
Much of the code simply handles logistics - nothing more. The two largest functions in
this example are responsible for 1) acquiring the version of the currently running
kernel so we can identify the correct System.map-$(uname -r) �le to read from and 2)
reading in the System.map-$(uname -r) �le line by line, checking each full line read to
see if it begins with “sys_call_table”.
That’s it.
Once we’ve got the address of the sys call table, it’s trivial to overwrite. Let’s take a
look.
General Structure
There are a few things going on in this application. Much of the code comprises helper
functions that read �les and parse strings. Other than the helpers, we have our
newwrite() function that is going to be the function we hook into the sys call table and our
standard \_init and __exit functions for loadable kernel module.
Important De�nes
PROC_V is the �le path to the /proc virtual �lesystem location that contains version
information of the currently running kernel.
BOOT_PATH is the �le path to the System.map-$(uname -r) �le that we are looking for
sans appended version information. We have to retrieve the kernel version before we
can �nish constructing this string.
In Linux loadable kernel modules, the function decorated with the __init macro is the
entry point to the module when it’s loaded and the function decorated with the __exit
macro is the destructor function that’s executed when the module is unloaded.
Since it only takes a couple lines of code to place our hooks in this simple example, we
perform our dirty work directly in these functions. We’ll come back to these functions
in a few minutes.
Helper Functions
Reads version info from PROC_V and chops it down to just the string we want. We
need our version info to be in the same format that’s produced by $(uname -r).
Next, we have to change the legal virtual address space of this process to include the
kernel data segment. If we skip this step, the call to read the �le will fail the user space
virtual address check performed by the kernel. In short, this allows us to read �le
contents into kernel memory later on:
oldfs = get_fs();
set_fs (KERNEL_DS);
Once we’re setup to read data into kernel space without causing a fault, we open the
PROC_V �le for reading and prepare our bu�er:
memset(buf, 0, MAX_VERSION_LEN);
We then tokenize the version information to extract just the information we want. The
piece of information we want is located in the third space-separated column that is
output by PROC_V:
filp_close(proc_version, 0);
Set the legally addressable virtual memory segment back to user space:
set_fs(oldfs);
Return the pointer to the �nal token produced by our calls to strsep():
return kernel_version;
And there we have it. We can now rely on this helper to gather the version information
we need for us.
Given the $(uname -r)-style kernel version, this function builds the System.map- �le
name by appending kern_ver to BOOT_PATH, opens the �le for reading, and reads the
�le line by line.
char system_map_entry[MAX_VERSION_LEN];
int i = 0;
/*
* Holds the /boot/System.map-<version> file name as we build it
*/
char *filename;
/*
* Length of the System.map filename, terminating NULL included
*/
size_t filename_length = strlen(kern_ver) + strlen(BOOT_PATH) + 1;
/*
* This will point to our /boot/System.map-<version> file
*/
struct file *f = NULL;
mm_segment_t oldfs;
Here is the old memory address segment trick to switch from allowing only user space
references to also allowing kernel space references:
oldfs = get_fs();
set_fs (KERNEL_DS);
Allocate space for the System.map �le name so we can build it:
Zero out the memory in preparation for constructing the �le name just to be safe:
memset(filename, 0, filename_length);
memset(system_map_entry, 0, MAX_VERSION_LEN);
We read the �le one character at a time until we have read an entire line. We
determine that we’ve read an entire line by 1) checking for a newline (‘\n’) character or
2) checking to see if we have read in the maximum amount of data that our bu�er can
hold, i.e. MAX_VERSION_LEN bytes.
Once we have read in an entire line, we do a basic string comparison to see if the �rst
part of our system_map_entry bu�er matches the string “sys_call_table”. If it does, we
allocate some space to store the following address in. The System.map �le is in the
format:
Once we’ve got that pointer, we simply copy it into sys_string and then invoke kstrtoul
on sys_string to convert sys_string - which contains a string representation of the hex
address of the “sys_call_table” symbol as pulled from System.map- - to an unsigned
long (4 byte/32 bit) address using base 16 (hex) representation and write the value to
our global syscall_table pointer:
kfree(filename);
return -1;
}
memset(sys_string, 0, MAX_VERSION_LEN);
kfree(sys_string);
break;
}
memset(system_map_entry, 0, MAX_VERSION_LEN);
continue;
}
i++;
}
Once we’re done doing all that, we clean up after ourselves by closing out our �le
handle, changing the addressable virtual memory segment back to user space, and
returning.
filp_close(f, 0);
set_fs(oldfs);
kfree(filename);
return 0;
At this point, the syscall_table pointer - which was declared to be global to the module
- now contains the address of the system call table as taken from /boot/System.map-
and is ready to be dereferenced.
The __init onload function is the entry point to the module and is where our primary
logic resides since it’s so simple. After we allocate require storage, we invoke the
�nd_sys_call_table() function with the result of an invocation to
acquire_kernel_version() passed in as an argument. By combining the two helpers
discussed previously, we are able to collect all the prerequisite information we need to
place our hooks:
find_sys_call_table(acquire_kernel_version(kernel_version));
After �nd_sys_call_table() returns, the global unsigned long syscall_table variable that
we declared at the top of our C �le is populated and ready for manipulation.
However, there is one little caveat left: the memory address where sys_call_table
resides is not writeable. The processor itself will raise an exception if you try to write
to it all willy-nilly.
So what do we do? We use the Linux paravirtualization system to change the 16th bit
of the CR0 register. The CR0 register is one of the control registers in the x86
processor that a�ects basic CPU functionality. The 16th bit of the CR0 register is the
“Write Protect” bit that indicates to the processor that it cannot write to read-only
memory pages, even when running as root. This is why the CPU will raise an exception
if you try to write to syscall_table right o� the bat.
Even though the CPU will refuse to write to read-only memory pages when the WP bit
of the CR0 register is set, we are the kernel. We can just toggle that bit and continue
on our way.
Using the write_cr0 and read_cr0 macros along with a logical bitmask for setting the
WP bit (16th bit in CR0 register) to 0, we can trivially disable write protection as shown
below.
Once that’s done, we simply dereference the appropriate o�set for the system call we
want to overwrite by using the kernel-de�ned _NR* indices, of which there is exactly 1
for each and every system call in the system. Using these prede�ned o�sets, we write
the address of our new_write() function over the address of the system call write()
function:
if (syscall_table != NULL) {
write_cr0 (read_cr0 () & (~ 0x10000));
original_write = (void *)syscall_table[__NR_write];
syscall_table[__NR_write] = &new_write;
write_cr0 (read_cr0 () | 0x10000);
printk(KERN_EMERG "[+] onload: sys_call_table hooked\n");
} else {
printk(KERN_EMERG "[-] onload: syscall_table is NULL\n");
}
kfree(kernel_version);
return 0;
Once we overwrite our target system call function pointer, we re-enable write protect
in the CR0 register and exit the __init function successfully.
In order to keep our system in a clean and stable state, we want to remove our hooks
gracefully when the module is unloaded. The __exit onunload() function behaves very
similarly to the __init onload function since it also has to toggle the write protect bit in
the CR0. The onunload function even writes to the exact same o�set into the
sys_call_table array as the onload function did.
The only di�erence is that the onunload function writes the address of the original
write() function over the address of our new_write() function, putting everything back
to the way it was before we came along:
if (syscall_table != NULL) {
write_cr0 (read_cr0 () & (~ 0x10000));
syscall_table[__NR_write] = original_write;
write_cr0 (read_cr0 () | 0x10000);
printk(KERN_EMERG "[+] onunload: sys_call_table unhooked\n");
} else {
printk(KERN_EMERG "[-] onunload: syscall_table is NULL\n");
}
NEWER
Basic Loadable Linux Kernel Module Example
OLDER
Understanding Cryptographic Primitives
RECENTS
LINUX RASPBERRY PI
RASPBERRY PI OPENVPN SERVER BEHIND UBUNTU SERVER ROUTER
2016-07-16
LINUX KERNEL
BASIC LOADABLE LINUX KERNEL MODULE EXAMPLE
2016-07-15
LINUX KERNEL
HOOKING THE LINUX SYSTEM CALL TABLE
2015-10-19
CRYPTOGRAPHY PRIMITIVES
UNDERSTANDING CRYPTOGRAPHIC PRIMITIVES
2015-09-27
CRYPTOGRAPHY ENCRYPTION
ENCRYPTING AND SIGNING USING LIBGCRYPT
2015-09-26
CATEGORIES
cryptography (2)
encryption (1)
gcrypt/libgcrypt (1)
primitives (1)
linux (3)
Raspberry Pi (1)
Raspbian (1)
Jessie (1)
OpenVPN (1)
kernel (2)
modules (1)
hooking (1)
TAGS
AES (1)
C (2)
HMAC (1)
Kernel (1)
Linux (3)
OpenVPN (1)
PBKDF2 (1)
Raspberry Pi (1)
Tutorial (1)
cryptography (2)
encryption (2)
gcrypt (1)
hooking (1)
kernel (1)
libgcrypt (1)
module (1)
tutorial (3)
TAG CLOUD
AES C HMAC Kernel Linux OpenVPN PBKDF2 Raspberry Pi Raspbian Jessie Tutorial cryptography
digital signing encryption gcrypt hooking kernel key derivation/stretching libgcrypt module syscall
table tutorial
ARCHIVES