0% found this document useful (0 votes)
18 views

02 Linux Syscall

The document discusses Linux system calls (syscalls). Syscalls allow userspace programs to request services from the kernel. Common syscalls include opening/reading files and managing processes. Syscalls are implemented by copying arguments to registers and trapping to kernel code. The kernel provides wrappers and virtual dynamically shared objects to optimize syscall performance.

Uploaded by

Lionel Auroux
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

02 Linux Syscall

The document discusses Linux system calls (syscalls). Syscalls allow userspace programs to request services from the kernel. Common syscalls include opening/reading files and managing processes. Syscalls are implemented by copying arguments to registers and trapping to kernel code. The kernel provides wrappers and virtual dynamically shared objects to optimize syscall performance.

Uploaded by

Lionel Auroux
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Linux -

Syscalls

Lionel Auroux

Generalities

The syscall
userland
interfaces
Linux - Syscalls
Implementation

A guided tour
of some
syscalls Lionel Auroux

2017-09-29

Lionel Auroux Linux - Syscalls 2017-09-29 1 / 27


Linux -
Syscalls

Lionel Auroux

Generalities

The syscall
userland
interfaces

Implementation

A guided tour
of some
syscalls
Generalities

Lionel Auroux Linux - Syscalls 2017-09-29 2 / 27


What is a syscall?
Linux -
Syscalls

Lionel Auroux User space can issue requests to the kernel in order to access its
Generalities resources or perfrom restricted operations.
The syscall
userland You can think of a syscall as regular function call, but where the
interfaces
code being called is in the kernel.
Implementation

A guided tour Syscalls usages:


of some
syscalls
Manipulating files and VFS: open, read, write, . . .
System setup: gettimeofday, swapon, shutdown. . .
Processes management: clone, mmap, . . .
Manipulating devices: ioctl, mount, . . .
Cryptography and security: seccomp, getrandom, . . .
...

Lionel Auroux Linux - Syscalls 2017-09-29 3 / 27


Linux -
Syscalls

Lionel Auroux

Generalities

The syscall
userland
interfaces

Implementation

A guided tour
of some
syscalls The syscall userland interfaces

Lionel Auroux Linux - Syscalls 2017-09-29 4 / 27


In assembly
Linux -
Syscalls On x86
Lionel Auroux
mov eax, 1 ; exit
Generalities int 0x80 ; or sysenter
The syscall
userland
interfaces Syscall number: eax
Implementation Arguments: ebx, ecx, edx, esi, edi, ebp, then use the
A guided tour stack
of some
syscalls

On x86_64
mov rax, 60 ; exit
syscall

Syscall number: rax


Arguments: rdi, rsi, rdx, rcx, r8 and r9, no args on
memory

Lionel Auroux Linux - Syscalls 2017-09-29 5 / 27


syscall(2)
Linux -
Syscalls

Lionel Auroux

Generalities

The syscall #include <unistd.h>


userland
interfaces #include <sys/syscall.h> /* for __NR_xxx */
Implementation

A guided tour long syscall(long number, ...);


of some
syscalls

Copies the arguments and syscall number to the registers.


Traps to kernel code.
Sets errno if the syscall returns an error.

Lionel Auroux Linux - Syscalls 2017-09-29 6 / 27


Don’t panic!
Linux -
Syscalls

Lionel Auroux

Generalities

The syscall
userland
interfaces You will learn all about that in kernel from scratch!
Implementation You almost never use direct calls to syscall(2).
A guided tour Your libc provides wrappers for most of the syscalls you
of some
syscalls need.
Linux also abstracts all thoses details in kernel code.
For a list of the Linux system calls, see syscalls(2).

Lionel Auroux Linux - Syscalls 2017-09-29 7 / 27


vdso(7)
Linux -
Syscalls

Lionel Auroux
Virtual Dynamically linked Shared Objects
Small shared library (8k) that the kernel automatically
Generalities
maps into the address space of all user-space applications.
The syscall
userland Contains non priviledged code and data: gettimeofday,
interfaces
time, clock_gettime, . . . (arch-depedent)
Implementation
The ELF must be dynamically linked.
A guided tour
of some
syscalls
Why?

Making system calls can be slow.


On x86 32bit, int 0x80 is expensive: goes through the full
interrupt-handling paths in the processor’s microcode as
well as in the kernel.
Even if there is a dedicated instr (syscall), context
switching must be done.

Lionel Auroux Linux - Syscalls 2017-09-29 8 / 27


Context switch
Linux -
Syscalls

Lionel Auroux

Generalities

The syscall A context is:


userland
interfaces

Implementation The CPU registers (including the instruction pointer)


A guided tour The state of a process (including threads):
of some
syscalls Memory state: stack, page tables, etc.
CPU state: registers, caches, etc.
Process scheduler state
...

Lionel Auroux Linux - Syscalls 2017-09-29 9 / 27


vdso in action
Linux -
Syscalls
$ cat time.c
Lionel Auroux
int main(int ac, char **av) {
Generalities printf("%d\n", time(0));
The syscall }
userland
interfaces
$ gcc time.c -o time -static
Implementation
$ strace -e time ./time
A guided tour
time(NULL) = 1411171041
of some 1411171041
syscalls
+++ exited with 11 +++
$ gcc time.c -o time
$ ldd ./time
linux-vdso.so.1 (0x00007fffe1735000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fee5e753000
/lib64/ld-linux-x86-64.so.2 (0x00007fee5eb01000)
$ strace -e time ./time
1411171118
+++ exited with 11 +++

Lionel Auroux Linux - Syscalls 2017-09-29 10 / 27


Linux -
Syscalls

Lionel Auroux

Generalities

The syscall
userland
interfaces

Implementation

A guided tour
of some
syscalls Implementation

Lionel Auroux Linux - Syscalls 2017-09-29 11 / 27


Defining a syscall
Linux -
Syscalls

Lionel Auroux
Use the SYSCALL_DEFINEx(syscall, ...) macros anywhere
Generalities
in Linux code.
The syscall
userland
interfaces These macros expands to:
Implementation

A guided tour SYSCALL_METADATA(syscall, ...) generate metadata


of some
syscalls used in the FTRACE tracing framework.
__SYSCALL_DEFINEx(syscall, ...) more function
definition expansion.
Ultimatly expand to: asmlinkage long
SyS_syscall(..)
asmlinkage means that arguments are on the stack.

Lionel Auroux Linux - Syscalls 2017-09-29 12 / 27


Example
Linux -
Syscalls

Lionel Auroux

Generalities

The syscall In kernel/signal.c:


userland
interfaces
3538 SYSCALL_DEFINE0(pause)
Implementation
3539 {
A guided tour 3540 while (!signal_pending(current)) {
of some
syscalls 3541 current->state = TASK_INTERRUPTIBLE;
3542 schedule();
3543 }
3544 return -ERESTARTNOHAND;
3545 }

Lionel Auroux Linux - Syscalls 2017-09-29 13 / 27


Side notes
Linux -
Syscalls current
Lionel Auroux
#include <asm/current.h>
Generalities ...
The syscall pr_debug("The process is \"%s\" (pid %i)\n",
userland
interfaces
current->comm, current->pid);
Implementation

A guided tour signal_pending


of some
syscalls static inline int signal_pending(struct task_struct *p)
{
return unlikely(
test_tsk_thread_flag(p,TIF_SIGPENDING));
}

schedule()
Ask the scheduling subsystem to pick the next process to run.

Lionel Auroux Linux - Syscalls 2017-09-29 14 / 27


The syscalls tables
Linux -
Syscalls
See arch/x86/entry/syscalls/syscall_{32,64}.tbl.
Lionel Auroux

Generalities
syscall_32.tbl
# <number> <abi> <name> <entry point> <compat entry point>
The syscall 0 i386 restart_syscall sys_restart_syscall
userland 1 i386 exit sys_exit
interfaces 2 i386 fork sys_fork stub32_fork
3 i386 read sys_read
Implementation 4 i386 write sys_write
A guided tour 5 i386 open sys_open compat_sys_open
of some 6 i386 close sys_close
syscalls

syscall_64.tbl
0 common read sys_read
1 common write sys_write
2 common open sys_open
3 common close sys_close
4 common stat sys_newstat
5 common fstat sys_newfstat
...
16 64 ioctl sys_ioctl
...
514 x32 ioctl compat_sys_ioctl

Lionel Auroux Linux - Syscalls 2017-09-29 15 / 27


Generation
Linux -
Syscalls

Lionel Auroux

Generalities

The syscall
userland
interfaces

Implementation Kbuild calls the syscalltbl.sh to generate


A guided tour arch/x86/include/generated/asm/syscalls_{64,32}.h
of some
syscalls Same with syscallhdr.sh

Lionel Auroux Linux - Syscalls 2017-09-29 16 / 27


Linux -
Syscalls

Lionel Auroux

Generalities

The syscall
userland
interfaces

Implementation

A guided tour
of some
syscalls A guided tour of some syscalls

Lionel Auroux Linux - Syscalls 2017-09-29 17 / 27


sysinfo
Linux -
Syscalls

Lionel Auroux
kernel/sys.c
Generalities
2099 SYSCALL_DEFINE1(sysinfo,
The syscall
userland struct sysinfo __user *, info)
interfaces
2100 {
Implementation 2101 struct sysinfo val;
A guided tour 2102
of some
syscalls 2103 do_sysinfo(&val);
2104
2105 if (copy_to_user(info, &val,
sizeof(struct sysinfo)))
2106 return -EFAULT;
2107
2108 return 0;
2109 }

Lionel Auroux Linux - Syscalls 2017-09-29 18 / 27


User data
Linux -
Syscalls

Lionel Auroux
__user
Generalities

The syscall
Used by tools such as sparse to statically check the use of
userland
interfaces
userspace pointers.
Implementation
# define __user __attribute__((noderef,
A guided tour address_space(1)))
of some
syscalls
copy_to_user
Copy data from kernel land to user land.
Checks that all bytes are writeable, using:
access_ok(VERIFIY_WRITE, addr_to, length)

Lionel Auroux Linux - Syscalls 2017-09-29 19 / 27


ioctl
Linux -
Syscalls

Lionel Auroux

Generalities

The syscall
#include <sys/ioctl.h>
userland
interfaces
int ioctl(int d, unsigned long request, ...);
Implementation

A guided tour
of some
Control devices.
syscalls A big mess:
Request numbers encodes data.
Request data is untyped (void *).
See LDD3, Chapter 6: Advanced Char Driver Operations.

Lionel Auroux Linux - Syscalls 2017-09-29 20 / 27


clone
Linux - clone
Syscalls
SYSCALL_DEFINE5(clone, unsigned long, clone_flags,
Lionel Auroux
unsigned long, newsp,
int __user *, parent_tidptr,
Generalities
int __user *, child_tidptr,
The syscall int, tls_val)
userland
interfaces {
return do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr);
Implementation
}
A guided tour
of some
syscalls

Lionel Auroux Linux - Syscalls 2017-09-29 21 / 27


clone
Linux - clone
Syscalls
SYSCALL_DEFINE5(clone, unsigned long, clone_flags,
Lionel Auroux
unsigned long, newsp,
int __user *, parent_tidptr,
Generalities
int __user *, child_tidptr,
The syscall int, tls_val)
userland
interfaces {
return do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr);
Implementation
}
A guided tour
of some
syscalls fork
SYSCALL_DEFINE0(fork)
{
return do_fork(SIGCHLD, 0, 0, NULL, NULL);
}

Lionel Auroux Linux - Syscalls 2017-09-29 21 / 27


clone
Linux - clone
Syscalls
SYSCALL_DEFINE5(clone, unsigned long, clone_flags,
Lionel Auroux
unsigned long, newsp,
int __user *, parent_tidptr,
Generalities
int __user *, child_tidptr,
The syscall int, tls_val)
userland
interfaces {
return do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr);
Implementation
}
A guided tour
of some
syscalls fork
SYSCALL_DEFINE0(fork)
{
return do_fork(SIGCHLD, 0, 0, NULL, NULL);
}

vfork
SYSCALL_DEFINE0(vfork)
{
return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, 0,
0, NULL, NULL);
}
Lionel Auroux Linux - Syscalls 2017-09-29 21 / 27
clone
Linux - clone
Syscalls
SYSCALL_DEFINE5(clone, unsigned long, clone_flags,
Lionel Auroux
unsigned long, newsp,
int __user *, parent_tidptr,
Generalities
int __user *, child_tidptr,
The syscall int, tls_val)
userland
interfaces {
return do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr);
Implementation
}
A guided tour
of some
syscalls fork
SYSCALL_DEFINE0(fork)
{
return do_fork(SIGCHLD, 0, 0, NULL, NULL);
}

vfork
SYSCALL_DEFINE0(vfork)
{
return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, 0,
0, NULL, NULL);
}
Lionel Auroux Linux - Syscalls 2017-09-29 21 / 27
personality
Linux -
Syscalls

Lionel Auroux

Generalities #include <sys/personality.h>


The syscall
userland
interfaces int personality(unsigned long persona);
Implementation

A guided tour
of some
Sets the process execution domain
syscalls
Used by setarch
Tweak:
uname-2.6
exposed architecture (i386, i486, i586, etc.)
STICKY_TIMEOUT
...

Lionel Auroux Linux - Syscalls 2017-09-29 22 / 27


reboot
Linux -
Syscalls

Lionel Auroux

Generalities #include <unistd.h>


The syscall #include <linux/reboot.h>
userland
interfaces

Implementation int reboot(int magic, int magic2, int cmd, void *arg);
A guided tour
of some
syscalls This system call will fail (with EINVAL) unless magic equals
LINUX_REBOOT_MAGIC1 (that is, 0xfee1dead) and magic2 equals
LINUX_REBOOT_MAGIC2 (that is, 672274793). However, since 2.1.17 also
LINUX_REBOOT_MAGIC2A (that is, 85072278) and since 2.1.97 also
LINUX_REBOOT_MAGIC2B (that is, 369367448) and since 2.5.71 also
LINUX_REBOOT_MAGIC2C (that is, 537993216) are permitted as value for
magic2. (The hexadecimal values of these constants are meaningful.)

Lionel Auroux Linux - Syscalls 2017-09-29 23 / 27


rt_XXX syscalls
Linux -
Syscalls
The addition or real-time signals required the widening of the
Lionel Auroux
signal set structure (sigset_t) from 32 to 64 bits.
Consequently, various system calls were superseded by new
Generalities
system calls that supported the larger signal sets.
The syscall
userland
interfaces
Linux < 2.0 Linux >= 2.2
Implementation

A guided tour sigaction(2) rt_sigaction(2)


of some
syscalls sigpending(2) rt_sigpending(2)

sigprocmask(2) rt_sigprocmask(2)

sigreturn(2) rt_sigreturn(2)
sigsusprend(2) rt_sigsuspend(2)

sigtimedwait(2) rt_sigtimedwait(2)

Lionel Auroux Linux - Syscalls 2017-09-29 24 / 27


Going further than syscalls
Linux -
Syscalls

Lionel Auroux

Generalities

The syscall
userland
interfaces
There are places in the kernel where the complexity of the
Implementation
task goes bewond a call to a function.
A guided tour
of some ioctl has grew dangerously.
syscalls
For example, netlink(7) aims to replace ioctl for
network configuration.

Lionel Auroux Linux - Syscalls 2017-09-29 25 / 27


References
Linux -
Syscalls

Lionel Auroux

Generalities

The syscall
userland
interfaces
http://lwn.net/Articles/604287/
Implementation
http://lwn.net/Articles/604515/
A guided tour
of some https://www.kernel.org/doc/htmldocs/kernel-hacking
syscalls
Searchable Linux Syscall Table:
https://filippo.io/linux-syscall-table/

Lionel Auroux Linux - Syscalls 2017-09-29 26 / 27


Contact info
Linux -
Syscalls

Lionel Auroux

Generalities

The syscall
userland
interfaces

Implementation

A guided tour lionel [at] lse.epita.fr with [linux] tag


of some
syscalls

Lionel Auroux Linux - Syscalls 2017-09-29 27 / 27

You might also like