Skip to content




Linux »

User-space, Kernel-space, and System Calls

A module runs in kernel space, whereas applications run in user space. Processes running in user space also don't have access to the kernel space. User space processes can only access a small part of the kernel via an interface exposed by the kernel - the system calls.

Last update: 2022-06-04


User Space and Kernel Space#

A module runs in kernel space, whereas applications run in user space. This concept is at the base of operating system theory.

The role of the operating system, in practice, is to provide programs with a consistent view of the computer’s hardware. In addition, the operating system must account for independent operation of programs and protection against unauthorized access to resources. This nontrivial task is possible only if the CPU enforces protection of system software from the applications.

Every modern processor is able to enforce this behavior. The chosen approach is to implement different operating modalities (or levels) in the CPU itself. The levels have different roles, and some operations are disallowed at the lower levels; program code can switch from one level to another only through a limited number of gates.

Unix systems are designed to take advantage of this hardware feature, using two such levels. All current processors have at least two protection levels, and some, like the x86 family, have more levels; when several levels exist, the highest and lowest levels are used. Under Unix, the kernel executes in the highest level (also called supervisor mode), where everything is allowed, whereas applications execute in the lowest level (the so-called user mode), where the processor regulates direct access to hardware and unauthorized access to memory.

We usually refer to the execution modes as kernel space and user space.

Processes running in user space also don’t have access to the kernel space. User space processes can only access a small part of the kernel via an interface exposed by the kernel - the system calls.

User space vs Kernel space

System Calls#

A system call is a programmatic way a program requests a service from the kernel.

The system call interface includes a number of functions that the operating system exports to the applications running on top of it. These functions allow actions like opening files, creating network connections, reading and writing from files, and so on.

System calls are divided into 5 categories mainly :

  • Process Control
  • File Management
  • Device Management
  • Information Maintenance
  • Communication

Let go through some primitive system calls:

Process Control:

This system calls perform the task of process creation, process termination, etc.

The Linux System calls under this are fork(), exit(), exec().

fork()

A new process is created by the fork() system call.

A new process may be created with fork() without a new program being run-the new sub-process simply continues to execute exactly the same program that the first (parent) process was running.

exit()

The exit() system call is used by a program to terminate its execution.

The operating system reclaims resources that were used by the process after the exit() system call.

exec()

A new program will start executing after a call to exec()

Running a new program does not require that a new process be created first: any process may call exec() at any time. The currently running program is immediately terminated, and the new program starts executing in the context of the existing process.

File Management:

File management system calls handle file manipulation jobs like creating a file, reading, and writing, etc. The Linux System calls under this are open(), read(), write(), close().

open()

It is the system call to open a file.

This system call just opens the file, to perform operations such as read and write, we need to execute different system call to perform the operations.

read()

This system call opens the file in reading mode

We can not edit the files with this system call. Multiple processes can execute the read() system call on the same file simultaneously.

write()

This system call opens the file in writing mode

We can edit the files with this system call. Multiple processes can not execute the write() system call on the same file simultaneously.

close()
This system call closes the opened file.
Device Management:

Device management does the job of device manipulation like reading from device buffers, writing into device buffers, etc. The Linux System calls under this is ioctl().

ioctl()

ioctl is referred to as Input and Output Control.

ioctl is a system call for device-specific input/output operations and other operations which cannot be expressed by regular system calls.

Information Maintenance:

It handles information and its transfer between the OS and the user program. In addition, OS keeps the information about all its processes and system calls are used to access this information. The System calls under this are getpid(), alarm(), sleep().

getpid()

getpid stands for Get the Process ID.

The getpid() function shall return the process ID of the calling process.

The getpid() function shall always be successful, and no return value is reserved to indicate an error.

alarm()

This system call sets an alarm clock for the delivery of a signal that when it has to be reached.

It arranges for a signal to be delivered to the calling process.

sleep()

This System call suspends the execution of the currently running process for some interval of time

Meanwhile, during this interval, another process is given chance to execute

Communication:

These types of system calls are specially used for inter-process communications.

Two models are used for inter-process communication

  • Message Passing (processes exchange messages with one another)
  • Shared memory(processes share memory region to communicate)
  • The system calls under this are pipe(), shmget(), mmap().
pipe()
The pipe() system call is used to communicate between different Linux processes. This system function is used to open file descriptors.
shmget()

shmget stands for shared memory segment.

It is mainly used for Shared memory communication. This system call is used to access the shared memory and access the messages in order to communicate with the process.

mmap()

This function call is used to map or unmap files or devices into memory.

The mmap() system call is responsible for mapping the content of the file to the virtual memory space of the process.

System call table

System call table is defined in Linux kernel source code.

For example, here is the syscall_64.tbl which defines 64-bit system call numbers and entry vectors.

# The format is:
# <number> <abi> <name> <entry point>
#
# The __x64_sys_*() stubs are created on-the-fly for sys_*() system calls
#
# The abi is "common", "64" or "x32" for this file.
#
0   common  read            sys_read
1   common  write           sys_write
2   common  open            sys_open
3   common  close           sys_close
4   common  stat            sys_newstat
5   common  fstat           sys_newfstat
6   common  lstat           sys_newlstat
7   common  poll            sys_poll
8   common  lseek           sys_lseek
9   common  mmap            sys_mmap
10  common  mprotect        sys_mprotect
11  common  munmap          sys_munmap

Example#

syscall.zip

Use printf function which is provided in userspace:

hello_userspace.c
#include <stdio.h>

void main() {
    printf("USER: Hello World!\n");
}}
gcc hello_userspace.c -o hello_userspace

Example of calling fwrite which invokes sys_write

Run with strace:

strace ./hello_userspace
execve("./hello_userspace", ["./hello_userspace"], 0x7ffd769d3740 /* 23 vars */) = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
... read dynamic library linking index
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
... read symbols in libc library
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
... obtain the standard output
write(1, "USER: Hello World!\n", 19USER: Hello World!)    = 19

You will see a list of system call invoked, including execve, access, open, read, write, close.

Function chain in a System Call

Use low-level syscall function

glibc offers you a function called syscall() that you can use to explore the system call interface. Note to add #define _GNU_SOURCE to access to low-level functions.

hello_syscall_glibc.c
#define _GNU_SOURCE
#include <sys/syscall.h>

// implemented in libc.so
long syscall(long number, ...);

void main() {
    syscall(SYS_write, 1, "SYSCALL: Hello World!\n", 22);
}
gcc hello_syscall_glibc.c -o hello_syscall_glibc

Call syscall directly using Assembly

We know that the ID of system call write is 1, we can invoke it through syscall instruction.

hello_syscall_asm.s
    .global _start

    .text

_start:
    # write(1, message, 26)
    mov     $1, %rax                # system call ID. 1 is write
    mov     $1, %rdi                # file handle 1 is stdout
    mov     $message, %rsi          # address of string to output
    mov     $26, %rdx               # string length
    syscall                         # system call invocation!

    # exit(0)
    mov     $60, %rax               # system call ID. 60 is exit
    xor     %rdi, %rdi              # we want return code 0
    syscall                         # system call invocation!

message:
    .ascii  "ASM SYSCALL: Hello World!\n"
gcc hello_syscall_asm.s -nostdlib -no-pie -o hello_syscall_asm


Run with strace to see only write system call is invoked:

strace ./hello_syscall_asm
execve("./hello_syscall_asm", ["./hello_syscall_asm"], 0x7ffe4fc483a0 /* 23 vars */) = 0
write(1, "ASM SYSCALL: Hello World!\n", 26ASM SYSCALL: Hello World!
) = 26

Exercise#

  1. Use ltrace -S instead of strace to investigate the system call order.

  2. strace can attach to a running process. Try it.

Comments