• LOGIN
  • No products in the cart.

Unix Kernel – System Calls

All of the most basic operating system commands are performed directly by the kernel.

By Mark Sitkowski C.Eng, M.I.E.E Design Simulation Systems Ltd

These include:

 open()
 close()
 dup()
 read()
 write()
 fcntl()
 ioctl()
 fork()
 exec()
 kill()

fork()

Under pre-Unix operating systems, starting a process from within another process was traditionally performed as a single operation. One command magically placed the executable into memory, and handed over control and ownership to the operating system, which made the new process run.
Unix doesn’t do that.
Each process has a hierarchical relationship with its parent, the process which brought it, together with its child or children to life. In turn, they are processes which it started by itself. All such related processes are part of a process group.
If a kill() signal is sent to the parent of the process group, it will propagate to the child processes.
Unix also has the concept of a ‘session’ which, essentially, can be thought of as comprising all of the process groups associated with a login or TCP/IP connection.
The basic mechanism which initiates the birth of a new process is fork().
The fork() system call makes a running copy of the process which called it. All memory addresses are re-mapped, and all open file descriptors remain open. Also, file pointers maintain the same file position in the child as they do in the parent.
Consider the following code fragment:

pid_t pid;

switch((pid = fork()){
 case –1:
 printf(“fork failed\n”);
 break;
 case 0:
 printf(“Child process running\n”);
 some_child_function();
 break;
 default:
 printf(“Parent process executes this code\n”);
 break;
 }

At the time that the fork() system call is called, there is only one process in existence, that of the expectant parent. The local variable pid is on the stack, probably uninitialized.
The system call is executed and, now, there are two identical running processes, both executing the same code. Both parent and the new child process simultaneously check the variable pid on the stack. The child finds that the value is zero, and knows from this that it is the child. It then executes some_child_function() and continues on a separate execution path.
The parent does not see zero, so it executes the ‘default’ part of the switch() statement. It sees the process ID of the new child process and drops through the bottom of the switch(). Note that, if we do not call a different function in the case 0: section of the switch, that both parent and child will continue to execute the same code, since the child will also drop through the bottom of the switch().
Programmers who know little about Unix will have a piece of folklore rattling around in their heads, which says ‘a fork() is expensive. You have to copy an entire process in memory, which is slow, if the process is large’.
This is true, as far as it goes. There is a memory-to-memory copy of that part of the parent which is resident in memory. So, you may have to wait a few milliseconds.
However, we are not concerned with trivial processes whose total run time is affected by those few milliseconds. We are dealing exclusively with processes whose run times are measured in hours. Hence, we consider a one-time penalty of a few milliseconds to be insignificant.
When a parent forks a child process, on a multi-processor machine, the Unix kernel places the child process onto its separate CPU. If the parent forks twelve children, on a twelve CPU machine, each child will run on one of the twelve CPU’s.
In an attempt to perform load-balancing, the kernel will shuffle the processes around the CPU’s but, basically, they will remain on separate processors.
The fork() system call is one of the most useful tools for the full utilization of a multi-processor machine’s resources. Furthermore, it should be used whenever one or more functions are called which can proceed their tasks in parallel. Not only is the total run time reduced to that of the longest-running function, but each function will execute on its CPU.

vfork()

There is a BSD variant of fork() which was designed to reduce the memory usage overhead associated with copying, possibly, a huge process in memory.
The semantics of vfork() are the same as those of fork(), but the operation is slightly different. Vfork() only copies the page of the calling process which is currently in memory. However, due to a bug (or feature), it permits the two processes to share the same stack. As a result, if the child makes any changes to variables local to the function which called vfork(), the changes will be visible to the parent.
Knowledge of this fact has enabled experienced programmers to make use of the advantages of vfork() while avoiding the pitfalls. However, far more subtle bugs also exist. Therefore, most Unix vendors recommend that vfork() only be used if it is immediately followed by an exec().

exec()

The original thinking behind fork() was that its primary use would be to create new processes – not just copies of the parent process. The exec() system call achieves this by overlaying the memory image of the calling process with the new process.
There is a good reason for separating fork() and exec(), rather than having the equivalent of VMS’s spawn() function which combines the two. The reason being, it is sometimes necessary or convenient to perform some operations in between fork() and exec(). For example, is may be necessary to run the child process as a different user, like root, or to change directory, or both.
There is, in fact, no such call as exec()There are two main variants, execl() and execv().
The semantics of execl() are as follows:

 execl(char *path, char *arg0, char *arg1…char *argn, (char *) NULL)
 execv(char *path, char *arg0, char **argv)

It may be seen that the principal difference between the two variants is that, whereas the execl() family takes a path, followed by separate arguments, in a NULL terminated, comma-separated list, whereas the execv() variants take a path, and a vector, similar to the argv[] vector, passed to a main() function.
The first variant of execl() and execv() adds an environment vector to the end of the argument list:

 execle(char *path, char *arg0, .…char *argn, (char *) NULL, char **envp)
 execve(char *path, char *arg0, char **argv, char **envp)

The second variant replaces the ‘path’ argument with a ‘file’ argument. If this latter contains a slash, it is used as a path. Otherwise, the PATH environment variable of the calling process is used to find the file.

execlp(char *file, char *arg0, .…char *argn, (char *) NULL, char **envp)
 execvp(char *file, char *arg0, char **argv, char **envp)

We can now combine fork() and exec() to execute lpr from the parent process. In order to print a file:

pid_t pid;

switch((pid = fork()){
 case –1:
 printf(“fork failed\n”);
 break;
 case 0:
 printf(“Child process running\n”);
 execl(“/usr/ucb/lpr”, “lpr”, “/tmp/file”, (char *) NULL);
 break;
 default:
 printf(“Parent process has executed lpr to print a file\n”);
 break;
 }

The above code only has one problem. If the parent process quits, the child process will become an orphan and be adopted by the ‘init’ process. When lpr has run to completion, it will become a zombie process and waste a slot in the process table. The same happens if the child prematurely exits due to some fault.

There are two solutions to this problem

We execute one of the wait() family of system calls.
A waited-for child does not become a zombie, but the parent must suspend processing until the child terminates, which may or may not be a disadvantage. There are options which allow processing to continue during the wait. But the parent needs to poll waitpid(), which makes our second solution, described below, a much better option.
If we are waiting for a specific process, the most convenient call is to waitpid(). The synopsis of this call is:
pid_t waitpid(pid_t pid, int *status, int options);

The call to waitpid() returns the process ID of the child for which we are waiting, whose process ID is passed in as the first argument, ‘pid’.
The second argument, ‘status’, is the returned child process exit status, and ‘options’ is the bitwise-OR of the following flags:

 WNOHANG : prevents waitpid() from causing the parent process to hang, if there is no immediate return.
 WNOWAIT : keeps the process whose status is returned in a waitable state, so that it may be waited for again.

We set the options flags to zero since they are of no use to us. The status word, however, provides useful information on how our child terminated, and can be decoded with the macros, described in the manual page for ‘wstat’.

pid_t pid;
 int status;

switch((pid = fork()){
 case –1:
 printf(“fork failed\n”);
 break;
 case 0:
 printf(“Child process running\n”);
 execl(“/usr/ucb/lpr”, “lpr”, “/tmp/file”, (char *) NULL);
 break;
 default:
 printf(“Parent process has executed lpr to print a file\n”);
 if(waitpid(pid, &status, 0) == pid){
 printf(“lpr has now finished\n”);
 }
 break;
 }

If we don’t wish to poll waitpid() repeatedly, but need to do other processing, while the child process goes about its business, we need to effectively disown the child process.
As soon as the child has successfully forked, we must disassociate it from the process group of the parent.
Process groups and sessions are discussed at the beginning of the fork() section, However, to save you the trouble of looking, a process group is headed by the parent process whose process ID becomes the group’s process group ID. All children of the parent then share this process group ID.
The disowning of a child process is accomplished by executing the system call setpgrp() or setsid(), (both of which have the same functionality) as soon as the child is forked. These calls create a new process session group, make the child process the session leader, and set the process group ID to the process ID of the child.
The complete code is as below:

pid_t pid;

switch((pid = fork()){
 case –1:
 printf(“fork failed\n”);
 break;
 case 0:
 if(setpgrp() == -1){
 printf(“Can’t set pgrp\n”);
 }
 printf(“Independent child process running\n”);
 execl(“/usr/ucb/lpr”, “lpr”, “/tmp/file”, (char *) NULL);
 break;
 default:
 printf(“Parent process has executed lpr to print a file\n”);
 break;
 }
 

open() close() dup() read() write()

These system calls are primarily concerned with files, but since Unix treats almost everything as a file, most of them can be used on any byte-orientated device, including sockets and pipes.

int open(char *file, int how, int mode);

open() returns a file descriptor to a file, which it opens for reading, writing or both. The ‘file’ argument is the file name, with or without a path, while ‘how’ is the bitwise-OR of some of the following flags defined in fcntl.h:

O_RDONLY Read only
O_WRONLY Write only
O_RDWR Read/write
O_TRUNC Truncate on opening
O_CREAT Create if non-existent

The ‘mode’ argument is optional, and defines the permissions on the file using the same flags as chmod.

Int close(int fd);

Closes the file which was originally opened with the file descriptor, fd.

Int dup(int fd);

Returns a file descriptor which is identical to that passed in as an argument, but with a different number. At a first glance, this call seems fairly useless. But in fact, it permits some powerful operations like bi-directional pipes, where we need a pipe descriptor to become a standard input or output. Also, client-server systems need to listen for incoming connections on a fixed socket descriptor while handling existing connections on different descriptors.

November 20, 2017

Leave a Reply

1 Comment on "Unix Kernel – System Calls"

Notify of
avatar
Sort by:   newest | oldest | most voted
fluca1978
Member

Interesting article, I would note that when teaching the fork(2) syscal I used a pattern like the following:

int parent_pid = getpid();

// fork
int my_pid = getpid();
if ( my_pid == parent_pid ) { // parent process }
else { // child process }

Also please note that the file descriptors are inherited by parent process.

wpDiscuz
© HAKIN9 MEDIA SP. Z O.O. SP. K. 2013