Previous Up Next

Chapter 2  Processes

2.1  Abstraction and virtualization

Before we talk about processes, I want to define a few words:

The word “virtual” is often used in the context of a virtual machine, which is software that creates the illusion of a dedicated computer running a particular operating system, when in reality the virtual machine might be running, along with many other virtual machines, on a computer running a different operating system.

In the context of virtualization, we sometimes call what is really happening “physical”, and what is virtually happening either “logical” or “abstract.”

2.2  Isolation

One of the most important principles of engineering is isolation: when you are designing a system with multiple components, it is usually a good idea to isolate them from each other so that a change in one component doesn’t have undesired effects on other components.

One of the most important goals of an operating system is to isolate each running program from the others so that programmers don’t have to think about every possible interaction. The software object that provides this isolation is a process.

A process is a software object that represents a running program. I mean “software object” in the sense of object-oriented programming; in general, an object contains data and provides methods that operate on the data. A process is an object that contains the following data:

Usually one process runs one program, but it is also possible for a process to load and run a new program.

It is also possible, and common, to run the same program in more than one process. In that case, the processes share the same program text but generally have different data and hardware states.

Most operating systems provide a fundamental set of capabilities to isolate processes from each other:

As a programmer, you don’t need to know much about how these capabilities are implemented. But if you are curious, you will find a lot of interesting things going on under the metaphorical hood. And if you know what’s going on, it can make you a better programmer.

2.3  UNIX processes

While I write this book, the process I am most aware of is my text editor, emacs. Every once in a while I switch to a terminal window, which is a window running a UNIX shell that provides a command-line interface.

When I move the mouse, the window manager wakes up, sees that the mouse is over the terminal window, and wakes up the terminal. The terminal wakes up the shell. If I type make in the shell, it creates a new process to run Make, which creates another process to run LaTeX and then another process to display the results.

If I need to look something up, I might switch to another desktop, which wakes up the window manager again. If I click on the icon for a web browser, the window manager creates a process to run the web browser. Some browsers, like Chrome, create a new process for each window and each tab.

And those are just the processes I am aware of. At the same time there are many other processes running in the background. Many of them are performing operations related to the operating system.

The UNIX command ps prints information about running processes. If you run it in a terminal, you might see something like this:

  PID TTY          TIME CMD
 2687 pts/1    00:00:00 bash
 2801 pts/1    00:01:24 emacs
24762 pts/1    00:00:00 ps

The first column is the unique numerical process ID. The second column is the terminal that created the process; “TTY” stands for teletypewriter, which was the original mechanical terminal.

The third column is the total processor time used by the process, in hours, minutes, and seconds. The last column is the name of the running program. In this example, bash is the name of the shell that interprets the commands I type in the terminal, emacs is my text editor, and ps is the program generating this output.

By default, ps lists only the processes associated with the current terminal. If you use the -e flag, you get every process (including processes belonging to other users, which is a security flaw, in my opinion).

On my system there are currently 233 processes. Here are some of them:

  PID TTY          TIME CMD
    1 ?        00:00:17 init
    2 ?        00:00:00 kthreadd
    3 ?        00:00:02 ksoftirqd/0
    4 ?        00:00:00 kworker/0:0
    8 ?        00:00:00 migration/0
    9 ?        00:00:00 rcu_bh
   10 ?        00:00:16 rcu_sched
   47 ?        00:00:00 cpuset
   48 ?        00:00:00 khelper
   49 ?        00:00:00 kdevtmpfs
   50 ?        00:00:00 netns
   51 ?        00:00:00 bdi-default
   52 ?        00:00:00 kintegrityd
   53 ?        00:00:00 kblockd
   54 ?        00:00:00 ata_sff
   55 ?        00:00:00 khubd
   56 ?        00:00:00 md
   57 ?        00:00:00 devfreq_wq

init is the first process created when the operating system starts. It creates many of the other processes, and then sits idle until the processes it created are done.

kthreadd is a process the operating system uses to create new threads. We’ll talk more about threads later, but for now you can think of a thread as kind of a process. The k at the beginning stands for kernel, which is the part of the operating system responsible for core capabilities like creating threads. The extra d at the end stands for daemon, which is another name for processes like this that run in the background and provide operating system services. In this context, “daemon” is used in the sense of a helpful spirit, with no connotation of evil.

Based on the name, you can infer that ksoftirqd is also a kernel daemon; specifically, it handles software interrupt requests, or “soft IRQ”.

kworker is a worker process created by the kernel to do some kind of processing for the kernel.

There are often multiple processes running these kernel services. On my system at the moment, there are 8 ksoftirqd processes and 35 kworker processes.

I won’t go into more details about the other processes, but if you are interested you can search for more information about them. You should run ps on your system and compare your results to mine.


Previous Up Next