Anatomy of a process

Table of Contents

Process
#

Program under execution is called a process. Its an instance that contains the program code and executes a set of tasks or instructions using the CPU. In other words when a program is loaded into memory and executed, it becomes a process.

Components of the Process
#

There are four key components of a process that make up its memory structure.

Text(code)
Data
Heap
Stack

Text(Code) Section : The loaded complied program code from disk is stored in this section.
Data Section : Data section is a read-write layer which contains initialized and uninitialized global/static variables in the program.
Stack : LIFO data structure which stores temporary data (local variables). It plays a crucial role in managing control flow and local data during program execution. It grows and shrinks dynamically with function calls and returns.
Heap : Grows upwards in memory, opposite to the direction of the stack. Its used for variables and data structures whose size may not be known at compile time or that need to exist beyond the scope of a single function call.

Note: When the stack and heap grows towards eachother, and if they collide, it results in errors like stackoverflow, segmentation faults.

Process Control Block (PCB):
#

PCB is a data structure maintained by the operating system to manage the process. It contains the program code or data along with all the metadata and state information about the process:

Process ID (PID)
Process State
CPU registers and program counter
Memory Management details
Scheduling and priority information
I/O status.
Resource Monitors.
Pointer to next PCB.

The PCB stores the context of a process, which is necessary for context switching the operation where the CPU switches from executing one process to another. The OS saves the current process’s state in its PCB before switching and loads the next process’s state from its PCB.
PCB provides the operating system with critical information for scheduling and resource allocation, enabling efficient multitasking and process management.

Note: PCB is stored in a protected memory area, often within the operating system kernel space, inaccessible to user processes

Lifecycle of a process:
#

New - The process is in the stage of being created.
Ready - The process has all the resources available that it needs to run, but the CPU is not executing this process’s instructions.
Running - The CPU is executing on this process’s instructions.
Waiting - The process is waiting for some resource to become available or for some event to occur. For example the process may be waiting for keyboard input, disk access request, inter-process messages, a timer to go off, or a child process to finish.
Terminated - The process has completed its execution

Special States:

Zombie: Finished execution but parent hasn’t acknowledged its termination.
Orphan: Parent has terminated before the child process.

Note: The load average indicate the average number of processes in the “Ready” state (waiting in the ready queue) over the last 1, 5, and 15 minutes, i.e. processes which has all the required resources to run but waiting for CPU.

System calls during the lifecycle of a process
#

A process moves from one state to another due to various system calls, internal kernel operations, and hardware interrupts in an operating system.

Operating System Bootup process with Init Process.
#

The first parent process that starts (PID-1) during the bootup time is the Init Process

Code snippet for spawning a new process from init.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/sysctl.h>
#include <sys/types.h>
#include <sys/wait.h>

// Function to get the parent PID (PPID) of a given PID using sysctl on macOS
pid_t get_ppid(pid_t pid) {
    struct kinfo_proc proc;
    int mib[4] = { CTL_KERN, KERN_PROC, KERN_PROC_PID, pid };
    size_t size = sizeof(proc);

    if (sysctl(mib, 4, &proc, &size, NULL, 0) == -1) {
        perror("sysctl failed");
        return -1;
    }
    return proc.kp_eproc.e_ppid;
}

int main() {
    pid_t pid = fork();

    if (pid < 0) {
        perror("fork failed");
        return 1;
    } else if (pid == 0) {
        // Child process prints its ancestry chain
        pid_t current_pid = getpid();
        printf("Process ancestry chain (PID -> PPID) starting from child process:\n");

        while (current_pid > 1) {
            pid_t ppid = get_ppid(current_pid);
            if (ppid == -1) {
                fprintf(stderr, "Failed to get PPID for PID %d\n", current_pid);
                break;
            }
            printf("PID: %d --> PPID: %d\n", current_pid, ppid);
            if (ppid == 1) {
                printf("Reached init process (PID 1)\n");
                break;
            }
            current_pid = ppid;
        }
        
        fflush(stdout);  // Flush stdout before execl

        // Replace process image with /bin/ls -l command
        execl("/bin/ls", "ls", "-l", (char *) NULL);

        // If execl returns, an error occurred
        perror("execl failed");
        _exit(1);
    } else {
        // Parent process prints info and waits for child
        printf("Parent process: PID = %d, Child PID = %d\n", getpid(), pid);
        waitpid(pid, NULL, 0);
    }

    return 0;
}

Comamnds:

gcc -o run_proc_tree process.c
./run_proc_tree

Output:

Explanation:

execl("/bin/ls", "ls", "-l", (char *)NULL);

This loads the /bin/ls executable and executes it with the arguments:

argv = “ls” (the program name by convention)
argv = “-l” (option to list files in long format)
End of arguments indicated by (char *)NULL

Process #

Components of the Process #

Process Control Block (PCB): #

Lifecycle of a process: #

System calls during the lifecycle of a process #

Operating System Bootup process with Init Process. #