TIL: Using waitpid to handle zombie processes without blocking parent process

2022-06-05 00:00:00 +0000 UTC

The fundamental parallelism primitive on Unix systems is the process. When writing C programs, a new process is spawned using the pid_t fork(void) syscall. A C program follows that shows the basic use of fork:

#include <unistd.h>
#include <sys/types.h>
#include <stdio.h>

int main(void) {
	pid_t pid = fork();

	if (pid < 0) {
		perror("fork error");
		return 1;
	}

	if (pid > 0) {
		puts("We are in the parent process!");
	}

	if (pid == 0) {
		printf("We are in a child process with PID %d\n", getpid()); // on my system pid_t is a signed 32 bit integer
		return 0;
	}

	return 0;
}

However in a longer-running program that may spawn many child processes, we may inadvertantly generate zombie processes. A zombie process is a Unix process that has had its execution terminated but cannot be cleaned up by the OS. Processes whose execution has ended are kept around with some data required to allow the parent process to send resumption signals to the child.

To let the OS know these processes can be cleaned up, we have to use a syscall in the waitpid family. Typically, these calls are blocking:

#include <sys/wait.h>
#include <sys/types.h>
// ... child processes ongoing ...

if (pid > 0) {
	// we are in the parent process
	int* pstat = 0;
	pstat = malloc(sizeof(int));

	// the following call to wait(...) will block until a child process terminates
	// equivalent to waitpid(-1, pstat, 0)
	pid_t pid = wait(pstat);
	
	// ...
}

If we need to continue execution on our parent process despite the status of child processes, we can instead set up a parent process main loop like this:

for(;;) {
	if (pid > 0 ){
		int* pstat = 0;
		pstat = malloc(sizeof(int));

		// when waitpid is called with the option WNOHANG, it returns 0
		// if no child process has changed state
		while(waitpid(-1, pstat, WNOHANG)) {
			// do something to handle each terminated child process
		}

		// the rest of our parent process main loop continues here
	} else if (pid == 0) {
		// child process work here
	}
}

The option flag WNOHANG causes waitpid to immediately return 0 if no child process has changed state since the last call. This allows us to continue work in the parent process while child processes continue to operate.

Documentation links from above: * man 2 fork * man 2 waitpid

Tags: til c linux