TIL: Using waitpid to handle zombie processes without blocking parent process
2022-06-05 00:00:00 +0000 UTCThe fundamental parallelism primitive on Unix systems is the process. When writing C programs, a new process is spawned using the pid_t fork(void)
syscall. A C program follows that shows the basic use of fork
:
#include <unistd.h>
#include <sys/types.h>
#include <stdio.h>
int main(void) {
pid_t pid = fork();
if (pid < 0) {
perror("fork error");
return 1;
}
if (pid > 0) {
puts("We are in the parent process!");
}
if (pid == 0) {
printf("We are in a child process with PID %d\n", getpid()); // on my system pid_t is a signed 32 bit integer
return 0;
}
return 0;
}
However in a longer-running program that may spawn many child processes, we may inadvertantly generate zombie processes. A zombie process is a Unix process that has had its execution terminated but cannot be cleaned up by the OS. Processes whose execution has ended are kept around with some data required to allow the parent process to send resumption signals to the child.
To let the OS know these processes can be cleaned up, we have to use a syscall in the waitpid
family. Typically, these calls are blocking:
#include <sys/wait.h>
#include <sys/types.h>
// ... child processes ongoing ...
if (pid > 0) {
// we are in the parent process
int* pstat = 0;
pstat = malloc(sizeof(int));
// the following call to wait(...) will block until a child process terminates
// equivalent to waitpid(-1, pstat, 0)
pid_t pid = wait(pstat);
// ...
}
If we need to continue execution on our parent process despite the status of child processes, we can instead set up a parent process main loop like this:
for(;;) {
if (pid > 0 ){
int* pstat = 0;
pstat = malloc(sizeof(int));
// when waitpid is called with the option WNOHANG, it returns 0
// if no child process has changed state
while(waitpid(-1, pstat, WNOHANG)) {
// do something to handle each terminated child process
}
// the rest of our parent process main loop continues here
} else if (pid == 0) {
// child process work here
}
}
The option flag WNOHANG
causes waitpid
to immediately return 0 if no child process has changed state since the last call. This allows us to continue work in the parent process while child processes continue to operate.
Documentation links from above: * man 2 fork * man 2 waitpid