[flow-tools] Zombies on solaris from flow-capture -R ...

Jarkko Torppa jarkko.torppa@elisa.fi
Mon, 13 May 2002 18:15:31 +0300


flow-capture -R creates zombies on solaris. Singals are only delivered 
once on solaris, after that the signal is reset.

I see 3 ways to fix this:

use sigaction to set the signal
set signals again in the handler
use bsd_signal on solaris

Third alternative seems is easiest to implement, but I doubt it is 
portable. Here is patch (agains 0.56) to use bsd_signal where appropriate.

This patch also tries to be bit more sensible about collecting zombies, 
current code has the following problems:
 - does not set WNOHANG on wait
 - if two child processes terminate at the samemoment, it is not 
guaranteed that the
   handler will be called twice

This has not been tested too heavily yet

Index: flow-capture.c
===================================================================
RCS file: /usr/cvsroot/kbus/flow-tools/src/flow-capture.c,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 flow-capture.c
--- flow-capture.c	2002/02/11 20:43:41	1.1.1.1
+++ flow-capture.c	2002/05/13 15:13:31
@@ -133,6 +133,13 @@
 int calc_rotate (int next, double *trotate, int *cur);
 double doubletime(void);
 
+/* On solaris signal are reset to default after being delivered
+   once. bsd_signal behaves more nicely, something needs to be done
+   for other sysV platforms */
+#ifndef HAVE_BSDSIGNAL
+# define bsd_signal(a,b) signal(a,b)
+#endif
+
 int main(argc, argv)   
 int argc;
 char **argv;
@@ -408,16 +415,16 @@
    * configure signal handlers
    */
 
-  if (signal(SIGPIPE, sig_pipe) < 0)
+  if (signal(SIGPIPE, sig_pipe) == SIG_ERR)
     fterr_err(1, "signal(SIGPIPE)");
 
-  if (signal(SIGHUP, sig_hup) < 0)
+  if (bsd_signal(SIGHUP, sig_hup) == SIG_ERR)
     fterr_err(1, "signal(SIGHUP)");
 
-  if (signal(SIGQUIT, sig_quit) < 0)
+  if (signal(SIGQUIT, sig_quit) == SIG_ERR)
     fterr_err(1, "signal(SIGQUIT)");
 
-  if (signal(SIGCHLD, sig_chld) < 0)
+  if (bsd_signal(SIGCHLD, sig_chld) == SIG_ERR)
     fterr_err(1, "signal(SIGCHLD)");
 
   /* sandbox */
@@ -583,31 +590,39 @@
     tt_now = now = doubletime();
 
     /* stake the zombies */
-    while (sig_chld_counter) {
-
-      /* one less zombie */
-      sig_chld_counter --;
-
-      child_pid = wait3(&child_status, 0, 0);
-
-      if (WIFEXITED(child_status)) {
-
-        if (WEXITSTATUS(child_status))
-          fterr_warnx("Child %d exit_status=%d", (int)child_pid,
-            WEXITSTATUS(child_status));
-
-      } else if (WIFSIGNALED(child_status)) {
-
-        fterr_warnx("Child %d signal=%d", (int)child_pid,
-          WTERMSIG(child_status));
-
-      } else {
-
-        fterr_warnx("PID %d exited status=%d", (int)child_pid,
-          child_status);
-
+    if(sig_chld_counter) {
+      do {
+	while((child_pid = waitpid(-1,&child_status, WNOHANG)) > 0) {
+
+	  /* one less zombie */
+	  sig_chld_counter --;
+	  
+	  if (WIFEXITED(child_status)) {
+	    if (WEXITSTATUS(child_status))
+	      fterr_warnx("Child %d exit_status=%d", (int)child_pid,
+			  WEXITSTATUS(child_status));
+	  } else if (WIFSIGNALED(child_status)) {
+	    fterr_warnx("Child %d signal=%d", (int)child_pid,
+			WTERMSIG(child_status));
+	  } else {
+	    fterr_warnx("PID %d exited status=%d", (int)child_pid,
+			child_status);
+	  }
+	}
+      } while(child_pid > 0 || ( pid == -1 && errno == EINTR));
+      if(child_pid == -1) {
+	if(errno == ECHILD) {
+	  /* No children left */
+	  sig_chld_counter = 0;
+	} else {
+	  fterr_warn("Did not get child exit status sig_chld_counter=%d",
+		     sig_chld_counter);
+	}
+      } else if (child_pid == 0) {
+	fterr_warnx("Nobody was ready to report status sig_chld_counter=%d",
+		    sig_chld_counter);
       }
-
+      if (sig_chld_counter < 0) sig_chld_counter = 0;
     } /* buffy */
 
     if (stat_interval) {


-- 
Jarkko Torppa, Elisa Internet