[flow-tools] Zombies on solaris from flow-capture -R ...

Mark Fullmer maf@eng.oar.net
Mon, 13 May 2002 13:13:41 -0400


You're right, this is broken.  The handler/reaper code has multiple
race conditions.  The sig_chld_counter should be a flag with a loop
around non blocking wait*().

Most configurations would create a child every 5 or 15 minutes so I
doubt anyone has ever run into the bug, still I don't know what I was
thinking when I wrote that :)

support.c has a function mysignal() that should fix the signal() semantic
problem.

mark

On Mon, May 13, 2002 at 06:15:31PM +0300, Jarkko Torppa wrote:
> 
> flow-capture -R creates zombies on solaris. Singals are only delivered 
> once on solaris, after that the signal is reset.
> 
> I see 3 ways to fix this:
> 
> use sigaction to set the signal
> set signals again in the handler
> use bsd_signal on solaris
> 
> Third alternative seems is easiest to implement, but I doubt it is 
> portable. Here is patch (agains 0.56) to use bsd_signal where appropriate.
> 
> This patch also tries to be bit more sensible about collecting zombies, 
> current code has the following problems:
>  - does not set WNOHANG on wait
>  - if two child processes terminate at the samemoment, it is not 
> guaranteed that the
>    handler will be called twice
> 
> This has not been tested too heavily yet
> 
> Index: flow-capture.c
> ===================================================================
> RCS file: /usr/cvsroot/kbus/flow-tools/src/flow-capture.c,v
> retrieving revision 1.1.1.1
> diff -u -r1.1.1.1 flow-capture.c
> --- flow-capture.c	2002/02/11 20:43:41	1.1.1.1
> +++ flow-capture.c	2002/05/13 15:13:31
> @@ -133,6 +133,13 @@
>  int calc_rotate (int next, double *trotate, int *cur);
>  double doubletime(void);
>  
> +/* On solaris signal are reset to default after being delivered
> +   once. bsd_signal behaves more nicely, something needs to be done
> +   for other sysV platforms */
> +#ifndef HAVE_BSDSIGNAL
> +# define bsd_signal(a,b) signal(a,b)
> +#endif
> +
>  int main(argc, argv)   
>  int argc;
>  char **argv;
> @@ -408,16 +415,16 @@
>     * configure signal handlers
>     */
>  
> -  if (signal(SIGPIPE, sig_pipe) < 0)
> +  if (signal(SIGPIPE, sig_pipe) == SIG_ERR)
>      fterr_err(1, "signal(SIGPIPE)");
>  
> -  if (signal(SIGHUP, sig_hup) < 0)
> +  if (bsd_signal(SIGHUP, sig_hup) == SIG_ERR)
>      fterr_err(1, "signal(SIGHUP)");
>  
> -  if (signal(SIGQUIT, sig_quit) < 0)
> +  if (signal(SIGQUIT, sig_quit) == SIG_ERR)
>      fterr_err(1, "signal(SIGQUIT)");
>  
> -  if (signal(SIGCHLD, sig_chld) < 0)
> +  if (bsd_signal(SIGCHLD, sig_chld) == SIG_ERR)
>      fterr_err(1, "signal(SIGCHLD)");
>  
>    /* sandbox */
> @@ -583,31 +590,39 @@
>      tt_now = now = doubletime();
>  
>      /* stake the zombies */
> -    while (sig_chld_counter) {
> -
> -      /* one less zombie */
> -      sig_chld_counter --;
> -
> -      child_pid = wait3(&child_status, 0, 0);
> -
> -      if (WIFEXITED(child_status)) {
> -
> -        if (WEXITSTATUS(child_status))
> -          fterr_warnx("Child %d exit_status=%d", (int)child_pid,
> -            WEXITSTATUS(child_status));
> -
> -      } else if (WIFSIGNALED(child_status)) {
> -
> -        fterr_warnx("Child %d signal=%d", (int)child_pid,
> -          WTERMSIG(child_status));
> -
> -      } else {
> -
> -        fterr_warnx("PID %d exited status=%d", (int)child_pid,
> -          child_status);
> -
> +    if(sig_chld_counter) {
> +      do {
> +	while((child_pid = waitpid(-1,&child_status, WNOHANG)) > 0) {
> +
> +	  /* one less zombie */
> +	  sig_chld_counter --;
> +	  
> +	  if (WIFEXITED(child_status)) {
> +	    if (WEXITSTATUS(child_status))
> +	      fterr_warnx("Child %d exit_status=%d", (int)child_pid,
> +			  WEXITSTATUS(child_status));
> +	  } else if (WIFSIGNALED(child_status)) {
> +	    fterr_warnx("Child %d signal=%d", (int)child_pid,
> +			WTERMSIG(child_status));
> +	  } else {
> +	    fterr_warnx("PID %d exited status=%d", (int)child_pid,
> +			child_status);
> +	  }
> +	}
> +      } while(child_pid > 0 || ( pid == -1 && errno == EINTR));
> +      if(child_pid == -1) {
> +	if(errno == ECHILD) {
> +	  /* No children left */
> +	  sig_chld_counter = 0;
> +	} else {
> +	  fterr_warn("Did not get child exit status sig_chld_counter=%d",
> +		     sig_chld_counter);
> +	}
> +      } else if (child_pid == 0) {
> +	fterr_warnx("Nobody was ready to report status sig_chld_counter=%d",
> +		    sig_chld_counter);
>        }
> -
> +      if (sig_chld_counter < 0) sig_chld_counter = 0;
>      } /* buffy */
>  
>      if (stat_interval) {
> 
> 
> -- 
> Jarkko Torppa, Elisa Internet
> 
> 
> 
> _______________________________________________
> flow-tools@splintered.net
> http://www.splintered.net/sw/flow-tools