drz5553
2015-03-13 16:13
浏览 102

在PHP中处理respawn和信号处理

Specifics

I have an issue in PHP, when respawned processes are not handling signals, while before respawning, handling working correctly. I narrowed down my code to the very basic:

declare(ticks=1);

register_shutdown_function(function() {
    if ($noRethrow = ob_get_contents()) {
        ob_end_clean();
        exit;
    }
    system('/usr/bin/nohup /usr/bin/php '.__FILE__. ' 1>/dev/null 2>/dev/null &');
});

function handler($signal)
{
    switch ($signal) {
        case SIGTERM:
            file_put_contents(__FILE__.'.log', sprintf('Terminated [ppid=%s] [pid=%s]'.PHP_EOL, posix_getppid(), posix_getpid()), FILE_APPEND);
            ob_start();
            echo($signal);
            exit;
        case SIGCONT:
            file_put_contents(__FILE__.'.log', sprintf('Restarted [ppid=%s] [pid=%s]'.PHP_EOL, posix_getppid(), posix_getpid()), FILE_APPEND);
            exit;
    }
}

pcntl_signal(SIGTERM, 'handler');
pcntl_signal(SIGCONT, 'handler');

while(1) {
    if (time() % 5 == 0) {
        file_put_contents(__FILE__.'.log', sprintf('Idle [ppid=%s] [pid=%s]'.PHP_EOL, posix_getppid(), posix_getpid()), FILE_APPEND);
    }
    sleep(1);
}

As you can see, it does following:

  • Registering shutdown function, in which respawn a process with nohup (so, to ignore SIGHUP when parent process dies)
  • Registering handler via pcntl_signal() for SIGTERM and SIGCONT. First will just log a message that process was terminated, while second will lead to respawn of the process. It is achieved with ob_* functions, so to pass a flag, what should be done in shutdown function - either exit or respawn.
  • Logging some information that script is "alive" to log file.

What is happening

So, I'm starting script with:

/usr/bin/nohup /usr/bin/php script.php 1>/dev/null 2>/dev/null &

Then, in log file, there are entries like:

Idle [ppid=7171] [pid=8849]
Idle [ppid=7171] [pid=8849]

Let's say, then I do kill 8849:

Terminated [ppid=7171] [pid=8849]

Thus, it is successful handling of SIGTERM (and script indeed exits). Now, if I instead do kill -18 8849, then I see (18 is numeric value for SIGCONT):

Idle [ppid=7171] [pid=8849]
Restarted [ppid=7171] [pid=8849]
Idle [ppid=1] [pid=8875]
Idle [ppid=1] [pid=8875]

And, therefore: first, SIGCONT was also handled correctly, and, judging by next "Idle" messages, newly spawned instance of script is working well.

Update #1 : I was thinking about stuff with ppid=1 (thus, init global process) and orphan processes signal handling, but it's not the case. Here is log part, which shows that orphan (ppid=1) process isn't the reason: when worker is started by controlling app, it also invokes it with system() command - same way like worker respawns itself. But, after controlling app invokes worker, it has ppid=1 and responds to signals correctly, while if worker respawns itself, new copy is not responding to them, except SIGKILL. So, issue appears only when worker respawns itself.

Update #2 : I tried to analyze what is happening with strace. Now, here are two blocks.

  1. When worker was yet not respawned - strace output. Take a look on lines 4 and 5, this is when I send SIGCONT, thus kill -18 to a process. And then it triggers all the chain: writing to the file, system() call and exiting current process.
  2. When worker was already respawned by itself - strace output. Here, take a look to lines 8 and 9 - they appeared after receiving SIGCONT. First of: looks like process is still somehow receiving a signal, and, second, it ignores the signal. No actions were done, but process was notified by the system that SIGCONT was sent. Why then the process ignores it - is the question (because, if installing of user handler for SIGCONT failed, then it should end execution, while process is not ended). As for SIGKILL, then output for already respawned worker is like:

    nanosleep({1, 0},  <unfinished ...>
    +++ killed by SIGKILL +++
    

Which indicates, that signal was received and did what it should do.

The problem

As the process is respawn, it is not reacting neither to SIGTERM, nor to SIGCONT. However, it is still possible to end it with SIGKILL (so, kill -9 PID indeed ends the process). For example, for process above both kill 8875 and kill -18 8875 will do nothing (process will ignore signals and continue to log messages).

However, I would not say that registering signals is failing completely - because it redefines at least SIGTERM (which normally leads to termination, while in this case it is ignored). Also I suspect that ppid = 1 points to some wrong thing, but I can not say for sure now.

Also, I tried any other kind of signals (in fact, it didn't matter what is the signal code, result was always the same)

The question

What could be the reason of such behavior? Is the way, which I'm respawning a process, correct? If not, what are other options which will allow newly spawned process to use user-defined signal handlers correctly?

  • 写回答
  • 好问题 提建议
  • 关注问题
  • 收藏
  • 邀请回答

2条回答 默认 最新

  • dongni3854 2015-03-16 12:33
    已采纳

    Solution : Eventually, strace helped to understand the problem. This is as follows:

    nanosleep({1, 0}, {0, 294396497})       = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
    restart_syscall(<... resuming interrupted call ...>) = 0
    

    Thus, it shows the signal was received, but ignored. To fully answer the question, I will need to figure out, why process added signals to ignore list, but unblocking them forcefully with pcntl_sigprocmask() is doing the thing:

    pcntl_sigprocmask(SIG_UNBLOCK, [SIGTERM, SIGCONT]);
    

    then all goes well and respawned process receives/handles signals as it is intended. I tried to add only SIGCONT for unblocking, for example - and then it was handled correctly, while SIGTERM was blocked, which points to the thing, that it is exactly the reason of failing to dispatch signals.

    Resolution : for some reason, when process spawns itself with signal handlers installed, new instance has those signals masked for ignoring. Unmasking them forcefully solves the issue, but why signals are masked in new instance - that is an open question for now.

    已采纳该答案
    评论
    解决 无用
    打赏 举报
  • doumianfeng5065 2015-03-16 10:13

    It's due to the fact, that you spawn a child process by executing system(foo), and then proceeding with dying of the current process. Hence, the process becomes an orphan, and its parent becomes PID 1 (init).

    You can see the change using pstree command.

    Before:

    init─┬─cron
    (...)
         └─screen─┬─zsh───pstree
                  ├─3*[zsh]
                  ├─zsh───php
                  └─zsh───vim
    

    After:

    init─┬─cron
    (...)
         └─php
    

    What wikipedia states:

    Orphan processes is kind of the opposite situation of zombie processes, since it refers to the case where a parent process terminates before its child processes, in which case these children are said to become "orphaned".

    Unlike the asynchronous child-to-parent notification that happens when a child process terminates (via the SIGCHLD signal), child processes are not notified immediately when their parent finishes. Instead, the system simply redefines the "parent-pid" field in the child process's data to be the process that is the "ancestor" of every other process in the system, whose pid generally has the value 1 (one), and whose name is traditionally "init". It is thus said that "init 'adopts' every orphan process on the system".

    For your situation, I would suggest two options:

    • Use two scripts: one for managing the child, and second one, "worker", to actually perform the job,
    • or, use one script, that will include both: outer part will manage, inner part, forked from outer, will do the job.
    评论
    解决 无用
    打赏 举报

相关推荐 更多相似问题