drz5553 2015-03-13 16:13
浏览 102
已采纳

在PHP中处理respawn和信号处理

Specifics

I have an issue in PHP, when respawned processes are not handling signals, while before respawning, handling working correctly. I narrowed down my code to the very basic:

declare(ticks=1);

register_shutdown_function(function() {
    if ($noRethrow = ob_get_contents()) {
        ob_end_clean();
        exit;
    }
    system('/usr/bin/nohup /usr/bin/php '.__FILE__. ' 1>/dev/null 2>/dev/null &');
});

function handler($signal)
{
    switch ($signal) {
        case SIGTERM:
            file_put_contents(__FILE__.'.log', sprintf('Terminated [ppid=%s] [pid=%s]'.PHP_EOL, posix_getppid(), posix_getpid()), FILE_APPEND);
            ob_start();
            echo($signal);
            exit;
        case SIGCONT:
            file_put_contents(__FILE__.'.log', sprintf('Restarted [ppid=%s] [pid=%s]'.PHP_EOL, posix_getppid(), posix_getpid()), FILE_APPEND);
            exit;
    }
}

pcntl_signal(SIGTERM, 'handler');
pcntl_signal(SIGCONT, 'handler');

while(1) {
    if (time() % 5 == 0) {
        file_put_contents(__FILE__.'.log', sprintf('Idle [ppid=%s] [pid=%s]'.PHP_EOL, posix_getppid(), posix_getpid()), FILE_APPEND);
    }
    sleep(1);
}

As you can see, it does following:

  • Registering shutdown function, in which respawn a process with nohup (so, to ignore SIGHUP when parent process dies)
  • Registering handler via pcntl_signal() for SIGTERM and SIGCONT. First will just log a message that process was terminated, while second will lead to respawn of the process. It is achieved with ob_* functions, so to pass a flag, what should be done in shutdown function - either exit or respawn.
  • Logging some information that script is "alive" to log file.

What is happening

So, I'm starting script with:

/usr/bin/nohup /usr/bin/php script.php 1>/dev/null 2>/dev/null &

Then, in log file, there are entries like:

Idle [ppid=7171] [pid=8849]
Idle [ppid=7171] [pid=8849]

Let's say, then I do kill 8849:

Terminated [ppid=7171] [pid=8849]

Thus, it is successful handling of SIGTERM (and script indeed exits). Now, if I instead do kill -18 8849, then I see (18 is numeric value for SIGCONT):

Idle [ppid=7171] [pid=8849]
Restarted [ppid=7171] [pid=8849]
Idle [ppid=1] [pid=8875]
Idle [ppid=1] [pid=8875]

And, therefore: first, SIGCONT was also handled correctly, and, judging by next "Idle" messages, newly spawned instance of script is working well.

Update #1 : I was thinking about stuff with ppid=1 (thus, init global process) and orphan processes signal handling, but it's not the case. Here is log part, which shows that orphan (ppid=1) process isn't the reason: when worker is started by controlling app, it also invokes it with system() command - same way like worker respawns itself. But, after controlling app invokes worker, it has ppid=1 and responds to signals correctly, while if worker respawns itself, new copy is not responding to them, except SIGKILL. So, issue appears only when worker respawns itself.

Update #2 : I tried to analyze what is happening with strace. Now, here are two blocks.

  1. When worker was yet not respawned - strace output. Take a look on lines 4 and 5, this is when I send SIGCONT, thus kill -18 to a process. And then it triggers all the chain: writing to the file, system() call and exiting current process.
  2. When worker was already respawned by itself - strace output. Here, take a look to lines 8 and 9 - they appeared after receiving SIGCONT. First of: looks like process is still somehow receiving a signal, and, second, it ignores the signal. No actions were done, but process was notified by the system that SIGCONT was sent. Why then the process ignores it - is the question (because, if installing of user handler for SIGCONT failed, then it should end execution, while process is not ended). As for SIGKILL, then output for already respawned worker is like:

    nanosleep({1, 0},  <unfinished ...>
    +++ killed by SIGKILL +++
    

Which indicates, that signal was received and did what it should do.

The problem

As the process is respawn, it is not reacting neither to SIGTERM, nor to SIGCONT. However, it is still possible to end it with SIGKILL (so, kill -9 PID indeed ends the process). For example, for process above both kill 8875 and kill -18 8875 will do nothing (process will ignore signals and continue to log messages).

However, I would not say that registering signals is failing completely - because it redefines at least SIGTERM (which normally leads to termination, while in this case it is ignored). Also I suspect that ppid = 1 points to some wrong thing, but I can not say for sure now.

Also, I tried any other kind of signals (in fact, it didn't matter what is the signal code, result was always the same)

The question

What could be the reason of such behavior? Is the way, which I'm respawning a process, correct? If not, what are other options which will allow newly spawned process to use user-defined signal handlers correctly?

  • 写回答

2条回答 默认 最新

  • dongni3854 2015-03-16 12:33
    关注

    Solution : Eventually, strace helped to understand the problem. This is as follows:

    nanosleep({1, 0}, {0, 294396497})       = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
    restart_syscall(<... resuming interrupted call ...>) = 0
    

    Thus, it shows the signal was received, but ignored. To fully answer the question, I will need to figure out, why process added signals to ignore list, but unblocking them forcefully with pcntl_sigprocmask() is doing the thing:

    pcntl_sigprocmask(SIG_UNBLOCK, [SIGTERM, SIGCONT]);
    

    then all goes well and respawned process receives/handles signals as it is intended. I tried to add only SIGCONT for unblocking, for example - and then it was handled correctly, while SIGTERM was blocked, which points to the thing, that it is exactly the reason of failing to dispatch signals.

    Resolution : for some reason, when process spawns itself with signal handlers installed, new instance has those signals masked for ignoring. Unmasking them forcefully solves the issue, but why signals are masked in new instance - that is an open question for now.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 sub地址DHCP问题
  • ¥15 delta降尺度计算的一些细节,有偿
  • ¥15 Arduino红外遥控代码有问题
  • ¥15 数值计算离散正交多项式
  • ¥30 数值计算均差系数编程
  • ¥15 redis-full-check比较 两个集群的数据出错
  • ¥15 Matlab编程问题
  • ¥15 训练的多模态特征融合模型准确度很低怎么办
  • ¥15 kylin启动报错log4j类冲突
  • ¥15 超声波模块测距控制点灯,灯的闪烁很不稳定,经过调试发现测的距离偏大