doubi2145
2011-05-24 07:56
浏览 58
已采纳

PHP中的并行处理 - 你是如何做到的?

I am currently trying to implement a job queue in php. The queue will then be processed as a batch job and should be able to process some jobs in parallel.

I already did some research and found several ways to implement it, but I am not really aware of their advantages and disadvantages.

E.g. doing the parallel processing by calling a script several times through fsockopen like explained here:
Easy parallel processing in PHP

Another way I found was using the curl_multi functions.
curl_multi_exec PHP docs

But I think those 2 ways will add pretty much overhead for creating batch processing on a queue taht should mainly run on the background?

I also read about pcntl_fork wich also seems to be a way to handle the problem. But that looks like it can get really messy if you don't really know what you are doing (like me at the moment ;)).

I also had a look at Gearman, but there I would also need to spawn the worker threads dynamically as needed and not just run a few and let the gearman job server then sent it to the free workers. Especially because the threads should be exit cleanly after one job has been executed, to not run into eventual memory leaks (code may not be perfect in that issue).
Gearman Getting Started

So my question is, how do you handle parallel processing in PHP? And why do you choose your method, which advantages/disadvantages may the different methods have?

Thanks for any input.

图片转代码服务由CSDN问答提供 功能建议

我目前正在尝试在php中实现作业队列。 然后,队列将作为批处理作业处理,并且应该能够并行处理某些作业。

我已经做了一些研究并找到了几种方法来实现它,但我并没有真正意识到它们的优点和缺点。

E.g。 通过 fsockopen 多次调用脚本进行并行处理,如下所述:
PHP中的简单并行处理

我发现的另一种方法是使用 curl_multi 函数。
curl_multi_exec PHP文档

但我认为这两种方式将会 添加相当多的开销用于在队列上创建批处理taht应该主要在后台运行吗?

我还读到了 pcntl_fork 这似乎也是一种方式 处理问题。 但是,如果你真的不知道自己在做什么(就像我现在一样;),那看起来就会变得非常混乱。)

我还看了 Gearman ,但在那里我还需要根据需要动态生成工作线程,而不是只运行一些并让齿轮工作服务器然后将其发送给自由工作者。 特别是因为线程应该在执行一个作业后干净利落地退出,不会遇到最终的内存泄漏(代码在这个问题上可能不完美)。
Gearman入门

所以我的问题是,你如何处理PHP中的并行处理? 为什么选择你的方法,不同的方法有哪些优点/缺点?

感谢任何输入。

  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

7条回答 默认 最新

  • doubi1713 2011-05-24 08:19
    已采纳

    i use exec(). Its easy and clean. You basically need to build a thread manager, and thread scripts, that will do what you need.

    I dont like fsockopen() because it will open a server connection, that will build up and may hit the apache's connection limit

    I dont like curl functions for the same reason

    I dont like pnctl because it needs the pnctl extension available, and you have to keep track of parent/child relations.

    never played with gearman...

    已采纳该答案
    打赏 评论
  • dongyaobo9081 2011-05-24 10:09

    I prefer exec() and gearman. exec() is easy and no connection and less memory consuming. gearman should need a socket connection and the worker should take some memory. But gearman is more flexible and faster than exec(). And the most important is that it can deploy the worker in other server. If the work is time and resource consuming. I'm using gearman in my current project.

    打赏 评论
  • douzhan5262 2011-05-24 10:20

    If your application is going to run under a unix/linux enviroment I would suggest you go with the forking option. It's basically childs play to get it working. I have used it for a Cron manager and had code for it to revert to a Windows friendly codepath if forking was not an option.

    The options of running the entire script several times do, as you state, add quite a bit of overhead. If your script is small it might not be a problem. But you will probably get used to doing parallel processing in PHP by the way you choose to go. And next time when you have a job that uses 200mb of data it might very well be a problem. So you'd be better of learning a way that you can stick with.

    I have also tested Gearman and I like it a lot. There are a few thing to think about but as a whole it offers a very good way to distribute works to different servers running different applications written in different languages. Besides setting it up, actually using it from within PHP, or any other language for that matter, is... once again... childs play.

    It could very well be overkill for what you need to do. But it will open your eyes to new possibilities when it comes to handling data and jobs, so I would recommend you to try Gearman for that fact alone.

    打赏 评论
  • duanlin6989 2011-05-24 10:30

    I use PHP's pnctl - it is good as long as you know what you do. I understand you situation but I don't think it's something difficult to understand our code, we just have to be little more conscious than ever when implementing JOB queue or Parallel process.

    I feel as long as you code it perfectly and make sure the flow is perfect off-course you should keep PARALLEL PROCESS in mind when you implement.

    Where you could do mistakes:

    1. Loops - should be able to handle by GLOBAL vars.
    2. Processing some set of transactions - again as long as you define the sets proper, you should be able to get it done.

    Take a look at this example - https://github.com/rakesh-sankar/Tools/blob/master/PHP/fork-parallel-process.php.

    Hope it helps.

    打赏 评论
  • dskvfdxgdo2422392 2011-05-24 10:39

    The method described in 'Easy parallel processing in PHP' is downright scary - the principle is OK - but the implementation??? As you've already pointed out the curl_multi_ fns provide a much better way of implementing this approach.

    But I think those 2 ways will add pretty much overhead

    Yes, you probably don't need a client and server HTTP stack for handing off the job - but unless you're working for Google, your development time is much more expensive than your hardware costs - and there are plenty of tools for managing HTTP/analysing performance - and there is a defined standard covering stuff such as status notifications and authentication.

    A lot of how you implement the solution depends on the level transactional integrity you require and whether you require in-order processing.

    Out of the approaches you mention I'd recommend focussing on the HTTP request method using curl_multi_ . But if you need good transactional control / in order delivery then you should definitely run a broker daemon between the source of the messages and the processing agents (there is a well written single threaded server suitable for use as a framework for the broker here). Note that the processing agents should process a single message at a time.

    If you need a highly scalable solution, then take a look at a proper message queuing system such as RabbitMQ.

    HTH

    C.

    打赏 评论
  • doushajian2018 2016-04-06 02:53

    Well I guess we have 3 options there:

    A. Multi-Thread:

    PHP does not support multithread natively. But there is one PHP extension (experimental) called pthreads (https://github.com/krakjoe/pthreads) that allows you to do just that.

    B. Multi-Process:

    This can be done in 3 ways:

    • Forking
    • Executing Commands
    • Piping

    C. Distributed Parallel Processing:

    How it works:

    1. The Client App sends data (AKA message) “can be JSON formatted” to the Engine (MQ Engine) “can be local or external a web service”
    2. The MQ Engine stores the data “mostly in Memory and optionally in Database” inside a queues (you can define the queue name)
    3. The Client App asks the MQ Engine for a data (message) to be processed them in order (FIFO or based on priority) “you can also request data from specific queue".


    Some MQ Engines:

    • ZeroMQ (good option, hard to use) a message orientated IPC Library, is a Message Queue Server in Erlang, stores jobs in memory. It is a socket library that acts as a concurrency framework. Faster than TCP for clustered products and supercomputing.
    • RabbitMQ (good option, easy to use) self hosted, Enterprise Message Queues, Not really a work queue - but rather a message queue that can be used as a work queue but requires additional semantics.
    • Beanstalkd (best option, easy to use) (Laravel built in support, built by facebook, for work queue) - has a "Beanstalkd console" tool which is very nice
    • Gearman (problem: centralized broker system for distributed processing)
    • Apache ActiveMQ the most popular open source message broker in Java, (problem: lot of bugs and problems)
    • Amazon SQS (Laravel built in support, Hosted - so no administration is required. Not really a work queue thus will require extra work to handle semantics such as burying a job)
    • IronMQ (Laravel built in support, Written in Go, Available both as cloud version and on-premise)
    • Redis (Laravel built in support, not that fast as its not designed for that)
    • Sparrow (written in Ruby that based on memcache)
    • Starling (written in Ruby that based on memcache, built in twitter)
    • Kestrel (just another QM)
    • Kafka (Written at LinkedIn in Scala)
    • EagleMQ open source, high-performance and lightweight queue manager (Written in C)

    More of them can be foun here: http://queues.io

    打赏 评论
  • douzuan2814 2018-05-08 00:54

    Here's a summary of a few options for parallel processing in PHP.

    AMP

    Checkout Amp - Asynchronous concurrency made simple - this looks to be the most mature PHP library I've seen for parallel processing.

    Peec's Process Class

    This class was posted in the comments of PHP's exec() function and provides a real simple starting point for forking new processes and keeping track of them.

    Example:

    // You may use status(), start(), and stop(). notice that start() method gets called automatically one time.
    $process = new Process('ls -al');
    
    // or if you got the pid, however here only the status() metod will work.
    $process = new Process();
    $process.setPid(my_pid);
    
    // Then you can start/stop/check status of the job.
    $process.stop();
    $process.start();
    if ($process.status()) {
        echo "The process is currently running";
    } else {
        echo "The process is not running.";
    }
    

    Other Options Compared

    There's also a great article Async processing or multitasking in PHP that explains the pros and cons of various approaches:

    Doorman

    Then, there's also this simple tutorial which was wrapped up into a little library called Doorman.

    Hope these links provide a useful starting point for more research.

    打赏 评论

相关推荐 更多相似问题