dongshuql24533 2016-08-09 06:06
浏览 115

PHP多cURL性能比顺序file_get_contents差

I am writing an interface in which I must launch 4 http requests to get some infomation.

I implemented the interface in 2 ways:

  1. using sequential file_get_contents.
  2. using multi curl.

I have benchmarked the 2 versions with jmeter. The result shows that multi curl is much better than sequential file_get_contents when there's only 1 thread in jmeter making requests, but much worse when 100 threads.

The question is: which could bring the bad performance of multi curl?

My multi curl code is as below:

$curl_handle_arr = array ();
$master = curl_multi_init();
foreach ( $call_url_arr as $key => $url )
{
    $curl_handle = curl_init( $url );
    $curl_handle_arr [$key] = $curl_handle;
    curl_setopt( $curl_handle , CURLOPT_RETURNTRANSFER , true );
    curl_setopt( $curl_handle , CURLOPT_POST , true );
    curl_setopt( $curl_handle , CURLOPT_POSTFIELDS , http_build_query( $params_arr [$key] ) );
    curl_multi_add_handle( $master , $curl_handle );
}
$running = null;
$mrc = null;
do
{
    $mrc = curl_multi_exec( $master , $running );
}
while ( $mrc == CURLM_CALL_MULTI_PERFORM );
while ( $running && $mrc == CURLM_OK )
{
    if (curl_multi_select( $master ) != - 1)
    {
        do
        {
            $mrc = curl_multi_exec( $master , $running );
        }
        while ( $mrc == CURLM_CALL_MULTI_PERFORM );
    }
}
foreach ( $call_url_arr as $key => $url )
{
    $curl_handle = $curl_handle_arr [$key];
    if (curl_error( $curl_handle ) == '')
    {
        $result_str_arr [$key] = curl_multi_getcontent( $curl_handle );
    }
    curl_multi_remove_handle( $master , $curl_handle );
}
curl_multi_close( $master );
  • 写回答

1条回答 默认 最新

  • dounayan3643 2016-08-10 01:39
    关注

    1. Simple optimization

    • You should sleep about 2500 microseconds if curl_multi_select failed.
      Actually, it defintely fails sometimes for each execution.
      Without sleeping, your CPU resources get occupied by lots of while (true) { } loops.
    • If you do nothing after some (not all) of the requests have finished,
      you should let maximum timeout seconds larger.
    • Your code is written for old libcurls. As of libcurl version 7.2,
      the state CURLM_CALL_MULTI_PERFORM does not appear anymore.

    So, the following code

    $running = null;
    $mrc = null;
    do
    {
        $mrc = curl_multi_exec( $master , $running );
    }
    while ( $mrc == CURLM_CALL_MULTI_PERFORM );
    while ( $running && $mrc == CURLM_OK )
    {
        if (curl_multi_select( $master ) != - 1)
        {
            do
            {
                $mrc = curl_multi_exec( $master , $running );
            }
            while ( $mrc == CURLM_CALL_MULTI_PERFORM );
        }
    }
    

    should be

    curl_multi_exec($master, $running);
    do
    {
        if (curl_multi_select($master, 99) === -1)
        {
            usleep(2500);
            continue;
        }
        curl_multi_exec($master, $running);
    } while ($running);
    

    Note

    The timeout value of curl_multi_select should be tuned only if you want to do something like...

    curl_multi_exec($master, $running);
    do
    {
        if (curl_multi_select($master, $TIMEOUT) === -1)
        {
            usleep(2500);
            continue;
        }
        curl_multi_exec($master, $running);
        while ($info = curl_multi_info_read($master))
        {
            /* Do something with $info */
        }
    } while ($running);
    

    Otherwise, the value should be extreamly large.
    (However, PHP_INT_MAX is too large; libcurl treats it as an invalid value.)

    2. Easy experiment in one PHP process

    I tested using my parallel cURL executor library: mpyw/co

    (The prep. for is improper and it should be by, sorry for my poor English xD)

    <?php 
    
    require 'vendor/autoload.php';
    
    use mpyw\Co\Co;
    
    function four_sequencial_requests_for_one_hundread_people()
    {
        for ($i = 0; $i < 100; ++$i) {
            $tasks[] = function () use ($i) {
                $ch = curl_init();
                curl_setopt_array($ch, [
                    CURLOPT_URL => 'example.com',
                    CURLOPT_FORBID_REUSE => true,
                    CURLOPT_RETURNTRANSFER => true,
                ]);
                for ($j = 0; $j < 4; ++$j) {
                    yield $ch;
                }
            };
        }
        $start = microtime(true);
        yield $tasks;
        $end = microtime(true);
        printf("Time of %s: %.2f sec
    ", __FUNCTION__, $end - $start);
    }
    
    function requests_for_four_hundreds_people()
    {
        for ($i = 0; $i < 400; ++$i) {
            $tasks[] = function () use ($i) {
                $ch = curl_init();
                curl_setopt_array($ch, [
                    CURLOPT_URL => 'example.com',
                    CURLOPT_FORBID_REUSE => true,
                    CURLOPT_RETURNTRANSFER => true,
                ]);
                yield $ch;
            };
        }
        $start = microtime(true);
        yield $tasks;
        $end = microtime(true);
        printf("Time of %s: %.2f sec
    ", __FUNCTION__, $end - $start);
    }
    
    Co::wait(four_sequencial_requests_for_one_hundread_people(), [
        'concurrency' => 0, // Zero means unlimited
    ]);
    
    Co::wait(requests_for_four_hundreds_people(), [
        'concurrency' => 0, // Zero means unlimited
    ]);
    

    I tried for five times to get the following results:

    enter image description here

    I also tried in reverse order (The 3rd request was kicked xD):

    enter image description here

    These results represent too many concurrent TCP connections actually decrease throughputs.

    3. Advanced optimization

    3-A. For different destinations

    If you want to optimize for both few and many concurrent requests, the following dirty solution may help you.

    1. Share the number of requesters using apcu_add / apcu_fetch / apcu_delete.
    2. Switch methods(sequencial or parallel) by current value.

    3-B. For the same destinations

    CURLMOPT_PIPELINING will help you. This option bundles all HTTP/1.1 connections for the same destination into one TCP connection.

    curl_multi_setopt($master, CURLMOPT_PIPELINING, 1);
    
    评论

报告相同问题?

悬赏问题

  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么
  • ¥15 banner广告展示设置多少时间不怎么会消耗用户价值
  • ¥16 mybatis的代理对象无法通过@Autowired装填
  • ¥15 可见光定位matlab仿真
  • ¥15 arduino 四自由度机械臂
  • ¥15 wordpress 产品图片 GIF 没法显示
  • ¥15 求三国群英传pl国战时间的修改方法
  • ¥15 matlab代码代写,需写出详细代码,代价私