douzhuo1858 2016-01-24 12:07
浏览 89

加载外部XML文件并在1次调用中获取html标头信息

I have a php file which grabs an xml file from another site, it then chucks that information into my database.

The problem I am having is that their site only allows 360 requests in any 1 hour period, so am trying to code it to check the header information whilst grabbing the file.

I have it checking the status of the page using

$requesttest = 'http://www.footballwebpages.co.uk/teams.xml';
if($requesttest == NULL) return false;  
$ch = curl_init($requesttest);  
curl_setopt($ch, CURLOPT_TIMEOUT, 5);  
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);  
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);  
$data = curl_exec($ch);  
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);  
curl_close($ch); 

if($httpcode == 429){
    return 'Try again later, too many requests recieved.';
} else if($httpcode>=200 && $httpcode<300){
    /* run code to grab xml file */
    $comps = array (    0 => 1, /* Premier_League */
                    1 => 2 /* Championship */ 
                    );
    $comps_total = count($comps);
    $comps_no = 0;

    while ($comps_no < $comps_total) {
        $url = 'http://www.footballwebpages.co.uk/teams.xml?comp=' . $comps[$comps_no];
        $full_list = simplexml_load_file($url);
        /* Code for grabbing and storing info from XML */
} else {
    return 'Football Web Pages Offline';
}

At the moment, it checks the main 'teams' page to see if the requests limit has been reached, and then grabs each xml for the competitions set. The issue is that if when on first check, there is only 1 request available, when it gets to the next stage, it will fail. How can I check the header info when loading the xml file, without having to call the page to check the header, then call the page to grab the xml file?

Basically load the xml file if the header code is between 200 and 300 in 1 call, so as not to waste 2 requests to grab 1 xml page.

  • 写回答

1条回答 默认 最新

  • douju2014 2016-01-24 13:02
    关注

    You could perhaps employ a method similar to the following, forget the first call to the base url as it is redundant and instead use the return value from the function to determine if further processing should be done:

    <?php
        /* utility function to get data and return an object */
        function getxml( $comp=1 ){
            global $ch;
            global $url;
    
            curl_setopt( $ch, CURLOPT_URL, $url . '?comp=' . $comp );
            $data = curl_exec( $ch );
            $status = curl_getinfo( $ch, CURLINFO_HTTP_CODE ); 
    
            return (object)array(
                'xmldata'   =>  $data,
                'status'    =>  $status
            );
        }
        /* All the comps available - more than specified! */
        $comps=array( 
            'Barclays_Premier_League' => 1,
            'Sky_Bet_Championship' => 2,
            'Sky_Bet_League_One' => 3,
            'Sky_Bet_League_Two' => 4,
            'National_League' => 5,
            'National_League_North' => 6,
            'National_League_South' => 7,
            'Evo-Stik_Southern_League_Premier_Division' => 8,
            'Evo-Stik_Southern_League_Division_One_Central' => 9,
            'Evo-Stik_Southern_League_Division_One_South_&_West' => 10,
            'Ryman_League_Premier_Division' => 11,
            'Ryman_League_Division_One_North' => 12,
            'Ryman_League_Division_One_South' => 13,
            'Evo-Stik_League_Premier_Division' => 14,
            'Evo-Stik_League_Division_One_North' => 15,
            'Evo-Stik_League_Division_One_South' => 16,
            'Scottish_Premiership' => 17,
            'Scottish_Championship' => 18,
            'Scottish_League_One' => 19,
            'Scottish_League_Two' => 20
        );
        /* only interested in first two */
        $comps=array_slice( $comps, 0, 2, true );
    
    
        /* I don't use simple_xml() - used to process xml data */
        $dom=new DOMDocument;
    
        /* base url */
        $url= 'http://www.footballwebpages.co.uk/teams.xml';
    
        /* 
            initialise curl request object but 
            set the url for each $comp in the function 
        */
        $ch = curl_init();
        curl_setopt( $ch, CURLOPT_TIMEOUT, 5 );  
        curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, 5 );  
        curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );   
    
        /* 
        If there have been too many requests when launching 
        the 429 condition should break out of the entire loop -
        thus using only 1 request
        */
        foreach( $comps as $key => $comp ){
            $xml=getxml( $comp );
            switch( $xml->status ){
                case 429: echo 'Try again later, too many requests recieved.'; break 2;
                case 200:
                    /* if everything is ok, process $xml */
                    $dom->loadXML( $xml->xmldata );
    
    
                    /* example of processing xml data */
                    echo '
                    <h1>'.$dom->getElementsByTagName('competition')->item(0)->nodeValue.'</h1>
                        <ul>';
    
                    $col=$dom->getElementsByTagName('team');
                    if( $col ){
                        foreach( $col as $team ) echo '<li>'.$team->childNodes->item(1)->nodeValue.', '.$team->childNodes->item(3)->nodeValue.'</li>';
                    }
                    echo '
                        </ul>';
                break;
                default:/* If no response or an unknown response exit */
                    echo 'Football Web Pages Offline';
                break 2;
            }
        }
    
        curl_close( $ch ); 
        $dom=$ch=$comps=null;
    ?>
    
    评论

报告相同问题?

悬赏问题

  • ¥15 目详情-五一模拟赛详情页
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?
  • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line