doucuodan0897 2011-04-12 22:33
浏览 38
已采纳

用于在许多URL中查找源代码中的字符串的代码

I want to enter a very long list of urls and search for specific strings within the source code, outputting a list of urls that contain the string. Sounds simple enough right? I have come up with the bellow code, the input being a html form. You can try it at pelican-cement.com/findfrog.

It seems to work half the time, but is thrown off by multiple urls/urls in different orders. Searching for 'adsense' it correctly ids politics1.com out of

cnn.com
politics1.com

however, if reversed the output is blank. How can I get reliable, consistent results? preferably something I could input thousands of urls into?

<html>
<body>

<?
set_time_limit (0);

$urls=explode("
", $_POST['url']);

$allurls=count($urls);

for ( $counter = 0; $counter <= $allurls; $counter++) {

 $ch = curl_init();
 curl_setopt($ch, CURLOPT_URL,$urls[$counter]);
 curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 curl_setopt($ch, CURLOPT_CUSTOMREQUEST,'GET');
 curl_setopt ($ch, CURLOPT_HEADER, 1); 
 curl_exec ($ch); 
 $curl_scraped_page=curl_exec($ch); 

$haystack=strtolower($curl_scraped_page);
$needle=$_POST['proxy'];
if (strlen(strstr($haystack,$needle))>0) {

echo $urls[$counter];
echo "<br/>";
curl_close($ch);
}
}




//$FileNameSQL = "/googleresearch" .  abs(rand(0,1000000000000000))  .  ".csv";
//$query = "SELECT * FROM happyturtle INTO OUTFILE '$FileNameSQL' FIELDS TERMINATED BY ','";
//$result = mysql_query($query) or die(mysql_error());

//exit;

echo '$FileNameSQL';





?>

</body>
</html>
  • 写回答

4条回答 默认 最新

  • drbfxb977777 2011-04-12 22:55
    关注

    Reorganized your code a bit. The main culprit was whitespace. You need to trim your URL string before using it (i.e. trim($url);).

    Other changes:

    • Set your search term outside the for loop, since it never changes.
    • Setup the curl object outside the loop and reuse it by just changing the URL each time.
    • Use curl_setopt_array() to set multiple curl options in one statement.
    • Use a foreach loop, since you're iterating over the entire array anyway and the code is cleaner.
    • Using stripos() is more efficient than strstr() and is case-insensitive anyway.
    • Use the !== comparator to prevent implied typecasting (FALSE !== 0, but FALSE == 0).
    • Check the returned $html string as curl_exec() can return FALSE if it fails.
    • Close the curl object at the end (i.e. outside the if statement too).

    The code below can be run on my quick mockup.

    <html>
    <body>
    
    <form action="search.php" method="post"> 
      URLs: <br/>
      <textarea rows="20" cols="50" input type="text" name="url" /></textarea><br/>
    
      Search Term: <br/>
      <textarea rows="20" cols="50" input type="text" name="proxy" /></textarea><br/>
    
      <input type="submit" /> 
    </form>
    
    <?
      if(isset($_POST['url'])) {
        set_time_limit (0);
    
        $urls = explode("
    ", $_POST['url']);
        $term = $_POST['proxy'];
        $options = array( CURLOPT_FOLLOWLOCATION => 1,
                          CURLOPT_RETURNTRANSFER => 1,
                          CURLOPT_CUSTOMREQUEST  => 'GET',
                          CURLOPT_HEADER         => 1,
                          );
        $ch = curl_init();
        curl_setopt_array($ch, $options);
    
        foreach ($urls as $url) {
          curl_setopt($ch, CURLOPT_URL, trim($url));
          $html = curl_exec($ch);
    
          if ($html !== FALSE && stristr($html, $term) !== FALSE) { // Found!
            echo $url;
          }
        }
    
        curl_close($ch);
      }
    ?>
    
    </body>
    </html>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 乌班图ip地址配置及远程SSH
  • ¥15 怎么让点阵屏显示静态爱心,用keiluVision5写出让点阵屏显示静态爱心的代码,越快越好
  • ¥15 PSPICE制作一个加法器
  • ¥15 javaweb项目无法正常跳转
  • ¥15 VMBox虚拟机无法访问
  • ¥15 skd显示找不到头文件
  • ¥15 机器视觉中图片中长度与真实长度的关系
  • ¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
  • ¥15 R语言卸载之后无法重装,显示电脑存在下载某些较大二进制文件行为,怎么办
  • ¥15 java 的protected权限 ,问题在注释里