doulang9521 2013-05-07 22:44
浏览 103
已采纳

正则表达式不返回任何与所用表达式无关的匹配

Ok here is the issue, I have been trying to build cURL script to check for dead links in a database. The links all look something like this http://www.ltblekinge.se/download/18.9c16a31109c04a3e880003750. The issue that I have is that no mater what regex "pattern" I use the $url_list remains empty. Any help would be appreciated!

Problematic part of Code

<?php
/*Config*/
/*** mysql hostname ***/
$hostname = 'localhost';

/*** mysql username ***/
$username = 'root';

/*** mysql password ***/
$password = 'root';
/*curl setup of varibles*/
$excluded_domains = array(  
'localhost', 'rollnstroll.se');
$max_connections = 10;
$url_list = array();  
$working_urls = array();  
$dead_urls = array();  
$not_found_urls = array();  
$active = null;



try {
$dbh = new PDO("mysql:host=$hostname;dbname=blankett", $username, $password);
$dbh->exec('SET CHARACTER SET utf8');

$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);


/*** fetch into an PDOStatement object ***/
$sql = "SELECT * FROM `forms2`";

$stmt = $dbh->prepare("SELECT * FROM forms2");
$stmt->execute();

while ($d = $stmt->fetchAll()) {

    if (preg_match_all('/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[.\!\/\\w]*))?)/', $d['link_forms'], $matches)) {

/***error code***/
if (preg_last_error() == PREG_NO_ERROR) {
print 'There is no error.';
}
else if (preg_last_error() == PREG_INTERNAL_ERROR) {
print 'There is an internal error!';
}
else if (preg_last_error() == PREG_BACKTRACK_LIMIT_ERROR) {
print 'Backtrack limit was exhausted!';
}
else if (preg_last_error() == PREG_RECURSION_LIMIT_ERROR) {
print 'Recursion limit was exhausted!';
}
else if (preg_last_error() == PREG_BAD_UTF8_ERROR) {
print 'Bad UTF8 error!';
}
else if (preg_last_error() == PREG_BAD_UTF8_ERROR) {
print 'Bad UTF8 offset error!';
}

    foreach ($matches[1] as $url) { 



        // exclude some domains  
        $tmp = parse_url($url);  
        if (in_array($tmp['host'], $excluded_domains)) {  
            continue;  
        }
        // store the url  
        $url_list []= $url; 
    }
   }
}

// remove duplicates  
$url_list = array_values(array_unique($url_list));

if (!$url_list) {  
die('No URL to check');  
}  


}
catch(PDOException $e)
{
echo $e->getMessage();
}

DB Structure

1 id    int(10) No  None    AUTO_INCREMENT
2 master_id     int(10) No  None
3   name_form   varchar(500) latin1_swedish_ci No   None
4   link_form   varchar(500)    latin1_swedish_ci No    None
5   date_added  timestamp   No  CURRENT_TIMESTAMP

Question Why is $url_listemtpy?

  • 写回答

1条回答 默认 最新

  • dongqiu3709 2013-05-07 23:09
    关注

    This works for me:

    $url="http://www.ltblekinge.se/download/18.9c16a31109c04a3e880003750 http://one.com www.two.com http://yourad.io";
    
    preg_match_all('/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[.\!\/\\w]*))?)/', $url, $matches);
    
    print_r($matches[1]);
    

    output:

    Array
    (
        [0] => http://www.ltblekinge.se/download/18.9c16a31109c04a3e880003750
        [1] => http://one.com
        [2] => www.two.com
        [3] => http://yourad.io
    )
    

    Check the contents of your $d['link_forms']

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 求解 yolo算法问题
  • ¥15 虚拟机打包apk出现错误
  • ¥30 最小化遗憾贪心算法上界
  • ¥15 用visual studi code完成html页面
  • ¥15 聚类分析或者python进行数据分析
  • ¥15 逻辑谓词和消解原理的运用
  • ¥15 三菱伺服电机按启动按钮有使能但不动作
  • ¥15 js,页面2返回页面1时定位进入的设备
  • ¥50 导入文件到网吧的电脑并且在重启之后不会被恢复
  • ¥15 (希望可以解决问题)ma和mb文件无法正常打开,打开后是空白,但是有正常内存占用,但可以在打开Maya应用程序后打开场景ma和mb格式。