doushi9780 2015-09-03 16:01
浏览 172
已采纳

如何从浏览器中读取.vcf文件?

I am trying to retrieve all the email addresses from the exhibitors of the IFA Berlin. This is pretty easy to crawl though.

But as a tricky part, they just allow us to download a .vcf file or to send an email (throught their server I guess). I would like to find that email address without downloading that vcf file. Otherwise I could download it and read it easily using PHP (since my crawler is also in PHP).

This is also my first question here after lurking for years! Nice meeting you guys.

  • 写回答

1条回答 默认 最新

  • donglan9517 2015-09-03 16:56
    关注

    How to read .vcf file from browser?

    This file will always be a file download and never displayed in a browser. One way to make it work is to setup a custom browser extension, which temporary stores the file and parses the microformat and displays the information.

    PHP scraping approach

    There are vcard parsers out there: https://github.com/nuovo/vCard-parser but i think you could base this on a RegExp solution: /EMAIL;INTERNET:(.*)/.

    Let's pretend, your first scraping run gives you a list of attendee IDs, then your second (vcard) scraping run could fetch and extract the name and emails by ID:

    <?php
    
    function getVcard($id) {
        return file_get_contents('http://www.virtualmarket.ifa-berlin.de/?Action=attendeeVcard&id=' . $id);
    }
    
    function getEmailFromVcard($vcard)
    {
        preg_match('/EMAIL;INTERNET:(.*)/', $vcard, $matches);
        if(isset($matches[1])) {
            return $matches[1];
        }
    }
    
    function getNameFromVcard($vcard)
    {
        preg_match('/N:(.*);;/', $vcard, $matches);
        if(isset($matches[1])) {
            $array = explode(';', $matches[1]);
            $name = trim($array[1]) . ' ' . trim($array[0]);
            return $name;
        }
    }
    
    $id = 1775586;
    
    $vcard = getVcard($id);
    $email = getEmailFromVcard($vcard);
    $name = getNameFromVcard($vcard);
    
    echo $name . ' ' . $email;
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 无线电能传输系统MATLAB仿真问题
  • ¥50 如何用脚本实现输入法的热键设置
  • ¥20 我想使用一些网络协议或者部分协议也行,主要想实现类似于traceroute的一定步长内的路由拓扑功能
  • ¥30 深度学习,前后端连接
  • ¥15 孟德尔随机化结果不一致
  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀
  • ¥20 手写数字识别运行c仿真时,程序报错错误代码sim211-100
  • ¥15 关于#hadoop#的问题