weixin_33739541 2016-03-10 08:33 采纳率: 0%
浏览 23

从网址获取元数据

I have used Jsoup library to fetch the metadata from url.

Document doc = Jsoup.connect("http://www.google.com").get();  
String keywords = doc.select("meta[name=keywords]").first().attr("content");  
System.out.println("Meta keyword : " + keywords);  
String description = doc.select("meta[name=description]").get(0).attr("content");  
Elements images = doc.select("img[src~=(?i)\\.(png|jpe?g|gif)]");  

String src = images.get(0).attr("src");
System.out.println("Meta description : " + description); 
System.out.println("Meta image URl : " + src);

But I want to do it in client side using javascript

  • 写回答

1条回答 默认 最新

  • 10.24 2016-03-10 08:57
    关注

    You can't do it client only because of the cross-origin issue. You need a server side script to get the content of the page.

    OR You can use YQL. In this way, the YQL will used as proxy. https://policies.yahoo.com/us/en/yahoo/terms/product-atos/yql/index.htm

    Or you can use https://cors-anywhere.herokuapp.com. In this way, cors-anywhere will used as proxy:

    For example:

    $('button').click(function() {
      $.ajax({
        url: 'https://cors-anywhere.herokuapp.com/' + $('input').val()
      }).then(function(data) {
        var html = $(data);
    
        $('#kw').html(getMetaContent(html, 'description') || 'no keywords found');
        $('#des').html(getMetaContent(html, 'keywords') || 'no description found');
        $('#img').html(html.find('img').attr('src') || 'no image found');
      });
    });
    
    function getMetaContent(html, name) {
      return html.filter(
      (index, tag) => tag && tag.name && tag.name == name).attr('content');
    }
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
    
    <input type="text" placeholder="Type URL here" value="http://www.html5rocks.com/en/tutorials/cors/" />
    <button>Get Meta Data</button>
    
    <pre>
      <div>Meta Keyword: <div id="kw"></div></div>
      <div>Description: <div id="des"></div></div>
      <div>image: <div id="img"></div></div>
    </pre>

    </div>
    
    评论

报告相同问题?

悬赏问题

  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测