dpyu7978 2010-06-26 17:11
浏览 190
已采纳

按相关性排序图像的算法

I'm developing a feature on a forum site that will allow to include a link and other type of content on a post (for clarifying the question or answer).

Related to the link feature implementation, I have several things to work on:

  1. Validate the URI entered (well formed, valid scheme, etc.)
  2. Validate that the remote resource exists
  3. Extract images from within the remote page
  4. Show to the user the set of images and let him choose one

Here comes the challenge. Previous to step 4, it would be great to sort this set of images in order of 'relevance'. I know that it's a goal quite ambiguous :-) but I can explain what I've gone through with the results given in step 4 and you will know why I'm dealing with this solution.

Many times, I get this kind of things into the set of images:

  • Images used for the layout of the page (tiny and useless)
  • Banners and ads
  • Pseudo-duplication of images (original and resized one)
  • Anarchical order of the set (logo on last position, etc.)

I decide to clean up this mess removing tiny images and sorting them by size, but I know that will be far away from a good solution.

Any ideas on that???

Thank you very much!

  • 写回答

2条回答 默认 最新

  • donglu1913 2010-06-26 20:31
    关注

    You could sort by saturation (which is a good indicator of how interesting an image might be), take a look at the question "Image Classification - Detecting Floor Plans" for a sample implementation.

    The hardest thing is separating image ads from regular images (since they are designed to look very interesting), to do this I suggest one or more of the following possible solutions:

    • ignore images that have standard dimensions of ads
    • query the page twice and ignore the images that change (ads tend to be dynamic)
    • ignore images hosted on external sites (watchout for CDNs!) or specific ad-serving URLs

    To overcome the problem of duplicated images in resolution you could resize them all to a very low resolution (like 8x8 or 4x4) and if two or more images are alike ignore the small(er) one(s).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?
  • ¥15 c++头文件不能识别CDialog
  • ¥15 Excel发现不可读取的内容