dsqbh42082 2012-04-02 04:05
浏览 41
已采纳

使用PHP在RSS源中查找“类似”文章

There is something I am trying to accomplish although I'm not really sure where to start.

I currently have a MySql database with a list of articles. The DB contains the article title, content, and some other info like dates, etc.

There is an RSS feed that we monitor for new articles, it's a Google Alert feed that just contains the latest news on certain subjects. I want to be able to automatically monitor this feed and record any feed items that are similar to stories currently in our DB.

I know how to set a script to run automatically, and I know how to parse the RSS feed with SimplePie.

What I need to figure out is how to take the description of the rss feed items, run a check on our DB to see if the feed item is similar to something we have in our DB, and return a numerical score of some sort, sort of like a "similarity rating" or something.

After that I can have the info I need recorded to the DB if the "similarity rating" is above a set limit, which I know how to do.

So my only issue is how to compare each feed item to our current articles, and return a score based on how similar it is.

  • 写回答

1条回答 默认 最新

  • doutan6286 2012-04-02 04:20
    关注

    The Levenshtein function (available for both PHP and MySQL) is a good way to handle this. It basically calculates a value based on the number of permutations (replacements, moves, etc) required to convert one string to another. That score would be your "similarity rating".

    EDIT: the Levenshtein function is not available natively in MySQL but there are SQL implementations of it that you can use such as: http://kristiannissen.wordpress.com/2010/07/08/mysql-levenshtein/

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 怎样才能让鼠标沿着线条的中心线轨迹移动
  • ¥60 用visual studio编写程序,利用间接平差求解水准网
  • ¥15 Llama如何调用shell或者Python
  • ¥20 谁能帮我挨个解读这个php语言编的代码什么意思?
  • ¥15 win10权限管理,限制普通用户使用删除功能
  • ¥15 minnio内存占用过大,内存没被回收(Windows环境)
  • ¥65 抖音咸鱼付款链接转码支付宝
  • ¥15 ubuntu22.04上安装ursim-3.15.8.106339遇到的问题
  • ¥15 blast算法(相关搜索:数据库)
  • ¥15 请问有人会紧聚焦相关的matlab知识嘛?