2012-04-02 04:05
浏览 41


There is something I am trying to accomplish although I'm not really sure where to start.

I currently have a MySql database with a list of articles. The DB contains the article title, content, and some other info like dates, etc.

There is an RSS feed that we monitor for new articles, it's a Google Alert feed that just contains the latest news on certain subjects. I want to be able to automatically monitor this feed and record any feed items that are similar to stories currently in our DB.

I know how to set a script to run automatically, and I know how to parse the RSS feed with SimplePie.

What I need to figure out is how to take the description of the rss feed items, run a check on our DB to see if the feed item is similar to something we have in our DB, and return a numerical score of some sort, sort of like a "similarity rating" or something.

After that I can have the info I need recorded to the DB if the "similarity rating" is above a set limit, which I know how to do.

So my only issue is how to compare each feed item to our current articles, and return a score based on how similar it is.

图片转代码服务由CSDN问答提供 功能建议

虽然我不确定从哪里开始,但我想要完成一些事情。 \ n

我目前有一个带有文章列表的MySql数据库。 数据库包含文章标题,内容和其他一些信息,如日期等。

我们监控新文章的RSS源,它是一个仅包含 某些科目的最新消息。 我希望能够自动监控此Feed并记录与我们数据库中当前故事类似的任何Feed项。

我知道如何设置脚本自动运行,我知道 如何使用SimplePie解析RSS提要。

我需要弄清楚如何获取rss提要项的描述,对我们的数据库运行检查以查看提要项 类似于我们在数据库中的东西,并返回某种类型的数字分数,有点像“相似性评级”或类似的东西。

之后我可以得到我需要的信息 如果“相似性评级”高于设定限制,则记录到数据库,我知道该怎么做。

所以我唯一的问题是如何将每个Feed项目与我们当前的文章进行比较, 并根据它的相似程度返回一个分数。

  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • doutan6286 2012-04-02 04:20

    The Levenshtein function (available for both PHP and MySQL) is a good way to handle this. It basically calculates a value based on the number of permutations (replacements, moves, etc) required to convert one string to another. That score would be your "similarity rating".

    EDIT: the Levenshtein function is not available natively in MySQL but there are SQL implementations of it that you can use such as: http://kristiannissen.wordpress.com/2010/07/08/mysql-levenshtein/

    打赏 评论

相关推荐 更多相似问题