dongpo5264 2014-01-26 17:58
浏览 32

根据字符串相似性获取最长的公共子字符串

I have a table with a column that includes names like:

  1. Home Improvement Guide
  2. Home Improvement Advice
  3. Home Improvement Costs
  4. Home Gardening Tips

I would like the result to be:

  1. Home Improvement
  2. Home Gardening Tips

Based on a search for the word 'Home'.

This can be accomplished in MySQL or PHP or a combination of the two. I have been pulling my hair out trying to figure this out, any help in the right directly would be greatly appreciated. Thanks.

Edit / Problem kinda solved:

I think this problem can be solved much easier by changing the logic a little. For anyone else with this problem, here is my solution.

  1. Get the sql results
  2. Find the first occurrence of the searched word, one string at a time, and get the next word in the string to the right of it.
  3. The results would include the searched word concatenated with the distinct adjoining word.

Not as good of a solution, but it works for my project. Thanks for the help everyone.

  • 写回答

1条回答 默认 最新

  • dongzhanlu0658 2014-01-26 18:29
    关注

    This is too long for a comment.

    I don't think that Levenshtein distance does what you want. Consider:

    Home Improvement
    Home Improvement Advice on Kitchen Remodeling
    Home Gardening
    

    The first and third are closer by the Levenshtein measure than the first and third. And yet, I'm guessing that you want the first and second to be paired.

    I have an idea of the algorithm you want. Something like this:

    • Compare every returned string to every other string
    • Measure the length of the initial overlap
    • Find the maximum over all the strings strings, pair those
    • Repeat the process with the second largest overlap and so on

    Painful, but not impossible to implement in SQL. Maybe very painful.

    What this suggests to me is that you are looking for a hierarchy among the products. My suggestion is to just include a category column and return the category. You may need to manually insert the categories into your data.

    评论

报告相同问题?

悬赏问题

  • ¥15 微信会员卡接入微信支付商户号收款
  • ¥15 如何获取烟草零售终端数据
  • ¥15 数学建模招标中位数问题
  • ¥15 phython路径名过长报错 不知道什么问题
  • ¥15 深度学习中模型转换该怎么实现
  • ¥15 HLs设计手写数字识别程序编译通不过
  • ¥15 Stata外部命令安装问题求帮助!
  • ¥15 从键盘随机输入A-H中的一串字符串,用七段数码管方法进行绘制。提交代码及运行截图。
  • ¥15 TYPCE母转母,插入认方向
  • ¥15 如何用python向钉钉机器人发送可以放大的图片?