duanlian1960 2013-03-08 21:39
浏览 101
已采纳

如何在php中比较2个字符串的部分

Good evening,

I am facing a small problem whilst trying to build a little search algorithm.

I have a database table containing video game names and software names. Now I would like to add new offers by fetching and parsing xml files on other servers. The issue is:

How can I compare the strings for the product name so it works even if the offer name doesn't match the product name stored in my database up to a 100%?

As an example I am currently using this PHP + SQL code to compare the strings:

$query_GID = "select ID,game from gkn_catalog where game like '%$batch_name%' or meta like '%$batch_name%' ";

I am currently using the like operator in conjunction with two wild-cards to compare the offer name (batch_name) with the name in the database (game).


I would like to know how I can improve on this as this method isn't very failsafe or whatever you want to call it, what happens is:

If the database says the game title is:

Deus Ex Human Revolution Missing Link

and the batch_name says:

Deus Ex Human Revolution Missing Link DLC

the result will be empty/wrong/false ... well it won't find the game in my database at all.

Same goes for something like this:

Database = Lego Star Wars The Complete Saga
batch_name = Lego Star Wars : The Complete Saga
Result: False

Is there a better way to do the SQL query?
Or how can I try to get that query working so it can deal with strings that come with special characters (like -minus- & [brackets])
and or characters which aren't included in the names within the database (like DLC, CE...)?

  • 写回答

2条回答 默认 最新

  • duanmuybrtg1231 2013-03-08 21:59
    关注

    You're looking for fuzzy search algorithms and fuzzy search results. This is a whole field of study. However, there are also some straightforward tutorials to get you started if you take a quick google around.

    You might be tempted to try something like PHP's wonderful levenshtein method, which calculates the "closeness" of two strings. However, this would require matching it against every record. If there will be thousands of records, that's out of the question.

    MySQL has some matching tools which may help. I see that as I'm writing this, somebody has already mentioned FULLTEXT and MATCH() in the comments. Those are a great way to go.

    There are a few other good solutions to look into as well. Storing an index of keywords (with all the articles and helpers like of/the/an/am/is/are/was/of/from removed) and then searching on each word in the search is a simple solution. However, it doesn't produce great results in that the returned values are not weighted well, and it doesn't localize at all.

    There are lots of cheap and wonderful third party search tools (Lucene comes to mind) as well that will do most of this work for you. You just call an API and they manage the caching, keywords, indexing, fuzzying, et al for searches.

    Here are some SO questions that are related to fuzzy searches, which will help you find more terminology and ideas:

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么
  • ¥15 banner广告展示设置多少时间不怎么会消耗用户价值
  • ¥16 mybatis的代理对象无法通过@Autowired装填
  • ¥15 可见光定位matlab仿真
  • ¥15 arduino 四自由度机械臂
  • ¥15 wordpress 产品图片 GIF 没法显示
  • ¥15 求三国群英传pl国战时间的修改方法
  • ¥15 matlab代码代写,需写出详细代码,代价私
  • ¥15 ROS系统搭建请教(跨境电商用途)