duanqiao1949 2011-03-24 15:14
浏览 45
已采纳

用PHP创建自然语言搜索引擎

I'm trying to code up a natural language parser and search engine in PHP. All of the ways that I have thought of thus far have been either cumbersome to implement, use, or not that efficient.

One of my ideas included a script that would perform regular expression on a simplified string, ie. various words removed from the string, and then the resulting string checked first for what the user is looking for - ie, "opening times", then if possible the venue they're searching for - lets say "Derngate". The rest is similar to that.

Can anyone point me in the direction of a more efficient way of doing things? I don't want to be doing 25 different regular expressions - or what ever the count is - per each page load if I can help it.

Many thanks!

Edit: I'm just curious, that's all. I'd rather make my own (to see how it works) rather than jumping into something like Lucene.

  • 写回答

3条回答 默认 最新

  • dongwen2794 2011-08-23 19:02
    关注

    I think that after a review of the state of the art, I'd look at root/stem word extraction as a start. (Not too heavy a task if your document corpus is relatively static, since this can be done at document-capture time.)

    There's a PHP extension for that, stem. http://pecl.php.net/package/stem

    There's the Porter Stemmer implemented in PHP, that's the key operation in the above, implemented as a function.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 HFSS 中的 H 场图与 MATLAB 中绘制的 B1 场 部分对应不上
  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?