duanqiao1949 2011-03-24 15:14
浏览 45
已采纳

用PHP创建自然语言搜索引擎

I'm trying to code up a natural language parser and search engine in PHP. All of the ways that I have thought of thus far have been either cumbersome to implement, use, or not that efficient.

One of my ideas included a script that would perform regular expression on a simplified string, ie. various words removed from the string, and then the resulting string checked first for what the user is looking for - ie, "opening times", then if possible the venue they're searching for - lets say "Derngate". The rest is similar to that.

Can anyone point me in the direction of a more efficient way of doing things? I don't want to be doing 25 different regular expressions - or what ever the count is - per each page load if I can help it.

Many thanks!

Edit: I'm just curious, that's all. I'd rather make my own (to see how it works) rather than jumping into something like Lucene.

  • 写回答

3条回答 默认 最新

  • dongwen2794 2011-08-23 19:02
    关注

    I think that after a review of the state of the art, I'd look at root/stem word extraction as a start. (Not too heavy a task if your document corpus is relatively static, since this can be done at document-capture time.)

    There's a PHP extension for that, stem. http://pecl.php.net/package/stem

    There's the Porter Stemmer implemented in PHP, that's the key operation in the above, implemented as a function.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 关于大棚监测的pcb板设计
  • ¥15 stm32开发clion时遇到的编译问题
  • ¥15 lna设计 源简并电感型共源放大器
  • ¥15 如何用Labview在myRIO上做LCD显示?(语言-开发语言)
  • ¥15 Vue3地图和异步函数使用
  • ¥15 C++ yoloV5改写遇到的问题