编程介的小学生 2016-11-30 03:39 采纳率: 20.5%

已采纳

Searching the Web

Description

The word "search engine" may not be strange to you. Generally speaking, a search engine searches the web pages available in the Internet, extracts and organizes the information and responds to users' queries with the most relevant pages. World famous search engines, like GOOGLE, have become very important tools for us to use when we visit the web. Such conversations are now common in our daily life:
"What does the word like ****** mean?"
"Um... I am not sure, just google it."

In this problem, you are required to construct a small search engine. Sounds impossible, does it? Don't worry, here is a tutorial teaching you how to organize large collection of texts efficiently and respond to queries quickly step by step. You don't need to worry about the fetching process of web pages, all the web pages are provided to you in text format as the input data. Besides, a lot of queries are also provided to validate your system.
Modern search engines use a technique called inversion for dealing with very large sets of documents. The method relies on the construction of a data structure, called an inverted index,which associates terms (words) to their occurrences in the collection of documents. The set of terms of interest is called the vocabulary, denoted as V. In its simplest form, an inverted index is a dictionary where each search key is a term ω∈V. The associated value b(ω) is a pointer to an additional intermediate data structure, called a bucket. The bucket associated with a certain term ω is essentially a list of pointers marking all the occurrences of ω in the text collection. Each entry in each bucket simply consists of the document identifier (DID), the ordinal number of the document within the collection and the ordinal line number of the term's occurrence within the document.
Let's take Figure-1 for an example, which describes the general structure. Assuming that we only have three documents to handle, shown at the right part in Figure-1; first we need to tokenize the text for words (blank, punctuations and other non-alphabetic characters are used to separate words) and construct our vocabulary from terms occurring in the documents. For simplicity, we don't need to consider any phrases, only a single word as a term. Furthermore, the terms are case-insensitive (e.g. we consider "book" and "Book" to be the same term) and we don't consider any morphological variants (e.g. we consider "books" and "book", "protected" and "protect" to be different terms) and hyphenated words (e.g. "middle-class" is not a single term, but separated into 2 terms "middle" and "class" by the hyphen). The vocabulary is shown at the left part in Figure-1.Each term of the vocabulary has a pointer to its bucket. The collection of the buckets is shown at the middle part in Figure-1. Each item in a bucket records the DID of the term's occurrence.
After constructing the whole inverted index structure, we may apply it to the queries. The query is in any of the following formats:
term
term AND term
term OR term
NOT term
A single term can be combined by Boolean operators: AND, OR and NOT ("term1 AND term2" means to query the documents including term1 and term2; "term1 OR term2" means to query the documents including term1 or term2; "NOT term1" means to query the documents not including term1). Terms are single words as defined above. You are guaranteed that no non-alphabetic characters appear in a term, and all the terms are in lowercase. Furthermore, some meaningless stop words (common words such as articles, prepositions, and adverbs, specified to be "the, a, to, and, or, not" in our problem) will not appear in the query, either.
For each query, the engine based on the constructed inverted index searches the term in the vocabulary, compares the terms' bucket information, and then gives the result to user. Now can you construct the engine?

Input

The input starts with integer N (0 < N < 100) representing N documents provided. Then the next N sections are N documents. Each section contains the document content and ends with a single line of ten asterisks.

You may assume that each line contains no more than 80 characters and the total number of lines in the N documents will not exceed 1500.
Next, integer M (0 < M <= 50000) is given representing the number of queries, followed by M lines, each query in one line. All the queries correspond to the format described above.
Output

For each query, you need to find the document satisfying the query, and output just the lines within the documents that include the search term (For a NOT query, you need to output the whole document). You should print the lines in the same order as they appear in the input. Separate different documents with a single line of 10 dashes.

If no documents matching the query are found, just output a single line: "Sorry, I found nothing."

The output of each query ends with a single line of 10 equal signs.

Sample Input

4
A manufacturer, importer, or seller of
digital media devices may not (1) sell,
or offer for sale, in interstate commerce,
or (2) cause to be transported in, or in a
manner affecting, interstate commerce,
a digital media device unless the device
includes and utilizes standard security
technologies that adhere to the security
system standards.

Of course, Lisa did not necessarily
intend to read his books. She might
want the computer only to write her
midterm. But Dan knew she came from
a middle-class family and could hardly
afford the tuition, let alone her reading
fees. Books might be the only way she
could graduate

Research in analysis (i.e., the evaluation
of the strengths and weaknesses of
computer system) is essential to the
development of effective security, both
for works protected by copyright law
and for information in general. Such
research can progress only through the
open publication and exchange of
complete scientific results

I am very very very happy!
What about you?

6
computer
books AND computer
books OR protected
NOT security
very
slick
Sample Output

want the computer only to write her

computer system) is essential to the

intend to read his books. She might
want the computer only to write her

fees. Books might be the only way she

intend to read his books. She might

fees. Books might be the only way she

for works protected by copyright law

could graduate

I am very very very happy!

What about you?

I am very very very happy!

Sorry, I found nothing.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
threenewbee 2016-11-30 04:20
关注
http://www.bubuko.com/infodetail-1570468.html

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

Searching the Web chrome git
2016-11-30 03:39

回答 2 已采纳 http://www.bubuko.com/infodetail-1570468.html
Searching the String
2017-04-11 07:35

回答 2 已采纳 http://blog.csdn.net/as604412059/article/details/51457200
Searching Quickly 算法
2017-09-23 01:08

回答 1 已采纳 http://www.zgxue.com/itwk/00125/001252896.html
UVA1597 在Web中搜索 Searching the Web
2022-01-19 13:09

天下第一行书的博客这道题因为出现在第五章，所以是不需要什么算法的，就是使用stl乱搞就行了，虽然stl的效率很低，但是还是能过的，我的思路就是两个映射，一个是映射一篇文章里面的所有的单词，为了去重，所以用set，第二个是每篇...
Matrix Searching
2017-04-11 08:48

回答 1 已采纳 http://blog.csdn.net/yu_ch_sh/article/details/47366265
uni-app 里面的蓝牙api 就是我打包成app 然后在手机上搜索蓝牙设备，一直搜索不到，但是我真机调试的时候很快就能搜索的到，请问这改如何解决呢，是发行需要点击勾选哪几个模块吗前端
2021-09-22 22:26

回答 2 已采纳你需要设置权限，没有权限是不可以的；所以运行不成功。
Knights of the Round Table
2017-10-12 16:17

回答 1 已采纳 http://blog.csdn.net/jhgkjhg_ugtdk77/article/details/47429725
【绘制“函数图”——html实现，Web前端校招面试经验汇总
2024-04-18 19:09

2301_82244392的博客因此收集整理了一份《2024年最新Web前端全套学习资料》，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友。既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶...
The Lost House
2017-10-05 09:03

回答 1 已采纳 http://blog.csdn.net/challengerrumble/article/details/50949288
web前端开发工作标准_找到下一个Web开发工作的20个地方
2020-08-24 22:35

culi3182的博客 web前端开发工作描述As the world economy continues to look shakier by the day, and major corporations have begun to trim work forces, it might not be such a bad idea to start looking for a new job....
mff开发者访谈_我的第一次前端Web开发人员访谈中的关键要点
2020-08-01 16:38

cumifi2519的博客 mff开发者访谈帮助您加快前端Web开发人员面试速度的技巧 (Tips to help you ace your front-end web developer interview) I’ve been teaching myself front-end web development for a little more than two ...
浅谈web语义化
2017-09-04 00:30

LiuJin1012的博客 1.web语义化是什么 HTML5标准出来的时候，我曾经诧异为什么要定义这么多header footer nav article标准，DIV不挻好的嘛，方便开发人员记忆啊。但当页面开发完，面对所有都是DIV标签的网页，就会混乱了，到底...
web前端编程编程_完善Web编程的6个步骤
2020-07-18 10:18

cunchi8090的博客 web前端编程编程 I've never met a perfect developer, but I've met several who have taken significant steps towards becoming one! Use the following tips to develop better, faster applicati...
7万字介绍一款waf（web应用防火墙），再也不怕有人入侵了
2022-07-28 16:29

门柚的博客 7万字介绍一款waf（web应用防火墙），再也不怕有人入侵了 Awesome WAF 简单定义:web应用程序防火墙是位于web应用程序和客户端端点之间的安全策略实施点。该功能可以在软件或硬件中实现，可以在设备设备中运行，也...
攻防世界web篇10.25
2022-10-25 22:00

GuiltyFet的博客攻防世界，它里面有web、pwn、music、reverse、crypto和mobile六个大类。
HackTheBox -- RedPanda
2022-09-13 22:48

kalakala789的博客 HackTheBox——RedPanda(writeup)
全网多种方式解决The requested resource [/] is not available的错误
2023-02-18 07:43

互联网全栈开发实战的博客全网多种方式解决The requested resource [/] is not available的错误
企业级WEB应用服务器TOMCAT
2022-06-19 10:16

ehuo_的博客 Tomcat 服务器是一个免费的开放源代码的Web 应用服务器，属于轻量级应用服务器，在中小型系统和并发访问用户不是很多的场合下被普遍使用，Tomcat 具有处理HTML静态资源页面的功能，它还是一个Servlet和JSP容器...
前端轻量级框架jqGrid 使用手册
2018-06-08 12:00

Master_Shifu_的博客 jqGrid 各种参数详解JQGridJQGrid是一个在jquery基础上做的一个表格控件，以ajax的方式和服务器端通信。JQGrid Demo 是一个在线的演示项目。在这里，可以知道jqgrid可以做什么事情。下面是转自其他人blog的一个...
针对Web的信息搜集
2019-08-24 07:42

微软技术分享的博客信息收集(Information Gathering)，信息收集是指通过各种方式获取所需要的信息，在整个渗透测试环节中，信息搜集是...对于后期的渗透工作是非常有帮助的，本章将针对Web网站进行信息的搜集工作，以作为学习笔记收录。
没有解决我的问题, 去提问

悬赏问题

¥20 CST怎么把天线放在座椅环境中并仿真
¥15 任务A：大数据平台搭建（容器环境）怎么做呢？
¥15 r语言神经网络自变量重要性分析
¥15 基于双目测规则物体尺寸
¥15 wegame打不开英雄联盟
¥15 公司的电脑，win10系统自带远程协助，访问家里个人电脑，提示出现内部错误，各种常规的设置都已经尝试，感觉公司对此功能进行了限制（我们是集团公司）
¥15 救！ENVI5.6深度学习初始化模型报错怎么办？
¥30 eclipse开启服务后，网页无法打开
¥30 雷达辐射源信号参考模型
¥15 html+css+js如何实现这样子的效果？

Searching the Web

The output of each query ends with a single line of 10 equal signs.

want the computer only to write her

computer system) is essential to the

fees. Books might be the only way she

fees. Books might be the only way she

for works protected by copyright law

could graduate

What about you?

I am very very very happy!

Sorry, I found nothing.

2条回答 默认 最新

悬赏问题

2条回答默认最新