Searching the Web

Description

The word "search engine" may not be strange to you. Generally speaking, a search engine searches the web pages available in the Internet, extracts and organizes the information and responds to users' queries with the most relevant pages. World famous search engines, like GOOGLE, have become very important tools for us to use when we visit the web. Such conversations are now common in our daily life:
"What does the word like ****** mean?"
"Um... I am not sure, just google it."

In this problem, you are required to construct a small search engine. Sounds impossible, does it? Don't worry, here is a tutorial teaching you how to organize large collection of texts efficiently and respond to queries quickly step by step. You don't need to worry about the fetching process of web pages, all the web pages are provided to you in text format as the input data. Besides, a lot of queries are also provided to validate your system.
Modern search engines use a technique called inversion for dealing with very large sets of documents. The method relies on the construction of a data structure, called an inverted index,which associates terms (words) to their occurrences in the collection of documents. The set of terms of interest is called the vocabulary, denoted as V. In its simplest form, an inverted index is a dictionary where each search key is a term ω∈V. The associated value b(ω) is a pointer to an additional intermediate data structure, called a bucket. The bucket associated with a certain term ω is essentially a list of pointers marking all the occurrences of ω in the text collection. Each entry in each bucket simply consists of the document identifier (DID), the ordinal number of the document within the collection and the ordinal line number of the term's occurrence within the document.
Let's take Figure-1 for an example, which describes the general structure. Assuming that we only have three documents to handle, shown at the right part in Figure-1; first we need to tokenize the text for words (blank, punctuations and other non-alphabetic characters are used to separate words) and construct our vocabulary from terms occurring in the documents. For simplicity, we don't need to consider any phrases, only a single word as a term. Furthermore, the terms are case-insensitive (e.g. we consider "book" and "Book" to be the same term) and we don't consider any morphological variants (e.g. we consider "books" and "book", "protected" and "protect" to be different terms) and hyphenated words (e.g. "middle-class" is not a single term, but separated into 2 terms "middle" and "class" by the hyphen). The vocabulary is shown at the left part in Figure-1.Each term of the vocabulary has a pointer to its bucket. The collection of the buckets is shown at the middle part in Figure-1. Each item in a bucket records the DID of the term's occurrence.
After constructing the whole inverted index structure, we may apply it to the queries. The query is in any of the following formats:
term
term AND term
term OR term
NOT term
A single term can be combined by Boolean operators: AND, OR and NOT ("term1 AND term2" means to query the documents including term1 and term2; "term1 OR term2" means to query the documents including term1 or term2; "NOT term1" means to query the documents not including term1). Terms are single words as defined above. You are guaranteed that no non-alphabetic characters appear in a term, and all the terms are in lowercase. Furthermore, some meaningless stop words (common words such as articles, prepositions, and adverbs, specified to be "the, a, to, and, or, not" in our problem) will not appear in the query, either.
For each query, the engine based on the constructed inverted index searches the term in the vocabulary, compares the terms' bucket information, and then gives the result to user. Now can you construct the engine?

Input

The input starts with integer N (0 < N < 100) representing N documents provided. Then the next N sections are N documents. Each section contains the document content and ends with a single line of ten asterisks.


You may assume that each line contains no more than 80 characters and the total number of lines in the N documents will not exceed 1500.
Next, integer M (0 < M <= 50000) is given representing the number of queries, followed by M lines, each query in one line. All the queries correspond to the format described above.
Output

For each query, you need to find the document satisfying the query, and output just the lines within the documents that include the search term (For a NOT query, you need to output the whole document). You should print the lines in the same order as they appear in the input. Separate different documents with a single line of 10 dashes.

If no documents matching the query are found, just output a single line: "Sorry, I found nothing."

The output of each query ends with a single line of 10 equal signs.

Sample Input

4
A manufacturer, importer, or seller of
digital media devices may not (1) sell,
or offer for sale, in interstate commerce,
or (2) cause to be transported in, or in a
manner affecting, interstate commerce,
a digital media device unless the device
includes and utilizes standard security
technologies that adhere to the security
system standards.


Of course, Lisa did not necessarily
intend to read his books. She might
want the computer only to write her
midterm. But Dan knew she came from
a middle-class family and could hardly
afford the tuition, let alone her reading
fees. Books might be the only way she
could graduate


Research in analysis (i.e., the evaluation
of the strengths and weaknesses of
computer system) is essential to the
development of effective security, both
for works protected by copyright law
and for information in general. Such
research can progress only through the
open publication and exchange of
complete scientific results


I am very very very happy!
What about you?


6
computer
books AND computer
books OR protected
NOT security
very
slick
Sample Output

want the computer only to write her

computer system) is essential to the

intend to read his books. She might
want the computer only to write her

fees. Books might be the only way she

intend to read his books. She might

fees. Books might be the only way she

for works protected by copyright law

Of course, Lisa did not necessarily
intend to read his books. She might
want the computer only to write her
midterm. But Dan knew she came from
a middle-class family and could hardly
afford the tuition, let alone her reading
fees. Books might be the only way she

could graduate

I am very very very happy!

What about you?

I am very very very happy!

Sorry, I found nothing.

2个回答

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
其他相关推荐
Searching the String
-
Searching Treasure in Deep Sea 怎么编写的
-
Searching Quickly
-
Matrix Searching
-
关于Repository 怎么实现
-
Outernet 网络的算法问题
-
Bomberman - Just Search!怎么解答的呢
-
UITextView中粘贴长文本
-
Knights of the Round Table
-
The Lost House
-
字符串快速查找算法的问题,怎么使用 C 语言的程序的编写的办法怎么来查找的?
-
关于celery启动任务时报错Thread 'ResultHandler' crashed: ValueError('invalid file descriptor 13',)
-
Tree Insertions
-
Finding a Word in a Text
-
学会了这些技术,你离BAT大厂不远了
每一个程序员都有一个梦想,梦想着能够进入阿里、腾讯、字节跳动、百度等一线互联网公司,由于身边的环境等原因,不知道 BAT 等一线互联网公司使用哪些技术?或者该如何去学习这些技术?或者我该去哪些获取这些技术资料?没关系,平头哥一站式服务,上面统统不是问题。平头哥整理了 BAT 等一线大厂的必备技能,并且帮你准备了对应的资料。对于整理出来的技术,如果你掌握的不牢固,那就赶快巩固,如果你还没有涉及,现在...
记一道字节跳动的算法面试题
点击蓝色“五分钟学算法”关注我哟加个“星标”,天天中午 12:15,一起学算法作者 | 帅地来源公众号 | 苦逼的码农前几天有个朋友去面试字节跳动,面试官问了他一道链表相...
程序员真是太太太太太有趣了!!!
网络上虽然已经有了很多关于程序员的话题,但大部分人对这个群体还是很陌生。我们在谈论程序员的时候,究竟该聊些什么呢?各位程序员大佬们,请让我听到你们的声音!不管你是前端开发...
史上最详细的IDEA优雅整合Maven+SSM框架(详细思路+附带源码)
网上很多整合SSM博客文章并不能让初探ssm的同学思路完全的清晰,可以试着关掉整合教程,摇两下头骨,哈一大口气,就在万事具备的时候,开整,这个时候你可能思路全无 ~中招了咩~ ,还有一些同学依旧在使用eclipse或者Myeclipse开发,我想对这些朋友说IDEA 的编译速度很快,人生苦短,来不及解释了,直接上手idea吧。这篇文章每一步搭建过程都测试过了,应该不会有什么差错。本文章还有个比较优秀的特点,就是idea的使用,基本上关于idea的操作都算是比较详细的,所以不用太担心不会撸idea!最后,本文
Python爬取淘宝商品信息
各位同学们,好久没写原创技术文章了,最近有些忙,所以进度很慢。 警告:本教程仅用作学习交流,请勿用作商业盈利,违者后果自负!如本文有侵犯任何组织集团公司的隐私或利益,请告知联系猪哥删除!!! 一、淘宝登录复习 前面我们已经介绍过了如何使用requests库登录淘宝,收到了很多同学的反馈和提问,猪哥感到很欣慰,同时对那些没有及时回复的同学说声抱歉! 顺便再提一下这个登录功能,代码是完全没有问题。...
全球最厉害的 14 位程序员!
来源 | ITWorld 整理自网络全球最厉害的 14 位程序员是谁?今天就让我们一起来了解一下吧,排名不分先后。01. Jon Skeet个人名望:程序技术问答网站 S...
从入门到精通,Java学习路线导航
引言 最近也有很多人来向我"请教",他们大都是一些刚入门的新手,还不了解这个行业,也不知道从何学起,开始的时候非常迷茫,实在是每天回复很多人也很麻烦,所以在这里统一作个回复吧。 Java学习路线 当然,这里我只是说Java学习路线,因为自己就是学Java的,对Java理当很熟悉,对于其它方面,我也不是很了解。 基础阶段 首先是基础阶段,在基础阶段,我们必须掌握Java基础,Mysql数据库,Ora...
我花了一夜用数据结构给女朋友写个H5走迷宫游戏
起因 又到深夜了,我按照以往在csdn和公众号写着数据结构!这占用了我大量的时间!我的超越妹妹严重缺乏陪伴而 怨气满满! 而女朋友时常埋怨,认为数据结构这么抽象难懂的东西没啥作用,常会问道:天天写这玩意,有啥作用。而我答道:能干事情多了,比如写个迷宫小游戏啥的! 当我码完字准备睡觉时:写不好别睡觉! 分析 如果用数据结构与算法造出东西来呢? ...
盘点那些被AI换脸、一键“脱”衣所滥用的AI模型
上周作者发布了一篇有关AI换脸的教程,不过令笔者始料未及的是一石激起千层浪,竟然有不少网友留言求所谓一键“脱”衣的教程。 虽然笔者对于技术的滥用深恶痛绝,但技术本身是中性的,并无好坏之分,从我上篇博文中也能看到“AI换脸”的门槛越来越低,目前其应用已经发展到几乎是随便什么人有个教程就能操作的地步了,所以想阻止这些滥用的技术,单靠封杀是不起了什么作用的,所以本文就回归...
五分钟小知识:为什么说 ++i 的效率比 i++ 高?
点击蓝色“五分钟学算法”关注我哟加个“星标”,天天中午 12:15,一起学算法作者 | 守望先生来源 | 编程珠玑前言不知道你是否听说过 ++i 比 i++ 快的说法,真...
接班马云的为何是张勇?
上海人、职业经理人、CFO 背景,集齐马云三大不喜欢的张勇怎么就成了阿里接班人? 作者|王琳 本文经授权转载自燃财经(ID:rancaijing) 9月10日,张勇转正了,他由阿里巴巴董事局候任主席正式成为阿里巴巴董事局主席,这也意味着阿里巴巴将正式开启“逍遥子时代”。 从2015年接任CEO开始,张勇已经将阿里巴巴股价拉升了超过200%。但和马云强大的个人光环比,张勇显得尤其...
什么是大公司病(太形象了)
点击蓝色“五分钟学算法”关注我哟加个“星标”,天天中午 12:15,一起学算法作者 | 南之鱼来源 | 芝麻观点(chinamkt)所谓大企业病,一般都具有机构臃肿、多重...
让程序员崩溃的瞬间(非程序员勿入)
今天给大家带来点快乐,程序员才能看懂。 来源:https://zhuanlan.zhihu.com/p/47066521 1. 公司实习生找 Bug 2.在调试时,将断点设置在错误的位置 3.当我有一个很棒的调试想法时 4.偶然间看到自己多年前写的代码 5.当我第一次启动我的单元测试时 ...
工厂模式,从第三方登录说起
现在的很多平台在登陆的时候,下面都会有一排选项,可以选择微信、QQ、微博账号等登陆,这些账号对平台来说都是第三方账号。第三方账号登陆是最近几年流行起来的,第三方账号登录一般都是基于OAuth2.0协议开发的。如果你不了解OAuth2.0协议,可以自行百度,也许会对你看这篇文章有所帮助。 现在由于公司要给平台引入流量,为了降低注册门槛,让更多的人来使用你们的平台,领导决定在你们的平台上接入第三方账号...
如何在Windows中开启"上帝模式"
原文链接 : https://mp.weixin.qq.com/s?__biz=MzIwMjE1MjMyMw==&amp;mid=2650202982&amp;idx=1&amp;sn=2c6c609ce06db1cee81abf2ba797be1b&amp;chksm=8ee1438ab996ca9c2d0cd0f76426e92faa835beef20ae21b537c0867ec2773be...
什么是“中台”?
“中台”这个概念,越来越多的在各种技术大会上提及,各大技术公司,纷纷推出自己的“中台”方案,究竟什么是“中台”?他和“前台”、“后台”有何区别?《》,这是我的朋友、前同事...
为什么面向对象糟透了?
又是周末,编程语言“三巨头”Java, Lisp 和C语言在Hello World咖啡馆聚会。服务员送来咖啡的同时还带来了一张今天的报纸, 三人寒暄了几句, C语言翻开了...
分享靠写代码赚钱的一些门路
作者 mezod,译者 josephchang10如今,通过自己的代码去赚钱变得越来越简单,不过对很多人来说依然还是很难,因为他们不知道有哪些门路。今天给大家分享一个精彩...
失业42天,我废了
作者:子彧师兄https://www.jianshu.com/p/62590c1339f12019.6.5这天下午,公司以资金困难,亏损较大为理由将我们整个技术部裁掉,我...
技术人员要拿百万年薪,必须要经历这9个段位
很多人都问,技术人员如何成长,每个阶段又是怎样的,如何才能走出当前的迷茫,实现自我的突破。所以我结合我自己10多年的从业经验,总结了技术人员成长的9个段位,希望对大家的职...
顶级产品经理是如何利用王者荣耀,3步毁掉你的自律。
【老王提示】:本文共 2384 字数,预计阅读时间为 8 Minute。 前言 当今时代,王者荣耀可谓无人不知无人不晓,该产品为其行业巨头,而其产品使用者年龄小则十几岁,大则近百岁。 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 这个现象极为恐怖,甚至有些心酸,本是正处于青春阳光成长中的孩子,应该围绕着学习才对,而不是花费大量时间在娱乐上。不仅是小学生,只要处于...
nginx学习,看这一篇就够了:下载、安装。使用:正向代理、反向代理、负载均衡。常用命令和配置文件
文章目录前言一、nginx简介1. 什么是 nginx 和可以做什么事情2.Nginx 作为 web 服务器3. 正向代理4. 反向代理5. 动静分离6.动静分离二、Nginx 的安装三、 Nginx 的常用命令和配置文件四、 Nginx 配置实例 1 反向代理五、 Nginx 配置实例 2 负载均衡六、 Nginx 配置实例 3 动静分离七、 Nginx 的高可用集群 前言 一、nginx简介...
动画:用动画给面试官解释 TCP 三次握手过程
作者 | 小鹿 来源 | 公众号:小鹿动画学编程 写在前边 TCP 三次握手过程对于面试是必考的一个,所以不但要掌握 TCP 整个握手的过程,其中有些小细节也更受到面试官的青睐。 对于这部分掌握以及 TCP 的四次挥手,小鹿将会以动画的形式呈现给每个人,这样将复杂的知识简单化,理解起来也容易了很多,尤其对于一个初学者来说。 学习导图 一、TCP 是什么? TCP(Transmissio...
JAVA实现商品信息管理系统
任务与实现 超市商品管理系统 题目要求 超市中商品分为四类,分别是食品、化妆品、日用品和饮料。每种商品都包含商品名称、价格、库存量和生产厂家、品牌等信息。 主要完成对商品的销售、统计和简单管理。 这个题目相对简单,可以用一张表实现信息的保存和处理,因此不再给出数据库设计参考。 功能要求 (1)销售功能。购买商品时,先输入类别,然后输入商品名称,并在库存中查找该商品的相关信息。如果有库存量,输入购买...
500行代码,教你用python写个微信飞机大战
这几天在重温微信小游戏的飞机大战,玩着玩着就在思考人生了,这飞机大战怎么就可以做的那么好,操作简单,简单上手。 帮助蹲厕族、YP族、饭圈女孩在无聊之余可以有一样东西让他们振作起来!让他们的左手 / 右手有节奏有韵律的朝着同一个方向来回移动起来! 这是史诗级的发明,是浓墨重彩的一笔,是…… 在一阵抽搐后,我结束了游戏,瞬时觉得一切都索然无味,正在我进入贤者模式时,突然想到,如果我可以让更多人已不同的方式体会到这种美轮美奂的感觉岂不美哉? 所以我打开电脑,创建了一个 `plan_game.py`……
2019诺贝尔经济学奖得主:贫穷的本质是什么?
2019年诺贝尔经济学奖,颁给了来自麻省理工学院的 阿巴希·巴纳吉(Abhijit Vinayak Banerjee)、艾丝特·杜芙若(Esther Duflo)夫妇和哈...
相关热词 c# 增加元素 c#控制台简单加法 c# 服务端框架 c# 判断事件是否注册 c#中is和has c# udp 连接超时 c#词典 c#实现排列组合 c# oss 上传 c#判断输入的是否为ip