dougong7850 2015-06-14 05:29
浏览 34

MongoDB文件计数波动

I am facing an issue with the document count in a collection being slightly erratic.

Here is my workflow:

Crawling is first done with scrapy. Scraped items are sent through a pipeline and prepared for writing to the collection using pymongo library.

Next, perform a check to see if the item currently exists (using a key) and if so, inherit the _id and use db.collection.save() to achieve an upsert. A check is done to ensure that all fields exist before writing.

If the item does not exist, a new document is created in the collection.

Lastly, a frontend PHP webpage allows users to search for documents in the collection using the PHP mongoDB driver.

Issue

I started noticing on the webpage that some new documents would appear in one crawl, then disappear from view suddenly, and then mysteriously appear again after the next crawl. So I went into mongo shell and found that a specific query would return a fluctuating number of results if sent repeatedly. Something like up by one and then down by two and then back to a stable number.

The thing I don't get is that at no point in the code do I remove() any documents from the collection. My impression is that db.collection.save() will only result in an equal or increasing number of documents in the collection.

Is there some form of blocking whereby documents being written cannot be queried? Or does it have something to do with my crawling interval?

Notes:

  • No indexing is done on the collection
  • Each crawl+write process only takes about 5-10s and are repeated in 30s interval.

Code snippet of the query:

    $cursor = $collection->find(array( '$or' => array(
            array('post_content' => new MongoRegex("/$safe/i")),
            array('post_user' => new MongoRegex("/^$safe$/i"))
    )));
    $cursor->sort(array('post_datetime' => -1));
  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 数学建模招标中位数问题
    • ¥15 phython路径名过长报错 不知道什么问题
    • ¥15 深度学习中模型转换该怎么实现
    • ¥15 HLs设计手写数字识别程序编译通不过
    • ¥15 Stata外部命令安装问题求帮助!
    • ¥15 从键盘随机输入A-H中的一串字符串,用七段数码管方法进行绘制。提交代码及运行截图。
    • ¥15 TYPCE母转母,插入认方向
    • ¥15 如何用python向钉钉机器人发送可以放大的图片?
    • ¥15 matlab(相关搜索:紧聚焦)
    • ¥15 基于51单片机的厨房煤气泄露检测报警系统设计