用Propel遍历PostgreSQL中的一个大表

I recently had a task to iterate over a big table (~40KK records) in PostgreSQL using Propel and encountered performance issues, both memory limit and execution speed. My script had been running for 22(!) hours.

The task was to retrieve records based on some criteria (not active for the last 6 months) and archive them (move to another table) and all related entities from other tables.

The primary table, my script is working on, has several columns: id, device_id, application_id, last_activity_date and others, that don’t have any significant meaning here. This table contains information about applications installed on device and their last activity dates. There may be several records with the same device_id and different application_id. Here is a sample from the table:

     id    | device_id | application_id |   last_init_date
 ----------+-----------+----------------+---------------------
         1 |         1 |              1 | 2013-09-24 17:09:01
         2 |         1 |              2 | 2013-09-19 20:36:23
         3 |         1 |              3 | 2014-02-11 00:00:00
         4 |         2 |              4 | 2013-09-29 20:12:54
         5 |         3 |              5 | 2013-08-31 19:41:05

So, the device is considered to be old enough to be archived, if the maximum last_activity_date for the particular device_id in this table is older than 6 months. Here is the query:

SELECT device_id
FROM device_applications
GROUP BY device_id
HAVING MAX(last_init_date) < '2014-06-16 08:00:00'

In Propel it looks like:

\DeviceApplicationsQuery::create()
  ->select('DeviceId')
  ->groupByDeviceId()
  ->having('MAX(device_applications.LAST_INIT_DATE) < ?', $date->format('Y-m-d H:i:s'))
  ->find();

The resulting set, as you understand, is too big to fit in memory, so I have to split it somehow into chunks.

The question is: what is the best strategy to choose in this situation to decrease memory consumption and to speed up the script? In my answer I'll show you what I've found so far.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongpo2340 2014-12-16 09:58
关注
I know three strategies of traversing a big table.

1. Good old limit/offset

The problem with this approach is that the database actually examines the records, that you want to skip with OFFSET. Here is a quote from the doc:

The rows skipped by an OFFSET clause still have to be computed inside the server; therefore a large > OFFSET might be inefficient.

Here is a simple example (not my initial query):

explain (analyze) SELECT * FROM device_applications ORDER BY device_id LIMIT 100 OFFSET 300;

Execution plan:

Limit (cost=37.93..50.57 rows=100 width=264) (actual time=0.630..0.835 rows=100 loops=1) -> Index Scan using device_applications_device_id_application_id_unique on device_applications (cost=0.00..5315569.97 rows=42043256 width=264) (actual time=0.036..0.806 rows=400 loops=1) Total runtime: 0.873 ms

Pay special attention to the actual results in Index scan section. It shows, that PostgreSQL worked with 400 records, which is offset (300) plus limit (100). So this approach is quite inefficient, especially taking into consideration the complexity of the initial query.

2. Ranging by some column

We can avoid the limitations of the limit/offset approach by making the query work with ranges of the table, which are made by slicing the table by a column.

To clarify, let’s imaging you have a table with 100 records, you can divide this table into five ranges by 20 records in each: 0 - 20, 20 - 40, 40 - 60, 60 - 80, 80 - 100, and then work with the smaller subsets. In my case the column we can range by is device_id. The query looks like this:

SELECT device_id FROM device_applications WHERE device_id >= 1 AND device_id < 1000 GROUP BY device_id HAVING MAX(last_init_date) < '2014-06-16 08:00:00';

It groups records by device_id, extracts the range and applies the condition on last_init_date. Of course, it may be (and will be in most cases) that there will be no records matching the condition. So, the problem with this approach is that you have to scan the whole table, even if the records you want to find are just 5% of all the records.

3. Using cursors

What we need is a cursor. Cursors allow to iterate over the result set without fetch the whole data at once. In PHP you make use of cursors when you iterate over a PDOStatement. A simple example:

$stmt = $dbh->prepare("SELECT * FROM table"); $stmt->execute(); // Iterate over statement using a cursor foreach ($stmt as $row) { // Do something }

In Propel you can make use of this PDO's feature with a PropelOnDemandFormatter class. So, the final code:

$devApps = \DeviceApplicationsQuery::create() ->setFormatter('\PropelOnDemandFormatter') ->select('DeviceId') ->groupByDeviceId() ->having('MAX(device_applications.LAST_INIT_DATE) < ?', $date->format('Y-m-d H:i:s')) ->find(); /** @var \DeviceApplications $devApp */ foreach ($devApps as $devApp) { // Do something }

Here the call to find() will not fetch the data, but instead will create a collection with on demand object creation.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

用Propel遍历PostgreSQL中的一个大表 php postgresql
2014-12-16 09:58

回答 2 已采纳 I know three strategies of traversing a big table. 1. Good old limit/offset The problem with thi
Propel2 diff使用PostgreSQL检测不存在的更改 php postgresql
2016-06-12 16:38

回答 2 已采纳 Downgrading PostgreSQL 9.5.x to 9.4.8 solved this problem for me. It may be a bug in 9.5.x.
Propel（PostgreSQL）语法错误在“。”附近。 php postgresql
2013-03-06 00:13

回答 1 已采纳 Answer from minitech's comment. user is a reserved ANSI SQL term. Either SELECT "user".id FROM "
angjs-propel-slim:这是一个简单的脚本，有助于使用 PropelORM + SLIM 作为 PHP REST 服务器编写 angularJS 服务
2021-06-06 13:59

和 SLIM 框架的代码生成器Propel 是一个简单的 PHP ORM，可以从数据库结构中自动生成复杂的模型SLIM 是一个小型 PHP 框架，可简化 REST API 的设计和实现。 AngularJS 是一个具有双重绑定和关注点分离的 JavaScript ...
如何使用3个或更多表进行Propel查询？ mysql php sql
2013-10-29 16:03

回答 1 已采纳 Of course it is possible. You have described the mapping in a config file. Example : $review =
如何使用Propel ORM中的SQL“替换”功能？ php sql
2015-10-26 02:19

回答 1 已采纳 This isn't exactly what you're looking for but accomplishes the same thing. According to the MySQ
Propel（PHP ORM），基本用法为所有（非空）表和列返回NULL mysql php
2016-02-05 01:49

回答 2 已采纳 http://propelorm.org/Propel/documentation/08-logging.html You will more info about errors from lo
propel-cli:一组帮助管理基于 Propel 的应用程序的工具
2021-06-05 17:59

推进命令行工具Propel 命令行工具是一组命令行工具，用于帮助开发使用 Propel ORM ( ) 的应用程序工具架构工具模式工具的目的是让您轻松地将数据库与 Propel 模式保持同步。目前架构工具支持 Propel 1.3.x 架构的...
如何在symfony（Propel）中连接表，并使用一个查询从两个表中检索对象 mysql php
2010-03-26 16:55

回答 1 已采纳 You should have a CommentPeer::doSelectJoinArticle() method, which can use to do this. If you don'
使用Propel联接的名称属性 mysql php
2015-05-03 07:37

回答 2 已采纳 foreign-keys name attribute is only being used for schema migration, not for the usage in your act
在Propel中使用EAV模型 mysql php
2013-03-28 11:39

回答 2 已采纳 What it actually worked is the following: $result = Entity_EntityQuery::create() ->withColumn(
change-logger-behavior:为 Propel2 在额外的记录器表中记录已定义列的更改
2021-07-02 13:35

它基本上将所有更改记录到一个额外的记录器表中，该表为您在log参数中指定的每一列定义。用法< table xss=removed> < column xss=removed xss=removed xss=removed xss=removed xss=removed> ...
Symfony 1.4 Propel：通过外键访问表数据 mysql php
2015-03-12 21:43

回答 1 已采纳 I finally found the solution to my problem, which had less to do with Propel and Symfony and every
Propel2：Propel2是适用于现代PHP的开源高性能对象关系映射（ORM）
2021-02-05 15:08

要求Propel使用以下Symfony组件： Propel主要依靠来管理依赖关系，但是您也可以使用（例如，请参见autoload.php.dist文件）。安装阅读。有助于每个人都可以为Propel贡献力量。只需对其进行分叉，然后发送“拉取...
php propel,php – 使用Composer安装Propel行为
2021-04-17 04:07

weixin_39685392的博客我目前正在使用WampServer在Windows上开发，并使Composer工作(使用OpenSSL)，Propel安装没有问题，一切似乎都可以正常工作。但是，我的项目现在需要利用here的“等行为”行为。我以为这会让我使用推进行为。在我的...
php propel,propel.php
2021-04-17 04:08

weixin_39736379的博客 return ['propel' => ['database' => ['connections' => ['mysource' => ['adapter' => 'mysql','classname' => 'Propel\Runtime\Connection\DebugPDO','dsn' => 'mysql:host=local...
optimistic-locking-behavior:允许您在 Propel 2 中使用乐观锁的行为
2021-07-02 13:23

乐观锁定行为Propel2 的乐观锁定行为。用法< table xss=removed> < column xss=removed xss=removed xss=removed xss=removed xss=removed> < column xss=removed xss=removed xss=removed xss=removed> < ...
php propel,关于propel–PHP
2021-04-17 04:07

非典型普通人类的博客我不知道有谁用过propel做过ORM,propel 效率还行,写起来像这样:1. $author = new Author();2. $author->setFirstName(‘Jane’);3. $author->setLastName(‘Austen’);4. $author->save();这里的anthor 是...
php propel,关于propel--PHP
2021-04-17 04:08

weixin_39525243的博客我不知道有谁用过propel做过ORM,propel 效率还行,写起来像这样:1. $author = new Author();2. $author->setFirstName('Jane');3. $author->setLastName('Austen');4. $author->save();这里的anthor 是...
propel-laravel:使 Propel 模型与 Laravel Form 一起工作
2021-06-20 02:46

__get() 的所有数据都使用普通的 Propel 吸气剂。检查正常列名和 PhpNames。用法构建属性 propel.behavior.laravelmodel.class = path.to.LaravelModelBehavior 架构.xml < behavior name = " ...
没有解决我的问题, 去提问

悬赏问题

¥15 python的qt5界面
¥15 无线电能传输系统MATLAB仿真问题
¥50 如何用脚本实现输入法的热键设置
¥20 我想使用一些网络协议或者部分协议也行，主要想实现类似于traceroute的一定步长内的路由拓扑功能
¥30 深度学习，前后端连接
¥15 孟德尔随机化结果不一致
¥15 apm2.8飞控罗盘bad health，加速度计校准失败
¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
¥15 谁有desed数据集呀
¥20 手写数字识别运行c仿真时，程序报错错误代码sim211-100

用Propel遍历PostgreSQL中的一个大表

2条回答 默认 最新

1. Good old limit/offset

2. Ranging by some column

3. Using cursors

悬赏问题

2条回答默认最新