Elasticsearch PHP批量索引性能与索引

I run a benchmark on elasticsearch using elasticsearch-php. I compare the time taken by 10 000 index one by one vs 10 000 with bulk of 1 000 documents.

On my vpn server 3 cores 2 Gb mem the performance is quite the same with or without bulk index.

My php code (inspired by à post):

<?php
set_time_limit(0);  //  no timeout
require 'vendor/autoload.php';
$es = new Elasticsearch\Client([
    'hosts'=>['127.0.0.1:9200']
]);
$max = 10000;

// ELASTICSEARCH BULK INDEX
$temps_debut = microtime(true);
for ($i = 0; $i <=  $max; $i++) {
    $params['body'][] = array(
        'index' => array(
            '_index' => 'articles',
            '_type' => 'article',
            '_id' => 'cle' . $i
        )
    );
    $params['body'][] = array(
        'my_field' => 'my_value' . $i
    );
    if ($i % 1000) {   // Every 1000 documents stop and send the bulk request
        $responses = $es->bulk($params);
        $params = array();  // erase the old bulk request    
        unset($responses); // unset  to save memory
    }
}
$temps_fin = microtime(true);
echo 'Elasticsearch bulk: ' . round($i / round($temps_fin - $temps_debut, 4)) . ' per sec <br>';

// ELASTICSEARCH WITHOUT BULK INDEX
$temps_debut = microtime(true);
        for ($i = 1; $i <= $max; $i++) {    
            $params = array();
            $params['index'] = 'my_index';
            $params['type']  = 'my_type';
            $params['id']    = "key".$i;
            $params['body']  = array('testField' => 'valeur'.$i);
            $ret = $es->index($params);
        }
$temps_fin = microtime(true);
echo 'Elasticsearch One by one : ' . round($i / round($temps_fin - $temps_debut, 4)) . 'per sec <br>';
?>

Elasticsearch bulk: 1209 per sec Elasticsearch One by one : 1197per sec

Is there something wrong on my bulk index to obtain better performance ?

Thank's

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dsadsadsa1231 2015-05-31 21:21
关注
Replace:

if ($i % 1000) { // Every 1000 documents stop and send the bulk request

with:

if (($i + 1) % 1000 === 0) { // Every 1000 documents stop and send the bulk request

or you will query for each non-0 value (that is 999 of 1000)... Obviously, this only works if $max is a multiple of 1000.

Also, correct this bug:

for ($i = 0; $i <= $max; $i++) {

will iterate over $max + 1 items. replace it with:

for ($i = 0; $i < $max; $i++) {

There might also be a problem with how you initialize $params. Shouldn't you set it up outside of the loop and only clean-up the $params['body'] after each ->bulk()? When you reset with $params = array(); you loose all of it.

Also, remember that ES may be distributed over a cluster. Bulk operations can then be distributed to even the workload. So some performance scaling is not visible on a single physical node.
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

使用elasticsearch作为hbase的二级索引 elasticsearch hbase
2017-09-02 07:23

回答 3 已采纳一个作为基础存储，一个作为数据索引。根据你的业务需求，判断哪些需要建立二级索引，存入 es数据库,建立索引与hbase的关联外键，可以实现实时搜索
elasticsearch kibana开发工具创建索引报错 elasticsearch java
2022-03-29 17:48

回答 1 已采纳盲猜8.0删除了自定义type，7.x自定义type就会警告不要使用除_doc以外的type了，这玩意几个版本之前就不推荐使用了，去掉你的type1就行了
java查询ElasticSearch库多索引问题 elasticsearch intellij-idea java
2022-08-05 10:48

回答 2 已采纳使用searchRequest的indices方法 SearchRequest searchRequest = new SearchRequest(); String[] indexList
php es 批量搜索,Elasticsearch 500 万索引批量存储 php demo
2021-04-22 05:42

fix bug的博客 Elasticsearch-PHP 安装引用文件dict.txt.cache.json, pinyin.php 云盘下载地址:索引生成测试代码 php单条索引生成,速度较慢 elastic.phpuse Elasticsearch\ClientBuilder;require 'vendor/autoload.php';include('...
如何使用ElasticSearch索引搜索单词的一部分 elasticsearch lucene php
2016-02-12 06:18

回答 2 已采纳 You can simply try using a wildcard first: curl http://localhost:9200/my_idx/my_type/_search?q=Do
elasticsearch新建索引库问题 elasticsearch
2023-02-28 13:19

回答 1 已采纳该回答引用NewBing 你好，这是Bing。😊 根据网上的资料①②③，elasticsearch多表关联同步的方案有两种：方案一：多表关联视图，视图同步 es。适用场景：基础业务都在 MySQ
Elasticsearch，实现如数据库中的联合唯一性索引 elasticsearch
2017-08-07 12:42

回答 1 已采纳主键？http://blog.csdn.net/brotherdong90/article/details/50695606
Elasticsearch 8.9 Bulk批量给索引增加数据源码
2023-11-16 23:53

胖墩的IT的博客 org.elasticsearch.action.bulk.BulkRequest bulkRequest, String executorName, ActionListener<BulkResponse> listener) { //省略代码 //在开始之前，尝试创建我们在批量处理期间需要的所有索引。 // Step 1: ...
Es 通过javaapi的方式创建索引且字段加分词器时出现的错误 elasticsearch java 搜索引擎
2022-06-15 12:31

回答 1 已采纳 es上面安装了ik分词器了么
elasticsearch的索引名和别名的关系 elasticsearch
2022-12-15 15:22

回答 1 已采纳索引名称和别名是一样的，任意一个匹配即可。望采纳，谢谢！
如何使用这种架构在Elastic Search中复制索引？ elasticsearch
2019-08-06 08:54

回答 1 已采纳 The concept you're after is an index alias. The basic workflow would be: Import today's data int
【ES】Elasticsearch核心基础概念：文档与索引
2023-03-01 07:00

逆流°只是风景-bjhxcc的博客 es的核心概念主要是：index(索引)、Document(文档)、Clusters(集群)、Node(节点)与实例，下面我们先来了解一下Document与Index。
Elasticsearch over mysql搜索性能 elasticsearch mysql php symfony
2015-06-24 15:40

回答 1 已采纳 Is my tests elasticsearch queries can be more de 6x faster than sql queries with mysql and Doctrin
elasticsearch通过顶部多索引，实现联合查询
2019-05-06 15:15

铁柱同学的博客一、前言之前试了下多索引查询，就是索引以...elasticsearch的多索引联合查询以及范围日期查询示例背景：使用es-php + es7.0 二、正文 1、首先索引部分还是以数组的形式 '...
2023-02-09 Elasticsearch 索引的批量操作
2023-02-09 09:46

@Autowire的博客这是一句最简单的批量查询的语句，使用ES官方提供的_mget进行批量查询。但是这个查询其实真的很糟糕，稍微复杂一点的需求就会包含大量重复的条件在里面。也可以通过"_source": {“include”:[]}和"_source": {...
没有解决我的问题, 去提问

悬赏问题

¥15 matlab中使用gurobi时报错
¥15 WPF 大屏看板表格背景图片设置
¥15 这个主板怎么能扩出一两个sata口
¥15 不是，这到底错哪儿了😭
¥15 2020长安杯与连接网探
¥15 关于#matlab#的问题：在模糊控制器中选出线路信息，在simulink中根据线路信息生成速度时间目标曲线（初速度为20m/s，15秒后减为0的速度时间图像）我想问线路信息是什么
¥15 banner广告展示设置多少时间不怎么会消耗用户价值
¥16 mybatis的代理对象无法通过@Autowired装填
¥15 可见光定位matlab仿真
¥15 arduino 四自由度机械臂

Elasticsearch PHP批量索引性能与索引

1条回答 默认 最新

悬赏问题

1条回答默认最新