doupeng6890 2014-02-18 16:06
浏览 60
已采纳

预测在共享内存中存储数据所需的大小

I'm working with the PHP shm (part of the semaphores extension, not to be confused with the shmop ones!) functions in a project. Basically the shared memory serves as kind of heap, I have only one array inside in which I'm storing keys (with meaningless values) as hashed index, I just check "Ah, it's there already". Now my problem is: that array can get quite big at times, but it doesn't always. I don't want to reserve a huge amount of memory I don't usually need, but rather resize dynamically.

I have registered an error handler that converts errors into ErrorExceptions, so I can catch the error thrown by shm_put_var when the memory is to small to store the array - but unfortunatly PHP clears the segment when data doesn't fit in there, so all other data is lost, too. This isn't an option therefore.

Because of this, I need a way to predict the size I'll need to store the data. One of the comments to shm_attach at php.net states that PHP appends an header of (PHP_INT_SIZE * 4) + 8bytes length, and one variable needs strlen(serialize($foo)) + 4 * PHP_INT_SIZE) + 4 (I have simplified the expression given in the comment, it's equal to mine but was blown up unecessarily)
While the header size seems to be correct (any memory smaller than 24 byte results in an error at creation, so 24 bytes seems to be the size of the header PHP puts in there), the size of each variable entry doesn't seem to hold true anymore in recent versions of PHP:
- I could store "1" in a shared memory segment with a size of 24 + strlen(serialize("1") + 3 * PHP_INT_SIZE) + 4 byte (note the 3 in there instead of 4),
- I couldn NOT store "999" in one sized 24 + strlen(serialize("999") + 4 * PHP_INT_SIZE) + 4

Does anyone know a way to predict how much memory is needed to store any data in shared memory using the shm functions or has some reference on how shm stores the variables? (I read the whole contets using shmop functions and printed them, but since it's binary data it's not reverse-engineerable in reasonable time)

(I will provide code samples as needed, I'm just not sure what parts will get relevant - ping me if you want to see any working samples, I have tried much so I have samples ready for most cases)


[Update] My C is pretty bad, so I odn't get far looking at the source (sysvshm.c and php_sysvshm.h), but I already found one issue with the solution that was suggested at php.net: While I could simplify the complex formula there to what I have included here (which was taken from the C sourcecode basically), this is NOT possible with the original one, as there are typecasts and no floating point math. The formula divides by sizeof(long) and multiplies with it again - which is useless in PHP but does round to multiples of sizeof(long) on C. SO I need to correct that in PHP first. Still, this is not everything, as Tests showed that I could store some values in even less memory than returned by the formula (see above).

  • 写回答

2条回答 默认 最新

  • douyan1321 2014-02-20 12:08
    关注

    Ok, answering this myself, as I figured it out by now. I still have no sources but my own research, so feel free to comment with any helpful links or answer on your own.

    Most important thing first: a working formula to calculate the size necessary to store data in shared memory using shm_* functions is:

    $header = 24; // actually 4*4 + 8
    $dataLength = (ceil(strlen(serialize($data)) / 4) * 4) + 16; // actually that 16 is 4*4
    

    The header with the size of $header is only stored once at the beginning of the memory segment and is stored when the segment is allocated (using shm_attach the first time with that system v ressource key), even if no data is written. Therefore, you cannot ever create a memory segment smaller than 24 byte.

    If you onyl want to use this and don'T care bout the details, just one warning: this is correct as long as PHP is compiled on a system that uses 32 bits for longs in C. If PHP is compiled with 64 bit longs, it's most likely a header size of 4 * 8 + 8 = 40 and each data variable needs (ceil(strlen(serialize($data)) / 8) * 8) + 32. Details in the explanation below.


    So, how did I get there?

    I looked into the PHP sourcecode. I don't know much C, so what I'm telling here is only how I got it, it may be nothing more than a lot of hot air...

    The relevant files are already linked in the question - look there. The important parts are:

    From php_sysvshm.h:

    typedef struct {
        long key;
        long length;
        long next;
        char mem;
    } sysvshm_chunk;
    
    typedef struct {
        char magic[8];
        long start;
        long end;
        long free;
        long total;
    } sysvshm_chunk_head;
    

    And from sysvshm.c:

    /* these are lines 166 - 173 in the sourcecode of PHP 5.2.17 (the one I found frist), 
       line nubmers may differ in recent versions */
    
    /* check if shm is already initialized */
    chunk_ptr = (sysvshm_chunk_head *) shm_ptr;
    if (strcmp((char*) &(chunk_ptr->magic), "PHP_SM") != 0) {
        strcpy((char*) &(chunk_ptr->magic), "PHP_SM");
        chunk_ptr->start = sizeof(sysvshm_chunk_head);
        chunk_ptr->end = chunk_ptr->start;
        chunk_ptr->total = shm_size;
        chunk_ptr->free = shm_size-chunk_ptr->end;
    }
    
     /* these are lines 371 - 397, comments as above */
    
     /* {{{ php_put_shm_data
     * inserts an ascii-string into shared memory */
    static int php_put_shm_data(sysvshm_chunk_head *ptr, long key, char *data, long len)
    {
        sysvshm_chunk *shm_var;
        long total_size;
        long shm_varpos;
    
        total_size = ((long) (len + sizeof(sysvshm_chunk) - 1) / sizeof(long)) * sizeof(long) + sizeof(long); /* long alligment */
    
        if ((shm_varpos = php_check_shm_data(ptr, key)) > 0) {
            php_remove_shm_data(ptr, shm_varpos);
        }
    
        if (ptr->free < total_size) {
            return -1; /* not enough memeory */
        }
    
        shm_var = (sysvshm_chunk *) ((char *) ptr + ptr->end);
        shm_var->key = key;
        shm_var->length = len;
        shm_var->next = total_size;
        memcpy(&(shm_var->mem), data, len);
        ptr->end += total_size;
        ptr->free -= total_size;
        return 0;
    }
    /* }}} */
    

    So, lot'S of code, I'll try to break it down.

    The parts from php_sysvshm.h tell us what size those structures ahve, we'll need that. I'm assuming each char has8 bits (which is most likely valid on any system), and each longhas 32 bits (which may differ on some systems that actually use 64 bit - you have to change the numbers then).

    • sysvshm_chunk has 3*sizeof(long) + sizeof(char), that makes 3*4 + 1 = 13 bytes.
    • sysvshm_chunk_head has 8*sizeof(char) + 4*sizeof(long), that makes 8*1 + 4*4 = 24 bytes.

    Now the first part from sysvshm.c is part of the code that gets executed when we're calling shm_attach in PHP. It initializes the memory segment by writing a header strucutre - the one defined as sysvshm_chunk_head we already talked about - if it'S not there already. This will need the 24 byte we calculated - the same 24 byte I gave in the formular right at the beginning.

    The second part is the function that actually inserts a variable into the shared memory. This get's called by another function, but I skipped that one, as it's not that usefull. Basicall, it gets the shared memory header structure, whcih includes the addresses of start and end of the data inside the meory segment. It then gets a longwith the variavble key you used to store the variable, a char* (well, similar to strings, but C version) with the already serialized data, and the length of that data (for whatever reason, it could calculate that on it's own, but anyway).
    For each data, a header (the structure defined as sysvshm_chunk we looked at) plus the actual data is now written into the memory. It is aligned to long however for easier memory management (that means: It's size is always rounded to the next multiple of sizeof(long), which is 4 bytes on most systems again). Now here it becomes a little strange. According to the C code we're looking at, (ceil((strlen(serialize($data)) + 13 - 1) / 4) * 4) ; should work (that13 in there is sizeof(sysvshm_chunk)). But: It doesn't. It always yields 4 bytes less then we actually need. couldn't find those four bytes. I assume that the length of that serialized data (len) is already alingned, but I didn't look into the source for that. But I couldn't find those 4 bytes anywhere else. The char is lasted in the C structure definition, and charis aligned on full bytes and nothing more, so that shouldn't cause those 4 additional bytes either - but if I'm wrong of how C alignes those, that could be the reason, too. ANyway, I aligned the data and the header individually in my formula, and it worked (aligned header alweayss has 16 bytes, that's the 16 in my formula, the data length gets aligned by that divide-round-multiply thingy). But, technically, the formula could also be

     $dataLength = (ceil((strlen(serialize($data)) + 13 - 1) / 4) * 4) + 4;
    

    It yields the sam results however, if I just missed those 4 bytes somewhere else. I have no system with a PHP versoin running that was compiled with 64 bit longs, so I cannot verify which one is correct.

    tl;dr: problem solved, comments welcome, if you got any additional questions, now is the time.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 对于相关问题的求解与代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料