douwang6635 2015-05-14 13:25
浏览 115
已采纳

如何修改传统的笛卡尔积以减少内存开销?

For my problem you're selecting up to 24 items from a pool of maybe 5-10,000 items. In other words we're generating configurations.

The number 24 comes from the item categories, each item is associated with a particular installation location, an item from location 1 cannot be installed in location 10, so I have arranged my associative array to organize the data in groups. Each item looks like:

$items[9][] = array("id" => "0", "2" => 2, "13" => 20);

Where the first parameter ( $item[9] ) tells you the location it is allowed in. If you want it's ok to think of the idea that you cannot install a tire in the spot for an exhaust pipe.

The items are stored in a mySQL database. The user can specify restrictions on the solution, for example, attribute 2 must have a final value of 25 or more. They can have multiple competing restrictions. The queries retrieve items that have any value for the attributes under consideration (unspecified attributes are stored but we don't do any calculations with them). The PHP script then prunes out any redundant choices (for example: if item 1 has an attribute value of 3 and item 2 has an attribute value of 5, in the absence of another restriction you would never choose item 1).

After all the processing has occurred get an associative array that looks like:

$items[10][] = array("id" => "3", "2" => 2, "13" => 100);
$items[10][] = array("id" => "4", "2" => 3, "13" => 50);
$items[9][] = array("id" => "0", "2" => 2, "13" => 20);
$items[9][] = array("id" => "1", "2" => -1, "13" => 50);

I have posted a full example data set at this pastebin link. There is reason to believe I can be more restrictive on what I accept into the data set but even at a restriction of 2 elements per option there's a problem.

In the array() value, the id is the reference to the index of the item in the array, and the other values are attribute id and value pairs. So $items[10][] = array("id" => "3", "2" => 2, "13" => 100); means that in location 10 there is an item with id 3 which as a value of 2 in attribute 2 and a value of 100 in attribute 13. If it helps think of an item being identified by a pair eg (10,0) is item 0 in location 10.

I know I'm not being specific, there are 61 attributes and I don't think it changes the structure of the problem with what they represent. If we want, we can think of attribute 2 as weight and attribute 13 as cost. The problem the user wants solved might be to generate a configuration where the weight is 25 exactly and the cost is minimized.

Back of the envelope math says a rough estimate, if there were only 2 choices per location, is 2^24 choices x size of the record. Assuming a 32 bit integer could be encoded to represent a single record somehow, we're looking at 16,777,216 * 4 = 67,108,864 bytes of memory (utterly ignoring data structure overhead) and there is no reason to believe that either of these assumptions is going to be valid, though an algorithm with an upper memory bound in the realm of 67 megs would be an acceptable memory size.

There's no particular reason to stick to this representation, I used associative arrays since I have a variable number of attributes to use and figured that would allow me to avoid a large, sparse array. Above "2"=>2 actually means that filtered attribute with id #2 has a value of 2 and similarly attribute 13's value is 100. I'm happy to change my data structure to something more compact.

One thought I had was that I do have an evaluation criteria I can use to discard most of the intermediate configurations. As an example, I can compute 75 * "value of "2"" + 10 * "value of "13" to provide a relative weighting of the solutions. In other words, if there were no other restrictions on a problem, each value improvement by 1 of attribute 2 costs 75 and each value improvement of attribute 13 costs 10. Continuing the idea of a car part, think of it like buying a stock part and having a machinist modify it to our specifications.

One problem I see with discarding configurations too early is that the weighting function does not take into account restrictions such as "the final result must have a value of "2" that is at exactly 25". So it's fine if I have a full 24 element configuration, I can run through a loop of the restrictions, discard the solutions that don't match and then finally rank the remaining solutions by the function, but I'm not sure there's a valid line of thought that allows me to throw away solutions earlier.

Does anyone have any suggestions on how to move forward? Although a language agnostic solution is fine, I am implementing in PHP if there's some relevant language feature that might be useful.

  • 写回答

1条回答 默认 最新

  • duandou8457 2015-05-15 12:46
    关注

    I solved my issue with memory by performing a depth first cartesian product. I can weigh the solutions one at a time and retain some if I choose or simply output them as I am doing here in this code snippet.

    The main inspiration for this solution came from the very concise answer on this question. Here is my code as it seems like finding a php depth first cartesian product algorithm is less than trivial.

    function dfcartesian ( $input, $current, $index ) {
        // sample use: $emptyArray = array();
        //             dfcartesian( $items, $emptyArray, 0 )
        if ( $index == count( $input ) ) {
            // If we have iterated over the entire space and are at the bottom
            // do whatever is relevant to your problem and return.
            //
            // If I were to improve the solution I suppose I'd pass in an
            // optional function name that we could pass data to if desired.
            var_dump( $current );
            echo '<br><br>';
            return;
        }
    
        // I'm using non-sequential numerical indicies in an associative array
        // so I want to skip any empty numerical index without aborting.
        //
        // If you're using something different I think the only change that
        // needs attention is to change $index + 1 to a different type of
        // key incrementer.  That sort of issue is tackled at
        // https://stackoverflow.com/q/2414141/759749
        if ( isset ( $input[$index] ) ) {
            foreach ( $input[$index] as $element ) {
                $current[] = $element;
                // despite my concern about recursive function overhead,
                // this handled 24 levels quite smoothly.
                dfcartesian( $input, $current, ( $index + 1 ) );
                array_pop( $current );
            }
        } else {
            // move to the next index if there is a gap
            dfcartesian( $input, $current, ( $index + 1 ) );
        }
    }   
    

    I hope this is of use to someone else tackling the same problem.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥100 求数学坐标画圆以及直线的算法
  • ¥100 c语言,请帮蒟蒻写一个题的范例作参考
  • ¥15 名为“Product”的列已属于此 DataTable
  • ¥15 安卓adb backup备份应用数据失败
  • ¥15 eclipse运行项目时遇到的问题
  • ¥15 关于#c##的问题:最近需要用CAT工具Trados进行一些开发
  • ¥15 南大pa1 小游戏没有界面,并且报了如下错误,尝试过换显卡驱动,但是好像不行
  • ¥15 自己瞎改改,结果现在又运行不了了
  • ¥15 链式存储应该如何解决
  • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站