douwang6635 2015-05-14 13:25
浏览 115
已采纳

如何修改传统的笛卡尔积以减少内存开销?

For my problem you're selecting up to 24 items from a pool of maybe 5-10,000 items. In other words we're generating configurations.

The number 24 comes from the item categories, each item is associated with a particular installation location, an item from location 1 cannot be installed in location 10, so I have arranged my associative array to organize the data in groups. Each item looks like:

$items[9][] = array("id" => "0", "2" => 2, "13" => 20);

Where the first parameter ( $item[9] ) tells you the location it is allowed in. If you want it's ok to think of the idea that you cannot install a tire in the spot for an exhaust pipe.

The items are stored in a mySQL database. The user can specify restrictions on the solution, for example, attribute 2 must have a final value of 25 or more. They can have multiple competing restrictions. The queries retrieve items that have any value for the attributes under consideration (unspecified attributes are stored but we don't do any calculations with them). The PHP script then prunes out any redundant choices (for example: if item 1 has an attribute value of 3 and item 2 has an attribute value of 5, in the absence of another restriction you would never choose item 1).

After all the processing has occurred get an associative array that looks like:

$items[10][] = array("id" => "3", "2" => 2, "13" => 100);
$items[10][] = array("id" => "4", "2" => 3, "13" => 50);
$items[9][] = array("id" => "0", "2" => 2, "13" => 20);
$items[9][] = array("id" => "1", "2" => -1, "13" => 50);

I have posted a full example data set at this pastebin link. There is reason to believe I can be more restrictive on what I accept into the data set but even at a restriction of 2 elements per option there's a problem.

In the array() value, the id is the reference to the index of the item in the array, and the other values are attribute id and value pairs. So $items[10][] = array("id" => "3", "2" => 2, "13" => 100); means that in location 10 there is an item with id 3 which as a value of 2 in attribute 2 and a value of 100 in attribute 13. If it helps think of an item being identified by a pair eg (10,0) is item 0 in location 10.

I know I'm not being specific, there are 61 attributes and I don't think it changes the structure of the problem with what they represent. If we want, we can think of attribute 2 as weight and attribute 13 as cost. The problem the user wants solved might be to generate a configuration where the weight is 25 exactly and the cost is minimized.

Back of the envelope math says a rough estimate, if there were only 2 choices per location, is 2^24 choices x size of the record. Assuming a 32 bit integer could be encoded to represent a single record somehow, we're looking at 16,777,216 * 4 = 67,108,864 bytes of memory (utterly ignoring data structure overhead) and there is no reason to believe that either of these assumptions is going to be valid, though an algorithm with an upper memory bound in the realm of 67 megs would be an acceptable memory size.

There's no particular reason to stick to this representation, I used associative arrays since I have a variable number of attributes to use and figured that would allow me to avoid a large, sparse array. Above "2"=>2 actually means that filtered attribute with id #2 has a value of 2 and similarly attribute 13's value is 100. I'm happy to change my data structure to something more compact.

One thought I had was that I do have an evaluation criteria I can use to discard most of the intermediate configurations. As an example, I can compute 75 * "value of "2"" + 10 * "value of "13" to provide a relative weighting of the solutions. In other words, if there were no other restrictions on a problem, each value improvement by 1 of attribute 2 costs 75 and each value improvement of attribute 13 costs 10. Continuing the idea of a car part, think of it like buying a stock part and having a machinist modify it to our specifications.

One problem I see with discarding configurations too early is that the weighting function does not take into account restrictions such as "the final result must have a value of "2" that is at exactly 25". So it's fine if I have a full 24 element configuration, I can run through a loop of the restrictions, discard the solutions that don't match and then finally rank the remaining solutions by the function, but I'm not sure there's a valid line of thought that allows me to throw away solutions earlier.

Does anyone have any suggestions on how to move forward? Although a language agnostic solution is fine, I am implementing in PHP if there's some relevant language feature that might be useful.

  • 写回答

1条回答 默认 最新

  • duandou8457 2015-05-15 12:46
    关注

    I solved my issue with memory by performing a depth first cartesian product. I can weigh the solutions one at a time and retain some if I choose or simply output them as I am doing here in this code snippet.

    The main inspiration for this solution came from the very concise answer on this question. Here is my code as it seems like finding a php depth first cartesian product algorithm is less than trivial.

    function dfcartesian ( $input, $current, $index ) {
        // sample use: $emptyArray = array();
        //             dfcartesian( $items, $emptyArray, 0 )
        if ( $index == count( $input ) ) {
            // If we have iterated over the entire space and are at the bottom
            // do whatever is relevant to your problem and return.
            //
            // If I were to improve the solution I suppose I'd pass in an
            // optional function name that we could pass data to if desired.
            var_dump( $current );
            echo '<br><br>';
            return;
        }
    
        // I'm using non-sequential numerical indicies in an associative array
        // so I want to skip any empty numerical index without aborting.
        //
        // If you're using something different I think the only change that
        // needs attention is to change $index + 1 to a different type of
        // key incrementer.  That sort of issue is tackled at
        // https://stackoverflow.com/q/2414141/759749
        if ( isset ( $input[$index] ) ) {
            foreach ( $input[$index] as $element ) {
                $current[] = $element;
                // despite my concern about recursive function overhead,
                // this handled 24 levels quite smoothly.
                dfcartesian( $input, $current, ( $index + 1 ) );
                array_pop( $current );
            }
        } else {
            // move to the next index if there is a gap
            dfcartesian( $input, $current, ( $index + 1 ) );
        }
    }   
    

    I hope this is of use to someone else tackling the same problem.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥100 连续两帧图像高速减法
  • ¥15 组策略中的计算机配置策略无法下发
  • ¥15 如何绘制动力学系统的相图
  • ¥15 对接wps接口实现获取元数据
  • ¥20 给自己本科IT专业毕业的妹m找个实习工作
  • ¥15 用友U8:向一个无法连接的网络尝试了一个套接字操作,如何解决?
  • ¥30 我的代码按理说完成了模型的搭建、训练、验证测试等工作(标签-网络|关键词-变化检测)
  • ¥50 mac mini外接显示器 画质字体模糊
  • ¥15 TLS1.2协议通信解密
  • ¥40 图书信息管理系统程序编写