duanjiao7440 2015-07-11 15:53
浏览 64

PHP数组交集 - 查找公共子集

I have data stored in mongodb collection - pages. Each page has following attributes:

    title - string 
    id - number
    contents - it is an object with 3 attribute
       contents.topic  - string
       contents.parentTopic - number
       contents.text - string

I have roughly 500 pages stored in DB and performance isn't a very big consideration for me.

I need to find common contents across all pages. If I do following:

    $pages = $db->selectCollection("pages");       
    $cursor = $pages->find(array());
    $data = array();

    foreach ( $cursor as $page ){
        array_push($data,$page);
    }
    $intersect = call_user_func_array('array_intersect_assoc',$data);
    echo "<pre>";
    print_r($intersect);

Like this I can get common contents across all pages which is working fine. This is true as long as there is at least one common 'content' across all pages.

But I need to find common sub-sets across all pages. For example, find the content that is common in pages 1-50, may be another subset that is common in pages 45,59,79,123,... another might be common in pages 450 - 459

Any better solution of finding such common sub-sets? Will it require creating trees?

Thanks.

  • 写回答

1条回答 默认 最新

  • drn5375 2015-07-11 16:35
    关注

    It all really depends on "which" attributes you deem as "duplicated" or at least "common to a set".

    You can do

    $pages->aggregate(
      array(
        '$group' => array(
          '_id' => '$content',
          'pages' => array( '$push' => '$id' ),
          'count' => array( '$sum' => 1 )
        )
      )
    );
    

    Which is a lot more efficient that the client code you are using.

    Or you can even do

    $pages->aggregate(
      array(
        array( 
          '$project' => array(
            'title' => 1,
            'id' => 1,
            'contents' => 1,
            'types' => array( '$literal' => array( 'topic', 'parentTopic', 'text' ) )
          )
        ),
        array( '$unwind' => '$types' ),
          array(
            '$group' => array(
              '_id' => array( 
                'type' => '$types',
                'content' => array(
                  '$cond' => array(
                    array( '$eq' => array( '$types', 'topic' ) ),
                    '$content.topic',
                    array(
                      '$cond' =>  array(
                        array( '$eq' => array( '$types', 'parentTopic' ) ),
                        '$content.parentTopic',
                        '$content.text'
                      )
                    )
                  )
                )
              ),
              'pages' => array( '$push' => '$id' ),
              'count' => array( '$sum' => 1 )
            )
          )
        )
      )
    );
    

    Which groups by each sub-key.

    All "grouping" is a form of 'set building'. But it's really not that clear what you are asking for here. Just trying to show something more efficient that what you seem to be doing.

    评论

报告相同问题?

悬赏问题

  • ¥15 为什么apriori的运行时间会比fp growth的运行时间短呢
  • ¥15 帮我解决一下膳食平衡的线性规划模型的数据实例
  • ¥40 万年历缺少农历,需要和阳历同时显示
  • ¥250 雷电模拟器内存穿透、寻基址和特征码的教学
  • ¥200 比特币ord程序wallet_constructor.rs文件支持一次性铸造1000个代币,并将它们分配到40个UTXO上(每个UTXO上分配25个代币),并设置找零地址
  • ¥15 关于Java的学习问题
  • ¥15 如何使用chatgpt完成文本分类任务?
  • ¥15 已知速度v关于位置s的等式,怎么转化为已知位置求速度v的等式
  • ¥15 我有个餐饮系统,用wampserver把环境配置好了,但是后端的网页却进去,是为什么,能不能帮远程一下?
  • ¥15 R运行没有名称为"species"的插槽对于此对象类"SDMmodelCV"