doutangguan2460 2014-02-19 19:03
浏览 21
已采纳

用mysql和php创建一个url树

I've been banging my head all day at this and still can't come up with some working answers.

I want to track all my visitors by cookie on my website and save the current URL (or path) that the user visits with the unique user key included in a mysql row.

After some time and some unique visitors I want to create a tree of paths that the users followed through my site. (something like Google Analytics does with there visitor flow reports).

But I can't somehow figure out how to make the query that walks through these rows and creates a "tree" of urls (with count or percentage)

If someone could help me out, that will be much appreciated.

-- Edit I already have a mysql database and tables in place

CREATE TABLE `journies` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `site_id` int(11) NOT NULL,
  `profile_id` int(11) NOT NULL,
  `url` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `profiles` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `site_id` int(11) NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

-- update 2 Oke, this may explain better what I want to do.

I want to create a tree with urls like this:

                                                  /- domain.com/news (100%)
                     /- domain.com/about (33%) --/
domain.com (100%) --/
                    \- domain.com/contact (33%)
                    \- domain.com/news (33%) --\
                                                \- domain.com/news/id/1 (100%)

Ofcourse this may be done with multiple queries (although it will be great if it could be done with one ;))

  • 写回答

2条回答 默认 最新

  • douying6206 2014-02-20 22:46
    关注

    Okay, this was a doozy, but I think I've finally got something for you to work with.

    The premise is to pull out the URLs and counts from the database, grouping and sorting them by url. Store those results into a master array. Then, find the base domain of the urls, which I called the trunk. Loop through the master array and find any items that have the same base domain as the trunk and find their percentages.

    One other thing to note is that these are level-specific. So cnn.com/employees (2 levels) would be on the same level as cnn.com/news.

    Here is the SQL data that I used for this:

    INSERT INTO `journies` (`id`, `site_id`, `profile_id`, `url`, `created_at`, `updated_at`) VALUES
        (1, 1, 1, 'domain.com', '2014-02-19 15:34:54', '0000-00-00 00:00:00'),                      
        (2, 1, 1, 'domain.com/about', '2014-02-19 15:35:57', '0000-00-00 00:00:00'),                
        (3, 1, 1, 'domain.com/contact', '2014-02-19 15:36:12', '0000-00-00 00:00:00'),              
        (4, 1, 1, 'domain.com/news', '2014-02-19 15:36:29', '0000-00-00 00:00:00'),                 
        (5, 1, 1, 'domain.com/news/id/1', '2014-02-19 15:39:26', '0000-00-00 00:00:00'),            
        (6, 1, 1, 'domain.com/contact', '2014-02-19 15:50:26', '0000-00-00 00:00:00'),              
        (7, 1, 1, 'cnn.com/news/id/1', '2014-02-19 16:00:02', '0000-00-00 00:00:00'),               
        (8, 1, 1, 'cnn.com/news', '2014-02-19 16:00:15', '0000-00-00 00:00:00'),                    
        (9, 1, 1, 'cnn.com', '2014-02-19 16:00:25', '0000-00-00 00:00:00'),                         
        (10, 1, 1, 'cnn.com', '2014-02-19 16:46:16', '0000-00-00 00:00:00'),                        
        (11, 1, 1, 'cnn.com', '2014-02-19 16:46:16', '0000-00-00 00:00:00'),                        
        (12, 1, 1, 'domain.com/news/id/1', '2014-02-20 08:47:23', '0000-00-00 00:00:00'),           
        (13, 1, 1, 'domain.com/news/id/1', '2014-02-20 08:47:23', '0000-00-00 00:00:00'),           
        (14, 1, 1, 'domain.com/news/id/2', '2014-02-20 08:53:29', '0000-00-00 00:00:00'),           
        (15, 1, 1, 'domain.com/prices', '2014-02-20 12:40:44', '0000-00-00 00:00:00'),              
        (16, 1, 1, 'domain.com/prices', '2014-02-20 12:40:44', '0000-00-00 00:00:00'),              
        (17, 1, 1, 'cnn.com/employees/friekot', '2014-02-20 15:23:34', '0000-00-00 00:00:00'),      
        (18, 1, 1, 'cnn.com/employees', '2014-02-20 15:23:34', '0000-00-00 00:00:00');              
    

    And here is the code that I came up with:

    <?php
    
    
    
    $link = mysqli_connect("localhost", "user", "pass", "database");
    
    
    
    
    
    
    // SET THE DEFAULTS
    $trunk_array = array();
    $master_array = array();
    
    
    
    
    
    
    // PULL OUT THE DATA FROM THE DATABASE
    $q_get_tracking_info = "SELECT *, COUNT(url) AS url_count FROM journies WHERE site_id = 1 AND profile_id = 1 GROUP BY url ORDER BY url;";
    $r_get_tracking_info = mysqli_query($link, $q_get_tracking_info) or trigger_error("Cannot Get Tracking Info: (".mysqli_error().")", E_USER_ERROR);
    
    
    
    while ($row_get_tracking_info = mysqli_fetch_array($r_get_tracking_info)) {
    
    
    
        $url = $row_get_tracking_info['url'];
        $url_count = $row_get_tracking_info['url_count'];
    
    
    
    
    
    
        // EXPLODE THE DOMAIN PARTS
        $domain_parts = explode('/', $url);
    
    
    
    
    
    
        // FIND THE TOTAL COUNTS FOR EACH LEVEL OF ARRAY
        // - SO THAT WE CAN DIVIDE BY IT LATER TO GET THE PERCENTAGE
        $count = count($domain_parts);
        if (!isset($level_totals[$domain_parts[0]])) {
            $level_totals[$domain_parts[0]] = array();
        }
    
        if (isset($level_totals[$domain_parts[0]][$count])) {
            $level_totals[$domain_parts[0]][$count] += $url_count;
        }
        else {
            $level_totals[$domain_parts[0]][$count] = $url_count;
        }
    
    
    
    
    
    
        // BUILD A TRUNK ARRAY SO WE CAN DEFINE SECTIONS
        if ($url == $domain_parts[0]) {
            $trunk_array[] = array($url, $url_count);
        }
    
    
    
    
    
    
        // BUILD A MASTER ARRAY OF THE ITEMS AS WE WILL LAY THEM OUT
        $master_array[$url] = $url_count;
    
    
    }
    
    
    
    
    
    
    // FIND THE TOTAL TRUNK COUNT SO WE CAN DIVIDE BY IT LATER
    $total_trunk_count = 0;
    foreach ($trunk_array AS $trunk_array_key => $trunk_array_val) {
        foreach($trunk_array_val AS $trunk_count_val) {
            $total_trunk_count += $trunk_count_val[0];
        }
    }
    
    
    
    
    
    
    // LOOP THROUGH THE TRUNK ITEMS AND PULL OUT ANY MATCHES FOR THAT TRUNK
    foreach ($trunk_array AS $trunk_item_key => $trunk_item_val) {
    
    
    
        $trunk_item = $trunk_item_val[0];
        $trunk_count = $trunk_item_val[1];
    
    
    
    
    
    
        // FIND THE PERCENTAGE THIS TRUNK WAS ACCESSED
        $trunk_percent = round(($master_array[$trunk_item] / $total_trunk_count) * 100);
    
    
    
    
    
    
        // PRINT THE TRUNK OUT    
        print "<BR><BR>".$trunk_item.' - ('.$trunk_percent.'%)';
    
    
    
    
    
    
        // LOOP THROUGH THE MASTER ARRAY AND GET THE RESULTS FOR ANY PATHS UNDER THE TRUNK
        foreach ($master_array AS $master_array_key => $master_array_val) {
    
    
    
            // PERFORM A MATCH FOR DOMAINS BELONGING TO THIS PARTICULAR TRUNK
            if (preg_match('/^'.$trunk_item.'/', $master_array_key)) {
    
    
    
                // SET A DEFAULT DELIMITER PAD
                $delimiter_pad = '';
    
    
    
    
    
    
                // EXPLODE EACH PATH INTO PARTS AND COUNT HOW MANY PARTS WE HAVE
                $domain_parts_2 = explode('/', $master_array_key);
                $count = count($domain_parts_2);
    
    
    
    
    
    
                // SET THE DELIMITER FOR HOW FAR DOWN ON THE TREE WE ARE
                // EACH INDENT WILL HAVE 8 SPACES
                for ($i = 2; $i <= $count; $i++) {
                    $delimiter_pad .= '&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;';
                }
    
    
    
    
    
    
                // SINCE WE ALREADY PRINTED OUT THE TRUNK, WE WILL ONLY SHOW ITEMS THAT ARE NOT THE TRUNK
                if ($master_array_key != $trunk_item) {
    
    
    
                    // FIND THE PERCENTAGE OF THE ITEM, GIVEN THEIR LEVEL IN THE TREE
                    $path_percentage = round(($master_array[$master_array_key] / $level_totals[$trunk_item][$count]) * 100);
    
    
    
    
    
    
                    // PRINT OUT THE PATH AND PERCENTAGE
                    print "<BR>".$delimiter_pad."|- ".$master_array_key.' - ('.$path_percentage.'%)';
    
    
    
                }
    
    
    
            }
    
    
    
        }
    
    
    
    }
    

    In the end, all of that outputs this:

    cnn.com - (75%)
            |- cnn.com/employees - (50%)
                    |- cnn.com/employees/friekot - (100%)
            |- cnn.com/news - (50%)
                            |- cnn.com/news/id/1 - (100%)
    
    domain.com - (25%)
            |- domain.com/about - (17%)
            |- domain.com/contact - (33%)
            |- domain.com/news - (17%)
                            |- domain.com/news/id/1 - (75%)
                            |- domain.com/news/id/2 - (25%)
            |- domain.com/prices - (33%)
    

    There may be an easier way to do this, but this is the method that came to mind for me. I hope this works for you!

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP