用mysql和php创建一个url树

I've been banging my head all day at this and still can't come up with some working answers.

I want to track all my visitors by cookie on my website and save the current URL (or path) that the user visits with the unique user key included in a mysql row.

After some time and some unique visitors I want to create a tree of paths that the users followed through my site. (something like Google Analytics does with there visitor flow reports).

But I can't somehow figure out how to make the query that walks through these rows and creates a "tree" of urls (with count or percentage)

If someone could help me out, that will be much appreciated.

-- Edit I already have a mysql database and tables in place

CREATE TABLE `journies` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `site_id` int(11) NOT NULL,
  `profile_id` int(11) NOT NULL,
  `url` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `profiles` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `site_id` int(11) NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

-- update 2 Oke, this may explain better what I want to do.

I want to create a tree with urls like this:

                                                  /- domain.com/news (100%)
                     /- domain.com/about (33%) --/
domain.com (100%) --/
                    \- domain.com/contact (33%)
                    \- domain.com/news (33%) --\
                                                \- domain.com/news/id/1 (100%)

Ofcourse this may be done with multiple queries (although it will be great if it could be done with one ;))

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

douying6206 2014-02-20 22:46

关注

Okay, this was a doozy, but I think I've finally got something for you to work with.

The premise is to pull out the URLs and counts from the database, grouping and sorting them by url. Store those results into a master array. Then, find the base domain of the urls, which I called the trunk. Loop through the master array and find any items that have the same base domain as the trunk and find their percentages.

One other thing to note is that these are level-specific. So cnn.com/employees (2 levels) would be on the same level as cnn.com/news.

Here is the SQL data that I used for this:

INSERT INTO `journies` (`id`, `site_id`, `profile_id`, `url`, `created_at`, `updated_at`) VALUES
    (1, 1, 1, 'domain.com', '2014-02-19 15:34:54', '0000-00-00 00:00:00'),                      
    (2, 1, 1, 'domain.com/about', '2014-02-19 15:35:57', '0000-00-00 00:00:00'),                
    (3, 1, 1, 'domain.com/contact', '2014-02-19 15:36:12', '0000-00-00 00:00:00'),              
    (4, 1, 1, 'domain.com/news', '2014-02-19 15:36:29', '0000-00-00 00:00:00'),                 
    (5, 1, 1, 'domain.com/news/id/1', '2014-02-19 15:39:26', '0000-00-00 00:00:00'),            
    (6, 1, 1, 'domain.com/contact', '2014-02-19 15:50:26', '0000-00-00 00:00:00'),              
    (7, 1, 1, 'cnn.com/news/id/1', '2014-02-19 16:00:02', '0000-00-00 00:00:00'),               
    (8, 1, 1, 'cnn.com/news', '2014-02-19 16:00:15', '0000-00-00 00:00:00'),                    
    (9, 1, 1, 'cnn.com', '2014-02-19 16:00:25', '0000-00-00 00:00:00'),                         
    (10, 1, 1, 'cnn.com', '2014-02-19 16:46:16', '0000-00-00 00:00:00'),                        
    (11, 1, 1, 'cnn.com', '2014-02-19 16:46:16', '0000-00-00 00:00:00'),                        
    (12, 1, 1, 'domain.com/news/id/1', '2014-02-20 08:47:23', '0000-00-00 00:00:00'),           
    (13, 1, 1, 'domain.com/news/id/1', '2014-02-20 08:47:23', '0000-00-00 00:00:00'),           
    (14, 1, 1, 'domain.com/news/id/2', '2014-02-20 08:53:29', '0000-00-00 00:00:00'),           
    (15, 1, 1, 'domain.com/prices', '2014-02-20 12:40:44', '0000-00-00 00:00:00'),              
    (16, 1, 1, 'domain.com/prices', '2014-02-20 12:40:44', '0000-00-00 00:00:00'),              
    (17, 1, 1, 'cnn.com/employees/friekot', '2014-02-20 15:23:34', '0000-00-00 00:00:00'),      
    (18, 1, 1, 'cnn.com/employees', '2014-02-20 15:23:34', '0000-00-00 00:00:00');

And here is the code that I came up with:

<?php



$link = mysqli_connect("localhost", "user", "pass", "database");






// SET THE DEFAULTS
$trunk_array = array();
$master_array = array();






// PULL OUT THE DATA FROM THE DATABASE
$q_get_tracking_info = "SELECT *, COUNT(url) AS url_count FROM journies WHERE site_id = 1 AND profile_id = 1 GROUP BY url ORDER BY url;";
$r_get_tracking_info = mysqli_query($link, $q_get_tracking_info) or trigger_error("Cannot Get Tracking Info: (".mysqli_error().")", E_USER_ERROR);



while ($row_get_tracking_info = mysqli_fetch_array($r_get_tracking_info)) {



    $url = $row_get_tracking_info['url'];
    $url_count = $row_get_tracking_info['url_count'];






    // EXPLODE THE DOMAIN PARTS
    $domain_parts = explode('/', $url);






    // FIND THE TOTAL COUNTS FOR EACH LEVEL OF ARRAY
    // - SO THAT WE CAN DIVIDE BY IT LATER TO GET THE PERCENTAGE
    $count = count($domain_parts);
    if (!isset($level_totals[$domain_parts[0]])) {
        $level_totals[$domain_parts[0]] = array();
    }

    if (isset($level_totals[$domain_parts[0]][$count])) {
        $level_totals[$domain_parts[0]][$count] += $url_count;
    }
    else {
        $level_totals[$domain_parts[0]][$count] = $url_count;
    }






    // BUILD A TRUNK ARRAY SO WE CAN DEFINE SECTIONS
    if ($url == $domain_parts[0]) {
        $trunk_array[] = array($url, $url_count);
    }






    // BUILD A MASTER ARRAY OF THE ITEMS AS WE WILL LAY THEM OUT
    $master_array[$url] = $url_count;


}






// FIND THE TOTAL TRUNK COUNT SO WE CAN DIVIDE BY IT LATER
$total_trunk_count = 0;
foreach ($trunk_array AS $trunk_array_key => $trunk_array_val) {
    foreach($trunk_array_val AS $trunk_count_val) {
        $total_trunk_count += $trunk_count_val[0];
    }
}






// LOOP THROUGH THE TRUNK ITEMS AND PULL OUT ANY MATCHES FOR THAT TRUNK
foreach ($trunk_array AS $trunk_item_key => $trunk_item_val) {



    $trunk_item = $trunk_item_val[0];
    $trunk_count = $trunk_item_val[1];






    // FIND THE PERCENTAGE THIS TRUNK WAS ACCESSED
    $trunk_percent = round(($master_array[$trunk_item] / $total_trunk_count) * 100);






    // PRINT THE TRUNK OUT    
    print "<BR><BR>".$trunk_item.' - ('.$trunk_percent.'%)';






    // LOOP THROUGH THE MASTER ARRAY AND GET THE RESULTS FOR ANY PATHS UNDER THE TRUNK
    foreach ($master_array AS $master_array_key => $master_array_val) {



        // PERFORM A MATCH FOR DOMAINS BELONGING TO THIS PARTICULAR TRUNK
        if (preg_match('/^'.$trunk_item.'/', $master_array_key)) {



            // SET A DEFAULT DELIMITER PAD
            $delimiter_pad = '';






            // EXPLODE EACH PATH INTO PARTS AND COUNT HOW MANY PARTS WE HAVE
            $domain_parts_2 = explode('/', $master_array_key);
            $count = count($domain_parts_2);






            // SET THE DELIMITER FOR HOW FAR DOWN ON THE TREE WE ARE
            // EACH INDENT WILL HAVE 8 SPACES
            for ($i = 2; $i <= $count; $i++) {
                $delimiter_pad .= '&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;';
            }






            // SINCE WE ALREADY PRINTED OUT THE TRUNK, WE WILL ONLY SHOW ITEMS THAT ARE NOT THE TRUNK
            if ($master_array_key != $trunk_item) {



                // FIND THE PERCENTAGE OF THE ITEM, GIVEN THEIR LEVEL IN THE TREE
                $path_percentage = round(($master_array[$master_array_key] / $level_totals[$trunk_item][$count]) * 100);






                // PRINT OUT THE PATH AND PERCENTAGE
                print "<BR>".$delimiter_pad."|- ".$master_array_key.' - ('.$path_percentage.'%)';



            }



        }



    }



}

In the end, all of that outputs this:

cnn.com - (75%)
        |- cnn.com/employees - (50%)
                |- cnn.com/employees/friekot - (100%)
        |- cnn.com/news - (50%)
                        |- cnn.com/news/id/1 - (100%)

domain.com - (25%)
        |- domain.com/about - (17%)
        |- domain.com/contact - (33%)
        |- domain.com/news - (17%)
                        |- domain.com/news/id/1 - (75%)
                        |- domain.com/news/id/2 - (25%)
        |- domain.com/prices - (33%)

There may be an easier way to do this, but this is the method that came to mind for me. I hope this works for you!

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(1条)

报告相同问题？

关注问题

用mysql和php创建一个url树 mysql php
2014-02-19 19:03

回答 2 已采纳 Okay, this was a doozy, but I think I've finally got something for you to work with. The premis
如何使用php脚本创建mysql用户和数据库？ mysql php
2015-03-07 12:34

回答 1 已采纳 PHP script to create MySQL database, add user and grant privileges. ## make sure you connect firs
宝塔服务器PHP+mysql网页URL跳转问题 mysql php 服务器
2022-08-03 10:34

回答 3 已采纳检查下php页面导航栏的URL是否带localhost，如果带有则把http://localhost改成/，保存刷新即可。如有帮助，望采纳
HTML+PHP+MYSQL实现一个简单的留言板
2022-12-20 18:51

氕氘氚的博客大家好，下面将为大家展示基于HTML PHP MYSQL的留言板的设计与实现过程，适合初学者点击观看，以下我将会以笔者自称！
mysql如何创建一个用户对应不同地址的主机 mysql
2021-09-29 20:33

回答 1 已采纳需求描述不清，你最好直接写要解决的最原始问题，而不是你自己想到的一个解决方案。单纯看mysqlabc@% 是包含 abc@localhost的，因为%包含全部
PHP使用mysql函数后网页一片空白如何解决？ apache mysql php sql
2020-04-14 08:27

回答 2 已采纳页面全部空白一般是报错了 1. 可以到服务器上的日志里看php的错误日志 2. php页面最开头加上 ``` ini_set("display_errors", "On")
让PHP和MySQL读取url mysql php
2014-10-13 09:39

回答 5 已采纳 if($_REQUEST=['MobileNumber']){ its a wrong syntax Try this add this line in your url ?MobileNum
【大数据面试】MySQL面试题与答案
2023-12-20 17:36

话数Science的博客数据库中的事务是什么，MySQL中是怎么实现的 MySQL事务的特性? 数据库事务的隔离级别?解决了什么问题?默认事务隔离级别? 脏读，幻读，不可重复读的定义 MySQL怎么实现可重复读? 数据库第三范式和第四范式区别? ...
请教mysql一个字段同时满足多个条件 java mysql
2022-03-22 16:49

回答 3 已采纳嗯，看着题我理解了一会觉得你的意思也许和我想的一样：你的意思是不是类比去查询一个id 这个id要满足不等于4，大于3小于5类似这样的条件.如果是的话 SELECT * FROM `user` wher
MySQL插入和更新查询在插入大数据时出错 mysql php
2014-06-26 10:28

回答 1 已采纳 Your content probably contains quote characters, which you need to escape. $bcontent = mysql_real
如何使用php获取mysql的最后一个ID mysql php
2013-08-12 09:57

回答 4 已采纳 Bloody hell, why don't you just store the damn id, into a hidden post field and pass it to the sec
数据安全与网络安全——基于php+MySql实现简易留言板（附全资料超级详细！）
2023-09-30 21:40

星川皆无恙的博客这篇文章是基于PHP和MySQL实现的一个简易留言板。该留言板具有用户注册、登录、发表留言以及查看留言等功能。首先，用户可以通过注册功能创建自己的账号，然后使用该账号进行登录。登录成功后，用户可以发表留言，...
前几天遇到一个MySQL面试题 mysql sql 数据库
2022-03-15 22:02

回答 1 已采纳 mysql 8.0以上支持开窗函数,可以使用row_number进行分组排序 select t.student_num,t.student_name,t.course_name,t.score fro
大数据应用技术实验报告六 Hive和MySQL
2019-05-25 16:24

老无所依的博客 <name>javax.jdo.option.ConnectionURL</name> jdbc:mysql://bigdata131:3306/hivedb?createDatabaseIfNotExist=true JDBC connect string for a JDBC metastore <name>javax.jdo.option....
基于echarts+PHP+MySQL的数据大屏
2020-08-18 11:03

_JaDcLen_的博客最近学习了如何使用echarts+PHP+MySQL制作出数据大屏，简单地阐述下我学习，制作过程，希望能给到想要学习类似数据可视化界面的朋友们一些建议。因为我使用的语言是PHP(当然Python感觉也行),因此没有任何基础的同学...
没有解决我的问题, 去提问

悬赏问题

¥60 求一个简单的网页(标签-安全|关键词-上传)
¥35 lstm时间序列共享单车预测，loss值优化，参数优化算法
¥15 基于卷积神经网络的声纹识别
¥15 Python中的request，如何使用ssr节点，通过代理requests网页。本人在泰国，需要用大陆ip才能玩网页游戏，合法合规。
¥100 为什么这个恒流源电路不能恒流？
¥15 有偿求跨组件数据流路径图
¥15 写一个方法checkPerson，入参实体类Person，出参布尔值
¥15 我想咨询一下路面纹理三维点云数据处理的一些问题，上传的坐标文件里是怎么对无序点进行编号的，以及xy坐标在处理的时候是进行整体模型分片处理的吗
¥15 CSAPPattacklab
¥15 一直显示正在等待HID—ISP

码龄粉丝数原力等级 --

用mysql和php创建一个url树

2条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

用mysql和php创建一个url树

2条回答 默认 最新

悬赏问题

2条回答默认最新