https://zb6okqgl47.feishu.cn/docx/doxcnRuV1CNpKZ5jmcNFIwa34fc
这是一道数据专员的题目,个人觉得难度有点大,向各位赐教,谢谢
备注:虽然说语言不限,但是请各位有python 或者mysql(我只会这两个)
1、涉及SQL(使用 MySQL8 数据库):
WITH t0 AS ( -- 提取有效时间范围内的有效数据及日期转月份,并去重(一个月一个用户只计算一次)
SELECT DISTINCT user_id, date_add( order_date, INTERVAL -DAY( order_date ) + 1 day ) mon FROM t_order WHERE order_status = '交易成功' AND order_date >= date'2021-01-01' AND order_date < date'2021-10-01' )
, t1 AS ( -- 人员按月份排序及提取第一次购买月份
SELECT user_id, mon, ROW_NUMBER( ) OVER( PARTITION BY user_id ORDER BY mon ) rn, FIRST_VALUE( mon ) OVER( PARTITION BY user_id ORDER BY mon ) mm FROM t0 )
, t2 AS ( -- 计算每个人员的各月份购买分布
SELECT mm, user_id, rn
, ( CASE WHEN rn > 1 AND mon = date_add( mm, INTERVAL 1 MONTH ) THEN 1 WHEN date_add( mm, INTERVAL 1 MONTH ) >= date'2021-10-01' THEN NULL ELSE 0 END ) M1
, ( CASE WHEN rn > 1 AND mon = date_add( mm, INTERVAL 2 MONTH ) THEN 1 WHEN date_add( mm, INTERVAL 2 MONTH ) >= date'2021-10-01' THEN NULL ELSE 0 END ) M2
, ( CASE WHEN rn > 1 AND mon = date_add( mm, INTERVAL 3 MONTH ) THEN 1 WHEN date_add( mm, INTERVAL 3 MONTH ) >= date'2021-10-01' THEN NULL ELSE 0 END ) M3
, ( CASE WHEN rn > 1 AND mon = date_add( mm, INTERVAL 4 MONTH ) THEN 1 WHEN date_add( mm, INTERVAL 4 MONTH ) >= date'2021-10-01' THEN NULL ELSE 0 END ) M4
, ( CASE WHEN rn > 1 AND mon = date_add( mm, INTERVAL 5 MONTH ) THEN 1 WHEN date_add( mm, INTERVAL 5 MONTH ) >= date'2021-10-01' THEN NULL ELSE 0 END ) M5
, ( CASE WHEN rn > 1 AND mon = date_add( mm, INTERVAL 6 MONTH ) THEN 1 WHEN date_add( mm, INTERVAL 6 MONTH ) >= date'2021-10-01' THEN NULL ELSE 0 END ) M6
, ( CASE WHEN rn > 1 AND mon = date_add( mm, INTERVAL 7 MONTH ) THEN 1 WHEN date_add( mm, INTERVAL 7 MONTH ) >= date'2021-10-01' THEN NULL ELSE 0 END ) M7
, ( CASE WHEN rn > 1 AND mon = date_add( mm, INTERVAL 8 MONTH ) THEN 1 WHEN date_add( mm, INTERVAL 8 MONTH ) >= date'2021-10-01' THEN NULL ELSE 0 END ) M8
, ( CASE WHEN rn > 1 AND mon = date_add( mm, INTERVAL 9 MONTH ) THEN 1 WHEN date_add( mm, INTERVAL 9 MONTH ) >= date'2021-10-01' THEN NULL ELSE 0 END ) M9
-- , ( CASE WHEN rn > 1 AND mon = date_add( mm, INTERVAL 10 MONTH ) THEN 1 WHEN date_add( mm, INTERVAL 10 MONTH ) >= date'2021-10-01' THEN NULL ELSE 0 END ) M10
-- , ( CASE WHEN rn > 1 AND mon = date_add( mm, INTERVAL 11 MONTH ) THEN 1 WHEN date_add( mm, INTERVAL 11 MONTH ) >= date'2021-10-01' THEN NULL ELSE 0 END ) M11
FROM t1 )
, t3 AS ( -- 统计各月份购买及回购数量
SELECT mm, sum( CASE WHEN rn = 1 THEN 1 ELSE 0 END ) dy
, sum( m1 ) M1, sum( m2 ) M2, sum( m3 ) M3, sum( m4 ) M4, sum( m5 ) M5, sum( m6 ) M6
, sum( m7 ) M7, sum( m8 ) M8, sum( m9 ) M9-- , sum( m10 ) M10, sum( m11 ) M11
FROM t2
GROUP BY mm )
SELECT mm 首次购买月份, dy 当月新增客户数量
, concat( truncate( M1 * 100 / ( CASE WHEN dy = 0 THEN 1 ELSE dy END ), 2 ), '%' ) AS M1
, concat( truncate( M2 * 100 / ( CASE WHEN dy = 0 THEN 1 ELSE dy END ), 2 ), '%' ) AS M2
, concat( truncate( M3 * 100 / ( CASE WHEN dy = 0 THEN 1 ELSE dy END ), 2 ), '%' ) AS M3
, concat( truncate( M4 * 100 / ( CASE WHEN dy = 0 THEN 1 ELSE dy END ), 2 ), '%' ) AS M4
, concat( truncate( M5 * 100 / ( CASE WHEN dy = 0 THEN 1 ELSE dy END ), 2 ), '%' ) AS M5
, concat( truncate( M6 * 100 / ( CASE WHEN dy = 0 THEN 1 ELSE dy END ), 2 ), '%' ) AS M6
, concat( truncate( M7 * 100 / ( CASE WHEN dy = 0 THEN 1 ELSE dy END ), 2 ), '%' ) AS M7
, concat( truncate( M8 * 100 / ( CASE WHEN dy = 0 THEN 1 ELSE dy END ), 2 ), '%' ) AS M8
, concat( truncate( M9 * 100 / ( CASE WHEN dy = 0 THEN 1 ELSE dy END ), 2 ), '%' ) AS M9
-- , concat( truncate( M10 * 100 / ( CASE WHEN dy = 0 THEN 1 ELSE dy END ), 2 ), '%' ) AS M10
-- , concat( truncate( M11 * 100 / ( CASE WHEN dy = 0 THEN 1 ELSE dy END ), 2 ), '%' ) AS M11
FROM t3
ORDER BY mm;
SQL中 t3 子查询的输出:
SQL的最终输出:
SQL说明:
①、SQL中的时间范围为:2021-01-01(包含)到 2021-10-01(不包含),程序中为月份范围参数
②、因为需求的时间范围不固定,因此具体SQL也不固定,我故意多输出了一个复购统计月份M9,实际SQL中需要将我屏蔽的M10、M11部分的类似SQL(其实就是所有的 M1 到 M11 部分),根据月份范围计算出需要输出的列的数量(即:[ (结束月份) - (开始月份) - (1) ] 个M列),动态生成具体SQL
2、程序:具体程序就是获取时间范围参数,连接数据库,动态生成上述SQL,执行SQL获取结果并输出,通过word或Excel另存为PDF即可