题解 | #统计活跃间隔对用户分级结果#
统计活跃间隔对用户分级结果
https://www.nowcoder.com/practice/6765b4a4f260455bae513a60b6eed0af
题目:
统计活跃间隔对用户分级后,各活跃等级用户占比,结果保留两位小数,且按占比降序排序。
难点:
近7天表示:包含当天T的近7天,即闭区间[T-6, T]。
用户分级:
1)活跃用户:最近活跃在7天内(小于等于T-6),且距离时间最长的一次活跃在7天外(大于T-6)
2)新晋用户:最近活跃和距离时间最长的一次活跃都在7天内(小于等于T-6)
3)沉睡用户:最近活跃在7天外(大于T-6)且在30天内(小于等于T-29)
4)流失用户:最近活跃在30天外(大于T-29)今天:指当天表格tb_user_log最近的日期
解题思路
步骤 1:先为用户的每一次活跃,标注上与“今天”的时间差
- “今天”的日期,可以通过开窗函数得到
SELECT uid, artical_id, DATEDIFF(MAX(DATE(in_time)) OVER (), DATE(in_time)) AS date_diff FROM tb_user_log
步骤 2:找出每个用户的最近一次活跃的时间差、最远一次活动的时间差,并根据前两者对用户进行等级标注
- 多说一嘴,CASE WHEN是在WHERE和GROUP BY之后执行的(开窗函数也是)
SELECT uid, MIN(date_diff) AS latest_act, MAX(date_diff) AS earliest_act, (CASE WHEN MIN(date_diff) <= 6 AND MAX(date_diff) > 6 THEN "忠实用户" WHEN MIN(date_diff) <= 6 AND MAX(date_diff) <= 6 THEN "新晋用户" WHEN MIN(date_diff) > 6 AND MAX(date_diff) < 29 THEN "沉睡用户" WHEN MIN(date_diff) > 29 THEN "流失用户" ELSE NULL END) AS user_grade FROM (SELECT uid, artical_id, DATEDIFF(MAX(DATE(in_time)) OVER (), DATE(in_time)) AS date_diff FROM tb_user_log) AS t1
步骤 3:根据步骤2结果,可以直接计算每个等级的占比
- 这里也使用了开窗函数,但需要注意的是:开窗函数是在WHERE和GROUP BY之后才执行的,也就是说,开窗函数中包含的字段应该是基于“WHERE和GROUP BY后所SELECT的字段”得出的
SELECT user_grade, /* 在根据用户等级分组计算得出每个等级的用户数之后,再使用SUM()开窗函数来获取总的用户数 */ ROUND(COUNT(uid) / SUM(COUNT(uid)) OVER (), 2) AS ratio FROM (SELECT uid, MIN(date_diff) AS latest_act, MAX(date_diff) AS earliest_act, (CASE WHEN MIN(date_diff) <= 6 AND MAX(date_diff) > 6 THEN "忠实用户" WHEN MIN(date_diff) <= 6 AND MAX(date_diff) <= 6 THEN "新晋用户" WHEN MIN(date_diff) > 6 AND MAX(date_diff) < 29 THEN "沉睡用户" WHEN MIN(date_diff) > 29 THEN "流失用户" ELSE NULL END) AS user_grade FROM (SELECT uid, artical_id, DATEDIFF(MAX(DATE(in_time)) OVER (), DATE(in_time)) AS date_diff FROM tb_user_log) AS t1 GROUP BY uid) AS t2 GROUP BY user_grade ORDER BY ratio DESC;