题解|有间隔日期的连接|#新用户次日留存率#
2021年11月每天新用户的次日留存率
https://www.nowcoder.com/practice/1fc0e75f07434ef5ba4f1fb2aa83a450
## 思路:分别创建‘每天新用户表’和‘用户活跃表’ -> 使用‘第一天新用户且第二天仍活跃’进行连接,保留相关记录 -> 分组计算留存率 ## 额外的思考:一般不会存在第一天in_time,第三天才out_time的情况,所以把in_time和out_time使用UNION进行去重取并集作为活跃日期是合理的 ## 第一步:每天新用户表 和 用户活跃表 # (SELECT uid,MIN(DATE(in_time)) AS dt # FROM tb_user_log # GROUP BY uid) t1 #每天新用户表 # (SELECT uid,DATE(in_time) AS dt # FROM tb_user_log # UINON # SELECT uid,DATE(out_time) AS dt # FROM tb_user_log) t2 #用户活跃表 # ## 第二步:使用‘第一天新用户第二天活跃的时间’ 作为连接条件 # t1 LEFT JOIN t2 # ON t1.uid = t2.uid # AND t1.dt = DATE_SUB(t2.dt,INTERVAL 1 DAY) ## 或者写为:t2.dt = DATE_ADD(t1.dt,INTERVAL 1 DAY) # ## 对这里连接的理解,如果不存在相差1天的日期,那么这行就是只保留了t1作为基准表进行左连接的记录,右侧没有连接记录的地方就是空值。 # ## 所以分别COUNT t2.uid就是第二天仍存在, COUNT t1.uid就是第一天新用户 ## 第三步:计算留存率 SELECT t1.dt,ROUND(COUNT(t2.uid)/COUNT(t1.uid),2) AS uv_rate FROM( SELECT uid,MIN(DATE(in_time)) AS dt FROM tb_user_log GROUP BY uid) t1 LEFT JOIN( SELECT uid,DATE(in_time) AS dt FROM tb_user_log UNION SELECT uid,DATE(out_time) AS dt FROM tb_user_log)t2 ON t1.uid = t2.uid AND t1.dt = DATE_SUB(t2.dt,INTERVAL 1 DAY) WHERE DATE_FORMAT(t1.dt,'%Y-%m') = '2021-11' GROUP BY t1.dt ORDER BY t1.dt ####如果需要同时输出次日&7日留存率按照以下进行: SELECT t1.dt,ROUND(COUNT(t2.uid)/COUNT(t1.uid),2) AS uv_rate,ROUND(COUNT(t3.uid)/COUNT(t1.uid),2) AS uv_rate_7d FROM( SELECT uid,MIN(DATE(in_time)) AS dt FROM tb_user_log GROUP BY uid) t1 LEFT JOIN( SELECT uid,DATE(in_time) AS dt FROM tb_user_log UNION SELECT uid,DATE(out_time) AS dt FROM tb_user_log)t2 ON t1.uid = t2.uid AND t1.dt = DATE_SUB(t2.dt,INTERVAL 1 DAY) LEFT JOIN( SELECT uid,DATE(in_time) AS dt FROM tb_user_log UNION SELECT uid,DATE(out_time) AS dt FROM tb_user_log)t3 ON t1.uid = t3.uid AND t1.dt = DATE_SUB(t3.dt,INTERVAL 6 DAY) WHERE DATE_FORMAT(t1.dt,'%Y-%m') = '2021-11' GROUP BY t1.dt ORDER BY t1.dt