留存率问题&连续天数问题
1、平均次日留存率
# 用户在某天刷题后第二天再来刷题的平均概率
# 解法一:找到第一次刷题的日期,根据uid相等左连接第二天刷题的日期 select count(t2.uid)/count(*) as avg_cnt from ( select distinct uid, min(date) as date from t group by uid ) as t1 left join ( select distinct uid, date from t ) as t2 on t1.uid = t2.uid and datediff(t2.date,t1.date) = 1 # 解法二:用lead()窗口函数,将同一用户连续两天的日期拼接 select avg(case when datediff(last_date,date) = 1 then 1 else 0 end) as avg_cnt from ( select uid,date lead(date) over (partition by uid order by date) as last_date from( select distinct uid,date from t ) t1 ) t2
2、新登录用户的次日成功留存率
分析:第一天登录的新用户 且 第二天也登陆的用户数/总用户数
user_id,date
# 解法一:分点判断 # 找到每天登陆的新用户 with t1 as ( select user_id, date, min(date) over (partition by user_id) as first_date, # 判断是否是新用户 lead(date) over (partition by user_id order by date) as new_date # 判断是否第二天登录 from t ) select sum((case when first_date = date and datediff(new_date,date) = 1 then 1 else 0 end))*1.0/count(distinct user_id) from t1 # 解法二:筛选出次日成功留存数 除以 总新用户数 # 新登录用户次日成功的留存率 (次日成功留存的用户数/总的新登录用户数) with t as ( # ① 使用lead()窗口函数增一列后移日期,计算差值DateDiff,得出距离第一次做题的时间间隔 select user_id, date, datediff(lead(date) over (partition by user_id order by date),date) as diff from login ) # ②次日成功留存率 = (次日成功留存的用户数*1.0/总的新登录用户数)!!一定要乘以1.0!! select round(count(distinct t.user_id)*1.0/(select count(distinct user_id) from login),3) as p from t where diff = 1 # 解法三:先构建新用户及日期,左连接日活表(相同用户且日期间隔1) # 新用户的次日成功的留存率 = 第一天且第二天也登录的用户数/总新用户 select round(count(l.user_id) / count(a.user_id),3) from( select distinct user_id,min(date(date)) as new_date from login group by user_id ) as a left join login l on a.user_id = l.user_id and datediff(l.date,a.new_date) = 1
3、每个日期新用户的次日留存率
分析:构建新试图(确定新用户、次日留存)→计算新用户❗次日留存(先用where筛选新用户)→每个日期(左连接、null变0)
with t as ( select user_id, date, min(date) over(partition by user_id ) as first_date, lead(date) over (partition by user_id order by date) as new_date from login), t1 as ( select date, round(sum(case when datediff(new_date,date) = 1 then 1 else 0 end)*1.0 / count(distinct user_id),3) as p from t where date = first_date group by date order by date ) select m.date,coalesce(t1.p,0.000) from (select distinct date from login) m left join t1 using(date)
4、查询连续登陆不少于3天的新注册用户
分析:
- 新注册用户:用where in 进行筛选
- 连续登录:row_number窗口函数进行编号,日期-编号 若相等,则连续
# 查询连续登陆不少于3天的新注册用户 with t as( select user_id, datediff(date(log_time),row_number() over (partition by user_id order by date(log_time))) as r from login_tb) select user_id from t where user_id in(select distinct user_id from register_tb) group by user_id having count(*) >= 3