留存率问题&连续天数问题

1、平均次日留存率

# 用户在某天刷题后第二天再来刷题的平均概率

# 解法一:找到第一次刷题的日期,根据uid相等左连接第二天刷题的日期
select count(t2.uid)/count(*) as avg_cnt
from (
    select distinct uid,
        min(date) as date 
    from t
    group by uid 
) as t1
left join (
    select distinct uid,
           date
    from t
) as t2 on t1.uid = t2.uid and datediff(t2.date,t1.date) = 1

# 解法二:用lead()窗口函数,将同一用户连续两天的日期拼接
select avg(case when datediff(last_date,date) = 1 then 1 else 0 end) as avg_cnt
from (
select uid,date
	   lead(date) over (partition by uid order by date) as last_date
from(
	select distinct uid,date
	from t
  ) t1
) t2
  

2、新登录用户的次日成功留存率

分析:第一天登录的新用户 且 第二天也登陆的用户数/总用户数

user_id,date

# 解法一:分点判断
# 找到每天登陆的新用户
with t1 as (
select user_id,
	   date,
	   min(date) over (partition by user_id) as first_date, # 判断是否是新用户
	   lead(date) over (partition by user_id order by date) as new_date # 判断是否第二天登录
from t 
)
select sum((case when first_date = date and datediff(new_date,date) = 1 then 1 else 0 end))*1.0/count(distinct user_id)
from t1


# 解法二:筛选出次日成功留存数 除以 总新用户数
# 新登录用户次日成功的留存率 (次日成功留存的用户数/总的新登录用户数)
with t as (
# ① 使用lead()窗口函数增一列后移日期,计算差值DateDiff,得出距离第一次做题的时间间隔
select user_id,
       date,
       datediff(lead(date) over (partition by user_id order by date),date) as diff 
from login )

# ②次日成功留存率 = (次日成功留存的用户数*1.0/总的新登录用户数)!!一定要乘以1.0!!
select round(count(distinct t.user_id)*1.0/(select count(distinct user_id) from login),3) as p
from t
where diff = 1


# 解法三:先构建新用户及日期,左连接日活表(相同用户且日期间隔1)
# 新用户的次日成功的留存率 = 第一天且第二天也登录的用户数/总新用户
select round(count(l.user_id) / count(a.user_id),3)
from(
select distinct user_id,min(date(date)) as new_date 
from login
group by user_id 
) as  a 
left join login l on a.user_id = l.user_id and datediff(l.date,a.new_date) = 1

3、每个日期新用户的次日留存率

分析:构建新试图(确定新用户、次日留存)→计算新用户❗次日留存(先用where筛选新用户)→每个日期(左连接、null变0)

with t as (
select user_id,
       date,
       min(date) over(partition by user_id ) as first_date,
       lead(date) over (partition by user_id order by date) as new_date
from login),
t1 as (
select  date,
        round(sum(case when datediff(new_date,date) = 1 then 1 else 0 end)*1.0 / count(distinct user_id),3) as p
from t
where date = first_date
group by date 
order by date )

select m.date,coalesce(t1.p,0.000)
from (select distinct date from login) m left join t1 using(date)

4、查询连续登陆不少于3天的新注册用户

分析:

  • 新注册用户:用where in 进行筛选
  • 连续登录:row_number窗口函数进行编号,日期-编号 若相等,则连续
# 查询连续登陆不少于3天的新注册用户
with t as(
select user_id,
       datediff(date(log_time),row_number() over (partition by user_id order by date(log_time))) as r
from login_tb)

select user_id
from t
where user_id in(select distinct user_id from register_tb)
group by user_id
having count(*) >= 3

全部评论

相关推荐

评论
1
收藏
分享

创作者周榜

更多
牛客网
牛客企业服务