筛选变量
- 2025-10-15下单2个以及2个以上状态为购买成功的C++或Java或Python课程的user_id
- 满足前面条件的第一次购买成功的C++或Java或Python课程的日期first_buy_date
- 满足前面条件的第二次购买成功的C++或Java或Python课程的日期second_buy_date
- 购买成功的C++或Java或Python课程的次数cnt
对比各变量的限定词,可发现重复率极高,则可以
- where部分进行日期、状态及课程的筛选
- first、second需对日期进行排序并连接
- 视情况用count/count()over()得到cnt
- 视情况用having count()>1/where cnt>1完成对user_id的筛选
难点——first_date,second_date如何排序与连接
方法一:row_number() over排序+left join连接相同两表(次日留存率常用方法1)
with q as
(select user_id,date,
min(date) over (partition by user_id) as first_buy_date,
row_number() over (partition by user_id order by date) as rn,
count(user_id) over (partition by user_id) as cnt
from order_info
where date>'2025-10-15' and status='completed' and
product_name in ('C++','Java','Python'))
select distinct q1.user_id,q1.first_buy_date,q2.date as second_buy_date,
q1.cnt
from q as q1
join q as q2
on q1.user_id=q2.user_id and q2.rn=2
where q1.cnt>1
order by user_id
方法二:lead() over排序(次日留存率常用方法2)+(date=first_date)连接
select user_id,first_buy_date,second_buy_date,cnt
from (select *,
min(date) over (partition by user_id) as first_buy_date,
lead(date,1) over (partition by user_id order by date)
as second_buy_date,
count(user_id) over (partition by user_id) as cnt
from order_info
where date>'2025-10-15'
and status='completed'
and product_name in ('C++','Python','Java')) as q
where cnt>1 and date=first_buy_date
order by user_id
方法三:row_number() over()排序+flag连接
with q as
(select user_id,date,
row_number() over (partition by user_id order by date) as rn,
count(user_id) over (partition by user_id) as cnt
from order_info
where date>'2025-10-15' and status='completed' and
product_name in ('C++','Java','Python'))
select user_id,
max(if(rn=1,date,null)) as first_buy_date, -- 此处的null换成0也可
max(if(rn=2,date,null)) as second_buy_date, -- 此处max不可换为sum,不然结果为整数型
cnt
from q
where cnt>1
group by user_id
order by user_id