题解 138 | #连续两次作答试卷的最大时间窗#
【场景】:时间窗
【分类】:专用窗口函数、嵌套子查询
分析思路
难点:
1.如何求窗口期:使用lag()或者lead(),再分组求最大值
2.作答试卷不去重
(1)统计2021年至少有两天作答过试卷的人
-
[条件]:where year(start_time) = 2021 group by uid having count(date(start_time)) >= 2
-
[使用]:count()、max()、min()
(2)得到平均每天作答试卷数、当前行日期和当前行的前一行日期
- [使用]:timestampdiff(day,min_date,max_date);
当前行的前一行日期
- [使用]:lag(date,1) over(partition by uid order by date) 对应 timestampdiff(day,pre_date,date)
当前行的后一行日期
- [使用]:lead(date,1) over(partition by uid order by date) 对应timestampdiff(day,date,latter_date)
(3)计算avg_exam_cnt,按最大时间窗和平均做答试卷套数倒序排序
得到平均每天作答试卷数、当前行日期和当前行的前一行日期的差求最大值,得到days_window和平均每天作答试卷数
-
[使用]: max(timestampdiff(day,pre_date,date))+1
-
[使用]:order by days_window desc,avg_exam_cnt desc
最终结果
select 查询结果 [用户ID;最大时间窗;平均做答试卷套数]
from 从哪张表中查询数据[嵌套from子查询]
where 查询条件 [空窗期不为空]
order by 按照指定条件排序 [按最大时间窗和平均做答试卷套数倒序排序]
扩展
前往查看: MySQL 窗口函数
求解代码
方法一
with子句+一步步拆解
with
temp as(
#2021年至少有两天作答过试卷的人
select
uid,
count(date(start_time)) as cnt_date,
max(date(start_time)) as max_date,
min(date(start_time)) as min_date
from exam_record
where year(start_time) = 2021
group by uid having count(date(start_time)) >= 2
)
,temp1 as(
#得到平均每天作答试卷数、当前行日期和当前行的前一行日期
select
uid,
cnt_date/(timestampdiff(day,min_date,max_date)+1) as avg_exam,
date,
lag(date,1) over(partition by uid order by date) as pre_date
from temp
join(
select distinct
uid,
date(start_time) as date
from exam_record
where year(start_time) = 2021
) attr using(uid)
)
,temp2 as(
#统计平均每天试卷数、最大时间窗
#当前行日期和当前行的前一行日期求差的最大值为days_window
select
uid,
max(avg_exam) as avg_exam,
max(timestampdiff(day,pre_date,date))+1 as days_window
from temp1
group by uid
)
#计算avg_exam_cnt,按最大时间窗和平均做答试卷套数倒序排序。
select
uid,
days_window,
round(avg_exam*days_window,2) as avg_exam_cnt
from temp2
where days_window is not null
order by days_window desc,avg_exam_cnt desc
优化代码
with
main as(
#2021年至少有两天作答过试卷的人
select
uid,
count(date(start_time)) as cnt_date,
max(date(start_time)) as max_date,
min(date(start_time)) as min_date
from exam_record
where year(start_time) = 2021
group by uid having count(date(start_time)) >= 2
)
,main1 as(
#得到平均每天作答试卷数、当前行日期和当前行的前一行日期
select
uid,
cnt_date/(timestampdiff(day,min_date,max_date)+1) as avg_exam,
date,
lag(date,1) over(partition by uid order by date) as pre_date
from main
join(
select distinct
uid,
date(start_time) as date
from exam_record
where year(start_time) = 2021
) attr using(uid)
)
#计算avg_exam_cnt,按最大时间窗和平均做答试卷套数倒序排序。
select
uid,
max(timestampdiff(day,pre_date,date))+1 as days_window,
round(max(avg_exam)*(max(timestampdiff(day,pre_date,date))+1),2) as avg_exam_cnt
from main1
where pre_date is not null
group by uid
order by days_window desc,avg_exam_cnt desc
方法二
from嵌套子查询
#计算avg_exam_cnt,按最大时间窗和平均做答试卷套数倒序排序。
select
uid,
max(timestampdiff(day,pre_date,date))+1 as days_window,
round(max(avg_exam)*(max(timestampdiff(day,pre_date,date))+1),2) as avg_exam_cnt
from(
#得到平均每天作答试卷数、当前行日期和当前行的前一行日期
select
uid,
cnt_date/(timestampdiff(day,min_date,max_date)+1) as avg_exam,
date,
lag(date,1) over(partition by uid order by date) as pre_date
from(
#2021年至少有两天作答过试卷的人
select
uid,
count(date(start_time)) as cnt_date,
max(date(start_time)) as max_date,
min(date(start_time)) as min_date
from exam_record
where year(start_time) = 2021
group by uid having count(date(start_time)) >= 2
) main
join(
select distinct
uid,
date(start_time) as date
from exam_record
where year(start_time) = 2021
) attr using(uid)
) main1
where pre_date is not null
group by uid
order by days_window desc,avg_exam_cnt desc