题解 | #未完成率较高的50%用户近三个月答卷情况#
未完成率较高的50%用户近三个月答卷情况
https://www.nowcoder.com/practice/3e598a2dcd854db8b1a3c48e5904fe1c
花了 1.5小时的题目还是值得我纪念一下。。其中前一个小时我都在“去重”的坑里。。
看到解释表中的第二行——用户 1002未完成数、作答总数分别是 4、8,而不是 5、9。我找了半天原因,于是在 exam_record 表中发现了这样两行(牛客这自带插入表格真难用😥):
id | uid | exam_id | start_time | submit_time | score |
10 | 1002 | 9002 | 2020-02-02 12:01:01 | None | None |
11 | 1002 | 9002 | 2020-02-02 12:01:01 | 2020-02-02 12:43:01 | 81 |
因此我以为需要先去重(即 uid 用户ID, exam_id 试卷ID, start_time 开始作答时间,这三项完全一致的为重复行,现实生活中这样认为也比较合理,例如重定向导致的重复行)。
去重:
- 首先用 dense_rank 根据 uid, exam_id, start_time 排序编号,并按 submit_time倒叙排序,记为 t1。这样属于重复行的可以编号相同,并且submit_time 不为空的会优先排在该组的第一行;
- 然后用 row_number 以dr分组,根据 uid, exam_id, start_time 排序编号,记为 t2;
- 最后筛选出 t2 中的 rk = 1 即为去重结果。
with t as
(
select *
from
(
select *, row_number() over(partition by dr order by uid, exam_id, start_time) rk
from
(
select *, dense_rank() over(order by uid, exam_id, start_time) dr
from exam_record
order by dr, submit_time desc
)t1
)t2
where rk=1
)
结果一个小时过去了,案例不通过,重读题目,嗯????SQL试卷?!?tnnd
with a1 as
(
select uid, sum(if(submit_time is null, 1, 0)) incomplete_cnt,
count(*) total_cnt,
sum(if(submit_time is null, 1, 0))/count(*) incomplete_rate,
PERCENT_RANK() over(order by sum(if(submit_time is null, 1, 0))/count(*) desc) rk
from exam_record
left join examination_info using(exam_id)
where tag='SQL'
group by uid
)
select uid, start_month, total_cnt, complete_cnt
from
(
select uid, date_format(start_time, '%Y%m') start_month, count(start_time) total_cnt, sum(if(submit_time is not null,1,0)) complete_cnt,
row_number() over(partition by uid order by date_format(start_time, '%Y%m') desc) time_rk
from exam_record
where uid in
(
select uid
from a1
left join user_info using(uid)
where rk<=0.5
and (level=6 or level=7)
)
group by uid, start_month
)a2
where time_rk<=3 # 查找作答时间最近的3个月
order by uid, start_month
;
划重点:
- PERCENT_RANK() 用得比较少,新知识点
- 查找作答时间最近的3个月,可以用排序编号的方法选出前三,根选出每个部门工资前三的员工的题目思路类似。


