题解 | #未完成率较高的50%用户近三个月答卷情况#
未完成率较高的50%用户近三个月答卷情况
https://www.nowcoder.com/practice/3e598a2dcd854db8b1a3c48e5904fe1c
题目:
请统计SQL试卷上未完成率较高的50%用户中,6级和7级用户在有试卷作答记录的近三个月中,每个月的答卷数目和完成数目。
按用户ID、月份升序排序。
难点:
- 本道题的难点在于百分位数的计算,看了不少回答,虽然高赞、也通过了所有示例,但我觉得percent_rank函数是最好的解决办法 —— 因为很多答案,我个人认为,其实都无法解决“当有多个uid拥有相同的未完成率"这一特殊情况,多数没有直接使用percent_rank()的答案其实都是舍弃了特定的uid
- 而percent_rank对于同等数值的行,则会赋予同一个百分比排名,这一点也就能解决上述特殊情况;并且,在本题目中貌似并未出现该种特殊情况,所以也就没继续细究了,欢迎各位讨论!
- 本题还能使用ntile(n) OVER ()来解决,但是该函数我还没用过
关于开窗函数的积累,有这么几个常用的开窗排序函数:
- RANK() OVER()
- DENSE_RANK() OVER()
- ROW_NUMBER() OVER()
- LEAD(field,n,default) OVER()
- LAG(field,n,default) OVER()
- NTILE(n) OVER
- PERCENT_RANK() OVER:关于PERCENT_RANK函数的讲解,详情可见此链接:https://www.begtut.com/mysql/mysql-percent_rank-function.html
解题思路
步骤 1:联结三个表,找出SQL试卷中每个用户的未完成数、作答总数、未完成率
SELECT er.uid, ui.level, SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) AS incomplete_cnt, /* 未完成数 */ COUNT(er.uid) AS total_cnt, /* 作答总数 */ SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) / COUNT(er.uid) AS incomplete_rate /* 未完成率 */ FROM exam_record AS er LEFT OUTER JOIN examination_info AS ei ON er.exam_id = ei.exam_id LEFT OUTER JOIN user_info AS ui ON er.uid = ui.uid WHERE ei.tag = "SQL" GROUP BY er.uid, ui.level
步骤 2:使用PERCENT_RANK函数,根据每个用户的未完成率,计算出对应的百分比排名
- 这里还需要继续说,开窗函数是可以在WHERE和GROUP BY之后才执行的,因此步骤 2的PERCENT_RANK函数是可以直接在步骤1的SELECT语句中添加:
SELECT er.uid, ui.level, SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) AS incomplete_cnt, COUNT(er.uid) AS total_cnt, SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) / COUNT(er.uid) AS incomplete_rate, /* 需要注意的是,由于未完成率和作答总数都是计算字段, 所以不能直接在PERCENT_RANK中使用它们的别名incomplete_cnt和total_cnt */ PERCENT_RANK() OVER (ORDER BY (SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) / COUNT(er.uid)) ASC) AS ranking FROM exam_record AS er LEFT OUTER JOIN examination_info AS ei ON er.exam_id = ei.exam_id LEFT OUTER JOIN user_info AS ui ON er.uid = ui.uid WHERE ei.tag = "SQL" GROUP BY er.uid, ui.level
步骤 3:选取出前50%的6/7级用户
- 只需要再嵌套一层查询即可:
SELECT t1.uid FROM (SELECT er.uid, ui.level, SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) AS incomplete_cnt, COUNT(er.uid) AS total_cnt, SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) / COUNT(er.uid) AS incomplete_rate, PERCENT_RANK() OVER (ORDER BY (SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) / COUNT(er.uid)) ASC) AS ranking FROM exam_record AS er LEFT OUTER JOIN examination_info AS ei ON er.exam_id = ei.exam_id LEFT OUTER JOIN user_info AS ui ON er.uid = ui.uid WHERE ei.tag = "SQL" GROUP BY er.uid, ui.level) AS t1 WHERE level IN (6,7) AND ranking >= 0.5
步骤 4:找出每个用户,每个月的答卷数目和完成数目,并指出月份所对应是近N个月
- 这一步其实也比较简单了,对于“近N个月”的界定,我是使用了RANK()函数来做:
SELECT uid, DATE_FORMAT(start_time, "%Y%m") AS start_month, COUNT(uid) AS total_cnt, SUM(CASE WHEN score IS NOT NULL THEN 1 ELSE 0 END) AS complete_cnt, RANK() OVER (PARTITION BY uid ORDER BY DATE_FORMAT(start_time, "%Y%m") DESC) AS m_ranking FROM exam_record GROUP BY uid, DATE_FORMAT(start_time, "%Y%m")
步骤 5:找出每个用户的近三个月数据即可
SELECT uid, start_month, total_cnt, complete_cnt FROM (SELECT uid, DATE_FORMAT(start_time, "%Y%m") AS start_month, COUNT(uid) AS total_cnt, SUM(CASE WHEN score IS NOT NULL THEN 1 ELSE 0 END) AS complete_cnt, RANK() OVER (PARTITION BY uid ORDER BY DATE_FORMAT(start_time, "%Y%m") DESC) AS m_ranking FROM exam_record GROUP BY uid, DATE_FORMAT(start_time, "%Y%m")) AS t2 WHERE m_ranking <= 3
步骤 6:最终步,只需要把步骤3和步骤5的结果联结起来,找到前50%的6/7级用户的近三个月数据即可
SELECT t3.uid, t4.start_month, t4.total_cnt, t4.complete_cnt FROM (SELECT t1.uid FROM (SELECT er.uid, ui.level, SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) AS incomplete_cnt, COUNT(er.uid) AS total_cnt, SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) / COUNT(er.uid) AS incomplete_rate, PERCENT_RANK() OVER (ORDER BY (SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) / COUNT(er.uid)) ASC) AS ranking FROM exam_record AS er LEFT OUTER JOIN examination_info AS ei ON er.exam_id = ei.exam_id LEFT OUTER JOIN user_info AS ui ON er.uid = ui.uid WHERE ei.tag = "SQL" GROUP BY er.uid, ui.level) AS t1 WHERE level IN (6,7) AND ranking >= 0.5) AS t3 INNER JOIN (SELECT uid, start_month, total_cnt, complete_cnt FROM (SELECT uid, DATE_FORMAT(start_time, "%Y%m") AS start_month, COUNT(uid) AS total_cnt, SUM(CASE WHEN score IS NOT NULL THEN 1 ELSE 0 END) AS complete_cnt, RANK() OVER (PARTITION BY uid ORDER BY DATE_FORMAT(start_time, "%Y%m") DESC) AS m_ranking FROM exam_record GROUP BY uid, DATE_FORMAT(start_time, "%Y%m")) AS t2 WHERE m_ranking <= 3) AS t4 ON t3.uid = t4.uid ORDER BY t3.uid ASC, t4.start_month ASC;