题解 | #未完成率较高的50%用户近三个月答卷情况#

未完成率较高的50%用户近三个月答卷情况

https://www.nowcoder.com/practice/3e598a2dcd854db8b1a3c48e5904fe1c

题目:

请统计SQL试卷上未完成率较高的50%用户中,6级和7级用户在有试卷作答记录的近三个月中,每个月的答卷数目和完成数目。
按用户ID、月份升序排序。


难点:

  • 本道题的难点在于百分位数的计算,看了不少回答,虽然高赞、也通过了所有示例,但我觉得percent_rank函数是最好的解决办法 —— 因为很多答案,我个人认为,其实都无法解决“当有多个uid拥有相同的未完成率"这一特殊情况,多数没有直接使用percent_rank()的答案其实都是舍弃了特定的uid
  • 而percent_rank对于同等数值的行,则会赋予同一个百分比排名,这一点也就能解决上述特殊情况;并且,在本题目中貌似并未出现该种特殊情况,所以也就没继续细究了,欢迎各位讨论!
  • 本题还能使用ntile(n) OVER ()来解决,但是该函数我还没用过

关于开窗函数的积累,有这么几个常用的开窗排序函数:


解题思路

步骤 1:联结三个表,找出SQL试卷中每个用户的未完成数、作答总数、未完成率

SELECT er.uid, ui.level,
       SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) AS incomplete_cnt, /* 未完成数 */
       COUNT(er.uid) AS total_cnt, /* 作答总数 */
       SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) / COUNT(er.uid) AS incomplete_rate /* 未完成率 */
FROM exam_record AS er LEFT OUTER JOIN examination_info AS ei ON er.exam_id = ei.exam_id
                       LEFT OUTER JOIN user_info AS ui ON er.uid = ui.uid
WHERE ei.tag = "SQL"
GROUP BY er.uid, ui.level

步骤 2:使用PERCENT_RANK函数,根据每个用户的未完成率,计算出对应的百分比排名

  • 这里还需要继续说,开窗函数是可以在WHERE和GROUP BY之后才执行的,因此步骤 2的PERCENT_RANK函数是可以直接在步骤1的SELECT语句中添加:
    SELECT er.uid, ui.level,
         SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) AS incomplete_cnt, 
         COUNT(er.uid) AS total_cnt, 
         SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) / COUNT(er.uid) AS incomplete_rate,
         /* 需要注意的是,由于未完成率和作答总数都是计算字段,
            所以不能直接在PERCENT_RANK中使用它们的别名incomplete_cnt和total_cnt */
         PERCENT_RANK() OVER (ORDER BY 
                                      (SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) / COUNT(er.uid)) ASC) AS ranking
    FROM exam_record AS er LEFT OUTER JOIN examination_info AS ei ON er.exam_id = ei.exam_id
                         LEFT OUTER JOIN user_info AS ui ON er.uid = ui.uid
    WHERE ei.tag = "SQL"
    GROUP BY er.uid, ui.level

步骤 3:选取出前50%的6/7级用户

  • 只需要再嵌套一层查询即可:
    SELECT t1.uid
    FROM
          (SELECT er.uid, ui.level,
                  SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) AS incomplete_cnt,
                  COUNT(er.uid) AS total_cnt,
                  SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) / COUNT(er.uid) AS incomplete_rate,
                  PERCENT_RANK() OVER (ORDER BY 
                                      (SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) / COUNT(er.uid)) ASC) AS ranking
           FROM exam_record AS er LEFT OUTER JOIN examination_info AS ei
           ON er.exam_id = ei.exam_id
           LEFT OUTER JOIN user_info AS ui ON er.uid = ui.uid
           WHERE ei.tag = "SQL"
           GROUP BY er.uid, ui.level) AS t1
    WHERE level IN (6,7) AND ranking >= 0.5

步骤 4:找出每个用户,每个月的答卷数目和完成数目,并指出月份所对应是近N个月

  • 这一步其实也比较简单了,对于“近N个月”的界定,我是使用了RANK()函数来做:
    SELECT uid, 
         DATE_FORMAT(start_time, "%Y%m") AS start_month, 
         COUNT(uid) AS total_cnt,
         SUM(CASE WHEN score IS NOT NULL THEN 1 ELSE 0 END) AS complete_cnt,
         RANK() OVER (PARTITION BY uid ORDER BY DATE_FORMAT(start_time, "%Y%m") DESC) AS m_ranking
    FROM exam_record
    GROUP BY uid, DATE_FORMAT(start_time, "%Y%m")

步骤 5:找出每个用户的近三个月数据即可

SELECT uid, start_month, total_cnt, complete_cnt
FROM
     (SELECT uid, 
             DATE_FORMAT(start_time, "%Y%m") AS start_month, 
             COUNT(uid) AS total_cnt,
             SUM(CASE WHEN score IS NOT NULL THEN 1 ELSE 0 END) AS complete_cnt,
             RANK() OVER (PARTITION BY uid ORDER BY DATE_FORMAT(start_time, "%Y%m") DESC) AS m_ranking
      FROM exam_record
      GROUP BY uid, DATE_FORMAT(start_time, "%Y%m")) AS t2
WHERE m_ranking <= 3

步骤 6:最终步,只需要把步骤3和步骤5的结果联结起来,找到前50%的6/7级用户的近三个月数据即可

SELECT t3.uid, t4.start_month, t4.total_cnt, t4.complete_cnt
FROM
    (SELECT t1.uid
     FROM
         (SELECT er.uid, ui.level,
                 SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) AS incomplete_cnt,
                 COUNT(er.uid) AS total_cnt,
                 SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) / COUNT(er.uid) AS incomplete_rate,
                 PERCENT_RANK() OVER (ORDER BY 
                                    (SUM(CASE WHEN er.score IS NULL THEN 1 ELSE 0 END) / COUNT(er.uid)) ASC) AS ranking
          FROM exam_record AS er LEFT OUTER JOIN examination_info AS ei
          ON er.exam_id = ei.exam_id
          LEFT OUTER JOIN user_info AS ui ON er.uid = ui.uid
          WHERE ei.tag = "SQL"
          GROUP BY er.uid, ui.level) AS t1
     WHERE level IN (6,7) AND ranking >= 0.5) AS t3

INNER JOIN

    (SELECT uid, start_month, total_cnt, complete_cnt
     FROM
         (SELECT uid, 
                 DATE_FORMAT(start_time, "%Y%m") AS start_month, 
                 COUNT(uid) AS total_cnt,
                 SUM(CASE WHEN score IS NOT NULL THEN 1 ELSE 0 END) AS complete_cnt,
                 RANK() OVER (PARTITION BY uid ORDER BY DATE_FORMAT(start_time, "%Y%m") DESC) AS m_ranking
         FROM exam_record
         GROUP BY uid, DATE_FORMAT(start_time, "%Y%m")) AS t2
     WHERE m_ranking <= 3) AS t4

ON t3.uid = t4.uid
ORDER BY t3.uid ASC, t4.start_month ASC;
全部评论

相关推荐

Kunnnnnnn:看这公司23年就成立了啊 还没倒闭呢
点赞 评论 收藏
分享
评论
点赞
收藏
分享

创作者周榜

更多
牛客网
牛客企业服务