题解 | #对试卷得分做min-max归一化#
对试卷得分做min-max归一化
https://www.nowcoder.com/practice/2b7acdc7d1b9435bac377c1dcb3085d6
with t1 as ( select b.uid, a.exam_id, b.score from examination_info as a, exam_record as b where a.exam_id = b.exam_id and a.difficulty = 'hard' and b.score is not null ), t2 as ( SELECT a.exam_id, MAX(b.score) - MIN(b.score) as diff_score, MIN(b.score) as min_score, Max(b.score) as max_score FROM examination_info as a JOIN exam_record as b ON a.exam_id = b.exam_id WHERE a.difficulty = 'hard' AND b.score IS NOT NULL GROUP BY a.exam_id ), tc as ( select t1.uid, t1.exam_id, t1.score, if ( t2.min_score = t2.max_score, t2.min_score, (t1.score - t2.min_score) * 100 / t2.diff_score ) as diff_last_score from t1 left join t2 on t1.exam_id = t2.exam_id ) select uid, exam_id, round(avg(diff_last_score), 0) as avg_new_score from tc group by uid, exam_id order by exam_id asc, avg_new_score desc
#先读题:请你将用户作答高难度试卷的得分在每份试卷作答记录内执行min-max归一化后缩放到[0,100]区间,并输出用户ID、试卷ID、归一化后分数平均值;最后按照试卷ID升序、归一化分数降序输出
1、条件:作答高难度试卷(join加hard),最后按照试卷ID升序、归一化分数降序输出
2、内容:在每份试卷作答记录内执行min-max归一化后缩放到[0,100]区间,并输出用户ID、试卷ID、归一化后分数平均值
解决:
作答高难度试卷(join加hard)--------------join两个表备用,用where筛选
select b.uid, a.exam_id, b.score from examination_info as a, exam_record as b where a.exam_id = b.exam_id and a.difficulty = 'hard' and b.score is not null
min-max归一化后缩放到[0,100]区间---------求这些最大值,最小值,最大值-最小值
SELECT a.exam_id, MAX(b.score) - MIN(b.score) as diff_score, MIN(b.score) as min_score, Max(b.score) as max_score FROM examination_info as a JOIN exam_record as b ON a.exam_id = b.exam_id WHERE a.difficulty = 'hard' AND b.score IS NOT NULL GROUP BY a.exam_id
最后就是通过join 对数据进行标准化
select t1.uid, t1.exam_id, t1.score, if ( t2.min_score = t2.max_score, t2.min_score, (t1.score - t2.min_score) * 100 / t2.diff_score ) as diff_last_score from t1 left join t2 on t1.exam_id = t2.exam_id
最后在求相应的平均值,round()函数,avg()聚合函数
select uid, exam_id, round(avg(diff_last_score), 0) as avg_new_score from tc group by uid, exam_id order by exam_id asc, avg_new_score desc
不过现在max() over 窗口函数貌似可以一步到位
MIN(score) OVER(PARTITION BY exam_id) min_x, #求每类试卷的得分最小值 MAX(score)OVER(PARTITION BY exam_id) max_x #求每类试卷的得分最大值
结束了,这个的判断很重要avg求均值,如果是1个数据就会叛定为null,所以必须使用if函数进行限制
结束