sql查询第几高不用order by的方法
sql查询第几高不用order by的方法
今天做遇到了这样一个有意思的sql题(来自牛客网)
查找当前薪水(to_date='9999-01-01')排名第二多的员工编号emp_no、薪水salary、last_name以及first_name,你可以不使用order by完成吗
CREATE TABLE employees` ( emp_no int(11) NOT NULL, birth_date date NOT NULL, first_name varchar(14) NOT NULL, last_name varchar(16) NOT NULL, gender char(1) NOT NULL, hire_date date NOT NULL, PRIMARY KEY (emp_no)); CREATE TABLE salaries ( emp_no int(11) NOT NULL, salary int(11) NOT NULL, from_date date NOT NULL, to_date date NOT NULL, PRIMARY KEY (emp_no,from_date)); -- 这个数据是其他题的,拿来验证 INSERT INTO salaries VALUES(10001,88958,'1986-06-26','9999-01-01'); INSERT INTO salaries VALUES(10003,43311,'2001-12-01','9999-01-01'); INSERT INTO salaries VALUES(10004,70698,'1986-12-01','1995-12-01'); INSERT INTO salaries VALUES(10004,74057,'1995-12-01','9999-01-01'); INSERT INTO salaries VALUES(10006,43311,'2001-08-02','9999-01-01'); INSERT INTO salaries VALUES(10007,88070,'2002-02-07','9999-01-01'); INSERT INTO titles VALUES(10001,'Senior Engineer','1986-06-26','9999-01-01'); INSERT INTO titles VALUES(10003,'Senior Engineer','2001-12-01','9999-01-01'); INSERT INTO titles VALUES(10004,'Engineer','1986-12-01','1995-12-01'); INSERT INTO titles VALUES(10004,'Senior Engineer','1995-12-01','9999-01-01'); INSERT INTO titles VALUES(10006,'Senior Engineer','2001-08-02','9999-01-01'); INSERT INTO titles VALUES(10007,'Senior Staff','1996-02-11','9999-01-01');
要求是不要用order by,这个题并不简单,评论区很多方法都是错的,但因为通过了就以为对了,没有仔细多输几个案例,我自己想到的是用两次max,不过评论区教了一个比较实用的方法,可以查到任意第几大的方法:
select e.emp_no,s.salary,e.last_name,e.first_name from employees e join salaries s on e.emp_no=s.emp_no and s.to_date='9999-01-01' and s.salary = ( select s1.salary from salaries s1 join salaries s2 on s1.salary<=s2.salary -- 最重要的部分 and s1.to_date='9999-01-01' and s2.to_date='9999-01-01' group by s1.salary having count(distinct s2.salary)=2 )
接下来将解析这个方法是怎么来的
他最重要的片段就是类似于
select * from A,B where A.xx<B.xx -- 或者 select * from A join B on A.xx<B.xx
在这里解释一下这个表达是怎么运作的以及怎么实现找到第几大的功能的
首先,A,B或者A join B是做的笛卡尔积(A的所有行跟B的每一行进行组合)
以本题为例,展示编号和薪水情况
select s1.emp_no s1, s1.salary, s2.emp_no s2, s2.salary from salaries s1 join salaries s2
结果是(未展示完)
可以看到是s1的所有行跟s2的一行组合,完了又继续跟下一行组合,直到结束
其实就是把自己和剩下的所有行都能做个组合(比如s2的10001和其他所有行都组合了)
所以,两表的salary可以做一个比较了
加上条件s1.salary<=s2.salary
select s1.emp_no s1, s1.salary, s2.emp_no s2, s2.salary from salaries s1 join salaries s2 on s1.salary<=s2.salary
结果
若每一个salary都不相同,则
s2同一个salary留下几个就是salary第几小(比如10001的slary比其余所有的都大,就全留下来了,第6小)
s1同一个salary留下几个就是salary第几高(注意每一个左边salary也跟剩下所有行都比较过了,小于几个就是第几嘛,一个都不小就第一)
这里要的是第二高,所以对s1的salary做一个分组,再统计一下就ok了
select s1.emp_no s1, s1.salary from salaries s1 join salaries s2 on s1.salary<=s2.salary group by s1.salary having count(1)=2
结果
但这里还有问题.比如要倒数第一高呢?(有两个)
select s2.emp_no s2, s2.salary -- 显示s2的 from salaries s1 join salaries s2 on s1.salary<=s2.salary group by s2.salary -- 注意切换成s2的salary分组 having count(1)=1
发现查不到,问题就是出在有不同emp_no有相同salary
观察s2.salary分组的时候,分组前:
看看其中两个组的情况
他们红线部分在之前s1.salary<=s2.salary都比较了,因此两行都留下来了,但其实只需留一行,因为实际上s2.salary只大这个值一次(排大小的话),所以可以在count前加一个distinct
select s2.emp_no s2, s2.salary from salaries s1 join salaries s2 on s1.salary<=s2.salary group by s2.salary having count(distinct s1.salary)=1 -- 注意分组是s2,这里是s1,因为去除的是之前重复比较的值
结果
还是没有显示出两个,不过已经得到预期效果了,那就是salary是对的,这样的话只需要再select一次就行了
select * from salaries where salary = ( select s2.salary from salaries s1 join salaries s2 on s1.salary<=s2.salary group by s2.salary having count(distinct s1.salary)=1)
结果
再按题目要求加上特殊信息就完成了
select s.emp_no,s.salary,e.last_name,e.first_name from salaries s join employees e on s.emp_no=e.emp_no where salary = ( select s2.salary from salaries s1 join salaries s2 on s1.salary<=s2.salary and s1.to_date='9999-01-01' and s2.to_date='9999-01-01' -- 注意要的是当前薪水中的排行信息(9999-01-01代表当前) group by s2.salary having count(distinct s1.salary)=1) -- 注意我还是求的倒数第一小
结果
试试把条件改成题目的第二高
select s.emp_no,s.salary,e.last_name,e.first_name from salaries s join employees e on s.emp_no=e.emp_no where salary = ( select s1.salary from salaries s1 join salaries s2 on s1.salary<=s2.salary and s1.to_date='9999-01-01' and s2.to_date='9999-01-01' group by s1.salary having count(distinct s2.salary)=2)
结果
大功告成
不过有个疑问
这里的排名方式够不够好呢?比方说相同排名之后没有进行间隔处理(两个第一名,则下一个名次其实应该是第三,但这样就查不到第二名了)