题解 | #每篇文章同一时刻最大在看人数#

每篇文章同一时刻最大在看人数

http://www.nowcoder.com/practice/fe24c93008b84e9592b35faa15755e48

方法一:

思路:

  1. 原始数据 alt

易错点:要筛去 artical_id=0的数据 2. 时刻的选取 alt

  • 题目要求“同一时刻”,而时间是连续变量,该选取什么时刻作为关注对象呢?根据例题画出了上图,可以直观发现,每一个有人进出的时刻都会人数变化,不妨取进出时刻为“关键时刻”point_time
  • in_timeout_time都是point_time,同时选取表格中的不同变量,可以分别选取再用union连接
  • 易错点!!!一定要用union,而不是union all!!! (笔者因为这个问题卡了一下午)

前者去重,后者不去重。 若不去重,后续统计同一时刻在看的总人数时,会重复计算同时有进出的时刻的人数,导致结果错误

  1. 是否在看的判断标准
  • 题目提到,若同一时刻有人进入也有离开,先记录进入,再记录离开,也就是同一时刻会对应两个总人数值,两值相差1。但既然题目最终只要求最大人数,对于同时有进出的时刻,我们不妨只取进入未离开的状态,也就是更大的总人数。
  • 由此,将每个用户观看的时间段与point_time对照,判断某时刻某用户是否在看: if(point_time between in_time and out_time,1,0)

代码

  1. 取关键时刻point_time表q1
select artical_id,in_time as point_time
    from tb_user_log 
    where artical_id!=0
    union all 
    select artical_id,out_time as point_time
    from tb_user_log
    where artical_id!=0
order by artical_id,point_time
  • 若不去重(union all) alt
  • 若去重(union) alt
  1. 对每个point_time,统计所有用户在看情况,得到表q
select q1.artical_id,point_time,
   sum(if(point_time between in_time and out_time,1,0)) as uv
   from 
    (
select artical_id,in_time as point_time
    from tb_user_log 
    where artical_id!=0
    union 
    select artical_id,out_time as point_time
    from tb_user_log
    where artical_id!=0
order by artical_id,point_time
) as q1
   join tb_user_log as q2
   on q1.artical_id=q2.artical_id
   group by artical_id,point_time
  • 拼接结果(部分)

alt alt

  • 拼接结果整合(按id和日期sum后) alt
  1. 取每篇文章同一时刻最大在看人数
select artical_id,max(uv) as max_uv
from 
  (select q1.artical_id,point_time,
   sum(if(point_time between in_time and out_time,1,0)) as uv
   from 
    (
select artical_id,in_time as point_time
    from tb_user_log 
    where artical_id!=0
    union 
    select artical_id,out_time as point_time
    from tb_user_log
    where artical_id!=0
order by artical_id,point_time
) as q1
   join tb_user_log as q2
   on q1.artical_id=q2.artical_id
   group by artical_id,point_time) as q
group by artical_id
order by max_uv desc
  • 若不去重,所得结果: alt 所以一定要去重啊朋友们!!!

方法二:

题解区大部分友友的思路:

  1. 同样取关键时刻,但对进出予以区分,进diff=+1退diff=-1
  2. 按时间排序后巧用**sum()over(partition by order by)**可得截止每个关键时刻的总人数
  3. 易错点:先进后出,故sum(diff)时不但要time asc,还要diff desc

代码:

select artical_id,max(uv) as max_uv
from 
  (select q1.artical_id,point_time,
   sum(diff) over (partition by artical_id order by point_time,
                  diff desc) as uv
   from 
    (select artical_id,in_time as point_time, 1 as diff
    from tb_user_log 
     where artical_id!=0
    union all 
    select artical_id,out_time as point_time,-1 as diff
    from tb_user_log
    where artical_id!=0
    order by artical_id,point_time) as q1
   ) as q2
group by artical_id
order by max_uv desc

两方法对比:

  1. 易错点:
  • 方法一由于最终取最大值,不考虑同时有进出时刻的“出”情况,故不区分inout,但易错点也在于此,既然要sum,就要记得给point_time去重。
  • 方法二区分了inout,但易错点在于区分过后就要规定好先后,不然最大值就会受到影响。
  1. 思路本质:
  • 方法一将时间段与时间点对照,但从结果出发取了巧,但偷懒也有偷懒要考虑的后果。
  • 方法二只考虑时间点间的跳跃,只需要更严格的顺序要求便可由简洁的思路得到严谨的结果。
  1. 内存(虽然不太懂内存这些,但体感)
  • 方法一会占用更多内存,若除id=0的情况有n条数据,那么point_time2n个,并要逐一与n个时间段进行对照,则join后的表格会有2n*n条数据,大可不必如此!
  • 方法二只需对2n个关键时刻进行窗口运算即可。
全部评论
第一个方法因为你用的是between,所以包含了结束时间段进来的情况,用union刚好帮你去重掉了这部分
点赞 回复 分享
发布于 2023-05-10 20:40 浙江

相关推荐

评论
2
3
分享

创作者周榜

更多
牛客网
牛客企业服务