MapReduce是啥

通俗解释

比如计算一副牌有多少张,最简单的方法是自己一张一张数。
但MapReduce思想是把牌分给大家,大家一起数,数完加起来。

分而治之

应用在计算任务可以水平切分,而不是相互依赖
比如需要a事件完成才能执行b事件(可以理解为上下游关系)

Map和Reduce是什么呢

Map(映射):分配给集群机器,对每个目标应用同一操作
Reduce(归纳):整合部分结果

  • file分为多个spilt,交给多个Mapper Task处理
  • 处理完根据键值对进行shuffle,保证同一个key的放到一起
  • 整合完交给reducer Task
  • 写入hdfs

图片说明

例子!!!!

文本

the weather is good
today is good
good weather is good
today has good weather

通过split拆分

Split-0: [0, "the weather is good]
Split-1: [1, "today is good"]
Split-2: [2, "good weather is good"]
Split-3: [3, "today has good weather]

Mapper映射

Mapper-0: ["the", 1], ["weather", 1], ["is", 1], ["good", 1]
Mapper-1: ["today", 1], ["is", 1], ["good", 1]
Mapper-2: ["good", 2], ["weather", 1], ["is", 1]
Mapper-3: [today", 1], ["has", 1], ["good", 1], ["weater", 1]

shuffle

["good", {1, 1, 2, 1}]
["has", {1}]
["is", {1, 1, 1}]
["the", {1}]
["today", {1, 2}]
["weater", {1,1}]

reducer

Reducer-0: ["good", 5]
Reducer-1: ["has", 1]
Reducer-2: ["is", 3]
Reducer-3: ["the", 1]
Reducer-4: ["today", 2]
Reducer-5: ["weather", 3]

END

全部评论

相关推荐

评论
点赞
收藏
分享

创作者周榜

更多
牛客网
牛客企业服务