Using Network-Aware Task Assignment to Boost Map Reduce
Running Map Reduce in a shared cluster to handle large-scale data analytical applications while increasing cluster utilization has been a current trend. However, network sharing across different apps may limit and heterogeneously distribute network capacity for Map Reduce workloads. As a result, network hotspots in racks become even more severe, rendering current task assignment rules that prioritize data location ineffective. This article provides a model to evaluate the connection between job completion time and the assignment of both map and reduce jobs across racks to address this problem. We also devise a network-aware task assignment method to reduce Map Reduce job completion times in shared clusters. It combines two basic but efficient greedy heuristics to decrease the time it takes to complete the map and reduce phases, respectively. We show that, when compared to state-of-the-art task assignment methods, the network-aware approach may reduce the average completion time of MapReduce tasks while maintaining an acceptable computing cost using large-scale simulations powered by Face book job traces.