Importance of MapReduce for Big Data Applications: A Survey


  • M. Durairaj Assistant Professor, School of Computer Science, Engineering and Applications, Bharathidasan University, Trichy, Tamil Nadu, India
  • T. S. Poornappriya Research Scholar, School of Computer Science, Engineering and Applications, Bharathidasan University, Trichy, Tamil Nadu, India


Big Data, Hadoop, Distributed File System, MapReduce Programming, Cloud Computing


Significant regard for MapReduce framework has been trapped by a wide range of areas. It is presently a practical model for data-focused applications because of its basic interface of programming, high elasticity, and capacity to withstand the subjection to defects. Additionally, it is fit for preparing a high extent of data in Distributed Computing environments (DCE). MapReduce, on various events, has turned out to be material to a wide scope of areas. MapReduce is a parallel programming model and a related usage presented by Google. In the programming model, a client determines the calculation by two capacities, Map and Reduce. The basic MapReduce library consequently parallelizes the calculation and handles muddled issues like data dispersion, load adjusting, and adaptation to non-critical failure. Huge data spread crosswise over numerous machines, need to parallelize. Moves the data, and gives booking, adaptation to non-critical failure. A writing survey on the MapReduce programming in different areas has completed in this paper. An examination course has been distinguished by utilizing a writing audit.


Wang, Botao, et al., “Parallel online sequential extreme learning machine based on MapReduce”, Neurocomputing, Vol. 149, pp. 224-232, 2015.

Marozzo, Fabrizio, Domenico Talia, and Paolo Trunfio. “P2P-MapReduce: Parallel data processing in dynamic Cloud environments.” Journal of Computer and System Sciences, Vol. 78, No.5, pp. 1382-1402, 2012.

Mohamed, Hisham, and Stéphane Marchand-Maillet. „MRO-MPI: MapReduce overlapping using MPI and an optimized data exchange policy”, Parallel Computing, Vol.39, No.12, pp. 851-866, 2013.

Barre, Benjamin, et al., “MapReduce for parallel trace validation of LTL properties”, International Conference on Runtime Verification. Springer, Berlin, Heidelberg, 2012.

Lu, Lu, et al., “Morpho: A decoupled MapReduce framework for elastic cloud computing”, Future Generation Computer Systems, Vol. 36, pp. 80-90, 2014.

Dean, Jeffrey, and Sanjay Ghemawat. “MapReduce: a flexible data processing tool”, Communications of the ACM, Vol.53, No.1, pp. 72-77, 2010.

Dean, Jeffrey, and Sanjay Ghemawat, “MapReduce: simplified data processing on large clusters”, Communications of the ACM, Vol. 51, No.1, pp. 107-113, 2008.

Kolb, Lars, Andreas Thor, and Erhard Rahm, “Multi-pass sorted neighborhood blocking with MapReduce”, Computer Science-Research and Development, Vol. 27, No.1, pp. 45-63, 2012.

[9] Anjos, Julio CS, et al., “MRA++: Scheduling and data placement on MapReduce for heterogeneous environments”, Future Generation Computer Systems, Vol. 42, pp. 22-35, 2015.

Zhang, Junbo, et al., “A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems”, International Journal of Approximate Reasoning, Vol.55 No.3, pp. 896-907, 2014.

Slagter, Kenn, et al., “SmartJoin: a network-aware multiway join for MapReduce”, Cluster Computing, Vol. 17, No.3, pp. 629-641, 2014.

Xiao, Zhifeng, and Yang Xiao, “Achieving accountable MapReduce in cloud computing”, Future Generation Computer Systems, 30, pp.1-13, 2014.

Debortoli, Stefan, Oliver Müller, and Jan vom Brocke, “Comparing business intelligence and big data skills”, Business & Information Systems Engineering, Vol. 6, No.5, pp. 289-300, 2014.

Shamsi, Jawwad, Muhammad Ali Khojaye, and Mohammad Ali Qasmi, “Data-intensive cloud computing: requirements, expectations, challenges, and solutions”, Journal of grid computing, Vol.11, No.2, pp. 281-310, 2013.

Lin, Jimmy, and Chris Dyer, “Data-intensive text processing with MapReduce”, Synthesis Lectures on Human Language Technologies, Vol. 3, No.1, pp.1-177, 2010.

Jain, Reshu, Prasenjit Sarkar, and Dinesh Subhraveti, “Gpfs-snc: An enterprise cluster file system for big data”, IBM Journal of Research and Development, Vol. 57, No.3/4, pp. 5-1, 2013.

Lee, Daewoo, Jin-Soo Kim, and Seungryoul Maeng, “Large-scale incremental processing with MapReduce”, Future Generation Computer Systems, Vol. 36, 66-79, 2014.

Zaharia, Matei, et al., “Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing”, Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012.

Zhao, Yaxiong, Jie Wu, and Cong Liu, “Dache: A data aware caching for big-data applications using the MapReduce framework”, Tsinghua science and technology, Vol. 19, NO.1, pp. 39-50, 2014.

Costa, Paolo, Austin Donnelly, Antony Rowstron, and Greg O’Shea, “Camdoop: Exploiting in-network aggregation for big data applications”, In Presented as part of the 9th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 12), pp. 29-42. 2012.

Pandey, Shweta, and Vrinda Tokekar, “Prominence of MapReduce in big data processing”, In 2014 Fourth IEEE International Conference on Communication Systems and Network Technologies, pp. 555-560. IEEE, 2014.

Liu, Ji, et al., “A survey of data-intensive scientific workflow management”, Journal of Grid Computing, Vol. 13, No.4, pp. 457-493, 2015.

Wu, Tin-Yu, et al., “Cloud-based image processing system with priority-based data distribution mechanism”, Computer Communications, Vol. 35, No. 15, pp. 1809-1818, 2012.

Senger, Hermes, et al., “BSP cost and scalability analysis for MapReduce operations”, Concurrency and Computation: Practice and Experience, Vol. 28,No .8, pp. 2503-2527, 2016.

Idris, Muhammad, et al., “Context‐aware scheduling in MapReduce: a compact review”, Concurrency and Computation: Practice and Experience, Vol. 27, No. 17, pp. 5332-5349, 2017.

Lee, Chia-Wei, et al., “A dynamic data placement strategy for hadoop in heterogeneous environments”, Big Data Research, Vol. 1, pp. 14-22, 2014.

Aridhi, Sabeur, et al., “Density-based data partitioning strategy to approximate large-scale subgraph mining”, Information Systems, Vol. 48, pp. 213-223, 2015.

Giachetta, Roberto, “A framework for processing large scale geospatial and remote sensing data in MapReduce environment”, Computers & Graphics, Vol. 49, pp. 37-46, 2015.

Jin, Songchang, et al., “Community structure mining in big data social media networks with MapReduce”, Cluster computing, Vol. 18, No.3, pp. 999-1010, 2015.

Zhang, Fan, et al., “A task-level adaptive MapReduce framework for real-time streaming data in healthcare applications”, Future Generation Computer Systems, Vol. 43, pp. 149-160, 2015.

Landset, Sara, et al., “A survey of open source tools for machine learning with big data in the Hadoop ecosystem”, Journal of Big Data, Vol. 2, No.1, pp. 24, 2015.

López, Victoria, et , “Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data”, Fuzzy Sets and Systems, Vol. 258, pp. 5-38, 2015.

Mashayekhy, Lena, et al., “Energy-aware scheduling of mapreduce jobs for big data applications”, IEEE transactions on Parallel and distributed systems, Vol. 26, No.10, pp. 2720-2733, 2015.

Peralta, Daniel, et al., “Evolutionary feature selection for big data classification: A MapReduce approach”, Mathematical Problems in Engineering, 2015.

Triguero, Isaac, et al., “MRPR: A MapReduce solution for prototype reduction in big data classification”, neurocomputing, Vol. 150, pp. 331-345, 2015.

[36] Yao, Qin, et al., “Design and development of a medical big data processing system based on Hadoop”, Journal of medical systems, Vol. 39, No.3, pp. 23, 2015.

Wang, Yong, et al., “Improving the performance of GIS polygon overlay computation with MapReduce for spatial big data processing”, Cluster Computing, Vol. 18, No.2, 507-516, 2015.

Bechini, Alessio, Francesco Marcelloni, and Armando Segatori, “A MapReduce solution for associative classification of big data”, Information Sciences, Vol. 332, pp. 33-55, 2016.

Tsai, Chih-Fong, Wei-Chao Lin, and Shih-Wen Ke, “Big data mining with parallel computing: A comparison of distributed and

MapReduce methodologies”, Journal of Systems and Software, Vol. 122, pp. 83-92, 2016.

Cao, Jianfang, et al., “Big data: A parallel particle swarm optimization-back-propagation neural network algorithm based on MapReduce”, PloS one, Vol. 11, No. 6, pp. e0157551, 2016.

Kamal, Sarwar, et al., “A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset”, Computer methods and programs in biomedicine, Vol. 131, pp. 191-206, 2016.

Gu, Boncheol, et al., “Biscuit: A framework for near-data processing of big data workloads”, ACM SIGARCH Computer Architecture News IEEE Press, Vol. 44. No. 3, 2016.

Chen, Jiaoyan, et al., “MR-ELM: a MapReduce-based framework for large-scale ELM training in big data era”, Neural Computing and Applications, Vol. 27, No. 1, pp.101-110, 2016.

Xia, Yingjie, et al., “Big traffic data processing framework for intelligent monitoring and recording systems”, Neurocomputing, Vol. 181, pp. 139-146, 2016.

Kumar, Ajay, et al., “A big data MapReduce framework for fault diagnosis in cloud-based manufacturing”, International Journal of Production Research, Vol. 54, No. 23, pp. 7060-7073, 2016.

Eldawy, Ahmed, Mohamed F. Mokbel, and Christopher Jonathan, “HadoopViz: A MapReduce framework for extensible visualization of big spatial data”, 2016 IEEE 32nd International Conference on Data Engineering (ICDE), IEEE, 2016.

Zhai, Junhai, Xizhao Wang, and Xiaohe Pang, “Voting-based instance selection from large data sets with MapReduce and random weight networks”, Information Sciences, Vol. 367, pp. 1066-1077, 2016.

Manogaran, Gunasekaran, et al., “Machine learning based big data processing framework for cancer diagnosis using hidden Markov model and GM clustering”, Wireless personal communications, Vol. 102, No.3, pp. 2099-2116, 2018.

Sadikin, Rifki, et al., “Processing next generation sequencing data in map-reduce framework using hadoop-BAM in a computer cluster”, 2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE). IEEE, 2017.

Li, Zhenlong, et al., “A spatiotemporal indexing approach for efficient processing of big array-based climate data with MapReduce”, International Journal of Geographical Information Science, Vol. 31, No.1, pp. 17-35, 2017.

[51] Ahmad, Awais, et al., “Multilevel data processing using parallel algorithms for analyzing big data in high-performance computing”, International Journal of Parallel Programming, pp. 1-20, 2018.

Fernández, Alberto, et al., “Fuzzy rule based classification systems for big data with MapReduce: granularity analysis”, Advances in Data Analysis and Classification, Vol. 11, No.4, 711-730, 2017.

Benmounah, Zakaria, Souham Meshoul, and Mohamed Batouche, “Scalable Differential Evolutionary Clustering Algorithm for Big Data Using Map-Reduce Paradigm”, International Journal of Applied Metaheuristic Computing (IJAMC), Vol. 8, No.1, pp. 45-60, 2017.

Zhai, Junhai, Sufang Zhang, and Chenxi Wang, “The classification of imbalanced large data sets based on mapreduce and ensemble of elm classifiers”, International Journal of Machine Learning and Cybernetics, Vol.8, No.3, pp. 1009-1017, 2017.

Pulgar-Rubio, F., et al., “MEFASD-BD: a multi-objective evolutionary fuzzy algorithm for subgroup discovery in big data environments-a MapReduce solution”, Knowledge-Based Systems, Vol. 117, pp. 70-78, 2017.

Cho, Wonhee, and Eunmi Choi, “Big data pre-processing methods with vehicle driving data using MapReduce techniques”, The Journal of Supercomputing, Vol. 73, No.7, pp. 3179-3195, 2017.

Zhang, Fan, et al., “Process Streaming Healthcare Data with Adaptive MapReduce Framework”, Handbook of Large-Scale Distributed Computing in Smart Healthcare. Springer, Cham, pp. 43-66, 2017.

Talan, Pooja P., et al., “An Overview of Hadoop MapReduce, Spark, and Scalable Graph Processing Architecture”, Recent Developments in Machine Learning and Data Analytics. Springer, Singapore, pp. 35-42, 2019.

Zhang, Bin, Xiaoyang Wang, and Zhigao Zheng, “The optimization for recurring queries in big data analysis system with MapReduce”, Future Generation Computer Systems, Vol. 87, pp. 549-556, 2018.

Qian, Jin, Min Xia, and Xiaodong Yue, “Parallel knowledge acquisition algorithms for big data using MapReduce”, International Journal of Machine Learning and Cybernetics, Vol. 9, No.6, pp. 1007-1021, 2018.

Martín, D., et al., “MRQAR: A generic MapReduce framework to discover quantitative association rules in big data problems”, Knowledge-Based Systems, Vol. 153, pp. 176-192, 2018.

Manogaran, Gunasekaran, and Daphne Lopez, “Spatial cumulative sum algorithm with big data analytics for climate change detection”, Computers & Electrical Engineering, Vol. 65, pp. 207-221, 2018.

Zou, Quan, Guoqing Li, and Wenyang Yu, “MapReduce functions to remote sensing distributed data processing—Global vegetation drought monitoring as an example”, Software: Practice and Experience, Vol. 48, No.7, pp. 1352-1367, 2018.

Tran, Xuan T., et al., “A New Data Layout Scheme for Energy-Efficient MapReduce Processing Tasks”, Journal of Grid Computing, Vol. 16, No.2, pp. 285-298, 2018.

Ramírez-Gallego, Sergio, et al., “A distributed evolutionary multivariate discretizer for big data processing on apache spark”, Swarm and Evolutionary Computation, Vol. 38, pp. 240-250, 2018.

Manogaran, Gunasekaran, Daphne Lopez, and Naveen Chilamkurti, “In-Mapper combiner based MapReduce algorithm for processing of big climate data”, Future Generation Computer Systems, Vol. 86, pp. 433-445, 2018.

Zhang, Liang, et al., “Efficient finer-grained incremental processing with MapReduce for big data”, Future Generation Computer Systems, Vol. 80, pp. 102-111, 2018.