Paper Title : MapReduce Performance Scaling Using Data Prefetching
ISSN : 2394-2231
Year of Publication : 2022
10.5281/zenodo.6677796
MLA Style: MapReduce Performance Scaling Using Data Prefetching "Jung Lee, Kyung Tae Kim, Tso Youn-Chen" Volume 9 - Issue 3 International Journal of Computer Techniques (IJCT) ,ISSN:2394-2231 , www.ijctjournal.org
APA Style: MapReduce Performance Scaling Using Data Prefetching "Jung Lee, Kyung Tae Kim, Tso Youn-Chen" Volume 9 - Issue 3 International Journal of Computer Techniques (IJCT) ,ISSN:2394-2231 , www.ijctjournal.org
Abstract
Recently, due to the advent of social networks, bio-computing, and the Internet of Things, more data is being generated than in the existing IT environment, and as a result, research on efficient large-capacity data processing techniques is being conducted. MapReduce is an effective programming model for data-intensive computational applications. A typical MapReduce application includes Hadoop, which is being developed and supported by the Apache Software Foundation. This paper proposes a data prefetching technique and a streaming technique to improve the performance of Hadoop MapReduce. One of the performance issues of Hadoop MapReduce is work delay due to input data transmission in the MapReduce process. In order to minimize this data transfer time, a prefetching thread in charge of data transfer was created separately, unlike the existing MapReduce. As a result, data transmission became possible even during the MapReduce operation of data, reducing the overall data processing time. Even with this prefetching technique, the job waits for the first data transmission due to the characteristics of Hadoop MapReduce. To reduce this waiting time, the streaming technique was used to further reduce the waiting time due to data transmission. Mathematical modeling was performed to measure the performance of the proposed method, and as a result of the performance measurement, it was confirmed that the performance of MapReduce to which the streaming method was additionally applied was improved compared to MapReduce to which only the existing Hadoop MapReduce and prefetching methods were applied.
Reference
[1] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, Communications of the ACM, vol. 51, no. 1,(2008) [2] D. Jiang, B. Chin, L. Shi, and S. Wu, "The performance of MapReduce: an in-depth study", Proceedings of the VLDB Endowment. vol. 3, no. 1, (2010). [3] J. Dean and S. Ghemawat, "Mapreduce: a flexible data processing tool", Communications of the ACM, vol. 53, no. 1,(2010) [4] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, "The Hadoop Distributed File System", IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), (2010) May 3-7: Incline Village, USA. [5] Tao Gu, Chuang Zuo, Qun Liao, Yulu Yang and Tao Li "Improving MapReduce Performance by Data Prefetching in Heterogeneous or Shared Environments " International Journal of Grid and Distributed Computing Vol.6, No.5 (2013), pp.71-82 [6] Jiadong Wu and Bo Hong, "Improving MapReduce Performance by Streaming Input Data from Multiple Replicas" Cloud Computing Technology and Science (CloudCom), 2013 IEEE 5th International Conference Vol 1,(2013) [7] Mitra, Arnab. "On Type D Fuzzy Cellular Automata-Based MapReduce Model in Industry 4.0." In Industrial Transformation, pp. 209-222. CRC Press, 2022. [8] Sheheeda Manakkadu, Srijan Prasad Joshi, Tom Halverson, and Sourav Dutta. "Top-k User-Based Collaborative Recommendation System Using MapReduce." In 2021 IEEE International Conference on Big Data (Big Data), pp. 4021-4025. IEEE, 2021. [9] Daghighi, Amirali, and Jim Q. Chen. "Robustness Comparison of Scheduling Algorithms in MapReduce Framework." In Intelligent Computing, pp. 494-508. Springer, Cham, 2022. [10] Li, Wei. "Link Mining and Topology Fusion of Social Network Nodes Based On MapReduce." In 2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS), pp. 1066-1069. IEEE, 2022.
Keywords
— MapReduce, Prefetching, Streaming.