?
Optimization of Network Interaction on a High-Performance Cluster Using Graph Scheduling
Data access time become large problem for data processing in distributed environments with a growth of system scale and data size and source of software optimization. In this paper, we research the problem of optimizing I/O and processing operations for a distributed Hadoop cluster that process data in HPC paradigm framework. To increase the efficiency of the framework that processes queries in HDFS, a graph algorithm were developed with respect to optimal use of resources using HDFS data locality on a processing graph. The graph uses data file blocks and replicas, hosts and workers as a nodes, and links between them as edges. The results of reading stage optimization phase was implemented and performance improvements measured comparing baseline and Spark as Industry standard framework.