Formalization and taxonomy of compute-aggregate problems for cloud computing applications
Efficient representation of data aggregations is a fundamental problem in modern big data applications, where network topologies and deployed routing and transport mechanisms play a fundamental role in optimizing desired objectives such as cost, latency, and others. In traditional networking, applications use TCP and UDP transports as a primary interface for implemented applications that hide the underlying network topology from end systems. On the flip side, to exploit network infrastructure in a better way, applications restore characteristics of the underlying network. In this work, we demonstrate that both specified extreme cases can be inefficient to optimize given objectives. We study the design principles of routing and transport infrastructure and identify extra information that can be used to improve implementations of compute-aggregate tasks. We build a taxonomy of compute-aggregate services unifying aggregation design principles, propose algorithms for each class, analyze them theoretically, and support our results with an extensive experimental study.