Generic Distributed In Situ Aggregation for Earth Remote Sensing Imagery
Earth remote sensing imagery come from satellites,
unmanned aerial vehicles, airplanes, and other sources. National agen-
cies, commercial companies, and individuals across the globe collect enor-
mous amounts of such imagery daily. Array DBMS are one of the promi-
nent tools to manage and process large volumes of geospatial imagery.
The core data model of an array DBMS is an N-dimensional array.
Recently we presented a geospatial array DBMS – ChronosDB – which
outperforms SciDB by up to 75× on average. We are about to launch a
Cloud service running our DBMS. SciDB is the only freely available dis-
tributed array DBMS to date. Remote sensing imagery are traditionally
stored in files of sophisticated formats, not in databases. Unlike SciDB,
ChronosDB does not require importing files into an internal DBMS for-
mat and works with imagery “in situ”: directly in their native file for-
mats. This is one of the many virtues of ChronosDB. It has now certain
aggregation capabilities, but this paper focuses on more advanced aggre-
gation queries which still constitute a large portion of a typical work-
load applied to remote sensing imagery. We integrate the aggregation
types into the data model, present the respective algorithms to perform
aggregations in a distributed fashion, and thoroughly compare the per-
formance of our technique with SciDB. We carried out experiments on
real-world data on 8- and 16-node clusters in Microsoft Azure Cloud.