Retrospective Satellite Data in the Cloud: An Array DBMS Approach
Earth remote sensing has always been a source of “big” data. Satellite data have inspired the development of “array” DBMS. An array DBMS processes N-dimensional (N-d) arrays utilizing a declarative query style to simplify raster data management and processing. However, raster data are traditionally stored in files, not in databases. Respective command line tools have long been developed to process these files. Most tools are feature-rich and free but optimized for a single machine. The approach of partially delegating in situ raster data processing to such tools has been recently proposed. The approach includes a new formal N-d array data model to abstract from the files and the tools as well as new distributed algorithms based on the model. This paper extends the approach with a new algorithm for the reshaping (tiling) of N-d arrays. The algorithm physically reorganizes the storage layout of N-d arrays to obtain an order of magnitude speedup. The extended approach outperforms SciDB up to 28× on retrospective Landsat data – one of the most typical and popular kind of satellite imagery. SciDB is the only freely available distributed array DBMS to date. Experiments were carried out on an 8-node cluster in Microsoft Azure Cloud.