Evaluating Array DBMS Compression Techniques for Big Environmental Datasets
Earth remote sensing imagery come from satellites, unmanned aerial vehicles, airplanes, and other sources. National agencies, commercial companies, and individuals across the globe collect enormous amounts of such imagery daily. Array DBMS are one of the prominent tools to manage and process large volumes of geospatial imagery. Recently we presented ChronosDB — innovative geospatial array DBMS that outperforms SciDB by up to 75× on average. SciDB is the only freely available distributed array DBMS to date. Unlike SciDB, ChronosDB does not require importing files into an internal DBMS format and works with imagery “in situ”: directly in their native file formats. This is one of the many virtues of ChronosDB. In this paper, we investigate the impact of data compression on the performance of array processing operations. We compress the data with diverse methods and explore compression impact on the processing speed. We thoroughly compare the performance on source and compressed data in ChronosDB and SciDB on real-world data on computer clusters in Microsoft Azure Cloud.