Algebra of Information in Big Data Processing
In big data problems the data usually are collected on many sites, have a huge volume, and new pieces of data are constantly generated. It is often impossible to collect all the data needed for a research project on one computer, and even impractical, since one computer would not be able to process it in a reasonable time. An appropriate data analysis algorithm should, working in parallel on many computers, extract from each set of raw data some intermediate compact “information”, gradually combine and update it, and finally, use the accumulated information to produce the result. When new data appears, it must extract information from them, add it to the accumulated one, and eventually update the result. We consider several examples of a suitable transformation of processing algorithms, discuss specific features of the emerging information spaces and, in particular, their algebraic properties. We also show that the information space often can be equipped with an order relation that reflects the "quality" of the information.