Metrized small world properties data structure
We introduce the information retrieval oriented data structure to build very large, scalable, loosely structured and unstructured distributed data storage. The main idea is to represent data as a set of structured storage units on which a semi-metric can be defined which characterizes the relative relevance of each unit. Then a complex graph can be constructed whose vertices are the storage units and the edges are selected in such a way that the graph has the small world properties and is in accordance with the introduced metric (Metrized Small World Feature). Addition and removal of the data items causes the graph to evolve, while the retrieval of information is based on generating a new vertex, connecting it to the graph and setting up a search process of the data vertices metrically close to the request vertex. Due to the special properties of the constructed graph, the search is accomplished on average in the number of steps logarithmic of the storage size. We built a prototype of such a storage where the data items are represented by XML documents and the graph is expressed by means of XLink. The analysis of the graph properties we performed confirmed the possibility of building efficient XML data storages which contain hundreds of petabytes of data.