Active database architecture for XML documents
We introduce the active XML database architecture to build very large, scalable, loosely structured distributed data storage. Traditionally data is regarded as passive records operated by DBMS software. Our idea is that every data unit is active, capable of communication with other data units and database clients. Combined with the special overlay structure incrementally formed by data units (Metrized Small World Graph) this provides for effective distribution of data units among database servers and unbounded scalability of the resulting storage ensuring logarithmic search and append complexity. Each active data unit is represented as an XML document addressable by a unique URL having a locally stored extendable set of XLink links to other data units, and a software module driving the communication with other data units and clients. Search in this structure is performed by sequential and/or parallel crawling following the links in the list obtained on each step. The active data units communicate by sending XML messages over a transport protocol such as HTTP. The communication includes the retrieval of XML content and link lists, addition of new links, calculating query relevance and work delegation (so that every unit can actively propagate the process initiated or mediated by another unit). Since there are no central controlling nodes in the structure, multiple processes of adding new data units and searching for existing data can be performed independently and simultaneously, and begin with any existing data unit. Moreover, because the data units are active, these processes may propagate on they own without being fully dictated by originator. This allows the distribution of data processing along with the distribution of data itself. We have built a prototype implementation of the architecture. The analysis of the small world overlay structure properties confirmed the possibility of building efficient XML data storages which contain hundreds of petabytes of data.