?
Association Algorithm for Two Dynamically Enlarging Tables Implemented in Apache Spark
In the paper we consider association problem with constraints for two dynamically enlarging tables. We consider an ordered set of rule groups which determine associations between entries from the first table and the second table. Each entry is associated with other entries from both tables directly or indirectly through the other associations. In the problem it is needed to list the associated entries for each entry. Tables are dynamically enlarging, the goal is to improve potential performance of the association process by using of the previously built associations. We consider a base full association algorithm and propose a partial association algorithm that improves the efficiency of the base algorithm, implement and evaluate both algorithms in Apache Spark for a particular case on 12 cluster nodes.