Algorithm for replica redistribution in an implementation of the population annealing method on a hybrid supercomputer architecture
The population annealing method is a promising approach for large-scale simulations because it is potentially scalable on any parallel architecture. We present an implementation of the algorithm on a hybrid program architecture combining CUDA and MPI. The problem is to keep all general-purpose graphics processing unit devices as busy as possible by efficiently redistributing replicas. We provide details of testing on hardware based the Intel Skylake/Nvidia V100 running more than two million replicas of the Ising model sample in parallel. The results are quite encouraging because the acceleration grows toward the perfect line as the complexity of the simulated system increases.