The population annealing method is a promising approach for large-scale simulations because it is potentially scalable on any parallel architecture. We present an implementation of the algorithm on a hybrid program architecture combining CUDA and MPI. The problem is to keep all general-purpose graphics processing unit devices as busy as possible by efficiently redistributing replicas. We provide details of testing on hardware based the Intel Skylake/Nvidia V100 running more than two million replicas of the Ising model sample in parallel. The results are quite encouraging because the acceleration grows toward the perfect line as the complexity of the simulated system increases.
GridMD is a C++ class library intended for constructing simulation applications and running them in
distributed environments. The library abstracts away from details of distributed environments, so that almost no knowledge of distributed computing is required from a physicist working with the library. She or he just uses GridMD function calls inside the application C++ code to perform parameter sweeps or other tasks that can be distributed at run-time. In this paper we briefly review the GridMD architecture. We also describe the job manager component which submits jobs to a remote system. The C++ source code of our PBS job manager may be used as a standalone tool and it is freely available as well as the full library source code. As illustrative examples we use simple expression evaluation codes and the real application of Coulomb cluster explosion simulation by Molecular Dynamics.
Population annealing is a promising recent approach for Monte Carlo simulations in statistical physics, in particular for the simulation of systems with complex free-energy landscapes. It is a hybrid method, combining importance sampling through Markov chains with elements of sequential Monte Carlo in the form of population control. While it appears to provide algorithmic capabilities for the simulation of such systems that are roughly comparable to those of more established approaches such as parallel tempering, it is intrinsically much more suitable for massively parallel computing. Here, we tap into this structural advantage and present a highly optimized implementation of the population annealing algorithm on GPUs that promises speed-ups of several orders of magnitude as compared to a serial implementation on CPUs. While the sample code is for simulations of the 2D ferromagnetic Ising model, it should be easily adapted for simulations of other spin models, including disordered systems. Our code includes implementations of some advanced algorithmic features that have only recently been suggested, namely the automatic adaptation of temperature steps and a multi-histogram analysis of the data at different temperatures.
We report on simulation technique and benchmarks for molecular dynamics simulations of the relaxation processes in solids and liquids using the graphics processing units (GPUs). The implementation of a manybody potential such as the embedded atom method (EAM) on GPU is discussed. The benchmarks obtained by LAMMPS and HOOMD packages for simple Lennard-Jones liquids and metals using EAM potentials are presented for both Intel CPUs and Nvidia GPUs. As an example the crystallization rate of the supercooled Al melt is computed.
Two-dimensional structures grown with Witten and Sander algorithm are investigated. We analyze clusters grown off-lattice and clusters grown with antenna method with Nfp=3,4,5,6,7 and 8 allowed growth directions. With the help of variable probe particles technique we measure fractal dimension of such clusters D(N) as a function of their size N . We propose that in the thermodynamic limit of infinite cluster size the aggregates grown with high degree of anisotropy (Nfp=3,4,5) tend to have fractal dimension D equal to 3/2, while off-lattice aggregates and aggregates with lower anisotropy (Nfp>6) have D≈1.710. Noise-reduction procedure results in the change of universality class for DLA. For high enough noise-reduction value clusters with Nfp>6 have fractal dimension going to 3/2 when N→∞.
mcsanc is a Monte-Carlo tool based on the SANC (Support for Analytic and Numeric Calculations for experiments at colliders) modules for higher order calculations in hadron collider physics. It allows to evaluate NLO QCD and EW cross sections for Drell-Yan processes (inclusive), associated Higgs and gauge boson production and single-top quark production in s- and t-channel. The paper contains theoretical description of the SANC approach, numerical validations and manual.
The library PRAND for pseudorandom number generation for modern CPUs and GPUs is presented. It contains both single-threaded and multi-threaded realizations of a number of modern and most reliable generators recently proposed and studied in Barash (2011), Matsumoto and Tishimura (1998), L’Ecuyer (1999,1999), Barash and Shchur (2006) and the efficient SIMD realizations proposed in Barash and Shchur (2011). One of the useful features for using PRAND in parallel simulations is the ability to initialize up to 1019independent streams. Using massive parallelism of modern GPUs and SIMD parallelism of modern CPUs substantially improves performance of the generators.
We present the random number generator (RNG) library RNGAVXLIB, which contains fast AVX realizations of a number of modern random number generators, and also the abilities to jump ahead inside a RNG sequence and to initialize up to 1019 independent random number streams with block splitting method. Fast AVX implementations produce exactly the same output sequences as the original algorithms. Usage of AVX vectorization allows to substantially improve performance of the generators. The new realizations are up to 2 times faster than the SSE realizations implemented in the previous version of the library (Barash and Shchur, 2013), and up to 40 times faster compared to the original algorithms written in ANSI C.
In this update, we present the new version of the random number generator (RNG) library RNGSSELIB,which, in particular, contains fast SSE realizations of a number of modern and most reliable generators . The new features are: (i) Fortran compatibility and examples of using the library in Fortran; (ii) new modern and reliable generators; (iii) the abilities to jump ahead inside a RNG sequence and to initialize up to 1019 independent random number streams with block splitting method.
The library RNGSSELIB for random number generators (RNGs) based upon the SSE2 command set is presented. The library contains realization of a number of modern and most reliable generators. Usage of SSE2 command set allows to substantially improve performance of the generators. Three new RNG realizations are also constructed. We present detailed analysis of the speed depending on compiler usage and associated optimization level, as well as results of extensive statistical testing for all generators using available test packages. Fast SSE implementations produce exactly the same output sequence as the original algorithms.
Monte Carlo (MC) simulations and series expansions (SE) data for the energy, specific heat, magnetization, and susceptibility of the three-state and four-state Potts model and the Baxter–Wu model on the square lattice are analyzed in the vicinity of the critical point in order to estimate universal combinations of critical amplitudes. We also form effective ratios of the observables close to the critical point and analyze how they approach the universal critical-amplitude ratios. In particular, using the duality relation, we show analytically that for the Potts model with a number of states q⩽4q⩽4, the effective ratio of the energy critical amplitudes always approaches unity linearly with respect to the reduced temperature. This fact leads to the prediction of relations among the amplitudes of correction-to-scaling terms of the specific heat in the low- and high-temperature phases. It is a common belief that the four-state Potts and the Baxter–Wu model belong to the same universality class. At the same time, the critical behavior of the four-state Potts model is modified by logarithmic corrections while that of the Baxter–Wu model is not. Numerical analysis shows that critical amplitude ratios are very close for both models and, therefore, gives support to the hypothesis that the critical behavior of both systems is described by the same renormalization group fixed point.
We present a comparative study of several algorithms for an in-plane random walk with a variable step. The goal is to check the efficiency of the algorithm in case where the random walk terminates at some boundary. We recently found that a finite step of the random walk produces a bias in the hitting probability and this bias vanishes in the limit of an infinitesimal step. Therefore, it is important to know how a change in the step size of the random walk influences the performance of simulations. We propose an algorithm with the most effective procedure for the step-length-change protocol.