Interplay between test sets and statistical procedures in ranking DFT methods: The case of electron density studies
The task of choosing a reliable density functional (DFT) approximation remains one of the most puzzling ones in quantum chemical modeling and materials simulations. Since DFT functionals are in general not systematically improvable, benchmarking them on specifically designed test sets is the usual way for identifying a method best suited for a particular purpose. To get an answer from a bunch of numbers, statistical analysis should be applied. In this article the possibilities and pitfalls of statistical error analysis are discussed, taking as an example the ranking of approximate functionals by the accuracy of their self-consistent electron densities, which were recently shown to have worsened in the last decade.