• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Глава

Efficient Exact Algorithm for Count Distinct Problem

P. 1-11.
Golov N., Filatov A., Bruskin S.

This paper describes and analyses optimization approaches, which make possible the exact calculation of millions of hierarchical count distinct measures over hundreds of billions data rows. Described approach evolved for several years, in parallel with the growth of tasks from a fast growing internet company, and was finally implemented as a PEAPM (Pipelined Exact Accumulation for Paralleled Measures) algorithm. Current version of an algorithm outputs exact values (not estimates), works in a single thread, in minutes using a general commodity hardware, and requires volume of RAM equal to the doubled size of required measures/