Comparative Analysis of Data Structures for Approximate Nearest Neighbor Search

A. Ponomarenko; N. Avrelin; Naidan B.; Boytsov L.

?

Comparative Analysis of Data Structures for Approximate Nearest Neighbor Search

P. 125-130.

Ponomarenko A., Avrelin N., Naidan B., Boytsov L.

Similarity searching has a vast range of applications in various fields of computer science. Many methods have been proposed for exact search, but they all suffer from the curse of dimensionality and are, thus, not applicable to high dimensional spaces. Approximate search methods are considerably more efficient in high dimensional spaces. Unfortunately, there are few theoretical results regarding the complexity of these methods and there are no comprehensive empirical evaluations, especially for non-metric spaces. To fill this gap, we present an empirical analysis of data structures for approximate nearest neighbor search in high dimensional spaces. We provide a comparison with recently published algorithms on several data sets. Our results show that small world approaches provide some of the best tradeoffs between efficiency and effectiveness in both metric and non-metric spaces.

Language: English

Full text

Text on another site

Keywords: metric space nearest neighbor search non-metric search approximate search small world graphs

In book

DATA ANALYTICS 2014, The Third International Conference on Data Analytics

[б.и.], 2014

Exact Search with Small World Data Structure for One Dimensional Metric Space

Ponomarenko A., Yury M., Andrey L. et al., Business Informatics 2014

The ability to scale is desirable in computer system as well as business settings. The distributed systems clearly demonstrate this ability and powerfulness to process a very big amount of data. Many system that have distributed architecture like Hadoop file system or distributed torrent tracker are based on the distribute hash table (DHT) which manages ...

Added: October 20, 2014

Query-Based Improvement Procedure and Self-Adaptive Graph Construction Algorithm for Approximate Nearest Neighbor Search

Ponomarenko A., Lecture Notes in Computer Science 2015 P. 314-319

The nearest neighbor search problem is well known since 60s. Many approaches have been proposed. One is to build a graph over the set of objects from a given database and use a greedy walk as a basis for a search algorithm. If the greedy walk has an ability to find the nearest neighbor in ...

Added: October 9, 2015

Lines in hypergraphs

Beaudou L., Bondy A., Chen X. et al., Combinatorica 2013 Vol. 33 No. 6 P. 633-654

One of the De Bruijn-Erdős theorems deals with finite hypergraphs where every two vertices belong to precisely one hyperedge. It asserts that, except in the perverse case where a single hyperedge equals the whole vertex set, the number of hyperedges is at least the number of vertices and the two numbers are equal if and ...

Added: April 11, 2019

The joint modulus of variation of metric space valued functions and pointwise selection principles

Vyacheslav V. Chistyakov, Svetlana A. Chistyakova, / Cornell University. Series math "arxiv.org". 2016. No. 1601.07298.

Given a subset T of real numbers and a metric space M, we introduce a nondecreasing sequence {v_n} of pseudometrics on the set M^T of all functions from T into M, called the joint modulus of variation. We prove that if two sequences of functions {f_j} and {g_j} from M^T are such that {f_j} is ...

Added: February 12, 2016

Minimum—weight perfect matching for nonintrinsic distances on the line

Delon J., Salomon J., Sobolevski A., Journal of Mathematical Sciences 2012 Vol. 181 No. 6 P. 782-791

We consider a minimum-weight perfect matching problem on the line and establish a "bottom-up" recursion relation for partial minimum-weight matchings. ...

Added: May 11, 2012

Modular metric spaces. II. Application to superposition operators

Chistyakov V., Nonlinear Analysis 2010 Vol. 72 No. 1 P. 15-30

The notion of a modular is introduced as follows. A (metric) modular on a set X is a function w:(0,∞)×X×X→[0,∞] satisfying, for all x,y,z∈X, the following three properties: x=y if and only if w(λ,x,y)=0 for all λ>0; w(λ,x,y)=w(λ,y,x) for all λ>0; w(λ+μ,x,y)≤w(λ,x,z)+w(μ,y,z) for all λ,μ>0. We show that, given x0∈X, the set Xw={x∈X:limλ→∞w(λ,x,x0)=0} is a ...

Added: January 25, 2013

Growing Homophilic Networks Are Natural Navigable Small Worlds

Мальков Ю. А., Ponomarenko A., Plos One 2016 Vol. 11 No. 6 P. 1-14

Navigability, an ability to find a logarithmically short path between elements using only local information, is one of the most fascinating properties of real-life networks. However, the exact mechanism responsible for the formation of navigation properties remained unknown. We show that navigability can be achieved by using only two ingredients present in the majority of ...

Added: September 9, 2016

The joint modulus of variation of metric space valued functions and pointwise selection principles

Vyacheslav V. Chistyakov, Svetlana A. Chistyakova, Studia Mathematica 2017 Vol. 238 No. 1 P. 37-57

Given a subset $T$ of the reals $R$ and a metric space $M$, we introduce a nondecreasing sequence $\{\nu_n\}$ of pseudometrics on $M^T$ (the set of all functions from $T$ into $M$), called the joint modulus of variation. We prove that if two sequences $\{f_j\}$ and $\{g_j\}$ of functions from $M^T$ are such that $\{f_j\}$ ...

Added: May 11, 2017

СРАВНИТЕЛЬНЫЙ АНАЛИЗ СТРУКТУР ДАННЫХ ДЛЯ ПРИБЛИЖЕННОГО ПОИСКА БЛИЖАЙШЕГО СОСЕДА

Ponomarenko A., Avrelin N., Найдан Б. С. et al., Алгоритмы, методы и системы обработки данных 2015 Т. 4 № 33 С. 91-106

Поиск по похожести широко применяется в различных областях компьютерных наук. Множество методов было предложено для решения задачи в точной постановке, однако все они подвержены "проклятью" размерности и не эффективны для данных высокой размерности. Приближенные алгоритмы отчасти позволяют справиться с "проклятьем". Однако из-за сложной стохастической природы, теоретические оценки для большинства приближенных алгоритмов отсутствуют. Более того, на ...

Added: September 27, 2016

The Inverted Multi-Index

Babenko A., IEEE Transactions on Pattern Analysis and Machine Intelligence 2014 Vol. PP No. 99 P. 1

A new data structure for efficient similarity search in very large datasets of high-dimensional vectors is introduced. This structure called the inverted multi-index generalizes the inverted index idea by replacing the standard quantization within inverted indices with product quantization. For very similar retrieval complexity and pre-processing time, inverted multi-indices achieve a much denser subdivision of ...

Added: December 19, 2014

Approximate nearest neighbor algorithm based on navigable small world graphs

Malkov Y., Ponomarenko Alexander, Krylov V. et al., Information Systems 2014 Vol. 45 No. DOI 10.1016/j.is.2013.10.006 P. 61-68

We propose a novel approach to solving the approximate k-nearest neighbor search problem in metric spaces. The search structure is based on a navigable small world graph with vertices corresponding to the stored elements, edges to links between them, and a variation of greedy algorithm for searching. The navigable small world is created simply by keeping ...

Added: January 24, 2014

Theatre Plays as ‘Small Worlds’? Network Data on the History and Typology of German Drama, 1730–1930

Fischer F., Göbel M., Kampkaspar D. et al., , in : Digital Humanities 2016. Conference Abstracts (Jagiellonian University & Pedagogical University, Kraków, 11–16 July 2016). : Kraków : [б.и.], 2016. P. 255-258.

Decades ago, alongside more traditional structuralist paradigms that were largely based on linguistic theorems (Lotman 1972, Titzmann 1977), literary studies began to undertake structural analyses based on empirical sociology, in particular the social network analysis. Structure was no longer solely defined by semantic relations (such as opposition or equivalence), but by social interactions, too (Marcus ...

Added: November 9, 2016

Modular metric spaces, I: Basic concepts

Chistyakov V., Nonlinear Analysis 2010 Vol. 72 No. 1 P. 1-14

Added: September 26, 2012

Scalable Distributed Algorithm for Approximate Nearest Neighbor Search Problem in High Dimensional General Metric Spaces

Yury M., Ponomarenko A., Vladimir K. et al., Lecture Notes in Computer Science 2012 No. 7404 P. 132-147

We propose a novel approach for solving the approximate nearest neighbor search problem in arbitrary metric spaces. The distinctive feature of our approach is that we can incrementally build a non-hierarchical distributed structure for given metric space data with a logarithmic complexity scaling on the size of the structure and adjustable accuracy probabilistic nearest neighbor ...

Added: October 1, 2013

Pointwise selection theorems for metric space valued bivariate functions

Vyacheslav V. Chistyakov, Svetlana A. Chistyakova, Journal of Mathematical Analysis and Applications 2017 Vol. 452 No. 2 P. 970-989

We introduce a pseudometric TV on the set M^X of all functions mapping a rectangle X on the plane R^2 into a metric space M, called the total joint variation. We prove that if two sequences {fj} and {gj} of functions from M^X are such that {fj} is pointwise precompact on X, {gj} is pointwise ...

Added: April 13, 2017

Совместный модуль вариации функций и условно регулярный принцип выбора

С.А.Чистякова, В.В.Чистяков, В кн. : Труды Математического центра имени Н.И.Лобачевского. Т. 54: Теория функций, ее приложения и смежные вопросы.: Каз. : Издательство Казанского математического общества и Академии наук РТ, 2017. С. 399-402.

Given a closed interval $I=[a,b]$ and a metric space $(M,d)$, we introduce a nondecreasing sequence $\{\nu_n\}$ of pseudometrics on $M^I$ (the set of all functions from $I$ into $M$), called the {\it joint modulus of variation}. We show that if two sequences of functions $\{f_j\}$ and $\{g_j\}$ from $M^I$ are such that $\{f_j\}$ is pointwise relatively compact on $I$, ...

Added: August 29, 2017

The approximate variation to pointwise selection principles

Vyacheslav V. Chistyakov, / Cornell University Library, NY, USA. Series arXiv [math.FA] "Functional Analysis". 2019. No. arXiv: 1910.08490.

Let $T\subset\mathbb{R}$, $M$ be a metric space with metric $d$, and $M^T$ be the set of all functions mapping $T$ into $M$. Given $f\in M^T$, we study the properties of the approximate variation $\{V_\varepsilon(f)\}_{\varepsilon>0}$, where $V_\varepsilon(f)$ is the greatest lower bound of Jordan variations $V(g)$ of functions $g\in M^T$ such that $d(f(t),g(t))\le\varepsilon$ for all $t\in T$. The notion of $\varepsilon$-variation ...

Added: October 21, 2019

The inverted multi-index

Babenko A., Lempitsky V., IEEE Transactions on Pattern Analysis and Machine Intelligence 2015 Vol. 37 No. 6 P. 1247-1260

Added: September 3, 2015