Research‎ > ‎

Distributed Kd-Trees for Large Scale Image Search


Distributed Kd-Trees is a method for building image retrieval systems that can handle hundreds of millions of images. It is based on dividing the Kd-Tree into a “root subtree” that resides on a root machine, and several “leaf subtrees”, each residing on a leaf machine. The root machine handles incoming queries and farms out feature matching to an appropriate small subset of the leaf machines. Our implementation employs the MapReduce architecture to efficiently build and distribute the Kd-Tree for millions of images. It can run on thousands of machines, and provides orders of magnitude more throughput than the state-of-the-art, with better recognition performance. We show experiments with up to 100 million images running on 2048 machines, with run time of a fraction of a second for each query image.


  • Mohamed Aly, Mario Munich, and Pietro Perona. Distributed Kd-Trees for Retrieval from Very Large Image Collections.
    British Machine Vision Conference (BMVC), DuDundee, UK, August 2011. [pdf]