An approximate Nearest Neighbor library which even when taken into consideration with large datasets runs faster: TOROS N2

Dec. 6, 2017, 7:34 p.m. By: Kirti Bakshi

TOROS N2

There are many great approximate nearest neighborhoods libraries present today such as annoy and nmslib, but they did not fully succeed in meeting the requirements that were listed in order to handle the Kakao’s dataset. Therefore, in order to overcome the same, it was decided to implement a library that would aim at the improvement of usability and also alongside performs better based on nmslib. And therefore, it finally, resulted in the release of N2 to the world.

As an introduction, N2 is a lightest approximate nearest neighborhoods algorithm library that has been written in C++ also including Python/Go bindings. N2, when compared, provides a much faster search speed than other implementations when modeling is done with a large dataset. Also, another feature of the library is that N2 also supports multi-core CPUs for index building.

The features of the lightweight library codenamed as TOROS N2 are as follows:

  • Greater Efficiency in implementations as N2 is a lightweight library which runs faster even when considered with large datasets.

  • Supports multi-core CPUs for index building.

  • Supports a mmap feature by default for the handling of large index files efficiently.

  • Also Supports Python/Go bindings.

Index Build Times:

TOROS N2

Search Speed:

TOROS N2

Bindings:

The following are the guides that explain how to use N2 with basic examples as well as API.

  • Python

  • C++

  • Go

Comparing And Conclusion:

In short, to be concise, on multi-core CPU's, N2 turns out to perform the best. Even though annoy might be a good choice for small datasets that can be handled by a single thread, However, when it comes to a dataset that is large, and where high indexing performance is critical, N2 is where to go and look forward to.

N2 is also known to run almost 2x faster than annoy. Especially When high precision and accuracy is required, N2 is considered to be just as good.

For More Information: GitHub