Stitcher for Podcasts

Get the App Open App
Bummer! You're not a
Stitcher Premium subscriber yet.
Learn More
Start Free Trial
$4.99/Month after free trial
HELP

Episode Info

Episode Info:

Episode Summary: Leo Boytsov, a PhD researcher from the Language Technologies Institute of Carnegie Mellon University, talks about fast approximate search in modern information retrieval. We discuss the curse of dimensionality, hard-to-beat baselines and NMSLIB, Leo's super fast library for nearest-neighbour search. How does NMSLIB compare to Facebook's FAISS and Spotify's Annoy? Warning: very technical.

Links & resources:

  1. NMSLIB: Leonid's Non-Metric Space Library (NMSLIB) for efficient similarity search.

  2. Facebook's FAISS and Spotify's Annoy, alternative libraries for the same thing.

  3. Erik Bernhardsson's excellent repo for benchmarking kNN search libraries.

  4. My original 2014 benchmark series of Python libraries for nearest neighbour search in high dimensional spaces.

  5. Off the Beaten Path: Let’s Replace Term-Based Retrieval with k-NN Search (YouTube): Leo's talk at the Allen Institute for Artificial Intelligence (AI2). ‏

  6. PyFastPFor: Fast integer compression—for anyone who needs fast integer compression :-)

  7. A few extra links Leo wanted you to see:
    Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors. D Baranchuk, A Babenko, Y Malkov. arxiv.org, 2018.
    Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. P Clark, O Etzioni, T Khot, A Sabharwal, O Tafjord… aaai.org, 2017.
    Revisiting unreasonable effectiveness of data in deep learning era. C Sun, A Shrivastava, S Singh. arxiv.org, 2017.

Subscribe with RSS, iTunes, YouTube, Stitcher, SoundCloud.
Archive of previous episodes.

Read more »

Discover more stories like this.

Like Stitcher On Facebook

EMBED

Episode Options

Listen Whenever

Similar Episodes

Related Episodes