ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT (Khattab & Zaharia)

March 8, 2025

-contextualized late interaction with BERT = ColBERT

-representation-based similarity

-query => 1 embedding

-document => 1 embedding

-similarity score calculated

-query-document interaction

-word/phrase-level relationships between q and d instead of individual embeddings

-matched using a deep neural network

-All-to-all interaction (e.g. BERT)

-models interaction between and across words/phrases in q and d at the same time

-Late interaction

-query is broken up into tokens + embedded

-document is broken up into tokens + embedded

-for each query embedding, the document embedding that yields in the highest similarity score is determined

-these similarity scores are added up to determine the total similarity score

-ranked across documents to get results

Masking \\[#\\] - encourage inference for missing context/meaning

Padding \\[PAD\\] - no meaningful information exists beyond the query length

Query Expansion - append synonyms, top-k retrieved terms, etc.

Learnable Tokens \\[BLANK\\] - can be fine-tuned to teach BERT to fill in missing query tokens/context in an optimal way

Similarity between q and d = sum of maximum similarity between query token/document over all query tokens

-can compute/store document embeddings offline because they are computed independently, irrespective of the query