ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT (Khattab & Zaharia)

March 8, 2025

-contextualized late interaction with BERT = ColBERT
-representation-based similarity
-query => 1 embedding
-document => 1 embedding
-similarity score calculated
-query-document interaction
-word/phrase-level relationships between q and d instead of individual embeddings
-matched using a deep neural network
-All-to-all interaction (e.g. BERT)
-models interaction between and across words/phrases in q and d at the same time
-Late interaction
-query is broken up into tokens + embedded
-document is broken up into tokens + embedded
-for each query embedding, the document embedding that yields in the highest similarity score is determined
-these similarity scores are added up to determine the total similarity score
-ranked across documents to get results

Query Augmentation

Masking \\[#\\] - encourage inference for missing context/meaning

Padding \\[PAD\\] - no meaningful information exists beyond the query length

Query Expansion - append synonyms, top-k retrieved terms, etc.

Learnable Tokens \\[BLANK\\] - can be fine-tuned to teach BERT to fill in missing query tokens/context in an optimal way

Similarity between q and d = sum of maximum similarity between query token/document over all query tokens

-can compute/store document embeddings offline because they are computed independently, irrespective of the query