Information retrieval and LLMs
Industrial Problems Seminar
Charlie Godfrey
AKASA
Abstract
Information retrieval, the task of returning documents relevant to a query from a potentially large corpus, is a classical AI problem with a deep history. Today there is a rich interplay between large language models and IR. On the one hand, exposing information retrieval tools to language models grounds their outputs (dramatically reducing hallucinations and increasing accuracy) and unlocks novel applications like deep research. On the other hand, LLMs are being used to improve the performance of information retrieval systems at practically every stage of the search pipeline -- to name a few specific applications: query rewriting, LLM document ranking, generation of synthetic data for training encoders and relevance labeling for AB testing. Following a brief survey of these research areas, we will discuss the process of building a benchmark for retrieval for a corpus of tax and accounting documents, as well as experiments comparing leading algorithms for LLM document ranking.