Text Indexing

Main Idea

Looking at a huge text or a collection of documents, searching for a pattern naively is not very fast. Especially if there are multiple queries for different patterns, since the text or the documents need to be scanned more than once. An index provides additional information about the text or the documents allowing to search patterns more efficiently. As the index needs to be created only once it can be used for all queries as long as the text or the documents do not change. Still the computation of the index should be reasonably fast and memory efficient as indices are created for huge texts and collections of documents.

In this group we mainly focus on full-text indices which are able to answer every query regardless of the pattern. Popular representatives of this type of indices are suffix arrays and suffix trees.

Current Projects

Members and Alumni

Last modified: 2020-10-02 09:08 by Johannes Fischer