Indexes for fast code search

In our plugin there are two types of search: similar search and exact search.

Both indexes are built on classic information retrieval methods, so they are very fast and can be computed even on low-powered machines. These indexes do not rely on any ML resources—in particular, they do not use embeddings, vector databases, or anything similar.

Exact search is a text search across arbitrary substrings. Its index is an inverted n-gram index that source texts and queries are mapped to. The index is built from the project source code. The agent uses it to search across the whole project, because on large codebases it is faster than the built-in IDEA Platform search while keeping the response time O(1) regardless of project size.

Similar search is a lightweight alternative to semantic search with embeddings. This index is built both from the project source code and from dependency code (libraries). It acts as an advanced analyzer of texts and queries that supports:

keyword search;
morphological analysis of natural languages;
camel case;
synonyms;
language-specific optimizations (different text extractors for different programming languages).