Automatic chat compression

If the chat with the agent becomes too large, it may hit the context window limit. The model starts working worse, while cost and response time go up.

Automatic compression reduces the chat history when it grows too large. First, the plugin uses built-in rules without making an extra LLM request. If that is not enough and the context still overflows, the plugin sends the history to an LLM for summarization. A short summary replaces the older conversation, while the latest user message stays unchanged.

You can configure when automatic compression runs and trigger it manually from the chat interface, as shown in the video.