Skip to content

LLM event compaction loses conversation language context - model switches to other languages (or hallucinates tools) after summarization #6273

Description

@pr-maia

Describe the Bug:

After EventsCompactionConfig summarizes conversation history, the model loses track of the conversation language and responds in an unrelated language (Russian in our case, despite an English conversation with English-speaking users).

We believe the LlmEventSummarizer produces a language-neutral summary that strips the implicit "we're speaking English" signal carried by the raw conversation history. When the model then processes large tool responses (200-300K chars of retrieved documents) alongside this compacted context, it has no strong language anchor and drifts.

Context — What We're Building:

A typical session accumulates 50-80+ events over a week, with each invocation fetching 100-300K chars of document content via tool responses. We use EventsCompactionConfig to keep sessions manageable:

_events_compaction = EventsCompac
_summarizer = LlmEventSummarizer(model="gemini-2.5-flash")

We also use server-side ContextWindowCompressionConfig as a safety net:

context_window_compression_config=ContextWindowCompressionConfig(
sliding_window=SlidingWindow(
trigger_tokens=800_000,
target_tokens=200_000,
)
)

Steps to Reproduce:

  1. Create a DatabaseSessionService-backed agent with EventsCompactionConfig(compaction_interval=10, overlap_size=2)
  2. Have a multi-day conversation (

Observed Behavior:

After ~81 events in session (multiple compaction cycles), a follow-up question received a response entirely in Russian ("Я подготовил подробную сравнительную таблицу..."), despite:

  • The user writing in English
  • All previous responses being in English
  • The system prompt explicitly stating "Match the user's language"

The session had been active for 3 days with multiple compaction cycles. The final invocation fetched ~295K chars of document content across 6 tool calls. The large tool response volume likely overwhelmed the weak language signal left after compaction.

Environment Details:

  • ADK Library Version: 2.3.0
  • Desktop OS: Linux (Docker, python:3.11-slim)
  • Python Version: 3.11

Model Information:

  • Are you using LiteLLM: No
  • Which model is being used: gemini-3.5-flash (agent), gemini-2.5-flash (compaction summarizer)

Logs:

Session event trace showing the language switch on the final model response:

[10:52:26] user: "Do sell sides forecast the component of purchase of housing separately
in New Zealand? i'd like to get a grasp of their "rmse", can you check
the last 12 months of cpi releases and answer me?"
[10:52:26] model: function_call retrieve_by_bank (NZ CPI housing)
[10:52:46] tool: response (35 chunks, 118K chars used)
[10:52:46] model: function_call fetch_documents (7 docs)
[10:52:56] tool: response (budget: 231K chars used)
[10:52:56] model: function_call retrieve_by_bank (NZ home ownership CPI)
[10:53:26] tool: response (90 chunks)
[10:53:26] model: function_call fetch_documents (7 docs)
[10:53:29] tool: response (budget: 295K chars used)
[10:53:29] model: text response IN RUSSIAN — "Я подготовил подробную сравнительную таблицу..."

This was event #81 in a session spanning June 29 – July 2, with compaction_interval=10 (so compaction had run ~7-8 times).

Suggested Mitigation:

The LlmEventSummarizer could explicitly preserve the conversation language in its summary, e.g. by prepending "Conversation language: English" (detected from user messages) to the compacted output. This would give the model a strong language anchor even after summarization.

Workaround:

We added explicit language enforcement to our system prompt:

Language: ALWAYS respond in the same language the user wrote in.
Never switch to another language (e.g. Russian, Chinese) regardless of the
language of any documents, data, or tool responses you process.

This mitigates but does not fully solve the issue — with very large tool contexts after multiple compaction cycles, the system prompt signal can still be overwhelmed.

How often has this issue occurred?:

  • Intermittently (<50%) — occurs in long-lived sessions (50+ events, multiple compaction cycles) combined with large tool response payloads (200K+ chars). We've observed it once definitively but suspect earlier instances went unreported by users.

Metadata

Metadata

Assignees

Labels

models[Component] Issues related to model support

Type

Fields

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions