bacground gradient shape
background gradient
background gradient

TOPICS

AI Agents long term memory - How and why

Long-term memory for AI agents storing context for more personalized responses

Long-term memory (LTM) is the stage of memory where information is stored for extended periods, potentially lasting for days, months, or even a lifetime. LTM in AI agents allows them to store and recall information across multiple interactions and sessions over extensive long period of time.

1. Foundations of the Discussion

“Learning” and “memory” are relatively ambiguous terms, so before diving into the main discussion, let's clarify the definitions used in this article.

This article focuses on medium to long-term capabilities of Agents—essentially, the ideal human-like Agents we aspire to build. From today’s technical standpoint, LLMs and human brains share a great deal of commonality, so many of the points discussed apply to both.

Under this framework, the boundary between learning and memory becomes quite blurry, which is why they are discussed together.

Long-term memory (LTM) is a key to the AI self-evolution according to Long Term Memory : The Foundation of AI Self-Evolution[21 Oct 2024]. 

2. How Agent Learning Capabilities Are Built

Previously, we explored how to construct learning abilities from the technical standpoint through RAG(Retrieval Augmented Generation). With the lightening fast development speed of AI, we have some exciting updates.

Unlike RAG, we believe that both information and experiential memory can be injected in the same way-for example, via prompts.

In most case, the successful experiences should be encoded as executable workflows. This is fundamental of Agentic Workflow. Now with Agent Workforce architecture, in early-stage scenarios, textual prompt workflows combined with tool calls are sufficient to represent both successes and failures. This method also has better generalizability. This differs from the earlier idea of “executable workflows.”

The workforce architecture allows the AI agents to reflect on output, both success or failure, and adapts. For instance, when the agent received error message on using a certain tool. It can self-reflect on the message and re-attempt again with the tool. If the next attempt succeed, it will then store this piece of memory use slightly higher reward on weight.

RAG - Retrieval Augmented Generation

2.1 Chain-of-Thought and reinforcement learning

All of this aligns with the concept of iterative prompting—essentially a few-shot or even one-shot learning process. This is related to "system prompt learning," and Claude’s chatbot system prompt is a real-world example. While we still lack a robust automated method for system prompt iteration, it’s now much more feasible than before.

We have been long emphasizing the RL (reinforcement learning) perspective. The early approach is to focus on decomposing tasks explicitly and optimizing intermediate steps automatically. But recent reasoning models have advanced much faster through a different route.

Questions like how to split tasks, evaluate each step, and allocate total rewards across steps are quite difficult. End-to-end RL training on Chain-of-Thought (CoT) traces provides a way to sidestep these challenges.

Compared to a year and a half ago, although there’s still no mature solution, the implementation path is much clearer now. Our understanding has improved significantly.

3. Memory Capabilities

Theoretically, a reasoning model can be trained on multiple objectives to fuse different abilities with minimal trade-offs. However, in practice, frontier models still exhibit a "seesaw" effect—improving one ability slightly degrades others, and users can perceive this degradation. Gemini 2.5 Pro recently demonstrated this issue.

For purely prompt-based systems, memory capacity becomes even more of a bottleneck. The amount of context a model can effectively handle is limited, and as you accumulate more experiences and memory, you erode future learning capacity.

Neither of these approaches currently meets the criteria for sustainable learning. Of the two, fusion during model training is closer to the goal—but it comes at a very high cost.

3.1 What Claude Code Reveals About Agent Memory

I’ve previously pointed out that Claude Code is like a dev board for building general-purpose Agents. In Claude Code, memory is stored as files in the current directory, and retrieval is done with traditional tools like grep, rg, and awk.

However, the user still needs to organize directory contents and build intermediate representations to facilitate fuzzy matching.

For example, in a code repository, reverse-engineering high-level and detailed design documents from code and saving them in the project directory—then guiding Claude to reference them via a CLAUD.md file—essentially creates a semantic retrieval index. This setup mirrors how human memory works.

This is a user-designed memory system that relies on manual updates or manually prompted generation by Claude Code.

Importantly, Claude Code doesn’t use conventional RAG (retrieval-augmented generation) systems like embeddings, Elasticsearch, or text2SQL + DB. (You could call LLM + grep a form of RAG, but I avoid this terminology. It no longer facilitates clear communication—most people don’t interpret “RAG” that way.)

In the long run, Agents may not need traditional RAG. Claude Code shows that training LLMs to use storage and retrieval tools as memory may be more promising. If the memory resembles a database, it might be better for the LLM to read/write the DB directly.

3.2 Summary

Memory read/write is part of business logic, and it’s hard to make it both generalizable and cost-effective (without scenario-specific optimizations).

Training LLMs via RL to use various read/write tools for memory is currently very expensive. As of now, there’s still no memory solution that balances generality, effectiveness, and cost.

4. Inferring User Intent and Preference

Beyond learning and memory, there’s also the challenge of understanding user intent or preference—in other words, “knowing the user.” The standard here is being able to predict how the user will judge a future event or piece of information.

Many people treat this as a personalized memory issue or a user profiling problem. But these methods can’t reliably predict outcomes with high accuracy. It’s easier to predict human commonalities than individual preferences or domain-specific thinking.

Even humans struggle with this. Try predicting future content from an author who’s been writing or podcasting for over a year—it’s difficult. Only people deeply familiar with someone can make semi-accurate guesses.

Reconstructing a person’s cognition from behavior alone—especially short-term behavior—is extremely hard. The saying “you know someone’s face but not their mind” exists for a reason.

Using user behavior as reward signals in RL might allow some level of modeling. But the cost of this approach is high. And controlled Chain-of-thoughts (CoT) is a quick win.

AgentX provides AI agent long term memory capability by leveraging combination of techniques including RAG and CoT. You are able to create knowledge base for the agent to remember. And during the interaction, the AI Agent is able to self-reflect and store memory with dynamic rewarding and weighting system.

Share Blog

circle image

Start Your AI Automation Journey Today

Start Your AI Automation Journey Today

Sign up for AgentX and let AI handle your routine tasks - no credit card needed.

Sign up for AgentX and let AI handle your routine tasks - no credit card needed.