mem-agent: AI Model with Persistent Memory

mem-agent: A 4B parameter AI model with persistent memory that rivals models 50x larger. Trained with reinforcement learning on Obsidian-like memory systems.

by HowAIWorks Team
aimemoryreinforcement-learningqwenobsidianpersistent-memoryai-agentsmachine-learninggspoai-research

Introduction

The artificial intelligence landscape has witnessed a groundbreaking development: the creation of mem-agent, the first AI model specifically trained to maintain persistent, human-readable memory. Developed by Dria, this 4B parameter model challenges the fundamental limitation of current large language models - their stateless nature.

Unlike traditional LLMs that cannot acquire new knowledge without additional training, mem-agent can learn, store, and retrieve information across conversations, making it a significant step toward truly intelligent AI systems.

The Problem with Current AI Models

Stateless Limitations

Most current large language models suffer from a critical limitation: they are stateless. This means:

  • No memory persistence: Information from previous conversations is lost
  • No knowledge acquisition: Models cannot learn new facts or procedures during operation
  • Limited context: Only information from the current conversation is available
  • Training dependency: New knowledge requires expensive retraining

The Need for Persistent Memory

The ideal AI system should be able to:

  • Remember past interactions with users
  • Learn new information during conversations
  • Build knowledge graphs of relationships and facts
  • Maintain context across multiple sessions
  • Ask for clarification when information is unclear

mem-agent: A Revolutionary Solution

Core Architecture

mem-agent is built around a sophisticated scaffold that combines:

  • Qwen3-4B-Thinking-2507 as the base model
  • GSPO (Generalized Self-Play Optimization) training algorithm
  • Obsidian-like memory system for persistent storage
  • Structured response format with <think>, <python>, and <reply> tags
graph TD
    A[User Query] --> B[mem-agent Model]
    B --> C{Memory Operation}
    C -->|Retrieve| D[Search Memory]
    C -->|Update| E[Modify Memory]
    C -->|Clarify| F[Ask Questions]
    D --> G[Memory System]
    E --> G
    F --> H[User Response]
    G --> I[Structured Response]
    H --> I
    I --> J[Final Answer]
    
    G --> K[user.md]
    G --> L[entities/]
    L --> M[entity1.md]
    L --> N[entity2.md]
    L --> O[entityN.md]

Memory System Design

The memory system uses a markdown-based structure inspired by Obsidian:

memory/
    ├── user.md
    └── entities/
        ├── dria.md
        ├── project_alpha.md
        ├── family_member.md
        └── ...

Example user.md structure:

# User Information
- user_name: John Doe
- birth_date: 1990-05-15
- living_location: San Francisco, CA
- job_title: Software Engineer

## User Relationships
- employer: [[entities/tech_corp.md]]
- spouse: [[entities/jane_doe.md]]
- project: [[entities/ai_project.md]]

Key Features:

  • user.md: Contains personal information and relationships
  • entities/: Directory with linked entity files
  • Wikilink format: [[entities/entity_name.md]] for relationships
  • Human-readable: Markdown format allows manual editing
  • Persistent: Memory survives across sessions
  • Bidirectional links: Entities can reference each other

Training Methodology

Three Core Subtasks

mem-agent was trained on three critical subtasks using GSPO:

1. Retrieval (59.6% of training data)

  • Regular retrieval: Finding relevant information from memory
  • Filtered retrieval: Applying privacy filters to sensitive information
  • Context-aware search: Understanding user intent and context

2. Update (19.3% of training data)

  • Adding new information: Incorporating facts into memory
  • Updating existing data: Modifying stored information
  • Maintaining relationships: Preserving links between entities

3. Clarification (21.1% of training data)

  • Asking questions: Requesting clarification when information is unclear
  • Conflict resolution: Handling contradictory information
  • Confidence assessment: Knowing when to ask for help

Training Process

The training involved:

  • Model selection: Testing various Qwen models and sizes
  • Algorithm comparison: Evaluating GRPO, RLOO, Dr.GRPO, and GSPO
  • Hyperparameter optimization: Finding optimal training configurations
  • Synthetic data generation: Creating diverse training scenarios

Performance Benchmarks

md-memory-bench Results

mem-agent was evaluated on a curated benchmark of 57 hand-crafted tasks across multiple domains:

Overall Performance Comparison

ModelParametersOverall ScorePerformance vs Base
mem-agent4B75.0%+35.7%
Qwen3-235B-A22B-Thinking-2507235B75.0%+35.7%
Claude Opus 4.1~200B58.9%+19.6%
Gemini 2.5 Pro~200B66.1%+26.8%
Base Qwen3-4B4B39.3%Baseline

Key Insight: mem-agent achieves the same performance as a model 50x larger, demonstrating the power of specialized training.

Category Breakdown

Retrieval Tasks:

  • mem-agent excels at finding relevant information
  • Strong performance on both regular and filtered retrieval
  • Context-aware information extraction

Update Tasks:

  • mem-agent: 72.7% success rate
  • Base Qwen: 0% (significant improvement)
  • Claude Opus 4.1: 0% (surprising underperformance)

Clarification Tasks:

  • mem-agent demonstrates strong self-awareness
  • Competes with much larger models
  • Shows ability to assess confidence levels

Filtering Tasks:

  • mem-agent: 91.7% success rate
  • Excellent privacy protection capabilities
  • Sophisticated information obfuscation

Technical Implementation

Available Tools

mem-agent has access to a comprehensive set of tools:

File Operations:

  • create_file(): Create new files with content
  • update_file(): Modify existing files
  • read_file(): Read file contents
  • delete_file(): Remove files
  • check_if_file_exists(): Verify file existence

Directory Operations:

  • create_dir(): Create directories
  • list_files(): Show directory structure
  • check_if_dir_exists(): Verify directory existence

Utilities:

  • get_size(): Check file/directory sizes
  • go_to_link(): Navigate to external links

Response Structure

The model uses a structured response format that separates reasoning, tool usage, and final responses:

<think>
I need to update the user's job title in their memory. Let me first read the current user.md file to see the existing content, then update it with the new information.
</think>

<python>
# Read current user information
current_content = read_file("memory/user.md")
print("Current content:", current_content)

# Update with new job title
new_content = current_content.replace("job_title: Software Engineer", "job_title: Senior AI Researcher")
result = update_file("memory/user.md", current_content, new_content)
print("Update result:", result)
</python>

<result>
Current content: # User Information
- user_name: John Doe
- job_title: Software Engineer
- living_location: San Francisco, CA

Update result: True
</result>

<reply>
I've successfully updated your job title to "Senior AI Researcher" in your memory. This information will be available for future conversations.
</reply>

Key Benefits of This Structure:

  • Transparency: Users can see the model's reasoning process
  • Debugging: Easy to identify where issues occur in the workflow
  • Tool Integration: Clear separation between reasoning and tool usage
  • Auditability: Complete trace of memory operations

Real-World Applications

Personal Memory Assistant

mem-agent can serve as a sophisticated personal assistant that:

  • Remembers preferences: User likes, dislikes, and habits
  • Tracks relationships: Family, friends, and professional connections
  • Maintains context: Previous conversations and decisions
  • Learns patterns: User behavior and preferences over time

Example Scenario:

User: "What's my wife's favorite restaurant?"
mem-agent: "Based on your memory, Sarah prefers Italian cuisine. 
Her favorite restaurant is [[entities/bella_vita.md]] - 
Bella Vita on Main Street. You went there for your 
anniversary last month."

Customer Support

The model excels in customer service scenarios:

  • Retrieval: Finding relevant customer information quickly
  • Update: Adding new customer data and interactions
  • Clarification: Asking for missing information when needed
  • Privacy: Filtering sensitive information appropriately

Example Use Case:

  • Customer calls about order status
  • Agent retrieves customer history, preferences, and previous issues
  • Updates memory with new interaction details
  • Maintains context for future support calls

Project Management

mem-agent can assist with project tracking:

  • Task management: Remembering project milestones and deadlines
  • Team coordination: Tracking team member roles and responsibilities
  • Documentation: Maintaining project history and decisions
  • Context preservation: Keeping project context across meetings

Healthcare Applications

  • Patient records: Maintaining detailed patient history and preferences
  • Treatment tracking: Remembering medication schedules and side effects
  • Care coordination: Linking patients with specialists and treatments
  • Privacy compliance: Filtering sensitive medical information appropriately

Practical Examples

Example 1: Personal Assistant Scenario

Initial Setup:

User: "Hi, I'm Sarah. I work as a marketing manager at TechCorp, 
I'm allergic to nuts, and I prefer morning meetings."

mem-agent: "Nice to meet you, Sarah! I've saved your information:
- Name: Sarah
- Job: Marketing Manager at [[entities/techcorp.md]]
- Allergies: Nuts
- Meeting preference: Morning

Is there anything else you'd like me to remember?"

Later Conversation:

User: "What time should we schedule our next meeting?"

mem-agent: "Based on your preferences, I recommend scheduling 
for the morning. Would 9:00 AM work for you? Also, I'll make 
sure to avoid any nut-containing snacks for the meeting."

Example 2: Customer Support Scenario

Customer Call:

Customer: "I'm having issues with my order #12345"

mem-agent: "I can see from your account that you're John Smith, 
and this is your third order with us. Your previous orders 
were delivered successfully. Let me check the status of 
order #12345..."

[Retrieves order details and customer history]

"I see the issue - your order was delayed due to weather. 
I've updated your record to note this delay, and I'm 
arranging for expedited shipping on your next order 
as compensation."

Example 3: Project Management Scenario

Team Meeting:

mem-agent: "Based on our project memory, here's the current status:
- Frontend development: 80% complete (Sarah, due Friday)
- Backend API: 60% complete (Mike, due next Tuesday)
- Database design: 100% complete (completed by Alex last week)

I notice we're behind on the backend. Should we adjust 
the timeline or allocate additional resources?"

Deployment Options

Low-Resource Environments

mem-agent is designed for efficient deployment:

Model Sizes:

  • Full model: 4B parameters
  • 8-bit quantized: Reduced size with minimal performance loss
  • 4-bit quantized: Only 2GB, 76.8% performance retention

Hardware Requirements:

  • M1 MacBook (8GB RAM): 4-bit quantized version
  • M4 Pro (64GB RAM): Full bf16 precision
  • Single H100 GPU: Full model inference with vLLM

MCP Server Integration

mem-agent includes a Model Context Protocol (MCP) server that:

  • Universal compatibility: Works with any MCP-capable model
  • Easy integration: Simple setup and configuration
  • Command-line interface: Direct interaction via chat_cli.py
  • Persistent memory: Memory survives across sessions

Current Limitations and Challenges

Technical Limitations

While mem-agent represents a significant breakthrough, it has some limitations:

  • Memory size constraints: Limited by available storage and processing power
  • Complex reasoning: Struggles with multi-hop queries requiring deep reasoning
  • Hallucination risk: May generate incorrect information when memory is incomplete
  • Privacy concerns: Balancing memory persistence with data protection
  • Scalability: Performance may degrade with very large memory databases

Ethical Considerations

  • Data privacy: Persistent memory raises questions about data retention
  • Bias amplification: Memory systems may perpetuate existing biases
  • Consent management: Users need control over what information is stored
  • Transparency: Clear understanding of what information is being remembered

Future Developments

Upcoming Releases

The Dria team plans to release:

  • Technical report: Detailed training methodology and results
  • Data generation pipeline: Code for creating training data
  • Training code: Complete training implementation
  • Benchmark code: Evaluation framework and metrics
  • MCP server: Model Context Protocol server for easy integration

Model Expansion

Future versions may include:

  • Larger models: 14B and 30B MoE Qwen variants
  • Enhanced reasoning: Better handling of complex multi-hop queries
  • Reduced hallucination: Improved accuracy in memory operations
  • Knowledge graph integration: Enhanced relationship modeling
  • Multi-modal memory: Support for images, audio, and other data types
  • Federated learning: Distributed memory systems across multiple agents

Implications for AI Development

Memory-Native AI

mem-agent represents a paradigm shift toward memory-native AI systems:

  • Persistent learning: AI that can learn and remember
  • Human-readable storage: Transparent and editable memory
  • Context preservation: Maintaining conversation history
  • Knowledge accumulation: Building expertise over time

Competitive Advantages

The model's performance demonstrates that:

  • Size isn't everything: 4B models can compete with 200B+ models
  • Specialized training: Task-specific training yields significant improvements
  • Efficient deployment: Small models can be highly capable
  • Resource optimization: Better performance per parameter

Conclusion

mem-agent represents a significant breakthrough in AI development, demonstrating that persistent memory is not only possible but can be implemented efficiently in relatively small models. The 4B parameter model's ability to rival 200B+ parameter models on memory tasks shows the power of specialized training and well-designed architectures.

Key Achievements:

  • First persistent memory AI: Successfully trained model with human-readable memory
  • Efficient performance: 4B model competing with 200B+ models
  • Practical deployment: Low-resource requirements with high capability
  • Comprehensive evaluation: Rigorous benchmarking across multiple tasks
  • Open development: Plans to release training code and methodology

Future Impact:

mem-agent opens new possibilities for AI applications where persistent memory is crucial, from personal assistants to customer support systems. The model's success suggests that memory-native AI systems will become increasingly important as we move toward more intelligent and context-aware AI applications.

The development of mem-agent marks an important milestone in AI research, showing that the stateless limitation of current LLMs can be overcome through innovative training approaches and thoughtful system design.

Key Takeaways for AI Practitioners

For Developers

  • Memory-first design: Consider persistent memory as a core feature, not an afterthought
  • Specialized training: Task-specific training can dramatically improve performance
  • Efficient deployment: Small models can be highly capable with proper training
  • Tool integration: Structured response formats improve reliability and debugging

For Businesses

  • Customer experience: Persistent memory enables truly personalized interactions
  • Operational efficiency: AI agents that remember context reduce repetitive work
  • Cost optimization: Smaller models with specialized training can be more cost-effective
  • Privacy considerations: Human-readable memory systems improve transparency

For Researchers

  • Training methodology: GSPO shows promise for specialized AI training
  • Evaluation frameworks: md-memory-bench provides a solid foundation for memory research
  • Architecture insights: Obsidian-like memory systems are effective for AI applications
  • Open research: The planned release of training code will accelerate research

Sources


Interested in learning more about AI memory systems and agent architectures? Explore our AI fundamentals courses, check out our glossary of AI terms, or discover other AI tools in our comprehensive catalog.

Frequently Asked Questions

mem-agent is the first AI model specifically trained to maintain persistent, human-readable memory using an Obsidian-like markdown system. Unlike stateless LLMs, it can acquire and retain new knowledge across conversations without additional training.
The model uses a markdown-based memory system with links, similar to Obsidian. It maintains a user.md file with personal information and an entities/ directory with linked entity files, allowing for structured knowledge storage and retrieval.
mem-agent excels at three key tasks: retrieval (finding relevant information from memory), updating (adding new information to memory), and clarification (asking for clarification when information is unclear or contradictory).
Despite being only 4B parameters, mem-agent rivals the performance of Qwen3-235B-A22B-Thinking-2507 (a model 50x larger) on memory-related tasks, achieving 75% overall performance on the md-memory-bench benchmark.
Yes, mem-agent is designed for low-resource deployment. The 4-bit quantized version is only 2GB and maintains 76.8% performance, making it suitable for edge devices and resource-constrained environments.

Continue Your AI Journey

Explore our lessons and glossary to deepen your understanding.