Building Personal Knowledge Bases with LLMs: The Karpathy Method

Introduction

In recent years, LLMs have fundamentally changed how we interact with code. However, a new shift is occurring: the transition from manipulating code to manipulating knowledge. Andrej Karpathy recently shared his personal workflow for building massive personal knowledge bases—what he calls an "LLM-managed wiki"—leveraging the latest models to organize research, papers, and data into a structured markdown ecosystem.

This approach moves away from the traditional, manual note-taking process toward a dynamic, agent-led system. Instead of spending hours organizing folders and writing summaries, Karpathy uses LLMs to "compile" raw data into a navigable, interlinked wiki in Obsidian.

Data Ingestion and Compilation

The foundation of this system is a two-tier directory structure: a raw/ directory and a "compiled" wiki.

Raw Data Ingest: Source documents—including PDFs, research papers, repositories, and images—are indexed into a local raw/ folder. Tools like the Obsidian Web Clipper extension are used to convert web articles into clean markdown files, while local hotkeys handle downloading related images.
LLM Compilation: An LLM agent then incrementally processes these raw files to build the wiki. It writes summaries, creates backlinks, categorizes data into concepts, and even writes full articles for them.
Minimal Manual Editing: One of the most striking aspects of this workflow is its hands-off nature. The LLM writes and maintains nearly all of the wiki's data, with the human user acting primarily as a consumer and high-level director.

The Obsidian Ecosystem as an IDE

Karpathy uses Obsidian as the "IDE frontend" for this knowledge base. This allows for several advanced visualization and interaction layers:

Frontend Navigation: Obsidian provides a clean interface for viewing both the raw data and the compiled wiki.
Dynamic Renderers: Plugins like Marp allow the LLM to generate slide shows directly within the wiki, while matplotlib can be used to render data visualizations that are displayed inside Obsidian notes.
Structure and Links: The LLM manages the interlinking of articles, ensuring that new information is correctly "filed" and connected to existing concepts.

Agent-Led Research and Q&A

Once the wiki reaches a significant scale—Karpathy mentions his own research wiki contains ~100 articles and over 400,000 words—it becomes a powerful engine for Q&A.

Instead of traditional Retrieval-Augmented Generation (RAG), Karpathy finds that LLM agents can effectively self-manage index files and brief summaries. When asked a complex question, the agent can research its own wiki, follow links, and synthesize comprehensive answers. These answers are often rendered as new markdown files or slides and filed back into the wiki, allowing the knowledge base to "add up" and grow more sophisticated over time.

Data Integrity and Custom Tools

To keep the system running smoothly, Karpathy employs LLM "health checks" or linting. These automated checks look for inconsistent data, fill in missing information using web search tools, and identify potential connections for new articles.

Furthermore, "vibe coding"—the process of quickly generating functional tools through AI—has led to the development of custom CLI utilities. For example, Karpathy built a naive search engine over the wiki that can be handed off to an LLM for complex queries, bridging the gap between a web UI and a command-line agent.

The Vision: From Context to Weights

As these personal repositories grow, the ultimate goal shifts from simple retrieval toward integrated knowledge. Karpathy highlights the potential for synthetic data generation and fine-tuning. By training or fine-tuning models on these curated personal wikis, the "knowledge" could eventually reside within the model's weights rather than just its context window, leading to a truly personalized AI assistant.

Conclusion

The "Karpathy Method" representing a significant evolution in personal productivity. By treating knowledge as a compiled asset and LLMs as the primary editors, we can build deeper, more interlinked understandings of complex topics without the overhead of manual organization. As LLMs become more capable of "manipulating knowledge," the barrier between raw information and actionable insights continues to dissolve.

Sources

Andrej Karpathy on LLM Knowledge Bases (X/Twitter)

Building Personal Knowledge Bases with LLMs: The Karpathy Method

Introduction

Data Ingestion and Compilation

The Obsidian Ecosystem as an IDE

Agent-Led Research and Q&A

Data Integrity and Custom Tools

The Vision: From Context to Weights

Conclusion

Sources

Frequently Asked Questions

What is an LLM-managed wiki?

Why use Obsidian for this workflow?

What is the benefit of the 'raw/ directory' approach?

Related Articles

Claude Fable 5: The Next Generation of Frontier Intelligence

Google DESIGN.md: Standard for AI-Native Design Systems

xAI Launches Grok Voice Think Fast 1.0

Continue Your AI Journey