Field Notes

Notes on Context Engineering.

Long-form writing on retrieval accuracy, build-time intelligence, and architectures for AI systems where hallucination is not an option.

Build a Second Brain for Your AI Agent in 5 Minutes

Tiago Forte's 'second brain' pattern, applied to AI coding agents. The 5 minutes is real, the architecture is borrowed, the integration is one curl.

Tutorial · May 03, 2026 · 11 min

↗

How to Cut Your AI Agent's Token Cost by 90% Without Touching Your Prompts

The 90% number comes from Mem0's published research. The mechanism is the same one any persistent-memory layer uses. Here's the math, the architecture, and the working integration.

Tutorial · May 02, 2026 · 12 min

↗

Add Persistent Memory to Cursor, Claude Code, and Windsurf in 3 Lines

One MCP config block, three popular AI coding agents, persistent memory across every session. The actual integration is shorter than this paragraph.

Tutorial · May 01, 2026 · 10 min

↗

Persistent Memory Is a Persistent Attack Surface: Memory Poisoning and Your Agents

Once memory survives sessions, every future session inherits whatever was injected. The 2026 demos against ChatGPT, Claude Code, and others show how this works in practice. What a defensible memory API has to look like.

Security · Apr 30, 2026 · 11 min

↗

Beyond CLAUDE.md: When Your Agent's Memory Outgrows a Markdown File

Karpathy’s llm-wiki gist proved a real pattern - vault, slash commands, Claude Code as the agent. It is the right answer for solo devs. Here is where it stops working, and the smallest possible upgrade path.

Tutorial · Apr 28, 2026 · 12 min

↗

Your Agent Doesn't Have a Memory Problem. It Has a Retrieval Problem.

Every framework that calls itself agent memory ships a retrieval system. The benchmarks they brag about are retrieval benchmarks. The latency they hero is retrieval latency. The honest reframe and what it changes.

Reframe · Apr 26, 2026 · 11 min

↗

Claude Has Memory Now. Here's What It Doesn't Do.

Anthropic shipped Claude Memory to free users in 2026. It is a real win for the chat product. It is not a replacement for an API-level memory layer, and the gap is bigger than the marketing suggests.

Comparison · Apr 24, 2026 · 10 min

↗

Persistent Memory Without the Vector Database: A Saner Default for AI Agents

Most agent teams are told that memory means standing up Pinecone or Weaviate. For most agents, that is the wrong default. An honest three-way comparison and a recommendation.

Decision · Apr 22, 2026 · 11 min

↗

Why Every AI Agent Has Amnesia (And Why a Bigger Context Window Is the Wrong Fix)

The stateless reset is the most-complained-about agent UX failure of 2026. Bigger context windows do not fix it - they make it worse. Here is the architectural cause and the right primitive.

Architecture · Apr 20, 2026 · 11 min

↗

7 RAG API Patterns Most Developers Skip

Seven patterns that turn a working RAG integration into a reliable one: query language, question decomposition, results calibration, handling visual content descriptions, proper LLM integration, error handling, and logging.

Tutorial · Apr 01, 2026 · 10 min

↗

How to Add a Knowledge Base to Your AI App in 3 Lines of Code

Copy-paste code examples in Python, Node.js, and cURL. Get your AI app answering questions from documentation in minutes.

Tutorial · Mar 30, 2026 · 6 min

↗

Why Your RAG Has a 70% Gap Between Best and Worst Answers (And How to Fix It)

ACL 2025 Eval4NLP research measured a 70% accuracy gap between best and worst runs of the same LLM-based RAG query, deterministic settings on. The architectural fix that makes the same question return the same answer every time.

Reliability · Mar 29, 2026 · 8 min

↗

How to Make Documentation Images and Videos Searchable in One API Call

Most RAG systems are blind to visuals. Gartner says 80-90% of enterprise data is unstructured, much of it visual. The architectural fix that turns screenshots and videos into first-class searchable content.

Multimodal · Mar 29, 2026 · 8 min

↗

Why We Don't Call an LLM at Query Time (And Why You Shouldn't Either)

Every runtime LLM call costs you latency, money, consistency, and accuracy. The case for front-loading intelligence at build time.

Architecture · Mar 28, 2026 · 9 min

↗

RAG vs Context Engine: Which One Does Your AI Actually Need?

A detailed comparison of traditional RAG and Context Engine architectures. When to use which, cost analysis, and a decision framework for your use case.

Comparison · Mar 28, 2026 · 8 min

↗

How to Eliminate RAG Hallucination AND Runtime Cost With One Architectural Move

Hallucination and runaway API costs trace to the same architectural decision: calling an LLM at query time. Move the AI work to build time and both go to zero, simultaneously.

Architecture · Mar 27, 2026 · 10 min

↗

Why Your RAG System Still Hallucinates (And What to Do About It)

Stanford research shows RAG legal tools hallucinate 17-33% of the time. We break down the data, explain why it happens mechanistically, and show an alternative architecture.

Research · Mar 27, 2026 · 11 min

↗

Context Engine vs RAG: How Build-Time AI Eliminates the 17-33% Hallucination Floor

Stanford research found production RAG tools hallucinate on 17-33% of queries. A context engine moves AI work to build time, leaving query time as pure retrieval - and the floor drops to zero.

Context Engineering · Mar 26, 2026 · 9 min

↗

18 Posts / Updated May 03, 2026 / RSS soon