The Lens and the Light
Everyone’s building better telescopes. Nobody’s pointing them at anything interesting.
In 1609, Galileo pointed a telescope at Jupiter and discovered four moons. The telescope was garbage by modern standards. A few inches of curved glass, chromatic aberration so bad everything had purple fringes, magnification that wouldn’t impress a tourist at the Grand Canyon gift shop and they are excited about everything.
It didn’t matter. He pointed it at the right thing.
Four centuries later, the James Webb Space Telescope costs $10 billion, orbits a million miles from Earth, and can see galaxies that formed 13 billion years ago. It is the greatest lens ever made.
Point it at the ground and you get a really expensive picture of some dirt1.
The power of a lens depends entirely on what passes through it. This is the state of AI models in 2026. We’re in a telescope arms race. Bigger parameter counts, longer context windows, better benchmarks, faster inference. Every lab is grinding a better lens because that’s the thing they can measure, market, and put on a pricing page. And largely ignoring what gets pointed at it.
Everyone’s building better telescopes, yet nobody’s pointing them at anything interesting. A bit dramatic but the point stands.
There’s a persistent misconception about how LLMs work: the model was trained, the training ended, and now it’s “frozen.” This framing suggests limitation. The model is stuck and can no longer learn. It only knows what it knew at the cutoff date. A snapshot slowly going stale like the end of a loaf of bread nobody wants.
This framing is wrong in the ways that matter for progress and usefulness.
Sure, the weights are frozen and the parameters don’t update when you talk to the model. It will not learn your name through repetition and it will certainly will not get better at your codebase because you keep showing up2. In the narrow technical sense: frozen.
But think about what the weights actually are. They’re a compression of patterns, reasoning strategies, and language relationships extracted from an enormous amount of human knowledge. They’re not a set of facts. They’re a way of processing information, a lens one might say3.
The lens doesn’t change shape. What you can see through it changes completely depending on what you put in front of it. Hand the same model a codebase and it’s a senior developer. Hand it a legal contract and it’s a paralegal who bills less. Hand it a medical case study and it’s a diagnostician. Hand it your messy daily notes and it extracts structure you didn’t know was there. The “frozen” model is doing completely different cognitive work in each case because the input changed.
The model’s capability isn’t frozen. Its weights are. Those are different things, and the difference is where all the leverage lives.
And here’s something I learned building agent pipelines that nobody warned me about: the model is the easy part.
Early on, my approach was the default one. Take the best model available, stuff the context window with everything that seemed relevant, and let it figure things out. The results were fine. Sometimes great. Sometimes the model would latch onto something from turn 47 and go on a tangent about a file I’d forgotten was in the window. Inconsistent in a way that made me question whether I’d hallucinated the good runs.
Then I started treating context assembly as its own engineering problem. Not “what should I include?” but “what should I exclude?” What’s the minimum viable context for this specific task? What format does the model actually perform best with for this type of decision? What prior information is helpful versus what’s just sitting there taking up tokens and creating surface area for confusion?
The difference was immediate and embarrassing. A smaller, cheaper model with a clean 8K context was outperforming a frontier model choking on 120K tokens of “maybe relevant” material. Not on benchmarks. On the actual work. Fewer hallucinations, more consistent reasoning, and the outputs stopped requiring the kind of careful babysitting that defeats the purpose of having an agent in the first place.
The model didn’t get smarter. It just stopped fighting through noise. Every token in the window was pulling its weight because every token had earned its spot. This is the part nobody puts on a slide: the system that assembles the context is doing more cognitive work than the model that processes it.
Saying “context matters more than model capability” feels wrong in 2026 because the entire industry narrative is organized around model capability. Product launches are about the model. Benchmarks compare models. Pricing tiers differentiate models. The model is the product, and the model is what gets the keynote.
Context assembly? That’s “just engineering”, aka the boring part aka what the application layer handles while the grown-ups work on the real thing.
This is like saying the recipe doesn’t matter because you bought expensive ingredients. It does matter and one could argue it might matter more.
The empirical evidence is stacking up. The gap between GPT-4 and GPT-3.5 was enormous. The gap between the latest models and last year’s models? Shrinking. Measurably, on the benchmarks that matter for real work. We’re hitting diminishing returns on the model curve, and each generation is incrementally better at the same tasks while costing more to train and run.
Meanwhile the gains from better context engineering are increasing. The gap between “dump everything in the window” and “carefully assemble the right context” gets wider as context windows get bigger. More rope to hang yourself with. A 200K token window holding 190K tokens of noise is worse than a 32K window holding 30K tokens of signal.
Neuroscience has an inconvenient finding for the bigger-model crowd: human cognition didn’t get dramatically better over the last 50,000 years4. Our brains are roughly the same hardware our ancestors used to track animals across savannas. Same synapses, same neurotransmitters, same roughly 86 billion neurons doing roughly the same thing.
What changed was the information environment. Language, writing, libraries, the printing press, the internet. None of these upgraded the brain. They upgraded what the brain had access to. Better inputs into the same processor.
Your great-great-grandmother had the same cognitive hardware as a modern AI researcher. She didn’t lack intelligence. She lacked context. She lacked access to the accumulated knowledge that makes modern breakthroughs possible, and also to Wikipedia at 2am when you really need to know how a nuclear reactor works.
The model is the brain. Context is the information environment. And the history of human civilization is a 50,000-year case study in the proposition that upgrading the environment matters more than upgrading the hardware.
If context does more work than most people think, the architectural implications are worth spelling out.
The pipeline is the moat, not the model. The system that decides what goes into the context window, how it gets there, and in what form, is the actual differentiator. Two teams using the same model will get wildly different results based on their context assembly. Models are commodities. You can swap them. You can’t swap the pipeline that knows which 6,000 tokens actually matter for a given task.
Model selection becomes task-specific. A quick factual lookup doesn’t need a frontier model. A complex multi-step reasoning task does. Both benefit from clean context equally. The expensive part of a good agent system should be context engineering, not model compute. Match the model to the job and spend the rest of your budget on input quality.
Model improvement has a ceiling. Context improvement doesn’t. There are theoretical limits to how good a language model can get. There are no theoretical limits to how well you can select, compress, and structure information for a specific purpose. Context engineering is an open-ended problem with compounding returns, which is also why nobody’s marketing it: it’s hard to benchmark, hard to demo on stage, and hard to sell as a product with a logo.
The end game is a model that sees clearly, not a model that sees everything. Perfect context with a good model beats noisy context with a perfect model. This isn’t a controversial claim if you think about it for ten seconds, but the industry isn’t building for it, because “we carefully selected the right 8,000 tokens” does not fit on a keynote slide.
Galileo’s discovery wasn’t a telescope achievement. It was a pointing achievement. He looked where nobody had thought to look, with a tool just barely good enough to see what was there.
The lesson for AI agents is the same one. The lens is good enough. The lens has been good enough for a while. A lens that doesn’t change shape is a lens you can design for. You know exactly what it can see, exactly how it processes what it sees, and exactly where it falls apart. That predictability is a feature, not a bug, and anyone who’s tried to build reliable systems on top of something that changes under them will back me up on this.
The variable is the input. The context.
Point the telescope at something worth seeing.
Admittedly a very sharp picture of some dirt. You’d be able to count the individual grains. The dirt would have never felt so seen.
Especially when its the 10 billionth take on a todo list.
Boom, metaphor complete.
The “cognitive revolution” debate in paleoanthropology is messy and I’m oversimplifying. Behavioral modernity, symbolic thought, the Upper Paleolithic transition — reasonable people disagree about whether there was a sudden cognitive leap around 50,000 years ago or a gradual accumulation. The point stands either way. Whatever hardware changes occurred, they were small relative to the environmental changes that followed. The brain is roughly the same. The world it operates in is unrecognizable.



