ARTICLE

The Models Don't Matter Anymore

Frontier releases keep coming, but for businesses, focusing on flexible infrastructure pays bigger dividends than chasing the latest and greatest.

Chop Chop!

  • Model-Agnostic Infrastructure Build systems that can swap between LLMs seamlessly, ensuring quick upgrades and minimal disruption when new models arrive.
  • Focus on Data Quality Clean, well-labeled, and accurately governed data can make even mid-range models shine, while poor data can sink any AI effort.
  • Fine-Tuning Is Overhyped Constantly emerging frontier models often outpace costly fine-tuning.
  • Prompt Engineering Matters Well-crafted prompts can embed domain knowledge and outperform heavily customized models—often with minimal overhead.
  • Prepare for Rapid Change Software and strategies that take 18 months to deploy risk obsolescence. Design for quick iteration and disposable components.

In Full...

Another day, another frontier-level release—welcome to the world, Claude 3.7 Sonnet. And yet, the churn of model announcements increasingly feels like background noise. For the past two years, conversations about AI have fixated on which model is best: who has the most parameters, who trained on the biggest cluster, whose benchmarks are highest… But do these improvements really matter anymore?

Ordinarily, when an article poses a rhetorical question, the safe bet is: “No.” But that’s not actually the case here—new models do matter. Yet for most businesses and consumers, the impact today is modest at best.

The constant stream of model releases are the result of the “great race”. OpenAI, Google, Anthropic, xAI, Mistral, DeepSeek, et al. are all chasing what they hope will be the model—the one capable of AGI (Artificial General Intelligence). Definitions vary. OpenAI’s charter describes AGI as systems that “outperform humans at most economically valuable work.” Sam Altman recently tweaked this definition to something like “a system that can tackle complex problems at human level, across many fields.”

My own definition is more specific:

The first true AGI is the first model that can, with limited human input, build a better version of itself.

In fundamental research then, the drive for better models is far from meaningless. Performance gains, more elegant architectures, and greater efficiency keep nudging us closer to Altman’s definition of AGI—if not OpenAI’s or my own. Specialized models also matter, and matter a lot: Google’s “co-scientist” model has shown real promise (paper), replicating in days hypotheses that scientists themselves honed over a decade of research (blog).

But for most businesses—and for most of us—this doesn’t (and shouldn’t) carry that much weight.


Don’t Let Perfect Be the Enemy of Good

When LLMs first arrived, the experience was mind-bending. The GPT-3.5 generation could generate coherent prose so convincing that a Google engineer believed their system had achieved sentience. It was natural to see those emergent abilities as a quantum leap in machine intelligence, because, well, they were.

The jump from GPT-3 to GPT-4, Claude 3.5 Sonnet, Google’s Gemini, and the like felt like another quantum leap. With the right coaxing, they became really good. Now, in the newest wave—OpenAI’s “o-series,” DeepSeek’s R1, and Grok 3—we again see models that are …really good. The leaps are getting harder to spot, not because they aren’t happening but because for most tasks and most people, existing LLMs already do the job.

For standard day-to-day tasks that require college-level writing or problem-solving, today’s models meet the bar. The difference between, say, Claude 3.7 Sonnet and what might be released next month is real, but often subtle. The game-changing improvements have already happened.

My argument is that the current crop of models are the ones businesses should build around—regardless of what’s coming next, and even if they don’t always beat human performance right now. By the time your project goes live, the model you spec’d on Day 1 will be replaced by a faster, cheaper, and more capable successor anyway.


Preparing the Groundwork

The key is to be model-agnostic. Treat the model like a new employee who will be “upskilled” every few months. Don’t design systems that rely on the quirks of a single LLM; assume the best option will shift frequently.

I’ve spent the past couple of months thinking about general rules for working with AI:

1. Build to Be Model-Agnostic

It’s safe to assume that models will keep improving and that different models excel at different sub-tasks. Your system should let you swap between them easily, upgrade seamlessly, and test performance quickly. Create well-structured pipelines so you can deploy cheaper models for high-volume tasks, then tap a more powerful model only when complexity demands it. It’s the same principle we use at CanaryIQ: we allocate tasks to whichever model handles them best. This flexibility is crucial given how fast the AI landscape evolves and how new capabilities appear “overnight”.

2. Focus on Your Data Integrations

Data remains the ultimate determinant of AI success. Even the most advanced model will falter if your data is inaccurate, incomplete, or poorly labeled. Conversely, an average LLM can shine with well-structured and reliable data. Many organizations discover that data engineering is the real bottleneck. Cleaning, labeling, and governing your data often yields bigger returns than anything else–and this can be done using current gen LLMs.

3. Fine-Tuning Models Is (Probably) Not Worth It

Once upon a time, fine-tuning was the standard for customizing a model to a specific domain. But in an era where a new, frontier-level release appears every few months, that strategy can quickly become obsolete. Why spend loads of engineering effort and dollars on a fine-tuned model that might be overtaken by a more advanced general model tomorrow?

As context windows expand, you can simply feed more relevant data into the prompt itself, sidestepping the need for complex retraining. This approach makes it easier to pivot as soon as a better model arrives.

4. Get Good at Prompt Engineering

Effective prompt engineering can embed domain expertise without entrenching your workflows in a single model. In many cases, LLMs themselves can help generate optimized prompts for specific tasks. You can automatically iterate through variations, test performance, and integrate the best prompts based on internal benchmarks. By treating prompt engineering as an ongoing process, you keep your system nimble and resilient.

5. Orchestration, Memory, and Retrieval-Augmented Generation

Real-world tasks often go beyond a single question-and-answer step. You may need to retrieve documents, compile them into a prompt, verify outputs, or query secondary data sources. This multi-step chain requires orchestration frameworks that let you slot in different LLMs as needed. If your logic isn’t bound to the quirks of one model, it’s easy to upgrade whenever a better contender shows up.

6. Agentic Workflows and Tool Use

Automated “AI agents” that chain tasks, call APIs, and pursue complex goals are the biggest news at the moment. But they’re only as useful as the systems behind them—reliable data access, robust orchestration, and well-defined processes. The choice of LLM matters far less if the rest of your infrastructure is brittle.

7. Models as Commodities

LLMs are commoditizing. The leading-edge model might shine for a particular challenge, but only until the next iteration arrives. You can’t simply chase “the best” model indefinitely. Instead, build flexible platforms that let you test new releases, confirm improvements, and switch as soon as it makes sense.

8. Disposable Software

The AI market moves at ludicrous speed. There’s little point in an 18-month build cycle if everything is outdated the day you ship. This doesn’t mean writing sloppy code, but it does mean building in a way that expects short life cycles. Your architecture should let you drop in a new LLM, run tests, confirm upgrades, and go live with minimal reengineering.


Looking Ahead

Models will continue to matter—and they’ll keep hitting the front-page of hacker news. At some point, we may even cross the threshold of AGI by my own definition (a system that can iteratively improve itself) and all hell may break loose. But for most organizations, whether that milestone arrives next year or the next decade is noise.

Focus on what you can control: a flexible infrastructure that pivots whenever a better model appears. That’s the essence of future-proofing your AI strategy. Don’t obsess over which LLM is hot this month—just be ready for whatever’s next. The models keep changing, but the right structures will outlast every frontier release.

*This article came from a talk I’ve given on this subject - the last version is here

Why It Matters...

Because the AI market moves faster than any single model’s development cycle. Building around a specific LLM risks missed opportunities as better options emerge. By embracing a model-agnostic strategy, you future-proof your AI efforts, reduce implementation headaches, and keep your focus on what truly drives results: data quality, flexible infrastructure, and the ability to pivot as soon as new capabilities arise. In short, the real competitive advantage doesn’t come from chasing the best model—it comes from designing systems that can absorb whatever comes next.

Share on: