Technical Chops by Simon Minton - Apple Intelligence: Context Matters

Today, Apple announced the arrival of Apple Intelligence (AI) – the long-awaited integration of Artificial Intelligence (AI) into their main product lineup of iOS devices. Accompanied by the faint whisper of 100 more AI app startups taking their last breath, the announcement was a declaration of intent; the next version of iOS will be deeply infused with AI. In my previous exploration, I posited that the utility of an AI assistant depends on its ability to have personal context and deep knowledge about my life and preferences. Without that understanding, AI assistants are of relatively little use.

The most promising element of AI for me, therefore, is the depth of integration and access promised for the AI. Apple’s approach is unique – by allowing what is, in effect, two AIs to work side by side, with a private, local-only AI having access to data on your phone and a secondary (OpenAI-powered) AI acting as a higher-powered interaction layer, they have created a proposition that could actually achieve what I had hoped for in Apple and AI.

As a result of the promised deep integration, Apple Intelligence will be able to access a level of personal context that apps just can’t match. And that ability to access personal context without requiring direct user input really matters. To understand why it matters so much, we need to consider how AIs obtain, retain, and process information.

Traditional Computing vs AI

In computing, there have traditionally been two main ways that information is stored: on a hard drive and in RAM (Random Access Memory). In the mental model that we have of our own memory, a hard drive stores long-term memories—knowledge that you know, a visual scene of your 12th birthday party, your date of birth—and RAM stores your working or short-term memory—what you need to get from the shops that evening, or what you went to the kitchen to do.

AI models work quite differently from a traditional computer. Yes, the models are stored on a hard drive, and most of the work is done “in memory,” but the model doesn’t specifically draw on files to access information. Instead, models effectively have two separate elements: the statically trained model itself and the ‘context window’.

Long-Term Memory: Training Data

When we ‘train’ a model, we provide it with what is described as a ‘pile’ of information. This pile, or large dataset, is what the model learns patterns and information from. Effective and extensive training gives the model behind ChatGPT (other models are available) the ability to accurately answer when World War II occurred, for example, based not on a file that tells it the correct dates but from the patterns it learned from within the training data.

Whilst it is being trained, the model learns the structure of language from the corpus, which is what gives it the ability to respond correctly in different languages. There are other processes that enhance the model’s performance, but this provides a basic understanding of how information is embedded into a model.

The way a model retrieves information and constructs sentences for its answers are intertwined processes. It generates an answer that is probabilistically correct for the given query, based on the patterns and knowledge it has learned from the training data.

This approach works quite well. We now have models that have been trained on extremely large datasets that appear to know a lot of information and have a great command of many languages—these are the models at the forefront of generalised AI. The training process for a model is relatively slow, though, and (at the moment) that training can’t be kept continuously updated, so even the most recently released models have a relatively out-of-date knowledge cut-off date. The information stored in the model is also impersonal in nature – it doesn’t specifically know about you and your life, and when you do directly provide it with information, you are not adding to the model itself. Instead, you are working within what we consider to be the AI’s short-term memory.

Short-Term Memory: Context Windows

This short-term memory is the ‘context window’. The term refers to the amount of new information that an LLM can understand at once. Whilst it appears that we are in a back-and-forth conversation with a model when we’re using these tools, in fact, we are starting afresh with every request each time we send a new message to ChatGPT. What is happening behind the scenes is that the entire conversation to that point is being sent to the model each and every single time, to provide it with the historical context of the conversation. The model itself remains unchanged by your inputs; it is just provided with the greater context afforded by having the previous chat.

Context windows are called windows for a reason, though; they have bounds, and beyond a certain point, the historic conversation will be cut off to fit within the window. For GPT-4, that point is at 128,000 tokens, or about 96,000 words. For Gemini 1.5 Pro, it’s as much as 2 million tokens or around 1.5 million words. This sounds like a lot, but in the context of your phone’s data, it barely scratches the surface.

There is another approach though; providing the AI with the ability to pull in data from other authoritative data sources that are relevant in the moment - we call this Retrieval-Augmented Generation (RAG). By pulling in data based on relevance to a given query from other sources, the limitations of both a lack of trained knowledge and a small context window can be overcome. And it is this approach that I had hoped that Apple would take from when I first discussed this earlier this year.

The Apple Difference

I’ve previously expressed scepticism regarding AI assistants, partly because current versions of LLMs lack the personal contextual awareness I would want from an assistant. Even where an assistant is able to persist ‘important’ information, there are so many pieces of personal context that an AI without access to your entire life simply can’t have. That context is stored in messages, calendars, emails or even just the places that we’ve been to or currently are. To achieve the level of context required to be truly useful without needing to be constantly updated by the user requires something that few devices can actually have; a persistent and all-seeing presence in your life. It turns out that the best way to have that persistent presence is to be your most persistently present personal device; your phone.

What makes AI exciting is its ability to integrate personal context into the model without needing to directly provide the information. By drawing from its access to your phone’s personal data, the AI will be able to understand more about your life than any other assistant would be able to. This is as a direct result of the positioning of the AI within the stack – it sits at the OS level and can therefore do things that no standalone app could ever do.

Apple has also spent no small amount of time and effort focussing on not only branding itself as a privacy-conscious platform, but also ensuring that the choices that it makes within its platforms are genuinely driven by privacy. They have built a level of user trust that is unmatched. That Apple is able to run their AI on-device matters – personal information isn’t shared to the cloud and stays safely, and securely, on-device. It really is only Apple that can provide this due to their unique approach to hardware and software.

If iOS 18 delivers on today’s promise, I’m excited. This is the first time that I’ve felt any company has truly understood how to approach AI in a way that is genuinely helpful and understands what an AI assistant needs to be more than just a toy. If you thought I was bullish on Apple prior to this event…>

Apple Intelligence: Context Matters

Apple's Artificial Intelligence is finally here

In Full...

Traditional Computing vs AI

Long-Term Memory: Training Data

Short-Term Memory: Context Windows

The Apple Difference