The most intriguing aspect of the glasses for me is the prospect of mixed-mode AI without taking my phone out of my pocket. Meta won’t release this until probably next year, but I do have some observations on how we could get there slightly sooner.
Open AI released their multi-modal version of GPT-Chat about a month ago, which means that you can now speak to Chat GPT (an oddly stilted style of conversation which is still quite compelling, I wrote about it here) and send it images which it can interpret and tell you about.
One of the cool features that Open AI included in the voice chat version is that on iOS the conversation is treated as a “Live Activity” – that means that you can continue the conversation whilst the phone is locked or you are using other apps.
What this also means is that the Ray-Ban Metas do have an AI that you can talk to, in as much as any Bluetooth headphones connected to an iPhone can be used to talk to the ChatGPT bot whilst your phone is in your pocket. I’ve looked at options to have this trigger via an automation and shortcut when the glasses connect to my phone but ultimately don’t think that is very useful - I don’t want an AI listening all the time, I want to be able to trigger it when I want it. It did lead me to add an “Ask AI” shortcut to my home screen which immediately starts a voice conversation with ChatGPT which I suppose will help me to understand how useful a voice assistant actually is over time. I also had high hopes that using “Hey Siri” would be able to trigger the shortcut, which it can, but not when the phone is locked. So close and yet so far.
As I said above though, this feature is also something that all headphones can be used for. The grail, and ultimate reason for getting the Ray-Bans, is in letting the AI see what you can see. Given that this feature won’t be officially released until probably next year, what options do we have?
The solution may come in the form of another Meta product, WhatsApp. I built a simple WhatsApp bot earlier this year which allows me to conduct an ongoing conversation with the API version of GPT-4, it’s quite rudimentary but does the job. The cool thing about the decision to deeply integrate the Meta glasses with other Meta products is that you can send messages and photos on WhatsApp directly from the glasses without opening your phone. The glasses will also read out incoming messages to you. This works pretty well with the bot that I’ve built; I can send messages using the glasses and they will read back the responses. I can say to the glasses “Hey Meta, send a photo to MyBot on Whatsapp” and it will take a snap and send it straight away. The GPT-4V(ision) API hasn’t been released yet, but once it has been, then I’ll be able to send the pictures to the bot via WhatsApp and the glasses will be able to read back the response.
This all feels pretty convoluted though and is ultimately an attempt to hack my way around the lack of available functionality. The Meta Glasses are quite cool but they aren’t wearable AI. Yet.
As with many things within the space at the moment, the technology feels tantalisingly close but not quite there. The last time anything felt quite like this to me though was the dawn of the smartphone era. Playing with the glasses has made me oddly reminiscent of playing with the accelerometer in the Nokia N95… if we’re at that point with AI then the iPhone moment is just around the corner.
Open AI released multi-modal AI a couple of weeks ago and it has been slowly making its way into the ChatGPT app. It is quite disconcertingly brilliant.
Conversation is a funny thing. Reading transcripts can be quite nightmarish - we don’t realise how much the spoken word, especially during conversations, meanders and is peppered with hesitation, deviation and repetition until we see it written down. When we’re speaking though, these unnecessary additions make conversation human and enjoyable. When we’re listening, we often don’t realise how much we are actually playing an active role in doing so–in person, it’s the facial expressions and nods which encourage the speaker to continue; on the phone, the short acknowledgements that let a partner know you’re still there and listening. I often speak to a friend on the phone who mutes when they’re not speaking and the experience is fine, but the silence is slightly off-putting.
And so to the experience of chatting with an AI. It’s brilliant, in as much as it actually feels as though you are having something of a conversation. The responses aren’t the same as the ones you would receive by directly typing the same words into Chat GPT - they’ve clearly thought about the fact that spoken conversation is different. There is little lag in the response. You don’t say your piece and then wait for 10 seconds for it to process; the AI starts almost straight away once it’s heard a long enough pause. The quality of the AI is fantastic - it’s using GPT-4 which is about as state-of-the-art as it can get.
The entire experience is disconcerting because of how brilliant it is. There is no room for you to take long pauses while you think mid-sentence. There is absolute silence when you are talking which causes you to look down at the screen to make sure it’s still working. The responses are often long and apparently deeply thought through, but they often end with a question, rather than just being an open-ended response to work from. I’m looking forward to having an AI conversational partner, but I want it to help me tease out ideas, not necessarily give me fully formed AI thoughts on a subject. I want it to say “yes” whilst I’m speaking for no apparent reason other than to encourage me to keep talking through the idea. I want it to meander and bring in new unrelated but tangential ideas. Ultimately, I guess I want it to be a little more human.
For much of the latter half of the 20th century, new music discovery went something like this. An artist would make a song and they’d send demo tapes out to record companies and radio stations. They’d play to dimly lit bars and clubs, hoping that an A&R impresario lurked in the crowd. If they were lucky, a DJ might listen to their demo and would play it live. Perhaps someone would record it and start bootlegging tapes. These contraband tapes would be passed around and listened to by teenagers gathered in bedrooms. If all went well, the artist’s popularity would grow. They’d be signed, played more on the radio and do bigger shows. Fans and soon-to-be fans would go to record stores to listen to the new releases and buy the music on vinyl, tape or CD. The record shops would make money, the musicians would make money, the record companies would make more money.
This began to shift in the early naughties, driven, as so much was, by the emergence of the internet. The newfound ability to rip CDs and transform tracks into easily shareable mp3s on the likes of Napster rendered the entire world’s music repertoire available gratis to eager ears. For those that preferred their music to come without the lawbreaking, the iTunes store and others made purchasing it just about as easy. MP3 players and the iPod made it effortless to carry 1000 songs in your pocket. The days of physically owning your music were all but over in the space of only a few short years.
Despite the music industry’s hope that killing Napster would stem the rising tide, the death of the platform only resulted in more alternatives appearing. It turned out that people liked having instant access to all music for pretty much free. Music discovery underwent a transformation. To acquire a song, one simply had to search for it, and within minutes, it was yours—provided Kazaa or uTorrent were operational and your parents didn’t pick up the phone and break the connection. Online forums teemed with enthusiasts discussing new musical revelations and leaks, offering nearly everything and anything you desired, all for free.
Music was no longer scarce; there were effectively infinite copies of every single song in the world to which anyone could have immediate access. Gone were the days of friends passing around tapes or lingering in record stores. The social aspect of music discovery shifted from smoky bars, intimate bedrooms, and record emporiums to the virtual amphitheaters of online forums, Facebook threads, and text message exchanges.
The big problem with all of this of course was that it was all quite illegal.
In 2006, Daniel Ek was working at uTorrent. He could see that users wouldn’t stop pirating music and that the attempts by the Music Industry to thwart sharing were doomed to failure. He “realized that you can never legislate away from piracy. Laws can definitely help, but it doesn’t take away the problem. The only way to solve the problem was to create a service that was better than piracy and at the same time compensates the music industry – that gave us Spotify.”
Spotify launched with the simple headline: A world of music. Instant, simple and free.
By 2023 it has over 500 million users.
For many music fans, playlists took center stage, with enthusiasts investing hours trawling the internet for them. Spotify introduced a feature to see what your friends were listening to via Facebook and then to directly share playlists with others. Then they made playlists searchable. That killed off sites like sharemyplaylist, but meant that when I needed three hours of Persian Wedding songs, all I had to do was hit the search bar and appear to be intimately familiar with the works of Hayedeh and Dariush for my then soon-to-be in-laws.
In 2015 Spotify launched Discover, a dynamic playlist which introduced users to tracks that were similar to what the listener had played recently. It was remarkably good. The social aspect of music discovery was being lost but it was replaced with an automaton that did the job exceptionally well, even if the results were sometimes corrupted by the false signal of repeated plays of Baby Shark.
What was more subtle about what had been happening throughout this period was that the way people consumed music was changing. We had progressed from music discovery as a purposeful act to one in which it was an every day occurrence. Background music had always existed, but it was via the radio, or compilations. This was personal. The value of the music itself transformed. The ability to have a consistent soundtrack playing, at home, at work, in the car or as you made your way through every day life, meant that listeners weren’t necessarily concerned about the specific artists that were playing, they had become more interested in the general ambience of that ever-present background music. Listeners still certainly relished the release of the new Taylor Swift album, but they also listened to music that they didn’t know more easily and without ever inquiring as to who the artist was, simply because it fit within the soundtrack of their lives.
The Discover feature was one of Spotify’s first public forays into personalised music curation using machine learning. The success of the project led to more experiments. It turned out that people loved the feature.
Spotify in 2023 is remarkable. When I want to run to rock music, the tempo and enthusiasm of the suggested playlist is exactly right. When I want to focus on work, the beats are mellow and unobtrusive. The playlists “picked for me” change daily, powered by AI. I still create my own playlists, but the experience is now akin to using ChatGPT. I add a few songs to set a general mood and Spotify offers up suggestions that match the general vibe. Prior to a recent trip, I created a playlist called “Italy Background Music”, which Spotify duly filled with tracks I wouldn’t have had the first idea about where to find. They were exactly what I was looking for.
Curation and general discovery, it seems, have been broadly solved by Spotify.
I’ve become accustomed to hearing tracks that I’ve never before heard and wouldn’t have the first idea about the artists of. Occasionally, I’ve tapped through on an unknown song and discovered that it has only had a couple of hundred thousand plays, ever. Spotify is clearly drawing on the entire breadth of artists within its library to match my musical preferences. Or is it?
Multiple cast-iron sources have informed us that, in recent months, Daniel Ek‘s company has been paying producers to create tracks within specific musical guidelines.
By introducing its own music into the (literal) mix, for which it has paid a producer a flat fee, has added to the platform under a false name and then surfaced to listeners via its AI curated playlists, the platform is solving two issues that it considers important: from a user’s perspective, more music that fits their desired soundscape is a good thing. From Spotify’s perspective, having the ability to add, say, a 3 minute track of in-house music (which they don’t need to pay royalties on) to every hour of listening means that their cost for that hour is reduced by 5%. The losers in this case are the artists, who would otherwise have earned from that three minute play.
In the above 2016 article, it is clear that the firm was paying actual producers to make the music. In 2023, the landscape is likely quite different. AI has advanced to such a point where the beats, melodies and riffs of jazz, trip hop and other non-vocal music can be quite easily produced by a well trained AI. There are dozens of sites that algorithmically generate lo-fi background music. If Spotify isn’t already adding tracks generated by AI that perfectly match a given vibe, especially within those non-vocal musical genres, then it is at least experimenting with it. The prize is too large to not. In 2021, the company paid more than $7bn to rights holders. At 5%, that’s a nice $350m to find down the back of the AI sofa.
Where this leads in my mind is something sort of entirely new.
Whilst vocal-less music is the easiest use of the technology, we saw earlier this year as there was a brief explosion of AI generated tracks from creators using AI voice models that imitated the likes of Drake and Kanye. Whilst these tracks weren’t perfect, they showed an early preview of technology that will change the face of music. The Hugging Face community is full of models of popular artists which can replicate the sound of a given singer or rapper and it is evident improvements move at a rapid clip, with some now indistinguishable from the original artist.
Licensing of brands exists broadly in most other industries. In fashion it saved Ralph Lauren (although nearly killed Burberry). It famously turned Nike from pure sportswear to casual fashion mega-brand. Could we see the emergence of the artist as a brand? The potential for artists to either directly license their musical likeness to a given platform or to allow producers to use an authorised AI model of their voice to create tracks which they, or their team, would have final sign off on could allow vocalists to extend their reach drastically.
We’ve also seen a rash of artists selling their music rights–will the future see those artists who reach the end of their careers sell their “official” AI model to allow them and their families to earn in perpetuity? It’s been proven repeatedly that those artists who adapt to the changing world are the ones that succeed, but this is something entirely new.
What seems certain however is that the music that we listen to in the coming years will be picked for us by machines and at least partially created using AI.
There is a good argument that Taylor’s reasoning for removing her music wasn’t entirely to do with this. ↩
Large Language Models (LLMs), content creation and filtering signal from noise
The internet is about to get a lot noisier, thanks to the rise of Large Language Models (LLMs) like GPT.
The Creators of The Internet
40 million people sounds like a lot of people. It’s roughly the population of Canada. And in 1995, it was the population of the entire internet. But today, today the population of the internet is about 5.2 billion. That is quite a lot more people. Most of those people are consumers. They use the internet without adding anything to it; this is fine, it is good in fact. They watch Netflix, chat on WhatsApp and scroll on TikTok. They might tweet on Twitter but, since the great Elon-ification of that place, it appears that even fewer of them are even doing that. These people used to be called ‘lurkers’ on forums. What they aren’t doing is creating websites or trying to get you to buy something or writing blogs like this one. For our purposes we can say that these people are being very quiet indeed.
But then there are a much smaller group of people who, like me, see the internet as the place that they can make a bit of noise and generally conduct some form of creativity, be it in running a business or creating reels for consumers to scroll or cobbling their thoughts into a 1000 word article. This is also a good thing - consumers can’t consume if no one is creating stuff for them to consume.
Signal vs Noise
Most of the stuff that is being created is pretty average; it’s just background noise, a light hum. A few people might consume it, but it’s receiving 27 likes on Instagram and isn’t really getting in anyone’s way. It’s the background music that plays in shops and hotel lobbies that fills the uncomfortable silence. Always there, but quite pleasant and you’d probably miss it if it wasn’t there.
Then we have the good stuff - these are the pieces that go viral: the think piece that nails a topic so absolutely, the tiktok that gets a bajillion views, the hot take that turns out to be absolute fire, the stack overflow answer that comes like an angel of the night to salve your programming pain. This is the sweet melody of signal. We like signal, and we spend most of our consuming time wading through the mire of noise looking for it. Sure the background noise is a bit irritating, but it’s relatively easy to find the good stuff once you know where and how to look for it. Google is pretty good at finding what we want. We’ve built aggregators like reddit or hacker news, where people can come and say “Hey everyone! Look, I found some gold!” and then other people can upvote it if it’s actually gold, or downvote it to oblivion if it’s not actually good.
All of this seems sort of fine. Every year more noise is created, but so too is signal. The good thing about this is that an individual creator could only create so much noise. Automated content was pretty obviously automated content, and even if Google didn’t manage to filter it out for you first, once you started reading it, it was pretty clear that no human had set eyes on it during the creation process. We become quickly attuned to that particular lack of harmony and click the back button, which also signalled to Google that the content wasn’t actually helpful, meaning the next person searching for that particular query would be less likely to end up seeing that page.
The problem we’re facing now though, is that creators have been given a whole new kind of instrument to play. It’s not that this instrument is louder or more dominant that the others, it’s that it can create an almost infinite amount of songs with barely any human input at all. And they all sound pretty great. LLMs are really good at creating noise. It’s not just that they can create an ungodly amount of content (basically for free) but differentiating that content from human generated content is, by design, almost impossible. Where a content creator could once have put out a few decent quality articles a day, they can now put out thousands. The ratio of noise to signal is about to dramatically shift.
One of the most exciting elements of working within the tech industry is that there is always lot of ✨new✨.
New ideas, new products, new words, new technologies. New things are good. New things are shiny. But which new things are going to change the world? Which technologies are going to fundamentally change how we interact with the world? How do we differentiate between hype and practical utility? And what does this all mean for the vast majority of the world?
As a non technical person, and even as a technical person, it is increasingly difficult to differentiate between the simply new and shiny and the new, shiny and potentially world changing. I’m looking forward to exploring how new technologies could change the world, and what it means for our next rotation.
Over the past few years I’ve been paying particular attention to:
Open Source Software and its wider adoption and integration
The evolution of software development and the speed with which the hard becomes easy
The deep integration of technology into every facet of our lives
I’m committing to one article a week exploring these topics and more. Follow along here or at @technicalchops on Twitter.