The wave of enthusiasm around generative networks feels like another Imagenet moment - a step change in what ‘AI’ can do that could generalise far beyond the cool demos.
What can it create, and where are the humans in the loop?
A decade or so ago, systems based on something called ‘machine learning’ started producing really good results in Imagenet, a contest for computer vision researchers.
Those researchers were excited, and then everyone else in AI was excited, and then, very quickly, so was everyone else in tech, as it became clear that this was a step change in what we could do with software that would generalise massively beyond cool demos that recognised cat pictures.
We might be going though a similar moment now around generative networks. 2m people have signed up to use ChatGPT, and a lot of people in tech are more than excited, and somehow even more excited than they were about using the same tech to make images a few weeks ago.
How does this generalise? What kinds of things might turn into a generative ML problem? What does it mean for search (and why didn’t Google ship this)?
Can it write code? Copy? Journalism? Analysis? And yet, conversely, it’s very easy to break it - to get it to say stuff that’s clearly wrong.
The wave of enthusiasm around chat bots largely fizzled out as people realised their limitations, with Amazon slashing the Alexa team last month. What can we think about this, and what’s it doing?
The conceptual shift of machine learning, it seems to me, was to take a group of problems that are ‘easy for people to do, but hard for people to describe’ and turn them from logic problems into statistics problems.
Instead of trying to write a series of logical tests to tell a photo of a cat from a photo of a dog, which sounded easy but never really worked, we give the computer a million samples of each and let it do the work to infer patterns in each set.
Instead of people trying to write rules for the machine to apply to data, we give the data and the answers to the machine and it calculates the rules.
This works tremendously well, and generalises far beyond images, but comes with the inherent limitation that such systems have no structural understanding of the question - they don’t necessarily have any concept of eyes or legs, let alone ‘cats’.
To simplify hugely, generative networks run this in reverse - once you’ve identified a pattern, you can make something new that seems to fit that pattern.
So you can make more picture of ‘cats’ or ‘dogs’, and you can also combine them - ‘cats in space suits’ or ‘a country song about venture capitalists rejecting founders’.
To begin with, the results tended to be pretty garbled, but as the models have got it better the outputs can be very convincing.
However, they’re still not really working from a canonical concept of ‘dog’ or ‘contract law’ as we do (or at least, as we think we do) - they’re matching or recreating or remixing a pattern that looks like that concept.
I think this is why, when I ask ChatGPT to ‘write a bio of Benedict Evans’, it says I work at Andreessen Horowitz (I left), worked at Bain (no), founded a company (no), and have written some books (no).
Lots of people have posted similar examples of ‘false facts’ asserted by ChatGPT. It often looks like an undergraduate confidently answering a question for which it didn’t attend any lectures.
It looks like a confident bullshitter, that can write very convincing nonsense.
OpenAI calls this ‘hallucinating’.
But what exactly does this mean?
Looking at that bio again, it’s an extremely accurate depiction of the kind of thing that bios of people like me tend to say. It’s matching a pattern very well.
Is that false? It depends on the question.
These are probabilistic models, but we perceive the accuracy of probabilistic answers differently depending on the domain.
To put this another way, some kinds of request don’t really have wrong answers, some can be roughly right, and some can only be precisely right or wrong, and cannot be ‘98% correct’.
So, the basic use-case question for machine learning as it generalised was “what can we turn into image recognition?” or “what can we turn into pattern recognition?”
The equivalent question for generative ML might be “what can we turn into pattern generation?” and “what use cases have what kinds of tolerance for the error range or artefacts that come with this?”
This might be a useful way to think about what this means for Google - what kind of questions are you asking?
How many Google queries are searches for something specific, and how many are actually requests for an answer that could be generated dynamically, and with what kinds of precision?
If you ask a librarian a question, do you ask them where the atlas is or ask them to tell you the longest river in South America?
But more generally, the breakthroughs in ML a decade ago came with cool demos of image recognition, but image recognition per se wasn’t the point - every big company has deployed ML now for all sorts for things that look nothing like those demos.
The same today - what are the use cases where pattern generation at a given degree of accurate is useful, or that could be turned into pattern generation, that look nothing like the demos?
What’s the right level of abstraction to consider this?
Qatalog, a no-code collaboration tool, is now using generative ML to make new apps - instead of making a hundred templates and asking the user to pick, the user types in what they want and they system generates it (my friends at Mosaic Ventures are investors).
This don’t look like the viral generative ML demos, and indeed it doesn’t look like ML at all, but then most ML products today don’t ‘look’ like ML - that’s just how they work. So, what are the use cases that aren’t about making pictures or text at all?
There’s a second set of questions, though: how much can this create, as opposed to, well, remix?