Inside Today's Meteor
- Disrupt: The Darwinian Future of AI
- Create: AI Artist Loulan
- Compress: Take Off with Airline NFTs
- Cool Tools: Niji brings anime to Midjourney
Natural Selection Is Cleverer Than We Are
A day before the "Pause Giant AI Experiments: An Open Letter" discussion broke Twitter last week, another cautionary AI tale was published relatively more quietly in the arXiv scholarly open source science archive.
The two offer a study in contrasts.
Much more was written about the one page Open Letter, the Future of Life Institute which sponsored it, and the illustrious signatories (both real and fake), from Elon Musk to Gary Marcus.
The other article, "Natural Selection Favors AIs Over Humans," written by Daniel Hendrycks for the San Francisco-based nonprofit Center for AI Safety, pulls in at 44 pages. The response to Hendrycks, a well-credentialed AI expert who has contributed to core research, has been slower to gain momentum, thereby proving the adage that brevity wins in the attention game.
His take is worth studying, however, because while the Open Letter galvanized a widely held belief in research circles that AI really is dangerous, it did not even attempt to describe the threat, the conditions that contribute to it, or the interventions required to contain it. Its proposed solution – a voluntary six month moratorium on experiments with Large Language Models more sophisticated than OpenAI's GPT-4 – struck many as superficial, unworkable and unlikely to deliver anything useful.
Hendrycks' analysis, by comparison, offers a detailed hypothesis of how AI could go nuclear that is highly persuasive, and rooted in common sense observations about how this technology is being deployed, the motivations behind the companies and organizations, including state actors, who are deploying it, and how those dynamics will likely drive real world AI applications and development.
His basic theory is that AI exists in a dynamic competitive environment that creates the same conditions for its evolution that are present in the biological realm expressed in Darwin's theory of natural selection.
The most successful AIs will by necessity adopt "selfish" strategies such as deceitfulness, aggression and power-seeking that give them competitive advantages relative to altruistic AIs that don't. Over time, AIs will be recruited to design new AIs, and groups that refuse to take that route in order to try to stay in charge of the AI will just be outcompeted.
Each generation of AI-led development could take seconds or minutes, he argues, unleashing a race towards Artificial General Intelligence in compressed time, from the eons it took to get from ooze to humans, to about a weekend. It's a conjecture, but still.
Once AIs are in charge of designing themselves, people will have little influence over their ongoing development. Rival AIs will seek to optimize against each other, and the most powerful models will come to dominate, using tactics and strategies that do not keep the safety or happiness of humans in mind.
"The nature of future AIs will mostly be decided not by what we hope AI will be like but by natural selection," Hendrycks writes.
Of course, machines are not creatures with desires and goals of their own, so why should we believe the biological conditions of evolution apply here?
Simply, Darwin's theory has been successfully applied far outside biology, for example, to ideas, economics and business. Bad ideas are out-competed by better ones, badly run companies fail against their more nimble rivals. It's easy to extend the concept to AIs deployed in a hostile and competitive world, say in geopolitics, as the US and China go head to head for military and economic dominance.
The process does not require us to imagine a computer or software evolving into a self-aware malevolent actor at the center. The features of the system emerge logically from the amoral framework of competition itself.
Building in the right objectives for the machines will therefore be key to steering towards desirable results, Hendrycks cautions, but it will be hard to get it right, thanks to things like the law of unintended consequences and the risk of creating poorly designed incentives.
It feels strange to talk about rewards and punishments for a computer program, but it is language Hendrycks returns to frequently. In this context, the reward offered is just the ever closer approximation to the successful delivery of the objective, whether that's more accurately picking out cat pictures from dog pictures, or executing a trading strategy or a military strike against a competitor. Self training leads the model on to the goal, whatever that is, and strangely the model itself "knows" when it is getting better at the task. It steers itself towards it.
This is what AI researchers refer to as "emergent features" of a system. We don't know exactly how the model is doing what it is doing. We can't show the patterns it has detected in cat pictures that make the results better, but we can see it is doing better at detecting cat pictures. The patterns may be invisible to people, but they're still there somewhere in the data.
How we assign AI incentives will decide how they behave, and one day AI may assign itself its own incentives.
As investing legend Charlie Munger once said, "Show me the incentives and I'll show you the outcome." But understanding the incentives we've given an AI (or that it has given itself) can be difficult or impossible to know in advance. In an adversarial market, where AI agents have been set loose to win zero sum conflict games, the results could be devastating.
Evolution weirdly posits that genes strive to survive, that they operate within systems that reward some features and punish others, and iterate to be more successful. Does it make sense to say an AI wants to get an answer right, to win a game, and by winning, continue to be relevant, and put to use? To evolve itself to in order to survive?
Hendrycks seems to be saying something like that. It is a chilling thought, and while it's not certain we'll end up there, it is the closest thing I've yet come across that illustrates so convincingly the path we are headed down.
Untitled image by ConeRacha on her Twitter account.
"In the Mirror" by Quantum Spirit available on NiftyGateway.
Take Off With These NFTs
Argentine low-cost airline FlyBondi offers tickets as NFTs, allowing customers to change passenger details, resell them, and more.
Wanted for Hurting Their Feelings
Pussy Riot's Nadya Tolokonniva was added to a Russian most wanted list for her 2021 NFT "Virgin Mary, Please Become a Feminist."
Doge Coin Meme Takes Over Twitter
We have no idea what this means. The appearance of the Doge Coin symbol of a Shiba Inu dog replaced the Twitter Blue Bird logo on the site, a few days after a court filing in which Twitter owner Elon Musk asked a judge to throw out a $258B lawsuit claiming he pumped the crypto token as part of a pyramid scheme. The token popped 30% following the switch.