It can be hard to know just how prescient a matter is in the public consciousness from your own standpoint. In my day-to-day, AI and AI Safety seem to be significant matters that many folks are actively concerned with or engaged in. A teeming field of research and activity, attracting professionals and thinkers from a vast spectrum of backgrounds, where the ropes securing humanity’s future are collectively hoisted.
And then I’ll run into someone who is hearing the term “AI Safety” for the first time, sweat dripping from their brow onto their bare knee as they stare at me from across the gym sauna. People can’t help but become philosophical in these hotboxes, a great way to sample a population and go beyond small-talk. It’s a good reminder of the general awareness of this field, and the lack thereof specifically.
It may seem benign at the face of it. After all, those developing the tech are acutely aware, and are (largely) invested in trying to develop AI safely. However, the lack of awareness/understanding of the need for AI Safety practices elsewhere feeds the furnace of the AI arms race. The general public will be wanting hastened development and releases of ever more dazzling models, and to maintain a competitive edge internationally, etc. Okay, that’s perhaps a little exaggerated as most laymen understand, or are at least spooked by, the potential (and realized) dangers as AI is increasingly vehemently integrated into many facets of our lives in innumerable ways. Just this week we saw OpenAI ban an account for creating a Dean Phillips bot as a political aid1.
The game really has changed. I’m a healthily skeptical person, but over the course of 2023, I steadily migrated to accepting this sea change. Among other things, the use of GPT-4 as a learning aid was profound. So, these are exciting and radical times as we stampede into this terra nova. The proliferation of this technology is, and will be, immense, and with OpenAI kicking open the doors on API plugging-in2, the clock has started.
What clock exactly? ~The Clock of Catastrophe~. Okay, that’s ridiculous…to a point. Look, I’ll put my cards on the table. I do believe that X-risk scenarios of perverse instantiations and malignant failure modes etc. have real merit, at the least insofar as they are worth taking seriously. I think I would rate their likelihoods somewhat conservatively at present3, but their consequences are so massive that marginal likelihoods are enough. Regardless, the investigation of these ideas and their kin enhance the AIs developed, giving both producers and consumers more peace of mind.
Returning to the clock and sense of urgency then, what keeps me up at night is not so much rogue superintelligences, but 1) regular inadvertent risks like economic hardships or mortalities, and, most of all, 2) malicious actors. Towards the latter, I’m of the mind that there are a great number of misanthropists who would relish the challenge of seeing what pain and havoc they could produce with AI tools. METR has already shown GPT-4 and Claude display early signs of competency in executing certain steps in service of Autonomous Replication. In my own investigating and exposure, it is evident that both the capabilities and the compliance of LLMs in performing nefarious tasks is close enough that with humans-in-the-loop bad actors could get pretty far. For capabilities, it may not carry out the entire scope of your plan, but it could certainly help get you there. In terms of coercion, prompt injections and other coercive framings have so far been very potent tools. And don’t get me started on how the AI arms race is accelerating capabilities…
So here we are. In the foyer of quite a curious mansion. On our journey up, we have told stories and imaginings of what beauties and misfortunes are down the halls and behind the doors within. A terribly exciting place to be. Some paths have feasts and comforts unrivalled, but some paths have lions. We worked hard to get the keys to this place, let’s not let the bratty children run off with them, and for god’s sake knock before opening.
N.B. The app was upfront about it being a bot impersonator to users, but preventing political ghost writers or full-scale ghost politicians could prove a very difficult task.
A choice that was surely amazing for their legal department’s pockets.
Here I’m specifically thinking of superintelligences accidentally doing the wrong thing to catastrophic effect. Though I am tempted, I will relegate digging into likelihoods of the suite of X-risk scenarios for another time.