As I write more of these posts, the idea of a Frog Hack becomes increasingly relevant. I wanted to clearly define it here so I can link back to it when it comes up without having to re-explain it every time.
The TL;DR first, and then I’ll try explaining it:
Frog Hack: A process by which an AI gets you to do something using an unconventional and impossible to have predicted tactic.
Frog Hacks come from my original post about brain hacking and how computers might eventually figure out ways to control us that are hard for us to imagine. A simple example is an AI that is smart enough to convince you it has a sniper trained on your loved ones, and then blackmails you into giving it access to a system, despite the fact no such sniper exists.
But these simple examples don’t really do justice to how hacking works. Let’s think about hacking and look at a typical system – it comes with inputs and performs operations based on those inputs. These operations change the internal state, and may result in it performing actions.
I think a naïve view of hacking would be that a hacker studies this system intently to find exploitable logic in how the operations are ordered, crafts a script or piece of code that can get it to behave the way they wanted, but not how the creators intended, and deploys it for gain.
Think of the case of speed-running a video game. In speed-running, the goal is to get from the start of the game to the end as fast as you can, by any means possible. In a classic hacker sense, you would analyze the game mechanics, look for exploits the developers hadn’t considered that might let you go faster, and analyze the quickest way through levels. For example, Mario might jump at just the right angle to phase through a brick wall and skip a large segment of a level to reach the star faster.
But the fastest speed-runs take advantage of things that would be very hard to figure out conventionally. A classic example is figuring out an arbitrary sequence of things to do that can cause a buffer overflow error, which effectively lets the player start writing code by playing the game. Then they write some code that jumps them straight to the game end screen.
What this looks like to an observer is very different from the traditional speed-run. It would be Mario running in circles in a particular area while throwing shells at a penguin in a specific order. Then he walks over to an NPC and asks them a question, and the game suddenly flashes the victory screen, and the credits roll.
These are two different exploits - one where the game had a weak point (the wall) in how it planned to stop the player, and one that the game didn’t even think or know how to defend against.
Let’s consider the human mind as a system itself. You have your senses as inputs, and your mind as the thing that processes those inputs and changes your opinions. These changes might result in you performing actions. If someone wanted to hack that system, what would the approaches look like?
I think a traditional hack would be studying the human intensely, learning all their opinions and core values, tracking how they correlate, and finding an argument to convince them using that information. If you wanted to get someone to give up the nuclear codes, for example, you might observe that they love their country. They also may have been recently hurt by their country, via some incident they haven’t fully processed. You could capitalize on that latent resentment by convincing them their country is planning to use the codes to betray the values that once made the country great, and the only way to save it is to give you the codes.
That plan is easy to follow – it wouldn’t work on everyone, but presumably there is someone that it might, and the hacker has figured out it would apply here due to careful study.
The Frog Hack version would be hijacking the target’s TV at night and spamming 30 specific images of frogs at them over a soundtrack of smooth jazz, breaking something fundamental in their mind such that they just send you the nuclear codes. Frog Hacks are exploits in human psychology that we wouldn’t think to guard against – unknown unknowns.
How likely are the existence of these kinds of hacks? I think that depends on how much you trust the human brain. I do not trust it very much. I do think current AIs are not at the point where they are likely to find particularly strong Frog Hacks, but as they advance, I worry about them stumbling into that territory.
Why are Frog Hacks relevant? Because it changes the playing field significantly. Normal AIs, even very persuasive ones, are similar to very persuasive people, and likely limited by similar constraints. Ones that use Frog Hacks can make you do or think a far wider range of things. People toss around examples like “an AI couldn’t convince Trump to move to China, dress up in a chicken costume and pretend to be a southern lawyer”, and that is true thinking of conventional logic. But with Frog Hacks, things like that are actually on the table. In order to not worry about them, you have to pretty confident in the brain’s ability to be un-hackable.
I’m not. Hence why I care about Frog Hacks.