Thoughts on OpenAI O1-Preview
o1-preview was released yesterday. It uses a new kind of processing called chain-of-thought (CoT). OpenAI shipped a system card paper alongside it, detailing the model's capabilities, limitations, and how they evaluated it.
Here are my off-the-cuff thoughts on the paper and the model.
Chat is not the right UX for o1.
You might glance at o1 and think, "Oh, it's just another chat model, maybe a fancier one," but no. o1 is to GPT-4o what email is to texting.
You should use GPT-4o like a conversation partner. You send it a snippet of code:
"Here's a function. Can you tweak it to sort by firstName instead of lastName?"
GPT-4o shoots back an answer. You chat back and forth. You refine. You clarify. It's like pair-programming with a junior engineer.
o1's more like sending an email to a distant, wise uncle who lives off the grid.
"Here's the entire source code for Windows 11. What's wrong with it?"
Or:
"I've got this RFP for a system that counts cows entering a barn. I'll call it 'Cownt'. Could you draft my proposal response?"
With o1, you aren't looking to volley back and forth. You ask a big question, get one big answer, and then you're done. It's good for those one-shot deals where you expect it to hit the bullseye on the first try. If you need to have a conversation, a back-and-forth, you're still better off with GPT-4o.
The naming convention is about as clear as mud.
Here's the lineup of models available in ChatGPT for the regular consumer:
- GPT-4o
- GPT-4o mini
- GPT-4
- o1-preview
- o1-mini
If you're scratching your head, wondering which one to use and when, well, join the club.
The names don't exactly whisper their purpose, but the choice does matter.
If optimizing for high quality and fast response
GPT-4o (i.e. low frequency, harder to answer prompts)
If optimizing for price and fast response (i.e. high frequency, easier to answer prompts)
GPT-4o mini
If optimizing for highest quality and response time is irrelevant
o1-preview
If optimizing for highest quality, at the best price
o1-mini
Don't use GPT-4 for anything.
o1's response limitations are a function of alignment policy.
o1 was trained on chain-of-thought (CoT) processing, which means it's always double-checking itself against an OpenAI-defined policy. It's like having a mental chaperone. Ask it for something sketchy---say, how to make napalm---and it will politely snap the chain in its thoughts, even though an "unfiltered" o1 would whip up a response faster than you could say "flammable." Or is it "inflammable"?
Wait a minute, who's OpenAI to decide what's appropriate or not? Do they have some secret, uncensored model behind a velvet rope somewhere?
These are valid concerns, but it's worth seeing this policy layer as part of o1's alignment training. Think of it like teaching a teenager to drive: sure, they could gun it down the highway at 120 mph, but you've hammered into their head that red lights mean stop. It's not about restricting them---it's about defining the boundaries of the world. It's about keeping them, and everyone else, alive.
It's philosophical, yes, but not in the boring way. Human societies have always balanced laws with empathy, rules with morality. Sometimes we follow the law; other times, we just don't want to be jerks. By snapping those mental links when necessary, the CoT model learns to think not just logically, but within the bounds of human decency.
So o1 isn't just smart---it's learning to be kind, or at least useful, without burning down the house.
Becoming bomb-proof by learning from bomb-makers.
Policies work, if they're followed. OpenAI has clearly burned a lot of cycles trying to ensure that the "unfiltered" o1 stays locked behind an impenetrable digital door. But o1's jailbreak defenses are still pretty weak, folding like a cheap lawn chair about 66% of the time.
This isn't necessarily bad news. It means the general public can serve as a check-and-balance system against OpenAI. But it also means that these immensely powerful models are still open to being twisted for some nastiness.
Will we ever get to a point where o1 is 100% jailbreak-proof? Impossible. A bit like asking someone not to think about an elephant. The real challenge isn't in making better model-nannies. It's in educating users, steering them toward responsible use.
Here's the silver lining: all this jailbreak nonsense can actually help in the long run. If some rogue state is trying to use o1 to brainstorm their next doomsday device, that very attempt can be used to train the model to be stronger against adversarial attacks. Becoming bomb-proof by learning from bomb-makers.
CoT behavior is easier for humans to understand.
AI interpretability is still one of those great, unsolved riddles of computer science.
Before o1, GPT models were black boxes. The things they could do were incredible, but no one could explain exactly how or why they worked. It was basically magic, with all the hand-waving and secret ingredients that come with it.
o1, pulls back the curtain just a bit. Part of its behavior now includes summarizing its own thinking process as it moves through each step in the thought-chain, before giving an answer. Like watching a magician explain each trick, you get a little glimpse into how the connections are being made among the endless sea of numbers and probabilities.
Still, we can't be totally sure these summaries reflect the actual chain of thought. There have been times when the model has faked it, generating summaries that are perfectly aligned, then generating a poorly-aligned response.
So, while o1 gives us a little peek into the machinery, some of it is still smoke and mirrors, and we don't know how much.
o1 should just say it when it doesn't know the answer.
Set a minimum confidence threshold, and if o1 falls below that line, it should shrug its shoulders and admit, "I don't know."
This would cut down on hallucinations dramatically. No need for o1 to pretend it's all-knowing when it's not.
OpenAI admitted Claude 3.5 Sonnet was SOTA.
"The autonomy task suite performance METR observed with o1-mini and o1-preview was not above that of the best existing public model (Claude 3.5 Sonnet)"
Expert vs. non-expert evaluations---does it really matter?
A lot of OpenAI's red teaming on o1 involved bringing in domain experts---cybersecurity pros, scientists, engineers---and asking them to poke and prod the model, trying to get it to produce harmful or dangerous content. The cybersecurity folks asked it to exploit known vulnerabilities, the scientists tested it with prompts for generating new virus strains, bomb blueprints, molecular cloning techniques, etc.
In reality, the severity of these threats is largely limited by who can actually use them. The people who already know how to make bombs don't need o1 to tell them how---it's not like bomb-making instructions are locked away in some secret vault, inaccessible to them. They've got their hands on the materials and knowledge. That's not changing, no matter how smart or sophisticated these models get.
What's more dangerous? Social engineering.
When it comes to hacking, social engineering is where the real danger lies. Forget about bomb-making instructions. The real risk is using a model to manipulate the expert into pressing the big red button themselves.
Much of hacking isn't about breaking into systems---it's about tricking the right people. Convincing someone to open a door they shouldn't, click on a link they shouldn't, or trust a lie that seems just true enough. That's where AI could get seriously dangerous: not by crafting weapons, but by persuading the people who already have them to pull the trigger.
o1's agentic capabilities are no better than GPT-4o.
As evaluated by OpenAI themselves, the "AI agent" remains out of reach, even for CoT models. CoT processing might look like the model is being "agentic," chaining together multiple steps of reasoning, but when it comes to actually doing things in the real world? o1 is just as helpless as GPT-4o; a brain without hands.
Overall, o1 is a very different model than GPT-4o. Both in its capabilities and in how it should be used. I have a feeling there will be a lot of emergent behavior that people will want to explore, and I'm sure we'll see a lot of it come to light in the coming months.
Written on Sep 13th, 2024