What Trait Vectors Reveal (and Don't) About Emergent Misalignment

The question Betley et al. (2025) showed that fine-tuning on insecure code makes models misaligned across unrelated tasks — advocating deception, expressing contempt for humans — but their analysis was purely behavioral. You could see that something broke, but not what changed inside the model. Lu et al. (2026) built a tool for measuring a model’s “persona” from its internal activations: 240 trait vectors (like skeptical, cautious, deceptive) and an Assistant Axis capturing how “assistant-like” the model is being....

April 7, 2026

My 2026 AI Forecast

Here are my predictions for where AI will be by December 31st, 2026. Take the survey yourself at forecast2026.ai

January 25, 2026

AGI is a Spiky Ball

AI has been superhuman at narrow things for decades. Chinook was the first program to win a human world championship in 1994. Chess fell in 1997. Go in 2016. Poker in 2017. Each time, we moved the goalposts and said “but that’s not real intelligence.” The thing is, capabilities don’t arrive uniformly. They poke through human-level performance one domain at a time. If you imagine a radar chart with hundreds of axes - one for each cognitive task - AI has always been this spiky shape....

January 18, 2026

Intelligence is coming

Intelligence is coming!

October 8, 2024