Understanding AI Safety

Four Background Claims: good high-level overview, maybe too concise to be useful for first exposure?
Robert Miles: really accessible explanations of various concepts in AI safety (if you want, you can speed up these videos no problemo)
- In particular (roughly ordered), check out his videos on orthogonality, instrumental convergence, mesa-optimizers
- For some proposed solutions, look at iterated distillation and amplification, and reward modelling
The summary section of the Big Fat Open phil Report on the “AI scheming” threat model.
Anthropic’s RSP: this is type specimen for the currently dominant class of AI safety approaches. Build evals to detect dangerous capabilities, scale safety measures to match, use “messily” aligned weak AGI to solve “clean” alignment
The current alignment plan and how we might improve it: good for orienting to the mindset and open problems in the RSP class of strategies, plus a version of the case for the AI control agenda. This other talk is similar and also good.
Situational Awareness: good for getting a high level strategic picture of how global AI development might unfold. I don't particularly agree with a lot of Leopold’s takes, but I think that this kind of broad scenario planning is an important kind of thing to be tracking.
I want to include some sort of resource about other aspects of the policy outlook: compute governance, the AISI’s, etc. but I don't really know of good write-ups. The broad problem is important - who are the actors that need to implement AI safety plans, and how can we get them to do that?
Eliezer’s List O’ Doom: mixed feelings about this resource. It covers a lot of important ideas very concisely. But Eliezer's perspective is pretty different from the influence-weighted average perspective on AI safety. This document focuses on his cruxes for why alignment is hard on his perspective, not the cruxes between his perspective and the “mainstream”. Maybe see also:

My Summary of Krakovna/Kumar’s Summary of Yudkowsky’s 38 Reasons Why We’re Fucked