My Summary of Krakovna/Kumar’s Summary of Yudkowsky’s 38 Reasons Why We’re Fucked | Notion

Key Takeaways

One-shot thesis (A1)
No unilateral avoidance (A2, A3, A16)
No safe pivotal acts (A4, A13, A16, A20)
Inner alignment hard (A5, A7, A8, A9, A17, A25)
Interpretability hard (A5, A9, A15, A16, A17, A18, A28)
New alignment failure modes emerge with increasing intelligence (A6, A26, A32, A33)
Outer alignment hard (A10, A14, A19, A27)
Generalizing capabilities is easier than generalizing alignment (A11, A12, A21)
Superhuman AGI is scary (A22, A23, A24, A25)
Multipolar schemes are dangerous (A30, A31)
Civilizational inadequacy (A35, A36, A37, A38)
A29 is sort of a misc point about inner misalignment of language models specifically

Point-By-Point Summaries

A1: we only get one chance to align an agent with the ability to kill us - if we fail, we will be killed, and will not be able to try again. (the One-Shot Thesis)
A2: someone will build AGI eventually, we (people concerned about alignment) can't unilaterally avoid the problem by not doing it ourselves.
A3: someone will build AGI strong enough to be dangerous, we can't unilaterally stick to safe systems.
A4: no pivotal act (which prevents dangerous agents from coming online until we’ve solved alignment) can be performed by an agent not powerful enough to kill us