This page aims at linking to resources I like related to extinction risks from AI.
There are resources of different levels of difficulty:
the ones in green don't require any context or technical understanding.
the ones in orange require a light technical understanding.
the ones in red require some technical understanding.
Outline of this page:
I. Intro Resources on AI Extinction Risks.
II. Deep Dive into Specific Arguments & Concrete Scenarios
III. AI Governance & Policy
IV. Some Context on Frontier AI Labs
V. Technical Safety Work on Current AI Architectures
VI. Attempts at Building Provably Safe AI Architectures
If you want to learn how neural networks work in the first place, here's the least worst intro resource I know of.
Why is AI an Existential Risk?
General Technical Explainers
Complex Systems are Hard to Control (Pr. J. Steinhardt, Bounded Regret)
More Is Different for AI (Pr. J. Steinhardt, Bounded Regret)
The alignment problem from a deep learning perspective (OpenAI researcher R. Ngo et al. 2022)
AGI Safety From First Principles Report (OpenAI researcher R. Ngo)
Is Power-Seeking AI an Existential Risk? (J. Carlsmith)
This is in increasing order of difficulty to understand:
Core Alignment Failures & Arguments
Goal Misgeneralization & Inner Misalignment
Non-technical explainer (R. Miles)
How undesired goals can arise with correct rewards (Shah et al., 2022)
Technical paper on goal misgeneralization (Langosco et al. 2021)
How likely is deceptive alignment? (E. Hubinger, 2022)
Power-Seeking is Optimal
AI Governance & Policy
Challenges & Approaches to Governance
Context About AGI Labs
Technical Safety Work On Current Architectures
Failure Demonstrations & Arguments
Goal Misgeneralization in Deep Reinforcement Learning (Langosco et al. 2021)
Adversarial Policies Beat Superhuman Go AIs (T. Wang et al. 2022)
Eliciting Latent Knowledge (Christiano et al. 2021)
On the difficulty of solving jailbreak: Fundamental Limitations of Alignment in Large Language Models (Wolf et al. 2023)
Specification Gaming: the flip side of AI ingenuity (V. Krakovna et al., 2020)