Siméon - Resources on AI Risks

This page aims at linking to resources I like related to extinction risks from AI.

There are resources of different levels of difficulty:

the ones in green don't require any context or technical understanding.
the ones in orange require a light technical understanding.
the ones in red require some technical understanding.

Outline of this page:

I. Intro Resources on AI Extinction Risks.

II. Deep Dive into Specific Arguments & Concrete Scenarios

III. AI Governance & Policy

IV. Some Context on Frontier AI Labs

V. Technical Safety Work on Current AI Architectures

VI. Attempts at Building Provably Safe AI Architectures

If you want to learn how neural networks work in the first place, here's the least worst intro resource I know of.

Introductory Resources on AI Extinction Risks

Don’t Look Up - The Documentary: The Case for AI as an Existential Threat (17 min film)
‘Godfather of AI’ says AI could kill all humans and there might be no way to stop it, (Turing Prize Winner G. Hinton, CNN)
We must slow down the race to God-like AI, (AI investor I. Hogarth, Financial Times)
There’s more regulation on selling sandwiches than on ‘God-like’ tech, (AI Lab CEO C. Leahy, CNN)
The importance of AI alignment, explained in 5 points, (D. Eth)
The ‘Don’t Look Up’ Thinking That Could Doom Us With AI (MIT professor M. Tegmark, TIME)
Pausing AI Development Isn’t Enough. We Need to Shut it All Down (Early AGI researcher E. Yudkowsky, TIME)

Why is AI an Existential Risk?

General Technical Explainers

Complex Systems are Hard to Control (Pr. J. Steinhardt, Bounded Regret)
How Rogue AIs may Arise (Turing Prize Winner Y. Bengio, personal blog)
More Is Different for AI (Pr. J. Steinhardt, Bounded Regret)
The importance of AI alignment, explained in 5 points, (D. Eth)
The alignment problem from a deep learning perspective (OpenAI researcher R. Ngo et al. 2022)
AGI Safety From First Principles Report (OpenAI researcher R. Ngo)
Is Power-Seeking AI an Existential Risk? (J. Carlsmith)

Concrete Scenarios

This is in increasing order of difficulty to understand:

Core Alignment Failures & Arguments

Corrigibility
- Non-technical explainer (R. Miles)
- Relatively technical article (E. Yudkowsky)
- Technical foundational paper (N. Soares)
Specification Gaming
- Non-technical cool examples (R. Miles)
- Simple technical blogpost (V. Krakovna)
Goal Misgeneralization & Inner Misalignment
- Non-technical explainer (R. Miles)
- How undesired goals can arise with correct rewards (Shah et al., 2022)
- Technical paper on goal misgeneralization (Langosco et al. 2021)
- How likely is deceptive alignment? (E. Hubinger, 2022)
Power-Seeking is Optimal
- Non-technical explainer of instrumental convergence (R. Miles)
- Technical papers
  - Proving it under strong assumptions (A. Turner et al. 2019)
  - Proving it under lighter assumptions (A. Turner et al. 2022)

AI Governance & Policy

Compute Governance

A Case for Compute Governance: Arms Control for Artificial Intelligence (CNAS)
A Techical Roadmap To Leverage Compute Governance as a Verification Mechanism (Y. Shavit, 2023)
The Semiconductor Supply Chain (CSET)
- Chip Supply Chain Map

If you want to read more: A Reading List on Compute Governance (L. Heim, 2022)

Challenges & Approaches to Governance

Key Considerations for AI International Cooperation (M. Baker, 2023)
The AI Deployment Problem (H. Karnofsky, 2022)
Auditing: Model evaluations for extreme risks (T. Shevlane et al., 2023)

Context About AGI Labs

OpenAI's Story

DeepMind's Story

Anthropic's Story

Technical Safety Work On Current Architectures

Introductory Resources on AI Extinction Risks

Why is AI an Existential Risk?

General Technical Explainers

Concrete Scenarios

Core Alignment Failures & Arguments

AI Governance & Policy

Compute Governance

Challenges & Approaches to Governance

Context About AGI Labs

OpenAI's Story

DeepMind's Story

Anthropic's Story

Technical Safety Work On Current Architectures

Interpretability

Evaluating Models

Failure Demonstrations & Arguments

Attempts at Building Provably Safe Architectures (Technical)

David Dalrymple's Open Agency Architecture

The Learning-Theoretic Agenda

John Wentworth's Plan

Cognitive Emulation (CoEm)