Glossary

The jargon, demystified. This field loves acronyms and insider words; none of it is as hard as it sounds. When you hit a term you don’t know in any note, check here. Tell me what’s missing and I’ll add it.

Don't pretend to know a word

Nobody good in this field expects you to know all the jargon. Asking “what does X mean?” is a strength signal, not a weakness one. Half of these I had to look up too.

The fields & tracks

Alignment — the problem of getting AI systems to reliably do what we intend, especially as they get more capable. (Alignment and AI Safety)
AI safety — the broader effort to make advanced AI go well; includes alignment, evals, control, security, and more.
Interpretability (“interp”) — understanding what’s happening inside a model. (Interpretability)
Mechanistic interpretability (“mech interp”) — the bottom-up flavor: reverse-engineering specific circuits/computations.
Governance — the rules, institutions, and incentives around AI. (Policy and Governance)
Research scientist vs. research engineer — designs the questions vs. builds the experiments. (Research Scientist / Research Engineer)

Technical terms

Transformer — the neural-network architecture behind modern LLMs. The thing you should be able to build from scratch. (Deep learning and transformers)
LLM — large language model.
PyTorch — the dominant Python framework for building/training models. (Python and PyTorch)
RL (reinforcement learning) — training via reward signals rather than labeled examples.
Fine-tuning — adapting an existing trained model to a specific task/domain. (Applied and Product ML)
Evals (evaluations) — systematic measurement of what a model can/can’t or will/won’t do.
Compute — the raw processing (GPUs/TPUs) needed to train/run models; a real bottleneck and a policy lever.
Backprop (backpropagation) — how networks learn; it’s the chain rule doing its job. (Math you actually need)

Safety sub-areas (you’ll hear these a lot)

Scalable oversight — supervising systems that may be better than us at the task.
AI control — safely getting useful work from models we don’t fully trust.
Adversarial robustness — resistance to jailbreaks/manipulation.
Model organisms (of misalignment) — deliberately built examples of failure modes, for study.
Model welfare — the (newer) question of moral consideration for AI systems.
Red-teaming — deliberately trying to make a system fail/misbehave to find problems.
Threat model — the specific story of what could go wrong that a piece of work is trying to address. Being able to state yours is a mark of a serious applicant. (Alignment and AI Safety)

Career / ecosystem words

Fellowship / residency — structured, usually-funded, time-boxed programs to do real work + get mentored. Residencies tend to be longer and more job-like.
Cohort — the group that goes through a program together; programs run repeated cohorts (so a closed deadline ≠ closed door).
Work test / take-home — a paid or unpaid task in an application that simulates the real work. (Applications (the actual mechanics))
LISA — London Initiative for Safe AI; a co-working hub where several London programs (ARENA, LASR, Pivotal) are based.
Career capital — the skills, credentials, and connections that make your next step easier. (Career strategy (small but matters))
EA Forum / LessWrong — community forums where much of the safety field discusses and announces things.

Back to Home · Reading and Courses

A Field Guide to AI Fellowships

Explorer

Glossary

Glossary

The fields & tracks

Technical terms

Safety sub-areas (you’ll hear these a lot)

Career / ecosystem words

Graph View

Table of Contents

Backlinks