Glossary
The jargon, demystified. This field loves acronyms and insider words; none of it is as hard as it sounds. When you hit a term you don’t know in any note, check here. Tell me what’s missing and I’ll add it.
Don't pretend to know a word
Nobody good in this field expects you to know all the jargon. Asking “what does X mean?” is a strength signal, not a weakness one. Half of these I had to look up too.
The fields & tracks
- Alignment — the problem of getting AI systems to reliably do what we intend, especially as they get more capable. (Alignment and AI Safety)
- AI safety — the broader effort to make advanced AI go well; includes alignment, evals, control, security, and more.
- Interpretability (“interp”) — understanding what’s happening inside a model. (Interpretability)
- Mechanistic interpretability (“mech interp”) — the bottom-up flavor: reverse-engineering specific circuits/computations.
- Governance — the rules, institutions, and incentives around AI. (Policy and Governance)
- Research scientist vs. research engineer — designs the questions vs. builds the experiments. (Research Scientist / Research Engineer)
Technical terms
- Transformer — the neural-network architecture behind modern LLMs. The thing you should be able to build from scratch. (Deep learning and transformers)
- LLM — large language model.
- PyTorch — the dominant Python framework for building/training models. (Python and PyTorch)
- RL (reinforcement learning) — training via reward signals rather than labeled examples.
- Fine-tuning — adapting an existing trained model to a specific task/domain. (Applied and Product ML)
- Evals (evaluations) — systematic measurement of what a model can/can’t or will/won’t do.
- Compute — the raw processing (GPUs/TPUs) needed to train/run models; a real bottleneck and a policy lever.
- Backprop (backpropagation) — how networks learn; it’s the chain rule doing its job. (Math you actually need)
Safety sub-areas (you’ll hear these a lot)
- Scalable oversight — supervising systems that may be better than us at the task.
- AI control — safely getting useful work from models we don’t fully trust.
- Adversarial robustness — resistance to jailbreaks/manipulation.
- Model organisms (of misalignment) — deliberately built examples of failure modes, for study.
- Model welfare — the (newer) question of moral consideration for AI systems.
- Red-teaming — deliberately trying to make a system fail/misbehave to find problems.
- Threat model — the specific story of what could go wrong that a piece of work is trying to address. Being able to state yours is a mark of a serious applicant. (Alignment and AI Safety)
Career / ecosystem words
- Fellowship / residency — structured, usually-funded, time-boxed programs to do real work + get mentored. Residencies tend to be longer and more job-like.
- Cohort — the group that goes through a program together; programs run repeated cohorts (so a closed deadline ≠ closed door).
- Work test / take-home — a paid or unpaid task in an application that simulates the real work. (Applications (the actual mechanics))
- LISA — London Initiative for Safe AI; a co-working hub where several London programs (ARENA, LASR, Pivotal) are based.
- Career capital — the skills, credentials, and connections that make your next step easier. (Career strategy (small but matters))
- EA Forum / LessWrong — community forums where much of the safety field discusses and announces things.
Back to Home · Reading and Courses