Glossary

The jargon, demystified. This field loves acronyms and insider words; none of it is as hard as it sounds. When you hit a term you don’t know in any note, check here. Tell me what’s missing and I’ll add it.

Don't pretend to know a word

Nobody good in this field expects you to know all the jargon. Asking “what does X mean?” is a strength signal, not a weakness one. Half of these I had to look up too.

The fields & tracks

  • Alignment — the problem of getting AI systems to reliably do what we intend, especially as they get more capable. (Alignment and AI Safety)
  • AI safety — the broader effort to make advanced AI go well; includes alignment, evals, control, security, and more.
  • Interpretability (“interp”) — understanding what’s happening inside a model. (Interpretability)
  • Mechanistic interpretability (“mech interp”) — the bottom-up flavor: reverse-engineering specific circuits/computations.
  • Governance — the rules, institutions, and incentives around AI. (Policy and Governance)
  • Research scientist vs. research engineer — designs the questions vs. builds the experiments. (Research Scientist / Research Engineer)

Technical terms

  • Transformer — the neural-network architecture behind modern LLMs. The thing you should be able to build from scratch. (Deep learning and transformers)
  • LLM — large language model.
  • PyTorch — the dominant Python framework for building/training models. (Python and PyTorch)
  • RL (reinforcement learning) — training via reward signals rather than labeled examples.
  • Fine-tuning — adapting an existing trained model to a specific task/domain. (Applied and Product ML)
  • Evals (evaluations) — systematic measurement of what a model can/can’t or will/won’t do.
  • Compute — the raw processing (GPUs/TPUs) needed to train/run models; a real bottleneck and a policy lever.
  • Backprop (backpropagation) — how networks learn; it’s the chain rule doing its job. (Math you actually need)

Safety sub-areas (you’ll hear these a lot)

  • Scalable oversight — supervising systems that may be better than us at the task.
  • AI control — safely getting useful work from models we don’t fully trust.
  • Adversarial robustness — resistance to jailbreaks/manipulation.
  • Model organisms (of misalignment) — deliberately built examples of failure modes, for study.
  • Model welfare — the (newer) question of moral consideration for AI systems.
  • Red-teaming — deliberately trying to make a system fail/misbehave to find problems.
  • Threat model — the specific story of what could go wrong that a piece of work is trying to address. Being able to state yours is a mark of a serious applicant. (Alignment and AI Safety)

Career / ecosystem words

  • Fellowship / residency — structured, usually-funded, time-boxed programs to do real work + get mentored. Residencies tend to be longer and more job-like.
  • Cohort — the group that goes through a program together; programs run repeated cohorts (so a closed deadline ≠ closed door).
  • Work test / take-home — a paid or unpaid task in an application that simulates the real work. (Applications (the actual mechanics))
  • LISA — London Initiative for Safe AI; a co-working hub where several London programs (ARENA, LASR, Pivotal) are based.
  • Career capital — the skills, credentials, and connections that make your next step easier. (Career strategy (small but matters))
  • EA Forum / LessWrong — community forums where much of the safety field discusses and announces things.

Back to Home · Reading and Courses