Prerequisites

This is a learning repo, so the bar is deliberately low — but not zero. Here's the honest floor, what's merely helpful, and what we plan to demonstrate.

The single most liberating fact. Inference is the forward pass only. We’re running a model someone else already trained — so no training, no backpropagation, no gradients, no calculus. Inference is: look up some vectors, multiply matrices, normalize, pick the most likely next token, repeat. That’s the whole game.

Three kinds of knowledge

🟢 THE FLOOR

Have a feel for these

Have some feel before starting. If one is shaky, spend an hour with a resource below — you don’t need mastery.

🟡 HELPFUL

Brush up as you go

Nice to have seen once. You can pick these up while following along, milestone by milestone.

🔵 WHAT WE’LL DEMONSTRATE

Come curious, be ready to dig in

Attention, RoPE, RMSNorm, GQA, SwiGLU, the KV cache, quantization, Metal kernels, BPE internals. Each is a milestone with its own doc.

What “enough” looks like

Thing	“Enough” looks like	Brush up with
Vectors & matmul	A matmul is rows-dot-columns, and shapes must line up (`[m×k]·[k×n]=[m×n]`). ~80% of what an LLM does.	3Blue1Brown, Essence of Linear Algebra
A forward pass	Inputs → weighted sums → a nonlinearity → outputs, stacked in layers. You don’t need to know how it’s trained.	3Blue1Brown, Neural Networks ch. 1–2
Basic Rust	`struct`/`enum`, `Vec`, slices, `Option`/`Result`, ownership, `match`. Not: async, macros, lifetime gymnastics.	The Rust Book (ch. 1–10), `rustlings`
Command line + git	clone, branch, commit; run a binary; navigate folders.	(you’re already here)
Bytes & number types	Roughly what `f32`/`f16`/`bf16`/`int8` are; that an array is just numbers laid out in memory.	learning 01 (safetensors vs GGUF)

The Rust we use. We assume only basic Rust — but a from-scratch engine reaches into corners a typical app never touches (unsafe + raw pointers + extern "C" to call Metal, #[repr(C)] layout, bit-level bf16 decoding, mmap over foreign memory). We explain each inline the first time it appears. If you hit Rust that looks nothing like the Rust Book, that’s expected — the odd Rust is part of what this repo teaches.

Ranked by usefulness for this repo

See it (intuition, a few hours)

3Blue1Brown — Neural Networks series — the best visual intuition for matmul, neural nets, and attention.

Jay Alammar — The Illustrated Transformer — the classic picture of the architecture we’re implementing.

Code it (closest to our method — “working code doesn’t lie”)

Karpathy — Neural Networks: Zero to Hero — Build the GPT Tokenizer primes M0; Build GPT from scratch primes M2/M3.

Karpathy — llama2.c — a single ~970-line C file that runs Llama end-to-end; a gentler full engine than ds4.

(optional, paid) Raschka — Build an LLM (From Scratch) — long-form version of what Karpathy’s free videos cover.

Place it in context (architecture)

Go deeper on inference (free, by noted practitioners)

Self-check

You don’t need to answer these — just feel they’re not total fog:

“An LLM mostly multiplies matrices” doesn’t sound mysterious.

You could write a Rust function taking a &[f32], summing it, returning the result.

You know a model file is “numbers + a table describing them.”

You accept that we’ll explain attention / RoPE / KV-cache / Metal when we get there.

If those feel roughly OK, you’re ready. Next: the abstraction ladder →

Prerequisites

Three kinds of knowledge

Have a feel for these

Brush up as you go

Come curious, be ready to dig in

What “enough” looks like

Ranked by usefulness for this repo

See it (intuition, a few hours)

Code it (closest to our method — “working code doesn’t lie”)

Place it in context (architecture)

Go deeper on inference (free, by noted practitioners)

Self-check