Before you dive in

Prerequisites

This is a learning repo, so the bar is deliberately low — but not zero. Here's the honest floor, what's merely helpful, and what we plan to demonstrate.

The single most liberating fact. Inference is the forward pass only. We’re running a model someone else already trained — so no training, no backpropagation, no gradients, no calculus. Inference is: look up some vectors, multiply matrices, normalize, pick the most likely next token, repeat. That’s the whole game.

How to read the tiers

Three kinds of knowledge

🟢 THE FLOOR

Have a feel for these

Have some feel before starting. If one is shaky, spend an hour with a resource below — you don’t need mastery.

🟡 HELPFUL

Brush up as you go

Nice to have seen once. You can pick these up while following along, milestone by milestone.

🔵 WHAT WE’LL DEMONSTRATE

Come curious, be ready to dig in

Attention, RoPE, RMSNorm, GQA, SwiGLU, the KV cache, quantization, Metal kernels, BPE internals. Each is a milestone with its own doc.

🟢 The floor

What “enough” looks like

Thing“Enough” looks likeBrush up with
Vectors & matmul A matmul is rows-dot-columns, and shapes must line up ([m×k]·[k×n]=[m×n]). ~80% of what an LLM does. 3Blue1Brown, Essence of Linear Algebra
A forward pass Inputs → weighted sums → a nonlinearity → outputs, stacked in layers. You don’t need to know how it’s trained. 3Blue1Brown, Neural Networks ch. 1–2
Basic Rust struct/enum, Vec, slices, Option/Result, ownership, match. Not: async, macros, lifetime gymnastics. The Rust Book (ch. 1–10), rustlings
Command line + git clone, branch, commit; run a binary; navigate folders. (you’re already here)
Bytes & number types Roughly what f32/f16/bf16/int8 are; that an array is just numbers laid out in memory. learning 01 (safetensors vs GGUF)
The Rust we use. We assume only basic Rust — but a from-scratch engine reaches into corners a typical app never touches (unsafe + raw pointers + extern "C" to call Metal, #[repr(C)] layout, bit-level bf16 decoding, mmap over foreign memory). We explain each inline the first time it appears. If you hit Rust that looks nothing like the Rust Book, that’s expected — the odd Rust is part of what this repo teaches.

Brush-up resources

Ranked by usefulness for this repo

See it (intuition, a few hours)

Code it (closest to our method — “working code doesn’t lie”)

Place it in context (architecture)

Go deeper on inference (free, by noted practitioners)

You’re ready when…

Self-check

You don’t need to answer these — just feel they’re not total fog:

If those feel roughly OK, you’re ready. Next: the abstraction ladder →