Hey folks,
I’m starting a new weekday series on June 23 at 9:00 AM PDT where I’ll spend 50 days coding a two LLM (15–30M parameters) from the ground up: no massive GPU cluster, just a regular laptop or modest GPU.
Each post will cover one topic:
Data collection and subword tokenization Embeddings and positional encodings Attention heads and feed-forward layers Training loops, loss functions, optimizers Evaluation metrics and sample generation Bonus deep dives: MoE, multi-token prediction,etc
Why bother with tiny models?
They run on the CPU. You get daily feedback loops. Building every component yourself cements your understanding.
I’ve already tried:
A 30 M-parameter GPT variant for children’s stories A 15 M-parameter DeepSeek model with Mixture-of-Experts
I’ll drop links to the code in the first comment.
Looking forward to the discussion and to learning together. See you on Day 1.
💬 Discussion r/LocalLLaMA (184 points, 17 commentaires)