50 days building a tiny language model from scratch, what I’ve learned so farLe jardin

Hey folks,

I’m starting a new weekday series on June 23 at 9:00 AM PDT where I’ll spend 50 days coding a two LLM (15–30M parameters) from the ground up: no massive GPU cluster, just a regular laptop or modest GPU.

Each post will cover one topic:

Data collection and subword tokenization Embeddings and positional encodings Attention heads and feed-forward layers Training loops, loss functions, optimizers Evaluation metrics and sample generation Bonus deep dives: MoE, multi-token prediction,etc

Why bother with tiny models?

They run on the CPU. You get daily feedback loops. Building every component yourself cements your understanding.

I’ve already tried:

A 30 M-parameter GPT variant for children’s stories A 15 M-parameter DeepSeek model with Mixture-of-Experts

I’ll drop links to the code in the first comment.

Looking forward to the discussion and to learning together. See you on Day 1.

💬 Discussion r/LocalLLaMA (184 points, 17 commentaires)

Bazaroid

Explorateur

50 days building a tiny language model from scratch, what I’ve learned so far

Vue Graphique