Hey LocalLLaMA! I tested Unsloth for Llama-3 70b and 8b, and we found our open source package allows QLoRA finetuning of Llama-3 8b to be 2x faster than HF + Flash Attention 2 and uses 63% less VRAM. Llama-3 70b is 1.83x faster and ues 68% less VRAM. Inference is natively 2x faster than HF! Free OSS package: https://github.com/unslothai/unsloth

https://preview.redd.it/hx802xv5ahwc1.png?width=863&format=png&auto=webp&s=f64adeb7e7c33e9e15f32ecb43cd86a1d79488cd

Unsloth also supports 3-4x longer context lengths for Llama-3 8b with +1.9% overhead. On a 24GB card (RTX 3090, 4090), you can do 20,600 context lengths whilst FA2 does 5,900 (3.5x longer). Just use use_gradient_checkpointing = “unsloth” which turns on our long context support! Unsloth finetuning also fits on a 8GB card!! (while HF goes out of memory!) Table below for maximum sequence lengths:

https://preview.redd.it/qe6ag5eabhwc1.png?width=770&format=png&auto=webp&s=818ce5dad21f181e6f48c2d8821320ec26421d33

Llama-3 70b can fit 6x longer context lengths!! Llama-3 70b also fits nicely on a 48GB card, while HF+FA2 OOMs or can do short sequence lengths. Unsloth can do 7,600!! 80GB cards can fit 48K context lengths.

https://preview.redd.it/dunl971xbhwc1.png?width=751&format=png&auto=webp&s=88847332eaab375f4855cacd8a261ce20374ceff

Also made 3 notebooks (free GPUs for finetuning) due to requests:

Llama-3 Instruct with Llama-3’s new chat template. No endless generations, fixed untrained tokens, and more! Colab provides free GPUs for 2-3 hours. https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing Native 2x faster inference notebook - I stripped all the finetuning code out, and left only inference - also no endless generations! https://colab.research.google.com/drive/1aqlNQi7MMJbynFDyOQteD2t0yVfjb9Zh?usp=sharing Kaggle provides 30 hours for free per week!! Made a Llama-3 8b notebook as well: https://www.kaggle.com/code/danielhanchen/kaggle-llama-3-8b-unsloth-notebook

More details on our new blog release: https://unsloth.ai/blog/llama3


💬 Discussion r/LocalLLaMA (174 points, 63 commentaires)