Heya! I’m VB, GPU poor working on open source at Hugging Face. Recently with Georgi (author of llama.cpp) we made a GGUF-my-repo space to create GGUF quants fast and reliably (https://huggingface.co/spaces/ggml-org/gguf-my-repo).

We created this Space to improve the reliability of GGUF quantization and, most importantly, to democratise the process of creating quants. Anyone can create quants even with low storage or slow networks. All of this at blazingly fast speeds as the converter runs closer to the storage.

All the code is public and fully auditable. We don’t store any tokens or files.

With the latest end-of-generation and improvements merged in llama.cpp (ref: https://github.com/ggerganov/llama.cpp/pull/6745), all Llama 3 checkpoints should work automatically!

We created this space to improve the reliability of GGUF quants on the Hub and have reliability on the process used to create them. Most importantly, we created this to democratise the process of creating quants. 🤗

Note: This is still very much in development, we want to make this useful, so please do share your feedback below!

More updates coming soon, VB


💬 Discussion r/LocalLLaMA (1 points, 0 commentaires)