If you can’t run kimi-k2 locally, there are now more providers offering API access. DeepInfra is now the cheapest provider, while Groq is (by far) the fastest at around ~250 tokens per second:

https://deepinfra.com/moonshotai/Kimi-K2-Instruct (2.20 in/out Mtoken) https://console.groq.com/docs/model/moonshotai/kimi-k2-instruct (3 in/out Mtoken, but very fast)

That makes it cheaper than Claude Haiku 3.5, GPT-4.1 and Gemini 2.5 Pro. Not bad for the best non-thinking model currently publicly available!

It also shows the power of an open weights model with an permissive license: Even if you can’t run it yourself, there’s a lot more options in API access.

See all providers on OpenRouter: https://openrouter.ai/moonshotai/kimi-k2

Edit: There’s also a free variant, but I don’t know the details: https://openrouter.ai/moonshotai/kimi-k2:free


💬 Discussion r/LocalLLaMA (44 points, 13 commentaires) 🔗 Source