NVIDIA released an open-source speech-to-speech model PersonaPlex-7B. It listens and talks simultaneously with ~200ms latency, handles interruptions, backchanneling, and natural turn-taking.
They only shipped a PyTorch + CUDA implementation targeting A100/H100, so I ported it to MLX, allowing it to run on Apple Silicon: github.com/mu-hashmi/personaplex-mlx.
Hope you guys enjoy!
💬 Discussion r/LocalLLM (25 points, 1 commentaires)