Multi-modal Phi-3-mini is here! Trained by XTuner team with ShareGPT4V and InternVL-SFT data, it outperforms LLaVA-v1.5-7B and matches the performance of LLaVA-Llama-3-8B in multiple benchmarks. For ease of application, LLaVA version, HuggingFace version, and GGUF version weights are provided.

Model:

https://huggingface.co/xtuner/llava-phi-3-mini-hf

https://huggingface.co/xtuner/llava-phi-3-mini-gguf

Code:

https://github.com/InternLM/xtuner

https://preview.redd.it/kwze7ewg2nwc1.jpg?width=1370&format=pjpg&auto=webp&s=a246313916d2a7f11d74810bbe41435a04192b9f

https://preview.redd.it/ur5srs3i2nwc1.jpg?width=1806&format=pjpg&auto=webp&s=7a175e3e391714c39b4a5ffe967282f1a8c4d865

https://preview.redd.it/jk76ea5j2nwc1.jpg?width=2050&format=pjpg&auto=webp&s=6065914aecfc4842e5578cc7650bafa5f358288b


💬 Discussion r/LocalLLaMA (98 points, 24 commentaires)