SOTA Spatial Reasoning in 2025Le jardin

The ability to accurately estimate distances from RGB image input is just at the 𝗳𝗿𝗼𝗻𝘁𝗶𝗲𝗿 𝗼𝗳 𝗰𝘂𝗿𝗿𝗲𝗻𝘁 𝗔𝗜 𝗺𝗼𝗱𝗲𝗹 𝗰𝗮𝗽𝗮𝗯𝗶𝗹𝗶𝘁𝗶𝗲𝘀.

Nonetheless, distance estimation is a 𝗰𝗿𝗶𝘁𝗶𝗰𝗮𝗹 𝗳𝗼𝗿 𝗽𝗲𝗿𝗰𝗲𝗽𝘁𝗶𝗼𝗻 𝗮𝗻𝗱 𝗽𝗹𝗮𝗻𝗻𝗶𝗻𝗴 𝗶𝗻 𝗲𝗺𝗯𝗼𝗱𝗶𝗲𝗱 𝗔𝗜 𝗮𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻𝘀 𝗹𝗶𝗸𝗲 𝗿𝗼𝗯𝗼𝘁𝗶𝗰𝘀 which must navigate around our 3D world.

Making a 𝗼𝗽𝗲𝗻-𝘄𝗲𝗶𝗴𝗵𝘁 model 𝘀𝗺𝗮𝗹𝗹 and 𝗳𝗮𝘀𝘁 enough to run 𝗼𝗻-𝗱𝗲𝘃𝗶𝗰𝗲, using 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲 𝗰𝗼𝗱𝗲 and 𝗱𝗮𝘁𝗮, we aim to democratize embodied AI.

I’ve updated the comparison among closed APIs with SOTA performance in quantitative spatial reasoning tasks like distance/size estimation from RGB inputs and our 3B open-weight model: SpaceThinker

The performance for the the 3B SpaceThinker lies between gpt-4o and gemini-2.5-pro in estimating distances using the QSpatial++ split of Q-Spatial-Bench.

Evaluation Results: https://huggingface.co/remyxai/SpaceThinker-Qwen2.5VL-3B#qspatial-comparison-table-42525

Interesting finding: By switching model name in this colab, using the non-reasoning variant SpaceQwen, you’ll find using the step-by-step reasoning prompt actually hurts performance, challenging the convention that reasoning models don’t benefit from complex instructions the way non-reasoning models do.

Modifying the above colab, you can also compare SpaceThinker to it’s base model to assess the performance impact due to SFT by LoRA using the SpaceThinker dataset: https://huggingface.co/datasets/remyxai/SpaceThinker

💬 Discussion r/LocalLLaMA (35 points, 8 commentaires) 🔗 Source

Bazaroid

Explorateur

SOTA Spatial Reasoning in 2025

Vue Graphique