Built a business sim where AI agents run a food truck for 30 days — location, menu, pricing, staff, inventory. Same scenario for all models.

Opus made 28K. 8 went bankrupt. Every model that took a loan went bankrupt (8/8).

There’s also a playable mode — same simulation, same 34 tools, same leaderboard. You either survive 30 days or go bankrupt, get a result card and land on the shared leaderboard. Example result: https://foodtruckbench.com/r/9E6925

Benchmark + leaderboard: https://foodtruckbench.com

Play: https://foodtruckbench.com/play

Gemini 3 Flash Thinking — only model out of 20+ tested that gets stuck in an infinite decision loop, 100% of runs: https://foodtruckbench.com/blog/gemini-flash

Happy to answer questions about the sim or results.


💬 Discussion r/LocalLLaMA (244 points, 105 commentaires) 🔗 Source