LLM Benchmark

Flight Combat
LLM Comparison

9 local LLMs tasked with building the same 3D browser flight combat game from a single prompt. Same hardware, same rules — judge for yourself.

9Models tested
ThreeJSEngine
1Prompt to start
LocalInference (MLX)
Design and create flight combat simulator game. The game must feature 3d graphics in any style you choose. A Start Screen that allows the user to select the plane they will use. The user may select from three potential options as follows: A fighter Jet, A Propeller Plane, An option of your choosing. Each Plane must have realistic limitations on its performance, which should also be displayed graphically on the plane selection screen. Once the plane is selected and the game started, there will be a dynamic number of opposing planes the user can engage in a dogfight with. There MUST be visible ammunition traces, as well as functional damage implementation for both enemy and player planes. If the player defeats all enemy planes in a round, the level repeats with increased difficulty. If the player loses, the plane they are in becomes uncontrollable and falls to the ground, returning them to the home screen following a 2 second black screen. You may use any library for this implementation, but it must be contained within a single script, and be able to be opened and played in the chrome browser.
Hardware: Apple M3 Max · 128 GB unified memory Inference server: oMLX Interface: Claude Code (CLI) Context window: 256k tokens (identical across all models) Temperature: 0.8 (identical across all models) Each model received the same single prompt with no additional instructions. Follow-up prompts were used only to fix bugs — the content of those follow-ups is noted in each HTML file as comments at the top of the file. Prompt gratefully borrowed from Bijan Bowen.