Show HN: Emergence World: World building as a way to evaluate LLMs

deepakakkil2 pts0 comments

Current LLM benchmarks are broken. We think long horizon world building could be an interesting additional way to evaluate LLMs, since it combines many aspects such as need for advanced reasoning, tool calling, working under large context window stress, safety, social and survival pressure from the world. For this we released Emergence World. Our first study ran 5 different parallel world, each powered by OpenAI (GPT-5-Mini), XAI (Grok-4.1), Claude (Sonnet 4.6), Gemini (3-Flash), and a world with mix of models. Early results in the website.

world emergence building evaluate llms quot

Related Articles