Show HN: Emergence World: World building as a way to evaluate LLMs

deepakakkil2 pts0 comments

Current LLM benchmarks are broken. We think long horizon world building could be an interesting additional way to evaluate LLMs, since it combines many aspects such as need for advanced reasoning, tool calling, working under large context window stress, safety, social and survival pressure from the world. For this we released Emergence World. Our first study ran 5 different parallel world, each powered by OpenAI (GPT-5-Mini), XAI (Grok-4.1), Claude (Sonnet 4.6), Gemini (3-Flash), and a world with mix of models. Early results in the website.

world emergence building evaluate llms quot

Show HN: Emergence World: World building as a way to evaluate LLMs

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast