I can t find a broad general-use benchmark for swarm intelligence comparable to the way LLMs have. Context: We ve been building a swarm intelligence and looking to measure how accurate outcomes are compared to single model results. Suggestions welcome.