Not coding assistants or agents, and not things that obviously would just use the latest frontier model like really hard work.I m talking about the daily AI features users interact with directly: generating content, rewriting things, recommendations, workflow helpers, contextual suggestions, etc.For my app I mostly use Gemini Flash via OpenRouter because the workloads are fairly structured. It works well enough, but there are now dozens of models available and I m not sure how most teams are evaluating them.Are people building proper eval suites? Comparing cost/latency? Testing a handful of models and picking the cheapest one that s good enough? I think most people fall in that last bucket and I want to at the least hear what others landed on. Besides Flash I found using a qwen or minimax model worked fairly well.Curious what you re running in production and how you chose it.