Seeing a pattern where teams fix a failure in an agent, change the prompt or model a week later, and the same failure quietly comes back. Nobody catches it until a user does. Curious how people are handling this today. Manual test cases? Evals? Logs? Nothing? Not trying to pitch anything. Just trying to understand how widespread this is and what current approaches look like.