I thought I was going crazy, trying to use Gemini 3.5 Flash to rate some answers, but it kept giving 7 instead of 10 for correct answers.Apparently once you add a Grading criteria text, the model collapses into a compressed toward the center of the scale hallucination (or training set overfitting).Someone on X asked me to try to reproduce it, and I could actually got it on the first try on their Gemini Chat: https://x.com/XCSme/status/2057613611959279988I am not sure what to make of this (or most SOTA) models. They got a lot smarter with coding and tool usage, but a lot dumber in other ways...