Show HN: CriteriaBot – A Universal Customizable Classifier

RoyalTnetennba1 pts0 comments

I needed a classifier for nuanced, subjective buckets that fell outside of typical ML use-cases (e.g., is this a spoiler? , is this factually correct? , is this user being mean? ). I ended up really happy with the architecture I built to solve it, so I rolled it out as a standalone API and service called CriteriaBot.WHAT IT DOES:You give it content and plain-English criteria. It gives you a true/false verdict on whether the content meets those criteria.HOW IT WORKS:In addition to a traditional classifier, the classification request is routed through a pool of small, open-weight LLMs to achieve a consensus verdict.I built a pre-vote factorization machine that selects a sub-pool of LLMs optimized for signal strength based on the embedding of the subject/category. A second factorization machine then reads the votes and the embedding to arrive at a single verdict. That verdict is dynamically modified based on the user s history of agreement/disagreement with the models in semantically similar evaluations.The models are also hooked up to Wikipedia and Wolfram to support edge cases requiring current information or mathematical grounding.FINDINGS:* With the same harness and sample set, Gemma 4 26B s accuracy is only ~1 percentage point below Opus 4.8. * Pure oracle is theoretically very good—currently ~98% accuracy for the datasets. I m using the second factorization machine as a combiner as it can theoretically push past oracle results, but it s an interesting fallback. * The single most useful LLM surprised me - LFM2 24B contributes the most to the consensus, despite being the worst individually (of the current pool of LLMs). It correlates the least with the other models (perhaps due to its unique architecture?) which makes it a useful signal for some of the problems. * The legal obligations of handling user-submitted images are... involved. I ve disabled image support for non-me users while I sort that out (in case you were hoping to try out Hotdog, Not Hotdog ). * Rails singularizes criteria as criterium and I didn t realize that was incorrect until it was kind of a lot of work to fix.WHY I M POSTING: I’d been dealing with burnout for a while, and getting this running has been incredibly rewarding. The majority of people in my personal life are non-technical so it s been hard to get reactions to it beyond what is it? .Would be thrilled with whatever honest feedback you have.

quot verdict classifier user criteria pool

Related Articles