I open-sourced my UFC prediction model, code, and database after 5 years of work

DanMcInerney1 pts0 comments

I Open-Sourced My UFC Prediction Model, Weights, and Database - Dan McInerney

Five years, 15,000+ hours, an 8% ROI since 2024, and a lot of machine-learning mistakes preserved in amber.

Published June 5, 2026.

Repo: DanMcInerney/mma-ai.

Database + model: huggingface.co/datasets/DanMcInerney/mma-ai.

11/12 positive ROI events in 2026 baaaaaby!!

Intro

Back in 2011 I read *The Singularity Is Near* by Ray Kurzweil. I found the evidence extremely compelling and knew my future lay in AI, but I was just a broke college kid finishing a useless psychology degree. I had been training BJJ and Muay Thai for a few years, took and won my first fight, then immediately locked myself in my room and designed a five-year plan to learn hacking, Python, Linux administration, and networking at the same time. Learn something, automate it with Python, release it open source, and do it again.

Fast forward to 2020. I was a senior security researcher and bored of the job. I had automated the majority of my job away, so I locked myself in my room again for a few years and built the mma-ai.net model. At that point, there was only one other modeler of UFC: wolftickets.ai. I cold emailed him for advice and ended up becoming coworkers with him at an AI security startup that was later acquired by Palo Alto Networks.

And here we are. Five years and 15,000+ hours into MMA-AI's database and model. It is officially time to open source it.

I'm fairly certain this is the first time a Vegas-beating machine-learning model has been completely open sourced. Since 2024 it has posted about 8% ROI, which puts it into the same category as the world's best UFC sharps. Except I never had to do tape study or any of that boring stuff. I just hit enter on my keyboard like a nerd.

I should note this is a complete release: code, model weights, and database. I'm fairly certain the database is the largest in the world of UFC stats, and it includes incredibly granular historical odds. Not just closing odds, but odds scraped by the hour starting at open. Truly a treasure trove of information. Probably about 60-70% coverage of all fights in the last 10 years.

The codebase and database are an utter terror, by the way. It is five years of stream-of-consciousness programming, and it is so big and bloated that I'm scared to refactor it in case I introduce one of the many tiny bugs that blows the model up, as I have done in the past. Forgive the code. It works, and that is what's important.

I think what I'm most proud of is the thousands of hours of collaboration, teaching, and learning I've done over these five years with other people interested in this incredibly niche hobby of machine learning for sports prediction. Huge shoutout to the OG in the space, wolftickets.ai. He basically taught me everything I know, and I've done my best to pass that knowledge on to dozens and dozens of people who reached out to me for help over the years.

It's funny. We all make the same mistakes early on. Wolftickets would tell me something about machine learning and I'd be like, no, that can't be right, I can do it better. Then two years later I'd be doing exactly what he told me to do.

MMA-AI

I packaged the model into a Docker container with a local interface. The database and data are on Hugging Face, and the Docker container pulls them down. Go to the Predict tab and hit predict to see the next event, or whatever event you choose.

Up until now I have been doing this manually by running Python scripts, so I had Codex whip this container and web app up so nontechnical folks can use it. If something is broken or not working, ask Claude Code or Codex to fix it and submit a PR. I'm not interested in turning this into some grand feature-complete package. It is just a simple way to use the model without any technical knowledge.

I added a small feature that lets you include an LLM API key so you can quickly query the database for data analytics too, but I'd suggest using Claude Code or Codex for more detailed analysis. I also included strong AGENTS.md and README.md files so they can understand the disorganized mess that is the code and database.

Funny Mistakes

1. Model Ensembles

I started with a single XGBoost model. Wolftickets told me he uses a library called AutoGluon to create an array of models and that this is generally best practice. So I ignored that, thinking I could tune this thing myself better than some group of ivory tower academics trying to put training wheels on real engineering.

Two years later, I switched to AutoGluon. Turns out I'm not a better machine-learning engineer than a funded team of experts.

2. Hyperparameter Optimization

Wolftickets told me hyperparameter optimization was mostly a waste of time. I thought, no! That can't be true. All these tutorials on Medium talk about HPO as a basic skill. All these tutorials are high on Google search for "machine learning tutorials." They can't be dumb and terrible!

Incorrect. They were all dumb and...

model years database learning open code

Related Articles