The 48-Hour Cancer Binder

Ludovico ComitoProject article The 48-Hour Cancer Binder

A hackathon field note on designing an FGFR2-selective protein binder, from target biology and hotspot selection to BoltzGen generation and off-target scoring.

Author Ludovico Comito

Published May 2026

I am writing this blog post while on my way home from my first protein-design themed hackathon. Three days ago, when I first arrived in Zurich, I had no idea how it would turn out: as an ML person, I had general knowledge of protein design models but had never put it into practice on a real problem. Our team was a mix of CS and biology people, but none of us worked specifically in protein design. We still got to design the best binding protein on a real-world cancer problem. Besides sharing the technical details of the solution, I wanted to write this as a logbook on how to approach protein hackathons, and to make ML people less scared of this kind of domain. Problem and reasoning In these hackathons, the structure is pretty clear: you are given a target protein and some constraints, and the goal is to design a novel protein that binds best to that target while respecting the desired specifications. In our case the target protein was FGFR2, and we had to design an inhibitor binder for it, with the additional specification that our binder should not bind to FGFR1, another very similar protein. Before describing what those proteins are and what we wanted to do, the first lesson for the ML person is this: be prepared to deal with acronyms. The protein world is full of them. FGFR2 is a receptor found on the surface of cells. Its normal role is to receive signals from molecules called fibroblast growth factors, such as FGF1. When FGF1 binds to FGFR2, the receptor becomes active and sends signals inside the cell. These signals can tell the cell to grow, divide, or survive. That is useful in normal biology, but it becomes a problem when FGFR2 is mutated or overactive. In some cancers, FGFR2 signaling is too strong or constantly active, which can help tumor cells keep growing. For this reason, FGFR2 is an interesting therapeutic target: if we can block its activation, we may reduce a cancer-promoting signal. 1DJS, the portion of FGFR2 that binds to FGF.The goal of our designed binder is to interfere with the normal interaction between FGFR2 and FGF1. In other words, the binder should occupy or block the region where the natural ligand would bind, making FGFR2 less likely to become activated. This brings an important challenge: selectivity. FGFR2 is part of a family of very similar receptors. FGFR1, in particular, has a structure very close to FGFR2. If we design a binder that only "likes" FGFR2 in a generic way, it may also bind FGFR1. That would be a problem, because FGFR1 has its own normal roles in the body, and blocking it could cause unwanted side effects. In the structural overlay below, FGFR2 is green and FGFR1 is gray. They share substantial structural similarity, which means the binder choice has to be specific rather than merely sticky. FGFR2 and FGFR1 overlap substantially, making selectivity a central design constraint.Let us now put it on the quantitative side for ML. When designing a binder, we can use a number of affinity measures to estimate how well it binds to the target protein. The chosen metric for this hackathon was iPSAE , a confidence score for protein-protein binding. Our goal was to maximize iPSAE between our designed binder and FGFR2, while minimizing the same score with respect to FGFR1. This is commonly called minimizing the off-target binding. We will talk later about the models and tools we used to generate protein binders and measure iPSAE, but at this stage of the hackathon, after figuring out the main problem, it was time for our biology teammates to shine. A crucial step was figuring out which regions of the target our binder should attach to, and making principled choices grounded in protein biology. Design choices The first priority was to clarify the key design choices: which specific regions of the target the binder should engage, and which ML tools we could use to generate it. With less than 48 hours available, this step is critical. There is not much time to iterate through many trials. Generating batches of targets takes time, and we wanted to generate large batches to maximize the probability of getting at least one good binder. At this point we decided to split based on expertise : the biology people identified the target regions, while the ML people set up the model pipeline and made sure it worked. During this phase, constant dialogue matters. While choosing the biological constraints, you must make sure you have models that can satisfy them. This is also how you learn a lot about the biology side of the problem; most of what I am writing here I learned during the hackathon. Identifying the target When designing our binder, we first had to decide which part of FGFR2 to target. This was important because FGFR2 is...

The 48-Hour Cancer Binder

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast