>>>> gd2md-html alert: inline image link in generated source and store images to your server.<br>* Footnote support in HTML is alpha: please check your footnotes.<br>----->
Research as a Stochastic Decision Process
In this post I will talk about an approach to research (and other projects that involve high uncertainty) that has substantially improved my productivity. Before implementing this approach, I made little research progress for over a year; afterwards, I completed one project every four months on average. Other changes also contributed, but I expect the ideas here to at least double your productivity if you aren't already employing a similar process.
Below I analyze how to approach a project that has many somewhat independent sources of uncertainty (we can often think of these as multiple "steps" or "parts" that each have some probability of success). Is it best to do these steps from easiest to hardest? From hardest to easiest? From quickest to slowest? We will eventually see that a good principle is to "reduce uncertainty at the fastest possible rate". After revealing issues with more simplistic approaches, I will articulate this principle in detail and show how to apply it. Throughout, I draw my examples primarily from problems in machine learning and mathematics, but I believe that the principles generalize to other situations as well.
Warm-Up
Suppose you are embarking on a project with several parts, all of which must succeed for the project to succeed. For instance, a proof strategy might rely on proving several intermediate results, or an applied project might require achieving high enough speed and accuracy on several components. What is a good strategy for approaching such a project? For me, the most intuitively appealing strategy is something like the following:
(Naive Strategy)
Complete the components in increasing order of difficulty, from easiest to hardest.
This is psychologically tempting: you do what you know how to do first, which can provide a good warm-up to the harder parts of the project. This used to be my default strategy, but often the following happened: I would do all the easy parts, then get to the hard part and encounter a fundamental obstacle that required scrapping the entire plan and coming up with a new one. For instance, I might spend a while wrestling with a certain algorithm to make sure it had the statistical consistency properties I wanted, but then realize that the algorithm was not flexible enough to handle realistic use cases.
The work on the easy parts was mostly wasted--it wasn't that I could replace the hard part with a different hard part; rather, I needed to re-think the entire structure, which included throwing away the "progress" from solving the easy parts.
What might be a better strategy than the naive strategy above? Since the naive strategy has the problem that we waste effort on the easy components if the hard components are intractable, maybe it would be better to complete the components in decreasing order of difficulty, starting from the hardest and moving to the easiest.
This might be better, but our intuitive sense of hardness likely combines many factors--the likelihood that the task fails, the time it takes to complete, and perhaps others as well. Here is an example:
Task A is a detailed and tricky calculation, but you have done many similar calculations before and are confident that given a few days you will succeed. Task B will likely take much less time, but it is something you haven't done before (so it is more likely there will be an unforeseen difficulty or problem).
In this case, task B would be better to do first--if you do task A first and then B turns out doomed, you have wasted several days. Even if A also has some chance of failing (so that it is both more likely to fail and takes longer than B), we would still usually rather do B before A.
This reveals that harder tasks should not necessarily be prioritized. Rather, we should prioritize tasks that are more likely to fail (so that we remove the risk of them failing) but also tasks that take less time (so that we've wasted less time if one of the tasks does fail, and also so that we get information about tasks more quickly).
A Better Strategy: Sorting by Information Rate
We can incorporate both of the above desiderata by sorting the tasks based on which are most informative per unit time.
(Better Strategy)
Do the components in order from most informative per unit time to least informative per unit time.
To implement this, we need a method for quantifying informativeness. I will present two methods below--one based on expected time saved, and one based on failure rate. Rather than define these rigorously upfront, I will work through several examples, which should make the general case evident.
Method 1: Expected Time Saved
If an earlier step fails, we save time by not having to attempt the later steps. We should therefore complete the steps in the order that...