The Hardware Lottery
PDF Version
Other Writing
The Hardware Lottery
Sara Hooker -- August, 2020
Introduction
Hardware, systems and algorithms research communities have<br>historically had different incentive structures and fluctuating<br>motivation to engage with each other explicitly. This historical<br>treatment is odd given that hardware and software have frequently<br>determined which research ideas succeed (and fail).
This essay introduces the term hardware lottery to describe when a<br>research idea wins because it is suited to the available software and<br>hardware and not because the idea is universally superior to<br>alternative research directions. History tells us that hardware<br>lotteries can obfuscate research progress by casting successful ideas<br>as failures and can delay signaling that some research directions are<br>far more promising than others.
These lessons are particularly salient as we move into a new era of<br>closer collaboration between hardware, software and machine learning<br>research communities. After decades of treating hardware, software and<br>algorithms as separate choices, the catalysts for closer collaboration<br>include changing hardware economics<br>, a “bigger is better” race in<br>the size of deep learning architectures<br>and the dizzying requirements of deploying machine learning to edge<br>devices.
Closer collaboration has centered on a wave of new generation hardware<br>that is "domain specific" to optimize for commercial use cases of deep<br>neural networks. While domain specialization creates important<br>efficiency gains, it arguably makes it more even more costly to stray<br>off of the beaten path of research ideas. While deep neural networks<br>have clear commercial use cases, there are early warning signs that<br>the path to true artificial intelligence may require an entirely<br>different combination of algorithm, hardware and software.
This essay begins by acknowledging a crucial paradox: machine learning<br>researchers mostly ignore hardware despite the role it plays in<br>determining what ideas succeed. What has incentivized the development<br>of software, hardware and algorithms in isolation? What follows is<br>part position paper, part historical review that attempts to answer<br>the question, "How does tooling choose which research ideas succeed<br>and fail, and what does the future hold?"
Separate Tribes
It is not a bad description of man to describe him as a tool<br>making animal.
— Charles Babbage, 1851
For the creators of the first computers the program was the machine.<br>Early machines were single use and were not expected to be<br>re-purposed for a new task because of both the cost of the<br>electronics and a lack of cross-purpose software. Charles Babbage’s<br>difference machine was intended solely to compute polynomial<br>functions (1817). Mark I was a<br>programmable calculator (1944).<br>Rosenblatt’s perceptron machine computed a step-wise single layer<br>network (1958). Even the<br>Jacquard loom, which is often thought of as one of the first<br>programmable machines, in practice was so expensive to re-thread<br>that it was typically threaded once to support a pre-fixed set of<br>input fields (1804) .
Early computers such as the Mark I were single use and were not<br>expected to be repurposed. While Mark I could be programed to<br>compute different calculations, it was essentially a very<br>powerful reprogramable calculator and could not run the variety<br>of programs that we expect of our modern day machines.
The specialization of these early computers was out of necessity and<br>not because computer architects thought one-off customized hardware<br>was intrinsically better. However, it is worth pointing out that our<br>own intelligence is both algorithm and machine. We do not inhabit<br>multiple brains over the course of our lifetime. Instead, the notion<br>of human intelligence is intrinsically associated with the physical<br>1400g of brain tissue and the patterns of connectivity between an<br>estimated 85 billion neurons in your head
When we talk about human intelligence, the prototypical image<br>that probably surfaces as you read this is of a pink ridged<br>cartoon blob. It is impossible to think of our cognitive<br>intelligence without summoning up an image of the hardware it<br>runs on.
Today, in contrast to the necessary specialization in the very early<br>days of computing, machine learning researchers tend to think of<br>hardware, software and algorithm as three separate choices. This is<br>largely due to a period in computer science history that radically<br>changed the type of hardware that was made and incentivized<br>hardware, software and machine learning research communities to<br>evolve in isolation.
The general purpose computer era crystalized in 1969, when opinion<br>piece by a young engineer called Gordan Moore appeared in<br>Electronics magazine with the apt title “Cramming more components<br>onto circuit boards” . Moore<br>predicted you could cram double the amount of transistors on an<br>integrated circuit every two years. Originally, the article and<br>subsequent follow-up was motivated by a simple...