Data-Oriented Design

tosh1 pts0 comments

Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP) – Games from Within

Picture this: Toward the end of the development cycle, your game crawls, but you don’t see any obvious hotspots in the profiler. The culprit? Random memory access patterns and constant cache misses. In an attempt to improve performance, you try to parallelize parts of the code, but it takes heroic efforts, and, in the end, you barely get much of a speed-up due to all the synchronization you had to add. To top it off, the code is so complex that fixing bugs creates more problems, and the thought of adding new features is discarded right away. Sound familiar?

That scenario pretty accurately describes almost every game I’ve been involved with for the last 10 years. The reasons aren’t the programming languages we’re using, nor the development tools, nor even a lack of discipline. In my experience, it’s object- oriented programming (OOP) and the culture that surrounds it that is in large part to blame for those problems. OOP could be hindering your project rather than helping it!

It’s All About Data

OOP is so ingrained in the current game development culture that it’s hard to think beyond objects when thinking about a game. After all, we’ve been creating classes representing vehicles, players, and state machines for many years. What are the alternatives? Procedural programming? Functional languages? Exotic programming languages?

Data-oriented design is a different way to approach program design that addresses all these problems. Procedural programming focuses on procedure calls as its main element, and OOP deals primarily with objects. Notice that the main focus of both approaches is code: plain procedures (or functions) in one case, and grouped code associated with some internal state in the other. Data-oriented design shifts the perspective of programming from objects to the data itself: The type of the data, how it is laid out in memory, and how it will be read and processed in the game.

Programming, by definition, is about transforming data: It’s the act of creating a sequence of machine instructions describing how to process the input data and create some specific output data. A game is nothing more than a program that works at interactive rates, so wouldn’t it make sense for us to concentrate primarily on that data instead of on the code that manipulates it?

I’d like to clear up potential confusion and stress that data-oriented design does not imply that something is data- driven. A data-driven game is usually a game that exposes a large amount of functionality outside of code and lets the data determine the behavior of the game. That is an orthogonal concept to data-oriented design, and can be used with any type of programming approach.

Ideal Data

If we look at a program from the data point of view, what does the ideal data look like? It depends on the data and how it’s used. In general, the ideal data is in a format that we can use with the least amount of effort. In the best case, the format will be the same we expect as an output, so the processing is limited to just copying that data. Very often, our ideal data layout will be large blocks of contiguous, homogeneous data that we can process sequentially. In any case, the goal is to minimize the amount of transformations, and whenever possible, you should bake your data into this ideal format offline, during your asset-building process.

Because data-oriented design puts data first and foremost, we can architect our whole program around the ideal data format. We won’t always be able to make it exactly ideal (the same way that code is hardly ever by-the-book OOP), but it’s the primary goal to keep in mind. Once we achieve that, most of the problems I mentioned at the beginning of the column tend to melt away (more about that in the next section).

When we think about objects, we immediately think of trees— inheritance trees, containment trees, or message-passing trees, and our data is naturally arranged that way. As a result, when we perform an operation on an object, it will usually result in that object in turn accessing other objects further down in the tree. Iterating over a set of objects performing the same operation generates cascading, totally different operations at each object (see Figure 1a).

To achieve the best possible data layout, it’s helpful to break down each object into the different components, and group components of the same type together in memory, regardless of what object they came from. This organization results in large blocks of homogeneous data, which allow us to process the data sequentially (see Figure 1b). A key reason why data-oriented design is so powerful is because it works very well on large groups of objects. OOP, by definition, works on a single object. Step back for a minute and think of the last game you worked on: How many places in the code did you have only one of something? One enemy? One vehicle?...

data game oriented design code programming

Related Articles