Are Determinacy-Race Bugs Lurking in YOUR Multicore Application? - Intel® Software Network
26 captures<br>03 Nov 2009 - 29 Dec 2025
Oct<br>NOV<br>Dec
03
2008<br>2009<br>2010
success
fail
About this capture
COLLECTED BY
Organization: Alexa Crawls
Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.
Collection: alexa_web_2009
this data is currently not publicly accessible.
TIMESTAMPS
The Wayback Machine - https://web.archive.org/web/20091103173033/http://software.intel.com:80/en-us/articles/are-determinacy-race-bugs-lurking-in-your-multicore-application/
Javascript is disabled on your browser. In order to use this platform effeciently, please enable javascript from your browser settings or contact your system administrator.
Work
Play
Support
About Intel
Change Location
Search
Intel® Software Network
Connect with developers and Intel engineers
CommunitiesAcademic<br>Cluster Ready<br>Manageability<br>Mobility<br>Parallel Programming and Multi-Core<br>Open Source<br>Virtualization<br>Visual Computing<br>More...
Downloads<br>ToolsIntel® Parallel Studio<br>HPC Tools<br>SOA Products<br>Buy or Renew<br>Free Evaluation Software<br>Free Non-Commercial Download<br>Reseller Center<br>Academic Program<br>Platform Administration Products<br>Content Management Products<br>Tools Knowledge Base
Forums/BlogsForums<br>Blogs<br>Blog Categories<br>Meet The Bloggers
ResourcesEvents Calendar<br>Intel Press Technical Books<br>Intel Software Insight Magazine<br>Intel Visual Adrenaline Magazine<br>Intel Software Partner Program<br>Knowledge Base<br>Take Five Videos<br>Training<br>What If Software
Software Support
Home › Articles
Are Determinacy-Race Bugs Lurking in YOUR Multicore Application?<br>Submit New Article
Last Modified On :<br>October 27, 2009 6:58 PM PDT
Rate
by Charles Leiserson
Race conditions are the bane of concurrency. Famous race bugs include the Therac-25 radiation therapy machine, which killed three people and injured several others, and the North American Blackout of 2003, which left over 50 million people without power. These pernicious bugs are notoriously hard to find. You can run regression tests in the lab for days without a failure only to discover that your software crashes in the field with regularity. If you're going to multicore-enable your application, you need a reliable way to find and eliminate race conditions.
Different types of race conditions exist depending on the synchronization methodology (e.g., locking, condition variables, etc.) used to coordinate parallelism in the application. Perhaps the most basic of race conditions, and the easiest to understand, is the “determinacy race,” because this kind of race doesn’t involve a synchronization methodology at all. A program is deterministic if it always does the same thing on the same input, no matter how the instructions are scheduled on the multicore computer, and it’s nondeterministic if its behavior might vary from run to run. Often, a parallel program that is intended to be deterministic isn’t, because it contains a determinacy race.
In the following examples, we’ll assume that the underlying hardware supports the sequential consistency memory model, where the parallel program execution can be viewed as an interleaving of the steps of the processes, threads, strands, or whatever the abstraction for independent locus of control in the parallel-programming model.
A simple example
Let’s look at an example of a determinacy-race bug, and then we’ll define determinacy races more precisely. The following Cilk++ code illustrates a determinacy race on a shared variable x:
void incr (int *counter) {<br>++(*counter);
void main() {
int x(0);
cilk_spawn incr (&x);
incr (&x);
cilk_sync;
assert (x == 2);
The cilk_spawn keyword calls incr() but allows control to continue to the following statement. The cilk_sync keyword says control shouldn’t go past this point until the spawned subroutine has completed. In an ordinary serial execution (equivalent to executing the code with the cilk_spawn and cilk_sync keywords nulled out), the result is that the value of x is increased to 2. This parallel code has a bug, however, and in a parallel execution, it might sometimes produce 1 for the value of x. To understand why, it’s helpful to have a clearer model of parallel-program execution.
A model for parallel-program execution
We can view the program execution as being broken into four “strands.” A strand is a sequence of instructions that doesn’t contain any parallel control, such as cilk_spawn or cilk_sync. Strand A begins with the start of the program and ends at the cilk_spawn statement. Two subsequent strands, B and C, are created at the cilk_spawn statement: B executes the spawned subroutine incr(), and C executes the called subroutine incr() on the next line. These two strands join at the cilk_sync statement, where Strand D begins. Strand D consists of the instructions from the cilk_sync to...