Producing Wrong Data Without Doing Anything Obviously Wrong


I keep sharing a few interesting readings/articles I really enjoyed. The first one is an old one but a must-read for anyone measuring performance for comparison purposes as it reports really unexpected measurement bias.

Producing Wrong Data Without Doing Anything Obviously Wrong! by Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, Peter F. Sweeney, ASPLOS 2009. DOI:

The second one is a more recent article that was brought to my attention by Lucas Nussbaum and Tom Cornebize and which reports other kind of biases (related to clusters which are way less uniform as originally claimed/expected, to SSD disks whose lazy management induces some memory effect that can persist many weeks later through multiple reboots, …).
Other interesting biases are reported in Taming Performance Variability by Aleksander Maricq, Dmitry Duplyakin, Ivo Jimenez, Carlos Maltzahn, Ryan Stutsman and Robert Ricci

Although I do not agree with all their statistical prescriptions, I fully agree with the importance of randomization, of thorough replication (stay away from pseudo-replication), of designed experiments, and of empirical knowledge…

If others have nice articles (possibly in other domains!) to share, please do so… We’ll make a “best of” at some point! :slight_smile:



Frédéric Wagner, a colleague from Grenoble, recently pointed me to this video:

The talk is really nice and pedagogical. The first part mentions measurement issues from the paper by Mytkovicz et al. and presents a way to circumvent such issue by regularly shuffling/randomizing address space at runtime, which allows then to use simple statistics. I like this as see this is a form of (controled) measurement apparatus.

The second part is not really related to reproducibility issues but those who like multi-threaded program measurement issues should enjoy it.



1 Like


On a similar topic, there is a performance analysis summary of methods and anti-methods by Brendan Gregg, which I found interesting:

I think some of these methods can be very useful for people starting to do low-level system/application performance analysis. This blog page also contains links to several in-depth methodology explanations.