Hello,
I keep sharing a few interesting readings/articles I really enjoyed. The first one is an old one but a must-read for anyone measuring performance for comparison purposes as it reports really unexpected measurement bias.
Producing Wrong Data Without Doing Anything Obviously Wrong! by Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, Peter F. Sweeney, ASPLOS 2009. DOI: http://doi.acm.org/10.1145/1508244.1508275
The second one is a more recent article that was brought to my attention by Lucas Nussbaum and Tom Cornebize and which reports other kind of biases (related to clusters which are way less uniform as originally claimed/expected, to SSD disks whose lazy management induces some memory effect that can persist many weeks later through multiple reboots, …).
Other interesting biases are reported in Taming Performance Variability by Aleksander Maricq, Dmitry Duplyakin, Ivo Jimenez, Carlos Maltzahn, Ryan Stutsman and Robert Ricci
Although I do not agree with all their statistical prescriptions, I fully agree with the importance of randomization, of thorough replication (stay away from pseudo-replication), of designed experiments, and of empirical knowledge…
If others have nice articles (possibly in other domains!) to share, please do so… We’ll make a “best of” at some point!