Glider from the game of Life, rising from the left




Topic: #reproducibility

Inadequate Record-Keeping in Machine-Learning Research


It appears that many researchers in machine learning, including some who profess to be scientists, are not keeping proper records of their experiments. Even with the assistance of version-control systems, they often fail to write down which versions of code libraries they are using, where their data sets come from and what they contain, how they massaged and cleaned their data sets, and what tweaks they made to their algorithms and to the configuration and initialization of their networks.

They redesign their experiments on the fly, interrupt and restart them, cherry-pick results from various runs, and reuse partially trained neural networks as starting points for subsequent experiments without properly documenting the process.

As a result, machine learning as a discipline is now facing a devastating crisis: researchers cannot reproduce one another's experiments, or even their own, and so cannot confirm their results.

“The Machine Learning Reproducibility Crisis”
Pete Warden, Pete Warden's Blog, March 19, 2018

In many real-world cases, the research won't have made notes or remember exactly what she did, so even she won't be able to reproduce the model. Even if she can, the frameworks the model code depend[s] on can change over time, sometimes radically, so she'd need to also snapshot the whole system she was using to ensure that things work. I've found ML researchers to be incredibly generous with their time when I've contacted them for help reproducing model results, but it's often [a] months-long task even with assistance from the original author.

#machine-learning #reproducibility #scientific-method

Hashtag index

This work is licensed under a Creative Commons Attribution-ShareAlike License.

Atom feed

John David Stone (

created June 1, 2014 · last revised December 10, 2018