Hacker News new | past | comments | ask | show | jobs | submit login

"Science is facing a "reproducibility crisis" where more than two-thirds of researchers have tried and failed to reproduce another scientist's experiments, research suggests. "

Ironically, it says this as a bad thing, but in an ideal world this would be 100%.

It would be like saying "2/3rds of coders have reviewed their colleague's code and found bugs". Since bugs are basically unavoidable, the fact that 1/3 haven't found any points more in the direction that they're not looking hard enough.

edit: pretty much everyone seems to have taken this the opposite way to how I intended it, but re-reading I can't figure out why that is the case. I'll try to re-phrase:

Science cannot be perfect every time. It's just too complex. This is why you need thorough peer review including reproduction. But if that peer review/reproduction is thorough, then it's going to find problems. When the system is working well, basically everyone will have at some point found a problem in something they are reproducing. This is good because that problem can then be fixed and it will become reproducible or be withdrawn. The current situation is that people don't even look for the problems and no-one can trust results.

edited again to change "peer-reviewed" -> "peer-reviewed including reproduction"




It is a terrible thing, and it is absolutely nothing like finding bugs.

Reproducibility is a core requirement of good science, and if we need to compare it to software engineering, the reproducibility crisis is like the adage "many eyes make all bugs shallow", when the assumption that there is many eyes even looking is often untrue. Most studies are never reproduced, but are held as true under the belief that if someone tried they could.

EDIT: You claimed that in an ideal world, 100% of experiments/studies would not be reproducible. This denotes a profound misunderstanding of the scientific process, or the whole basis of reproducibility. In an idea world, 100% of studies would be vetted through reproduction, and 100% of them would be reproducible. This is essentially the fundamental assumption of the scientific process.


No, I claimed that all scientist would have had the experience of not reproducing something. Because if they do it a lot, as part of a regular process then they will eventually find something that doesn't work because the original scientist didn't document a step correctly or misread the results or just got lucky due to random chance.

Just like all developers will eventually find a bug in code they code review. This is different from all code they review having bugs.


While the wording may be vague, they aren't talking about the experiences of a subset of researchers -- they are saying that of the experiments they tried to replicate, 2/3rds weren't reproduceable. That is terrible, and has absolutely nothing to do with finding bugs.


Science publishing is based in a review by peers system. All evident bugs (and many non so evident bugs) should be catched before to appear in a journal. Is totally different to standard journalism.


Replication happens after publication. Why is everyone misunderstanding ZeroGravitas' point?


I agree that there are no test suites to provide experiments to run and no make test to repeat an experiment with little effort. This accounts for the "too complex" thing.

However many experiments should be reproducible. Not making results testable is against the goal of sharing knowledge. But I understand that's an extra effort compared the current state of the art, and that must be rewarded and acknowledged. In another comment I proposed to include reproducibility in the h-index.


The whole point of an experiment is to isolate a single variable so you can test a falsifiable statement about it.

It's the exact opposite of building a system, which is what coders do.


I don't think that analogy is at all accurate and I think the conclusion that you reach from it is completely incorrect.


I think I must have expressed myself poorly, as I think my conclusion is the same as the article suggests i.e. that the science/code shouldn't be considered "done", until it's been peer-reviewed, since it's easy to fool yourself and others if you're not actually reviewing and testing your code.

What did you take away from my comment?


Peer review does not imply reproducibility, and it's the latter that is the problem.

I can confirm, as a reviewer, that your methodology and analysis looks sensible, but the flaws may be deeper, and the fact that you didn't publish the 19 other studies that failed, but that this is the "lucky one", or that you simply cherry picked the data, is not something I can see as a reviewer.

This is especially true if the experiment is nontrivial to re-do.


" Peer review does not imply reproducibility, and it's the latter that is the problem."

I think this is the key to it, I'm suggesting that reproducibility should be part of considering something peer-reviewed, but of course as currently practised, that isn't true.

Of course in a software metaphor, that would probably cover both code review and QA, which is sometimes done by a different job role which further muddies the water.


This would be like expecting a car brand to open its code to its competitors before to release a new product. Will lead typically also to the peers rushing to publish the same discovering on disguise before the original.


After reading your explanations elsewhere, I take back my statements. Your statements are literally correct, although easy to take incorrectly, and I support them: in particular you seem to be supporting that 100% of researchers should attempt to replicate studies and do so sufficiently that all of them will eventually fail to replicate at least some studies. I think most of us took this to mean that you thought every study should fail to replicate (by analogy that every software has at least some bugs), but I see now your intent and that your original wording backed that up.


If there are errors in a study's methods that make it unreplicable, then it shouldn't have passed peer review or been published.


Then you risk to lose all einstein work unless you have two einsteins at the same time in fact. Genious are scarce. And you could not publish nothing about comets for example, because this would be unreplicable until the next 20 years. Is not so simple as that.


How do you know it's unreplicable until someone tries to replicate it?


I work with principal investigators phd/mds at upenn automating some of their data analysis pipelines.

They have all have secret checklists for bs detection in papers they read. Certain labs set off red flags to them or certain techniques being too fuzzy or easy to mess up.

Every one seems to have their own heuristics, and no one seems to take any article at face value anymore.

I hear PI's say stuff is unreplicable all the time.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: