Hypothesis-free big science: is it good for you?
Sugar-free food is often marketed as being healthier. What about hypothesis-free research? “Consorting with big science” is the title of a 2014 editorial in Nature Neuroscience that praises the advantages of large-scale research collaborations in the form of consortiums.
It continues the ongoing debate about how to best take advantage of limited research funding in the midst of huge challenges in neuroscience. The usual arguments are proposed, such as the advantage of pooling funds and the inefficiency of smaller, competing laboratories that duplicate the same work instead of complimenting each other. The editorial highlights a concept that is important in viewing large-scale collaborations: hypothesis-free.
Technically, the editorial uses the term “hypothesis-free data”, which it further characterizes as “unbiased”. Data biasing is an important topic itself, but the author is really referring to data that is collected without a specific objective of benefiting one particular laboratory. The piece emphasizes the value of churning out large data sets, but it doesn’t address the risks of generating data for the sake of data. In the latter, progress may be evaluated in terms of quantity, not quality. Krešimir Josić had an interesting post titled “Can science become too big to fail?”. In a later comment regarding Obama’s Human Brain Project, he stated, “And if the goals are not clearly defined, then it really is impossible to fail.”
And what about that data biasing I mentioned? Someone must decide which methods will produce the most useful results. A risk not addressed in the editorial is the incredible waste of a massive project with flawed methods. Armies of scientific soldiers are still soldiers, not a committee, and they will march without much thought. A risk in big-data, or big-science, is that it may seek sudden, epic increases in scale, as opposed to gradual increases in scale that allow for processes to evolve. I have not clearly seen evidence of plans to control the growth.
In my last post, I pointed out the controversies revolving around Henry Markram and his large-scale projects. Part of the controversy, like much of the global debate, concerns the methods and types of data that will be produced. One could say that Markram has a methodological hypothesis, and his detractors don’t agree with it. I feel that the real misconception in the philosophy of big-data is that the scientific world just needs the data – and tons of it. I disagree. What we need are methods that enable as many researchers as possible to collect their own data. Certainly a major effort in big-data is to develop new technologies to acquire the data, but not with the intent to let everyone use those technologies.
As a computational neuroscientist, I am happy to say that government-funded, publicly-accessible supercomputing facilities are recognized as a general tool that is worth major investment. I am not aware of any such government-funded infrastructure for private physical scientists who require expensive techniques. Having experience in an electrophysiology lab, I realize it would be challenging to have some sort of “public laboratory”. However, some neuroscience labs function as data factories already, employing dedicated technicians for every aspect. Smaller labs may be able to outsource work to other labs or use commercial companies, and I admit that I don’t know if this could be improved upon with a public laboratory.
To summarize my points, I feel that the focus of creating big-data is a poor concept. There is no truly “hypothesis-free data”. So let’s acknowledge that the methods really do matter, and that it would be better to focus on creating publicly accessible tools. Then the big data will come anyway, and it is likely to be far more explosive, quantitatively and conceptually, than anything we can conceive of.