Post written by Toni Hermoso, bioinformatician at the CRG.
It’s been almost a decade since the term “Open Science” first appeared in Wikipedia. The page was created by Aaron Swartz and initially redirected to the “Open Access” entry. Some years later this young activist committed suicide as a result of the pressure from the judicial charges against him after having uploaded many privative licensed articles to the Internet.
Parallel to these events, Creative Commons licenses, a set of recommendations intended to foster sharing in the digital world, became increasingly popular, and many novel publishing initiatives took advantage of them for promoting open access to scientific literature.
At the same time, more and more government agencies started to demand that the benefactors of their funding should provide their publication results openly within a certain period of time. So, if research was not published originally in an open-access journal (golden road) it should be eventually uploaded in an institutional repository (green road). Furthermore, preprints, an already common practice in Physical Sciences, started to become widespread in Biosciences after the creation of portals such as BioRxiv.
However, despite the bloom of Open-Access (OA) journals and the introduction of a more favouring legislation, there are still strong concerns regarding the future of open access in science. This is mostly due to the fact that the publishing sector is effectively controlled by very few parties, which often provide pay walled content. A reaction to this situation is evidenced by initiatives such as Sci-Hub, which is defiantly providing free-access to those restricted articles.
In any case, there is more to Open Science than Open Access. We could highlight at least two other major facets: Open Data and Open Methodology. These are the indispensable two pilars for making reproducibility in modern science actually possible. In general terms, they may be the initial and raw data (straight from machines or sensors) or the final outcomes such as chart images or spreadsheets. The recent data flood has made necessary the birth of established public open repositories (e.g. Sequence Read Archive or the European Variant Archive) so researchers could freely reuse and review existing material.
It is also a common requirement from these repositories that data must be available in an open format, so other researchers may process them with different tools or versions than the ones originally used. This latter aspect is intimately associated to Open Source, which is also essential for ensuring a reproducible methodology. As a consequence, an increasing number of journals are requiring submitters to provide both data and program code so reviewers may assess by themselves that results are those that are claimed to be.
The present challenge is how to transfer those good practices -which originated in the software engineering world and later permeated into computational sciences- to the wide scientific community, where subject systems may be far less controllable (e.g., organisms or population samples). In order to help on this, there is an increasing effort in training scientists on technologies such as control version systems (e.g. GitHub), wikis or digital lab notebooks. All these kind of systems can enable collaboration of several different parties in an open and traceable way.
Even though there are some practices in everyday scientific activity, such as peer review, that are still under experimentation within the open umbrella, hopefully we may expect that in the future more and more of the key points we commented above will be just taken for granted. At that stage we might not even need to distinguish Open Science from simply SCIENCE anymore.