There are 2.5 quintillion bytes of data created each day. Extracting the value of data became one of the most important necessity to most organizations. Nowadays, we can fix the world and improve our lives just by using the data and sharing them.
Data never sleeps
Data can be a powerful thing. Most organizations already know that extracting the value of data is one of the most important need. This shift will drive companies to capture and preserve more of the data which they generate in order to transform it to find value. Data are used to support smarter business decisions.This power is even more important in science and health area.
Knowledge about use of the data in medicine is increasing. That has a strong impact on the drugs development process, clinical trials and finally – on patients. The same phenomenon is observed in science: this area without the data and appropriate analysis is useless. Furthermore, the amount of genomic data is set to increase massively. Genomic research promises major advances in our understanding of health and disease. Why? DNA is powerful data. But to find results, genetic data may not be enough. Consequently, it is important to have access to different genetic datasets and population information. It is necessary to improve the accuracy of research. Moreover, the data are not the problem. Big data is a standard, it is our everyday life. However, accession to the data is challenging.
Nowadays, data sharing is a crucial part of clinical research throughout the world. Sharing the data accelerates research, drug development, clinical trials, personalized medicine and the development of innovative products. Unfortunately, such benefits will be gained if researchers and clinicians can access, make comparisons and seek patterns across the genomes of a large number of individuals. Indeed, most studies require the aggregation of genomic data from large cohorts of patients before any dependencies can be discovered with a reasonable confidence.
Data, be more OPEN!
The key to solve this problem is very easy: Let’s make data more open! What is the definition of the open data? Open data and content can be freely used, modified, and shared by anyone for any purpose.
Open Data are revolutionizing our reality. The benefits of that process are important. Open data can be easily re-used and re-purposed for the complementary development. Moreover, data sharing encourages transparency and encourages accountability to participants, beneficiaries, peers and data subjects. It enforces scientists to verify the quality of the data,controlled by a larger group of interested parties. Data sharing encourages more connection and collaboration between researchers, which can result in an important new findings within the field. Sharing data increases data circulation and use within the scientific community. It encourages better transparency, enables reproducibility of results, and informs larger scientific community. Data publication encourages researchers to manage their data better and to ensure that their data are of high quality. Moreover, with higher availability of the data, researchers can perform meta-analyses on the current research topic. Consequently, data sharing can maximize the impact of data and conclusions drawn from it, increase efficiency among a wider audience than just within your project, or play a role in decision making within other projects.
Though the many benefits gained from the open data, there are important considerations that researchers must be aware of when to share the data. There are concerns that others will use the data inappropriately or out of context from the original purpose of the research. Moreover, genomic and clinical data are sensitive information, therefore apprehensions about maintaining confidentiality are reasonable. Current legal regulations allow participants seeing that their data is used in a way they do not agree with, or that puts them in a danger, to refuse to participate in a subsequent research. It is very important to think carefully about what kind of data gets shared. Special relationships, licenses and agreements should be established to govern limited sharing. Irrespective of whether you choose to make your data open or not, there are a number of different licenses that you can use for your case.
Unfortunately, due to the diversity and fragmentation in medical databases, there is a disorder in data formats, processing and analysis methods, and data transfer. It often leads to lost opportunities for scientific advancement. This difficulty in sharing genetic data for research purposes is exacerbated by the fact that genomic and clinical data are still generally collected by institution and studied within diseases. Despite the fact that international guidelines have facilitated sharing, many countries have put in place strict provisions guiding international sharing, and a few even prohibit it entirely.
The power of the data
Most organizations and institutions have big data. Many of them understand the need to harness those data and extract value from them. Sharing, in the sense of publishing open data, is an increasing trend. The open data and open government movements (e.g. website dane.gov.pl in Poland established by the Polish government), as part of trend for better transparency and accountability, as well as recent interest in shared measurement for project evaluation, are just some examples of how the international norm for sharing data has gained powerful traction in recent years. It also motivated to establish many public databases (e.g. The Cancer Genome Atlas (TCGA), The database of Genotypes and Phenotypes (dbGaP), The European Genome-phenome Archive (EGA), The ICGC Data Portal, Catalogue Of Somatic Mutations In Cancer (COSMIC), Gene Expression Omnibus (GEO), etc.) what is priceless and crucial for research. Furthermore, with today’s technology, it is possible to analyze the huge amount of the data. You can get answers from it almost immediately! Moreover, the development of methods to analyze nucleic acids has transformed biological inquiry. It has the potential to alter the practice of medicine. It uncovers hidden patterns, correlations and other insights. Data without those analyses are useless. Indeed, the connection between basic inquiry and potential clinical translation has never been more clear.
To share or not to share, that is the question…
Although the data generated from the large-scale cancer genome characterization efforts have been and will continue to be made publicly available, accessing and using these cancer genome data remains a major challenge. We believe in future where diseases like cancer are curable. We are sure that the solution to this challenge is at our hands. A way to this world leads through multiomics data and Artificial Intelligence that translates it into big discoveries. In Ardigen we pursue this vision. And we are convinced that data sharing make a difference. For all of us.
Bibliography:
[1] D.L. Longo, J. M. Drazen, Data sharing, N Engl J Med 2016; 374:276-277
[2] T. Haeusermann, B. Greshake, A. Blasimme, D. Irdam, M. Richards, E. Vayena, Open sharing of genomic data: Who does it and why?, PLoS One. 2017 May 9;12(5)
[3] L. Chin, W.C. Hahn, G.Getz, M. Meyerson, Making sense of cancer genomic data, Genes Dev. 2011 Mar 15; 25(6): 534–555.
[4] https://www.domo.com/learn/data-never-sleeps-6
[5] https://responsibledata.io/resources/handbook/chapters/chapter-02c-sharing-data.html
[6] https://datamakespossible.westerndigital.com/value-of-data-2018/