Share news in:
26 February 2019
Authors: Bożena Milanović, Emilia Strycharz-Angrecka
Authors: Bożena Milanović, Emilia Strycharz-Angrecka

Open Data That Change Our Reality: the Complementary Development of Research and Drug Discovery Processes

There are 2.5 quintillion bytes of data created each day. Extracting the value of data became one of the most important necessities to most organizations. Nowadays, we can fix the world and improve our lives just by using the data and sharing it. Its importance is the topic of today’s article. Let’s learn how big data has already changed the world – and which course our reality will take.

What you’ll learn from this article:

  1. How important is the process of data sharing in the medical and biotech fields,
  2. How the importance of genomic open data changed the landscape of modern lab research,
  3. What are the benefits of sharing data between medical facilities,
  4. What are the concerns in relation to the big data usage,
  5. What are the public open datasets worth knowing,
  6. Conclusion.

The Vast Importance of Data in the Biotechnology and Medical Industries

There is no doubt in the drug discovery landscape – data is powerful. Most organizations already know that extracting the value of data is one of their most vital needs. This shift will drive companies to capture and preserve more of the data they generate to transform it and make it as valuable as possible. Smaller businesses also utilize many pieces of data to make informed decisions. This power is even more important in science and health area.

Knowledge about the usage of data in the medical industry steadily increases. This awareness impacts the drug development process, clinical trials, and, finally – the patients themselves. One can apply the same phenomenon to science – the field relying on data and meticulous analysis.

The Rising Relevance of Open Genomic Data

The amount of genomic data increases massively. Such research promises vast advances in our understanding of health and disease. Why? The DNA-related is especially powerful. However, it may not be sufficient to rely solely on one genetic dataset to obtain results. 

Consequently, having access to different genetic datasets and population information is necessary, as it improves the accuracy of research. Moreover, the data are not the problem. Relying on big data has become a standard. We use it in this way or another in our everyday life. However, we do not even realize that accessing such data can pose a major challenge.

The Importance of Data Sharing Between Medical Facilities

Nowadays, data sharing is a crucial part of clinical research worldwide. It accelerates research, drug development, clinical trials, personalized medicine, and the development of innovative products. Unfortunately, we can enjoy such benefits only if researchers and clinicians can access, make comparisons, and seek patterns across the genomes of many individuals. Indeed, most studies require the aggregation of genomic data from large cohorts of patients before discovering any dependencies with reasonable confidence.

Data Sharing: the Groundbreaking Necessity of the Modern Lab Research

The solution to this problem seems to be an easy one: Let’s make data more open! However, to grasp all the nuances of this issue, we need to establish several basic premises. First, let us define the most relevant term – the “open data” themselves. 

The National Center for Mobility Management defines open data as content that “can be freely used, modified, and shared by anyone for any purpose [7].” As such, its accessibility is vital for transforming our reality. It brings numerous benefits that our field nowadays relies on. In short, data sharing: 

  • They can be easily reused and re-purposed for complementary development. 
  • Encourages transparency and encourages accountability to participants, beneficiaries, peers, and data subjects. 
  • Forces scientists to verify the quality of the data, controlled by a larger group of interested parties. Data sharing encourages more connection and collaboration between researchers, which can result in an important new finding within the field.
  • Increases data circulation and use within the scientific community. 
  • Encourages better transparency, enables reproducibility of results, and informs the larger scientific community. 
  • Encourages researchers to manage their data better 
  • Ensures its quality, 
  • Enables researchers to perform meta-analyses on the current research topic.

Consequently, data sharing can maximize the impact of data and conclusions drawn from it, increase efficiency among a wider audience than just within your project, or play a role in decision-making within other projects.

Concerns Relating to Open Data in the Medical Field

Despite the many benefits gained from the open data, there are considerations that researchers must be aware of when to share them. The following issues remain valid nowadays, and they need constant addressing during the biopharmaceutical and medical fields’ development. 

  • Some concerns relate to the possibility of other parties using datasets inappropriately or out of context from the original research’s purposes.
  • Moreover, genomic and clinical data are sensitive information. Therefore, the ever-rising apprehensions about maintaining confidentiality remain reasonable. Current legal regulations allow participants to see their data means of application. That way, they can be sure that the data are used in a way they do not agree with, or that puts them in danger, of refusing to participate in subsequent research.
  • It is very important to think carefully about what kind of data gets shared. Special relationships, licenses, and agreements should be established to govern limited sharing. Irrespective of whether you choose to make your data open or close its availability to the public, there are several different licenses that you can use for your case.
  • Unfortunately, due to the diversity and fragmentation of medical databases, there is a disorder in data formats, processing and analysis methods, and data transfer. It often leads to lost opportunities for scientific advancement. This difficulty in sharing genetic data for research purposes is exacerbated by the fact that genomic and clinical data are still generally collected by institutions and studied within diseases. Even though international guidelines have facilitated sharing, many countries have put in place strict provisions guiding international sharing, and a few even prohibit it entirely.

Where to Gather Big Data From? The Public Databases Worth Knowing

Most organizations and institutions have big data. Many understand the need to harness them to extract all the valuable information. In the sense of publishing open data, sharing is an increasing trend. 

The open data and open government movements (for example, the website in Poland established by the Polish government) show how the international norm for data sharing has gained powerful traction in recent years. They provide better transparency and accountability for the parties involved in their usage. Also, the possibility of using them in shared measurement for project evaluation encourages an atmosphere of cooperation and mutual responsibility.

This trend also motivated the facilities to establish many public databases. They are a priceless source of valuable information that esteemed scientific facilities utilize in their research. To name a few of them:  

  • The Cancer Genome Atlas (TCGA), 
  • The database of Genotypes and Phenotypes (dbGaP), 
  • The European Genome-phenome Archive (EGA), 
  • The ICGC Data Portal, 
  • The Catalogue Of Somatic Mutations In Cancer (COSMIC), 
  • The Gene Expression Omnibus (GEO)

Furthermore, with today’s technology, the analysis of huge amounts of data is fairly easy. You can get answers from it almost immediately, as the development of methods to analyze nucleic acids has transformed biological inquiry. It has the potential to alter the practice of medicine, as the technological solutions now uncover hidden patterns, correlations, and other insights. Data without those analyses are useless. Indeed, the connection between raw inquiry and potential clinical translation has never been clearer.

Data Sharing in the Medical Field: Conclusion

Although the data generated from the large-scale cancer genome characterization efforts have been and will continue to be available publicly, accessing and using these cancer genome data remains a significant challenge. We believe in a future where diseases like cancer are curable. We are sure that the solution to this challenge is in our hands. A way to this reality leads through multi-omics data and Artificial Intelligence – solutions that translate information into great discoveries. In Ardigen, we pursue this vision and believe that data sharing makes a difference – for all of us.

Works Cited:

[1] D.L. Longo, J. M. Drazen, Data sharing, N Engl J Med 2016; 374:276-277

[2] T. Haeusermann, B. Greshake, A. Blasimme, D. Irdam, M. Richards, E. Vayena, Open sharing of genomic data: Who does it and why?, PLoS One. 2017 May 9;12(5)

[3] L. Chin, W.C. Hahn, G.Getz, M. Meyerson, Making sense of cancer genomic data, Genes Dev. 2011 Mar 15; 25(6): 534–555.






29 January 2019
Biobanking: The Future of Science is Now 
17 May 2019
The future of oncology - unleashing the power lying within us
Go up