This blog guides through the rapid expansion of possibilities brought into transcriptomics by a continuous stream of recent technological advancements and highlights challenges and opportunities through which Ardigen, working with its clients, successfully delivers solutions that unlock advanced analysis in this domain.
Table of Contents:
Biology has historically been a visual discipline. The great biological discoveries were catalyzed by the invention of the microscope by Antoni van Leeuwenhoek in 1666. From that time and up until the XX century, that was the status quo. The advent of next-generation sequencing (NGS) propelled biology into the Big Data era, with the amount of information challenging our ability to process it. Just like the Large Hadron Collider once challenged physicists with processing events in the range of hundreds of GB per second, now constantly updated NGS methods in novel protocols are challenging biologists with processing in the order of TB per run.
The amount of biological data is growing not only in terms of terabytes. Its dimensionality is growing as well, in terms of the data types as well as spatial dimensions. The field of transcriptomics, in particular, has grown from being a snapshot of average gene expression in a tissue or organ to single-cell resolution. Initially, bulk transcriptomics provided insights into average gene expression across tissues or entire organs, capturing broad patterns but masking heterogeneity among individual cells. Advances in microarray technology and bulk RNA-sequencing (RNA-seq) enabled a deeper understanding of gene expression profiles, albeit as aggregates of cellular populations.
The shift to single-cell transcriptomics marked a transformative leap, driven by the development of techniques like droplet-based single-cell RNA-seq (scRNA-seq). These methods enabled researchers to resolve gene expression at the level of individual cells, unveiling cellular diversity, rare cell populations, and dynamic transcriptional states within tissues.
In addition, there has been a lot of progress in developing new transcriptomic methods to understand isoforms and noncoding elements. Traditional RNA-seq methods using short-read sequencing offer insights into the abundance of isoforms by aligning fragments to a reference genome. Newer methods, like full-length transcript sequencing and single-cell RNA-seq, go a step further by resolving both coding and noncoding RNA species, including lncRNAs, miRNAs, and circRNAs. Techniques such as SMART-seq and Ribo-Zero protocols enhance the ability to detect these elements, providing critical information on their roles in gene regulation and cellular behavior.
Novel gene expression analysis methods enabled researchers to identify and quantify specific isoforms through spliced read alignment and exon-exon junction mapping. For instance, long-read RNA sequencing (from PacBio or Oxford Nanopore) enables direct resolution of full-length transcripts, capturing alternative splicing events. This allows researchers to distinguish between gene isoforms, their tissue-specific expression, and the roles they play in cellular functions or diseases.
The next advance was the single-cell methods that brought the differentiation between the cells of the complex tissues and magnified view of differences between the expression of the same cell type – stochastic and cyclic oscillations of expressions. Much can be gained from single cells. For blood cells, it gives a complete picture of the expression, thus being a perfect tool for research on circulating cells of the immune system. However, for every other tissue, or even if one would like to investigate the lymph node or the in situ inflammation, the lack of information on the distribution of measured cells may hide a significant part of biology, especially in highly organized organs and processes depending on layers and interface contact.
The introduction of spatial transcriptomics and rapid advancements of its various technological approaches have changed this status quo for gene expression studies. Unlike traditional transcriptomics, which averages gene expression across samples, spatial transcriptomics provides a detailed map of gene activity within tissues and organs—a vivid portrait of biology. This breakthrough enables researchers to study gene expression in its precise spatial context, uncovering mechanisms previously hidden in aggregate data. Technologies like Visium by 10x Genomics and Slide-seq allow high-resolution mapping of RNA molecules in situ, offering insights into the molecular microenvironment.
This capability transforms our understanding of complex biological systems, such as tumor microenvironments, brain organization, and immune responses. By linking gene expression to precise spatial locations, spatial transcriptomics helps identify how cells interact in their native environments, uncovering mechanisms of disease progression and facilitating the development of targeted therapies. For example, it can reveal spatial patterns in cancer, highlighting tumor-immune interfaces or regions of hypoxia, which are critical for designing novel treatment strategies. Similarly, in neuroscience, spatial transcriptomics can map gene expression across brain regions, aiding in the study of neural circuits and neurodegenerative diseases.
Despite its transformative potential, spatial transcriptomics introduces significant technical and computational challenges. The main technical challenge is the tradeoff between spatial resolution (image sharpness) and the number of genes surveyed (depth). With technical progress in the field these technical challenges will likely get resolved soon, however data analysis remains the biggest hurdle for fully utilizing the potential of spatial transcriptomics studies.
The shift from bulk to spatial transcriptomics has led to an exponential increase in data complexity. What was once a single column of gene expression data is now a multi-dimensional dataset, encompassing thousands of genes across two or three spatial dimensions. This complexity is further compounded by the need to analyze gene expression correlations across neighboring cells, track cell-to-cell interactions, and integrate spatial data with other modalities, such as clinical metadata or chemical structures.
Even fundamental aspects like computational power pose significant hurdles. Small increases in linear resolution or depth can exponentially increase data volumes. This complexity is further compounded by a growing interest in larger datasets to capture experimental results as comprehensively as possible. Given the high costs associated with sample collection, investing in thorough data readouts is both logical and increasingly common.
Most existing software tools for transcriptomics analysis were grandfathered from less complex transcriptomics domains and are not designed to handle the scale and complexity of spatial transcriptomics. New software packages, such as NVIDIA’s RAPIDS, which leverages GPU acceleration, have been introduced to address these issues. However, despite their promise, these tools often require additional effort to fully meet scientists’ needs, including testing, validation, and developing workarounds to ensure the functionality and accuracy of the non-accelerated reference toolkits.
At Ardigen, we have firsthand experience addressing these challenges. Our projects have tackled the bugs and implementation issues characteristic of newly developed software. Once optimized and deployed, such a pipeline is a game-changer for a company that is looking to tap into the rapidly expanding transcriptomics data universe, unlocking powerful analysis capabilities, making larger published datasets accessible, and enabling comparative meta-analysis. This is vital for managing the “data deluge” that is expected in the near future.
While robust statistics and classical computer vision approaches remain foundational for quality control and normalization, deep learning methods are indispensable for uncovering the non-obvious patterns and detecting nuanced differences that escape traditional methods. At Ardigen, we’ve demonstrated that carefully designed deep learning models, trained on small but representative datasets, can significantly outperform conventional models.
Our experiences in analysis of microscopic images show that a carefully designed deep model trained over a small but representative dataset performs substantially better than conventional models. This is true even for the well-researched approaches, with handpicked features backed by years of research and evaluations. These models produce superior embeddings, enhancing downstream tasks and revealing nuanced differences that traditional methods often miss.
Applying deep learning to spatial transcriptomics data enables the identification of significant genes that exhibit changes not only across cell types but also in spatial or organizational dimensions. This deeper understanding can lead to novel insights into biological mechanisms and potential therapeutic targets.
One of the most exciting advancements in transcriptomics analysis is multi-modal fusion—the integration of diverse data types such as gene expression patterns, images, chemical structures, variant information, clinical metadata, or text. Modern approaches allow these disparate data modalities to be transformed into comparable formats that can be integrated into a comprehensive view of all experimental evidence.
Such a bird’s-eye view enables sophisticated analyses, including assessing the toxicity of drugs, comparing primary cancer cell lines to established model cell lines to find the best match, or detecting subtle changes across multiple data modalities for a more complete insight into indications or biological mechanisms at play. These are real-life examples of Ardigen’s projects which provided clients with insights previously unattainable without these advanced deep learning methods and our experience in combining these methods to provide customized solutions tailored to answer specific research questions.
Spatial transcriptomics has brought about a staggering evolution in our understanding of gene expression, progressing from single-point measurements to detailed 3D images of thousands of genes across the tissue at single-cell resolution. However, this explosion in capability has left many researchers struggling to keep up with the associated data analysis challenges.
At Ardigen, we have successfully addressed these challenges through our expertise in bioinformatics and software engineering. Our experience in designing, testing, and deploying validated pipelines has been instrumental in helping clients adopt spatial transcriptomics data and incorporate its insights into their research—and you can learn more about the solutions we have helped them implement in our upcoming webinar on February 6. By leveraging advanced deep learning methods and multi-modal integration, we ensure that our clients can fully harness the potential of these groundbreaking technologies.
Unlike the exaggerated promises often associated with AI, Ardigen’s approach focuses on delivering practical, reliable solutions that reveal new details about biology. Our carefully controlled and deliberate application of AI enables researchers to achieve a deeper understanding of complex biological systems. With Ardigen as your partner, you can navigate the challenges of transcriptomics and unlock its transformative potential for scientific discovery.