Summary: In this blog post, we discuss how ARM-based processors, GPUs, and FPGAs are accelerating omics workflows and reducing data processing costs by solving the inefficiencies in traditional tools and architectures.
Modern drug discovery depends on scalable data processing to extract insights from vast omics datasets. However, traditional bioinformatics workflows struggle at scale due to the computational intensity of genomic data processing, and simply adding more cloud resources doesn’t always solve the problem. Even with unlimited cloud resources, inefficiencies in standard architecture lead to computational bottlenecks and rising costs.
Transitioning to ARM-based architecture as well as implementing hardware accelerator solutions like FPGA-based Dragen DGX or GPU-based Parabricks drastically reduce processing time associated with computationally-intensive tasks such as whole-genome variant calling. By overcoming the limitations of traditional CPU-based methods, these solutions enable cost-effective large-scale population genomics studies, single-cell analysis, and feasible real-time clinical diagnostics.
Here we discuss hardware acceleration solutions for bioinformatics.
Transitioning from legacy bioinformatics tools to ARM-enabled architecture
Many legacy bioinformatics tools were originally designed and optimized for x86 CPUs and consequently older applications still run on Intel’s architecture. However, researchers are increasingly switching to Arm-based processors, especially in cloud environments such as AWS Graviton. As was showcased at the Nextflow Summit 2024, transitioning to ARM-based solutions provides a cost-effective and environmentally friendly way to process bioinformatics data with enhanced performance and scalability.
ARM-based chips excel in workflows requiring high parallelization, such as genome assembly and AI-driven drug discovery. These processors are built for power efficiency, making them ideal for cloud computing and high-performance clusters. Bioinformatics researchers adopting ARM-based cloud instances (such as AWS Graviton) can benefit from lower costs and improved scalability, especially for large bioinformatics jobs.
Next step in hardware acceleration: From CPUs to GPUs
As bioinformatics datasets continue to scale, specialized hardware like FPGA (Dragen DGX) and GPU (Parabricks, RAPIDS) are transforming genomics analysis. These hardware accelerators deliver significantly faster processing, reducing costs while handling larger datasets efficiently.
The DRAGEN Bio-IT Platform, developed by Illumina, is an FPGA-based (Field-Programmable Gate Array) system designed to accelerate genomic data analysis. Unlike traditional CPU-based bioinformatics tools, which rely on software running on general-purpose processors, DRAGEN offloads key genomics algorithms to hardware, significantly improving speed, accuracy, and cost-efficiency.
NVIDIA’s Parabricks and RAPIDS are GPU-accelerated software frameworks that significantly speed up bioinformatics workflows, revolutionizing both single-cell genomics and population-level studies. They leverage GPUs to process massive datasets much faster than traditional CPU-based methods. For instance, a typical whole-genome variant calling pipeline (BWA-GATK) that takes ~30 hours on CPUs can be completed in ~30 minutes on GPUs using Parabricks.
RAPIDS is an open-source framework which can be used for speeding up large-scale genomic data analysis and AI-driven insights. A study using RAPIDS for genome-wide association studies (GWAS) showed its ability to process millions of variants in minutes, compared to hours or days using traditional CPU-based pipelines.
When it comes to single-cell data analysis, RAPIDS-singlecell delivers remarkable efficiency gains, achieving 676x faster UMAP and 70x faster PCA on a 1-million cell dataset. These improvements reduce dimensionality reduction tasks from hours to minutes, making large-scale single-cell analysis more practical and accessible.
Overcoming bottlenecks: Next-gen solutions for scalable omics workflows
Innovative solutions for optimizing bioinformatics workflows
ARM-based architecture (e.g., AWS Graviton): cheaper, greener, and great for parallel tasks.
FPGA-based DRAGEN: speeds up genomics with custom hardware logic.
GPU-based tools like Parabricks and RAPIDS: offer massive speed boosts (e.g., 676x faster UMAP).
As the volume of sequencing data grows, optimizing bioinformatics workflows is critical. Moving from CPUs to ARM-based cloud instances enhances efficiency, while switching to GPU-accelerated frameworks like Parabricks and RAPIDS further accelerates analysis by orders of magnitude.
Ardigen is at the forefront of these innovations, contributing to open-source projects in the Nextflow ecosystem. As active members of the nf-core community, our experts shape best practices in bioinformatics pipeline development. For the last few years, Ardigen has participated in the Nextflow Summit and nf-core hackathons to share knowledge and helps drive advancements in scalable bioinformatics solutions
In just a few weeks, Kamil Malisz, Lead Nextflow Developer at Ardigen, will present a webinar titled “From Rapid Prototyping to High-Performance: GPU-powered workflow as an automation heart of your AI lab’s loop”. This session will showcase strategies for transitioning from prototype bioinformatics workflows to scalable, automated solutions using GPU acceleration (Parabricks, RAPIDS), FPGA-based variant calling (Dragen DGX), and cloud-native optimization. Stay tuned for more information!