Optimizing nextflow pipelines for cloud scalability
Quotient Therapeutics, a company working with deeply sequenced whole-genome sequencing (WGS) data, faced growing computational demands. As the number of processed samples increased, they needed to optimize their Nextflow-based pipelines for better efficiency, reliability, and cost-effectiveness.
Challenge
The client’s pipeline was struggling to keep up with increasing workloads, resulting in longer processing times and rising compute costs. Ensuring consistent reliability was also a key priority to maintain high-quality data outputs.
Approach
A multi-tiered optimization strategy was implemented:
- Comprehensive review of pipeline code, tools, and environments to identify optimization opportunities.
- Integration of nf-test for automated validation and CI/CD workflows.
- Optimization across three levels: bioinformatics, software engineering, and cloud DevOps.
Results
- Reliability improved through automated integration tests before every deployment.
- Processing cost reduced by 64%.
- Execution time reduced by 63%, enabling sample processing within the target 24-hour window
Through strategic optimization of bioinformatics pipelines in an AWS Batch environment, Quotient Therapeutics achieved significant cost savings and efficiency gains. The improvements ensured the scalability and reliability of their workflow, preparing them for continued growth.