Poster: From Text to Genes: Can LLMs Enhance Expression Annotations?

Topic:

Format:

Automating GEO Metadata Annotation with LLMs

About the poster

NCBI’s Gene Expression Omnibus (GEO) provides valuable gene expression and functional genomics data, but a major challenge is the lack of consistent, standardized annotation. Proper annotations, including experimental conditions and sample types, are essential for making datasets searchable, comparable, and usable across studies. These annotations are crucial for integrating data from multiple sources, facilitating accurate analysis, and ensuring reproducibility, which is key for advancing scientific discoveries. However, manual annotation is time-consuming, prone to errors, and slows down scientific progress.
To address these challenges, we developed an automated tool based on large language models (LLMs) that streamlines the annotation process. This tool detects and extracts relevant metadata, ensuring consistency and reducing human error. A minimum viable product (MVP) was developed to automatically annotate four key fields in GEO studies: Condition, Tissue, Drug and Intervention, demonstrating the potential of AI-driven techniques to enhance accuracy and accelerate biological research.

This poster was originally presented during the BioIT 2025 Conference.

You might be also interested in:

Blog cover for Ardigen publication on ARDisplay-I and MHC ligand identification in Molecular & Cellular Proteomics
New publication in MCP: Improving MHC ligand identification with machine learning and optimized isolation
Fluorescence microscopy image of cells stained with multiple Cell Painting dyes showing cellular organelles in green, blue, and pink, overlaid with Ardigen brand graphic elements indicating phenomics data in durg discovery
End to End Data-to-Decision Journey for AI-Driven Phenomics in Drug Discovery
Abstract network visualization representing AI-driven integration of biological data and knowledge graphs for target identification in drug discovery.
Target Identification: From Poor Data to Quality Predictions
Abstract data streams representing data sourcing in pharmaceutical research and AI drug discovery
What Are Common Data Sourcing Patterns in Pharmaceutical Research (part 3)

Contact

Ready to transform drug discovery?

Discover how one of the top AI CROs in the world, can be your trusted partner in revolutionizing drug discovery through AI.

Contact us today to learn more about our tailored solutions for empowering your drug development journey.

Send us a message and we will contact you back within 48 hours.

Newsletter

Become an insider

Be the first to know about Ardigen’s latest news and get access to our publications, webinars and more!