Accessibility Revolution: Genomics Research Made Affordable with Advanced Sequencing Technology

Written by SAGC | May 31, 2024 12:01:32 AM

Unlocking the Power of Multi-Omics Biology through Data Science

Recently, Cathal King (Bioinformatician & Single Cell and Spatial Transcriptomics Specialist) at SAGC, sat down with scientific writer, Laura Tabellini Pierre, to discuss his experience with Parse Biosciences and combinatorial barcoding data and analysis pipelines.

Multi-omics biology is deeply rooted in data science.

To support the explosion of technologies generating large multi-dimensional datasets, computational biologists are developing efficient data pipelines that combine tools to preprocess, analyse, and visualise data.

The South Australian Genomic Centre (SAGC) is at the forefront of these efforts, a state-wide genomics facility headquartered in the South Australian Health and Medical Research Institute (SAHMRI) building in Adelaide.

The SAGC provides a plethora of multi-omic services, including single cell RNA sequencing (scRNA-Seq) with Parse Evercode assays. A team of computational biologists and statisticians supports the facility’s users by developing customised approaches for data analysis, integration, and visualisation.

To understand their vision and its challenges, we interviewed Cathal King, a bioinformatician at SAGC specializing in scRNA-Seq and spatial biology.

Cathal shared his experience with Parse data, highlighting beneficial aspects and areas for future development. He then shared his aspirations for developments aimed at enhancing the user experience in data analysis. Daniel Diaz, Senior Bioinformatics Application Scientist at Parse, joined to discuss efforts to streamline complex data analysis workflows and integrate diverse -omics data types.

You can read the full story here on the Parse Biosciences website...

Cathal, can you tell us your background and how your team became involved with Parse Biosciences and combinatorial barcoding?

Cathal King (CK): I am a bioinformatician based at SAHMRI, in Adelaide. I am primarily engaged in the analysis phase of single cell RNA-Seq and spatial transcriptomic studies. My role involves exploring the datasets, interpreting them from a biological perspective, and communicating outcomes to fellow researchers. Besides, I handle primary analyses such as alignment and preprocessing of data.

I also collaborate with three other research teams at SAHMRI to develop and disseminate analytical pipelines and methods across these teams and the broader SAHMRI community.

Our introduction to Parse and combinatorial barcoding technologies came through the distributor Decode Science. Joel Bathe from SAGC, our partnerships manager, connects us with companies and researchers interested in trialing new technologies like Parse for single cell data analysis.

As a proficient user, what has been your experience with implementing the Parse pipeline? What were your challenges and do you have recommendations?

CK: The pipeline was well-documented and user-friendly. We set it up and executed the computational pipeline on the SAHMRI High-Performance Computing (HPC). The use of an Excel sample loading sheet for sample data was unique but necessary to demultiplex Parse data.

I think that to optimize a data analysis workflow to accommodate large-scale datasets, transitioning from an Excel sheet to an automation-friendly format like a CSV file would significantly reduce errors and improve efficiency.

Daniel Diaz (DD): I agree. And we are streamlining and simplifying the process now. By replacing the existing platform with a cloud-based GUI, users can work through their web interface and better handle large-scale scRNA-Seq data sets.

In terms of streamlining, do you see any need for improvements in accommodating multiple samples or customers in a single run?
CK: Integrating sample management directly with the data input process could enhance efficiency. Additionally, incorporating the pipeline into a workflow manager could further optimize the process, something we are exploring as we transition toward Nextflow.

What key features do you look for in a dataset, particularly regarding quality control metrics?
CK: We examine the HTML report for genes per cell and reads per cell, presented clearly in the Parse pipeline. Clustering analysis, like k-means or cell clustering visualized on a UMAP, is also critical for understanding cell groupings.

Didn’t you sequence an Evercode kit with an MGI sequencer recently? MGI sequencers are less common in our work, but their cost-effectiveness is making them increasingly popular.
CK: Our experience has been positive, with no issues integrating MGI-generated data into existing workflows.

We fully sequenced a Parse dataset on an MGI sequencer. We have our de-multiplexing pipeline, which takes raw data off the sequencer and converts it to class queues. There were no issues with the Parse data.

Our lab operates both the MGI T7 and G400 models, and we have successfully processed datasets from these machines without compatibility issues. The affordability of MGI sequencing is influencing the market and client preferences.

How do you handle mixed-species samples, such as those involving viruses or bacteria and human cells?
DD: For mixed-species samples prepared with Parse technology, we append the non-human genome sequence to the human genome, providing a mapping reference. This allows for accurate alignment and analysis of transcripts from both species within the same sample. Customizing genomes in this way is a flexible approach to accommodate diverse experimental designs.

Shifting focus on the service provider’s logistics, have there been any challenges in delivering data back to customers?
CK: We have not encountered significant issues with data delivery. As the volume of samples increases, ensuring we have sufficient bioinformatics support is crucial. So far, we have managed well, and the use of Parse is still expanding without major data handling problems.

SAGC provides support for a broad range of -omics methods and data analyses, and you handle multi-omics datasets regularly. In your work with these complex datasets, what challenges and insights can you share about integrating different data types from various technologies?
CK: Currently, I am working on integrating mass spectrometry data with spatial datasets. This process involves aligning samples from lipidomics and proteomics studies conducted via mass spectrometry with corresponding spatial transcriptomics data. The challenge lies in the alignment discrepancies between the datasets, even though the tissue samples may appear similar.

The goal is to unravel complex biological questions, with a particular focus on how spatial gene expression data correlate with protein and lipid profiles in the same tissue regions. Spatial transcriptomics typically provide gene expression data in the context of tissue architecture, offering a map of where gene expression occurs, therefore for some aspects, it can be analogous to single cell data if we set aside its image component and accept a trade-off in resolution.

To address this, I have worked on a method where we can overlay and match data points from the two distinct types of analyses. By identifying common points between the spatial and mass spectrometry datasets, I am developing techniques to map and compare the molecular signatures within specific tissue areas.

Another project that interests me is the integration of immune profiling with genomic mutations. I am focusing on immune profiling through VDJ sequencing, as it is crucial for understanding the immune repertoire.

I work with a multiple myeloma research group to analyze sequencing data to identify plasma cell subclones within individual patients and correlating these findings with genomic alterations. This aspect of my research aims to provide deeper insights into the molecular underpinnings of immune responses and disease mechanisms. Repertoire sequencing adds another layer of data we integrate.

DD: Indeed. Our developments in TCR and BCR sequencing kits aim to facilitate this kind of multi-layered analysis, especially for characterizing clonal populations in diseases.

CK: Beyond data integration, a significant focus is on analyzing and interpreting the combined datasets. We explore various analytical approaches to ensure they make biological sense across different technologies. For instance, comparing gene expression signals across platforms and validating spatial data with single cell resolution are critical steps.

Continue reading the full story here on the Parse Biosciences website...

Interested in trialing new technologies like Parse for single cell data analysis? Contact us at the SAGC to learn more about Parse data, pricing and services.

View full post