Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

The Power of Open-Source Pipelines for Scientific Research with Harshil Patel

The Power of Open-Source Pipelines for Scientific Research with Harshil Patel

FromData in Biotech


The Power of Open-Source Pipelines for Scientific Research with Harshil Patel

FromData in Biotech

ratings:
Length:
41 minutes
Released:
May 8, 2024
Format:
Podcast episode

Description

This week, Harshil Patel, Director of Scientific Development at Seqera, joins the Data in Biotech podcast to discuss the importance of collaborative, open-source projects in scientific research and how they support the need for reproducibility.

Harshil lifts the lid on how Nextflow has become a leading open-source workflow management tool for scientists and the benefits of using an open-source model. He talks in detail about the development of Nextflow and the wider Seqera ecosystem, the vision behind it, and the advantages and challenges of this approach to tooling.

He discusses how the nf-core community collaboratively develops and maintains over 100 pipelines using Nextflow and how the decision to constrain pipelines to one per analysis type promotes collaboration and consistency and avoids turning pipelines into the “wild west.”

We also look more practically at Nextflow adoption as Harshil delves into some of the challenges and how to overcome them.

He explores the wider Seqera ecosystem and how it helps users manage pipelines, analysis, and cloud infrastructure more efficiently, and he looks ahead to the future evolution of scientific research. 

Data in Biotech is a fortnightly podcast exploring how companies leverage data innovation in the life sciences.

---

Chapter Markers

[1:23] Harshil shares a quick overview of his background in bioinformatics and his route to joining Seqera.

[3:37] Harshil gives an introduction to Nextflow, including its origins, development, and the benefits of using the platform for scientists.

[9:50] Harshil expands on some of the off-the-shelf process pipelines available through NFcore and how this is continuing to expand beyond genomics.

[12:08] Harshil explains NFcore’s open-source model, the advantages of constraining pipelines to one analysis per type, and how the Nextflow community works.

[17:43] Harshil talks about Nextflow's custom DSL and the advantages it offers users

[20:23] Harshil explains how Nextflow fits into the broader Seqera ecosystem. 

[26:08] Ross asks Harshil about overcoming some of the challenges that arise with parallelization and optimizing pipelines

[28:01] Harshil talks about the features of Wave, Seqera’s containerization solution. 

[32:16] Ross asks Harshil to share some of the most complex and impressive things he has seen done within the Seqera ecosystem.

[35:42] Harshil gives his take on how he sees the future of biotech genomics research evolution.

---

Download our latest white paper on “Using Machine Learning to Implement Mid-Manufacture Quality Control in the Biotech Sector.”

Visit this link: https://connect.corrdyn.com/biotech-ml
Released:
May 8, 2024
Format:
Podcast episode

Titles in the series (21)

Data in Biotech is a fortnightly podcast exploring how companies leverage data to drive innovation in life sciences.  Every two weeks, Ross Katz, Principal and Data Science Lead at CorrDyn, sits down with an expert from the world of biotechnology to understand how they use data science to solve technical challenges, streamline operations, and further innovation in their business.  You can learn more about CorrDyn - an enterprise data specialist that enables excellent companies to make smarter strategic decisions - at www.corrdyn.com