Collaborative Genomics Projects: A Comprehensive Guide
By Margi Sheth, Julia Zhang and Jean C Zenklusen
()
About this ebook
Collaborative Genomics Projects: A Comprehensive Guide contains operational procedures, policy considerations, and the many lessons learned by The Cancer Genome Atlas Project. This book guides the reader through methods in patient sample acquisition, the establishment of data generation and analysis pipelines, data storage and dissemination, quality control, auditing, and reporting.
This book is essential for those looking to set up or collaborate within a large-scale genomics research project. All authors are contributors to The Cancer Genome Atlas (TCGA) Program, a NIH- funded effort to generate a comprehensive catalog of genomic alterations in more than 35 cancer types.
As the cost of genomic sequencing is decreasing, more and more researchers are leveraging genomic data to inform the biology of disease. The amount of genomic data generated is growing exponentially, and protocols need to be established for the long-term storage, dissemination, and regulation of this data for research. The book's authors create a complete handbook on the management of research projects involving genomic data as learned through the evolution of the TCGA program, a project that was primarily carried out in the US, but whose impact and lessons learned can be applied to international audiences.
- Establishes a framework for managing large-scale genomic research projects involving multiple collaborators
- Describes lessons learned through TCGA to prepare for potential roadblocks
- Evaluates policy considerations that are needed to avoid pitfalls
- Recommends strategies to make project management more efficient
Margi Sheth
B.S., Senior Project Manager, Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Related to Collaborative Genomics Projects
Related ebooks
Principles and Applications of Clinical Mass Spectrometry: Small Molecules, Peptides, and Pathogens Rating: 0 out of 5 stars0 ratingsPragmatic Randomized Clinical Trials: Using Primary Data Collection and Electronic Health Records Rating: 0 out of 5 stars0 ratingsClinical Genomics Rating: 5 out of 5 stars5/5Quality and Safety in Neurosurgery Rating: 0 out of 5 stars0 ratingsMass Spectrometry for the Clinical Laboratory Rating: 0 out of 5 stars0 ratingsClinical Trial Management – an Overview Rating: 0 out of 5 stars0 ratingsRigor and Reproducibility in Genetics and Genomics: Peer-reviewed, Published, Cited Rating: 0 out of 5 stars0 ratingsGenetic Toxicology Testing: A Laboratory Manual Rating: 0 out of 5 stars0 ratingsGlucose Monitoring Devices: Measuring Blood Glucose to Manage and Control Diabetes Rating: 0 out of 5 stars0 ratingsA Comprehensive and Practical Guide to Clinical Trials Rating: 3 out of 5 stars3/5Basics of Quality Management for Nuclear Medicine Practices Rating: 0 out of 5 stars0 ratingsEthical Considerations When Preparing a Clinical Research Protocol Rating: 0 out of 5 stars0 ratingsMolecular Biological Markers for Toxicology and Risk Assessment Rating: 0 out of 5 stars0 ratingsEmerging Practices in Telehealth: Best Practices in a Rapidly Changing Field Rating: 0 out of 5 stars0 ratingsPractical Biostatistics: A Friendly Step-by-Step Approach for Evidence-based Medicine Rating: 5 out of 5 stars5/5Adverse Childhood Experiences: Using Evidence to Advance Research, Practice, Policy, and Prevention Rating: 5 out of 5 stars5/5Clinical Trial Project Management Rating: 0 out of 5 stars0 ratingsAdvances in Cell and Molecular Diagnostics Rating: 5 out of 5 stars5/5Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating Rating: 0 out of 5 stars0 ratingsKey Advances in Clinical Informatics: Transforming Health Care through Health Information Technology Rating: 0 out of 5 stars0 ratingsMeasuring Capacity to Care Using Nursing Data Rating: 0 out of 5 stars0 ratingsPredictive Modeling of Drug Sensitivity Rating: 0 out of 5 stars0 ratingsNovel Designs of Early Phase Trials for Cancer Therapeutics Rating: 0 out of 5 stars0 ratingsStatistics at Square One Rating: 0 out of 5 stars0 ratingsA Practical Guide to Cluster Randomised Trials in Health Services Research Rating: 0 out of 5 stars0 ratingsAn Introduction to Healthcare Informatics: Building Data-Driven Tools Rating: 5 out of 5 stars5/5Modelling Methodology for Physiology and Medicine Rating: 0 out of 5 stars0 ratingsBehavior Change Research and Theory: Psychological and Technological Perspectives Rating: 0 out of 5 stars0 ratingsCancer Biomarkers: Clinical Aspects and Laboratory Determination Rating: 0 out of 5 stars0 ratings
Biology For You
Anatomy and Physiology For Dummies Rating: 4 out of 5 stars4/5Anatomy 101: From Muscles and Bones to Organs and Systems, Your Guide to How the Human Body Works Rating: 4 out of 5 stars4/5This Will Make You Smarter: 150 New Scientific Concepts to Improve Your Thinking Rating: 4 out of 5 stars4/5The Rise and Fall of the Dinosaurs: A New History of a Lost World Rating: 4 out of 5 stars4/5The Grieving Brain: The Surprising Science of How We Learn from Love and Loss Rating: 4 out of 5 stars4/5Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career Rating: 4 out of 5 stars4/5Sapiens: A Brief History of Humankind Rating: 4 out of 5 stars4/5Why We Sleep: Unlocking the Power of Sleep and Dreams Rating: 4 out of 5 stars4/5Dopamine Detox: Biohacking Your Way To Better Focus, Greater Happiness, and Peak Performance Rating: 3 out of 5 stars3/5The Obesity Code: the bestselling guide to unlocking the secrets of weight loss Rating: 4 out of 5 stars4/5The Seven Sins of Memory: How the Mind Forgets and Remembers Rating: 4 out of 5 stars4/5Gut: The Inside Story of Our Body's Most Underrated Organ (Revised Edition) Rating: 4 out of 5 stars4/5Homo Deus: A Brief History of Tomorrow Rating: 4 out of 5 stars4/5Peptide Protocols: Volume One Rating: 4 out of 5 stars4/5How Emotions Are Made: The Secret Life of the Brain Rating: 4 out of 5 stars4/5Lifespan: Why We Age—and Why We Don't Have To Rating: 4 out of 5 stars4/5All That Remains: A Renowned Forensic Scientist on Death, Mortality, and Solving Crimes Rating: 4 out of 5 stars4/5The Coming Plague: Newly Emerging Diseases in a World Out of Balance Rating: 4 out of 5 stars4/5The Winner Effect: The Neuroscience of Success and Failure Rating: 5 out of 5 stars5/5The Soul of an Octopus: A Surprising Exploration into the Wonder of Consciousness Rating: 4 out of 5 stars4/5Mother of God: An Extraordinary Journey into the Uncharted Tributaries of the Western Amazon Rating: 4 out of 5 stars4/5The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race Rating: 4 out of 5 stars4/5Woman: An Intimate Geography Rating: 4 out of 5 stars4/5A Crack In Creation: Gene Editing and the Unthinkable Power to Control Evolution Rating: 4 out of 5 stars4/5Jaws: The Story of a Hidden Epidemic Rating: 4 out of 5 stars4/5A Letter to Liberals: Censorship and COVID: An Attack on Science and American Ideals Rating: 3 out of 5 stars3/5Vax-Unvax: Let the Science Speak Rating: 5 out of 5 stars5/5The Sixth Extinction: An Unnatural History Rating: 4 out of 5 stars4/5Suicidal: Why We Kill Ourselves Rating: 4 out of 5 stars4/5
Reviews for Collaborative Genomics Projects
0 ratings0 reviews
Book preview
Collaborative Genomics Projects - Margi Sheth
Collaborative Genomics Projects: A Comprehensive Guide
Margi Sheth
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, United States
The Cancer Genome Atlas, Center for Cancer Genomics, National Cancer Institute, National Institutes of Health, Bethesda, MD, United States
Jiashan Zhang
The Cancer Genome Atlas, Center for Cancer Genomics, National Cancer Institute, National Institutes of Health, Bethesda, MD, United States
Jean C. Zenklusen
The Cancer Genome Atlas, Center for Cancer Genomics, National Cancer Institute, National Institutes of Health, Bethesda, MD, United States
Table of Contents
Cover image
Title page
Copyright
Chapter 1. Introduction
Abstract
Overview of the Cancer Genome Atlas
Scope, Implementation, and Applicability of This Guide
Policy Considerations
References
Chapter 2. Gathering Project Requirements
Abstract
Introduction
Establish the Purpose of the Project
Identify Key Stakeholders
Set Project Milestones
Design a Pipeline for the Project
Conclusion
References
Chapter 3. Communications Strategies
Abstract
Introduction
Why Develop a Communications Strategy?
How to Develop a Communications Strategy
Project and Policy Changes: A Part of Strategy Evaluation
Examples from Communications Strategies
Communications Devices: Symposia and Visual Identity
Conclusion
References
Chapter 4. Pipeline: Sample Acquisition
Abstract
Introduction
Define the Sample Set for the Project
Establish a Central Biospecimen Processing Facility
Establish Sample Qualification Metrics
Sample Processing and Distribution to Data Generation Centers
Establish Consent Protocols
Handling Institutional Review Boards in Multi-Center Studies
Identify Potential Tissue Source Sites
Establish Contractual Obligation and Payment Plans for Tissue Source Sites
Management of Clinical Data Collection
References
Chapter 5. Pipeline: Data Generation
Abstract
Introduction
Building a Data Generation Model
Establishing a Data Generation Pipeline and Quality Control Measures
Proper Tracking of Data Generation
Conclusion
References
Chapter 6. Pipeline: Data Storage and Dissemination
Abstract
Introduction
Creation of Centralized Data Management Center
Define Standard Data and Metadata Formats
Collect, Store, and Version Data and Metadata
Implement Quality Control Measures for Submitted Data
Put in Place Appropriate Security and Access Controls
Redistribute Data and Metadata Tailored to Diverse Project Stakeholders and End Users
Conclusion
References
Chapter 7. Pipeline: Data Analysis
Abstract
Introduction
Preconceived Questions to Answer
Establishment of Data Analysis Teams
Analysis Structure and Methodology
Practical Considerations
Conclusion
References
Chapter 8. Quality Control, Auditing, and Reporting
Abstract
Introduction
Establish Quality Metrics for Each Component of the Pipeline
Ensure Ethical Management of Samples, Information, and Derived Data Sets
Provide Quality Reports to Stakeholders to Help Improve Processes
Example of a Quality Management Issue
Conclusion
Chapter 9. Project Closure
Abstract
Introduction
Levels of Closure
Budgetary Considerations
Conclusion
References
Conclusion
Flexibility
Transparency
Collaboration
Communication
Reference
Appendix A. Glossary of Terms
Appendix B. TCGA Workflow Diagrams
Appendix C. Publication Guidelines as of July 14, 2015
Use of TCGA Data in Publications and Presentations Prior to Initial TCGA Global Analysis Publication
Use of TCGA Data in Publications and Presentations after Initial TCGA Global Analysis Publication
Use of TCGA Data for Research Purposes Other Than Publication and Presentation
TCGA Program Attribution in Publications and Presentations
Appendix D. Mutation Annotation Format (MAF) Specification
Appendix E. MAGE-TAB
Appendix F. Data Use Certification Agreement
Introduction and Statement of Policy
Terms of Access
Appendix 1
Appendix G. TCGA Analysis Working Group Charter
Purpose
Organization and Function
Membership
Benefits of DWG/AWG Participation
Index
Copyright
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, UK
525 B Street, Suite 1800, San Diego, CA 92101-4495, USA
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, USA
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK
First Edition 2016
Copyright © 2016 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
ISBN: 978-0-12-802143-9
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress.
For Information on all Academic Press publications visit our website at http://store.elsevier.com/
Typeset by MPS Limited, Chennai, India www.adi-mps.com
Printed and bound in the USA
Chapter 1
Introduction
Abstract
As the cost of genomic sequencing is decreasing, more and more researchers are leveraging genomic data to inform the biology of disease. The amount of genomic data generated is growing exponentially, and protocols need to be established for the long-term storage, dissemination, and regulation of these data for research. We aim to create a comprehensive guide to managing research projects involving genomic data, as learned through the evolution of The Cancer Genome Atlas program over the last decade. This project was primarily carried out in the United States, but the impact and lessons learned can be applied to an international audience.
Keywords
Genomic sequencing; data generation; project closure; communications; quality control
As the cost of genomic sequencing is decreasing, more and more researchers are leveraging genomic data to inform the biology of disease. The amount of genomic data generated is growing exponentially, and protocols need to be established for the long-term storage, dissemination, and regulation of these data for research. We aim to create a comprehensive guide to managing research projects involving genomic data, as learned through the evolution of the Cancer Genome Atlas (TCGA) program over the last decade. This project was primarily carried out in the United States, but the impact and lessons learned can be applied to an international audience.
The guide will serve to:
Establish a framework for managing large-scale genomic research projects involving multiple collaborators,
Describe lessons learned through TCGA to prepare for potential roadblocks,
Evaluate policy considerations that are needed to avoid pitfalls, and
Recommend strategies to make project management more efficient.
The guide will cover operational procedures, policy considerations, and lessons learned through TCGA on the following topics:
Sample acquisition
Data generation
Data storage and dissemination
Data analysis
Quality control, auditing, and reporting
Project closure
Communications
Overview of the Cancer Genome Atlas
In 2006, the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) initiated a pilot project to determine the feasibility of comprehensively cataloging the genomic alterations associated with three different human cancers. This initial pilot project demonstrated that cancer-associated genes and genomic regions can be identified by combining diverse genomic information with tumor biology and clinical data, and that the sequencing of selected regions can be conducted efficiently and cost-effectively. In 2009, TCGA expanded to characterizing 33 different types of human cancers including nine rare cancers. The strength of TCGA was producing unprecedented multidimensional data sets using an appropriate number of samples to provide statistically robust results.
Three overarching lessons were learned through TCGA in order to successfully interpret results generated by various genomic characterization platforms. Data generation centers had to (1) utilize high-quality molecular analytes isolated from well-characterized tissue specimens, (2) perform experiments utilizing strictly standardized protocols, and (3) deposit the results in structured and well-described formats. The last lesson strongly impacted on the ability of the various analytical groups to extract meaningful results from the genomic data generated.
The unique aspect of TCGA was the development and function of an integrated research network. The intent of TCGA was to conduct a coordinated, comprehensive, genome-wide analysis of cancer-relevant alterations by simultaneously applying several technologies to interrogate the genome, epigenome, and transcriptome in large collections of quality-controlled cancer biospecimens derived from specific cancer types. To accomplish this goal, TCGA included multidisciplinary teams of investigators and associated institutions that collectively provided biological data, as well as informed strategies for data analysis through the development of bioinformatics tools. The progress in understanding some cancer-associated molecular alterations and the accompanying advances in technology suggested that it was possible to obtain comprehensive genomic information from multiple tumor types to catalog most, if not all, of the genomic changes associated with cancer. The TCGA Research Network demonstrated that a coordinated pipeline approach for the investigation of cancer is the best way to avoid biases in the data sets, thus allowing for interoperability of the different cancer type–specific analysis projects.
TCGA followed a coordinated pipeline to receive tissues accrued, process analytes, generate and analyze data, and present the results to the community. Components of this pipeline were:
Biospecimen Core Resource (BCR): The BCR served as the tissue processing center and provided the analytes for Genome Characterization Centers (GCCs) and Genome Sequencing Centers (GSCs). Standard operating procedures were used for clinical data collection, sample collection, pathological examination, analyte (eg, DNA and RNA) extractions, quality control, laboratory data collection, and analyte distribution to the GCCs and GSCs. The samples were required to have patient-informed consent for the public release of data or an IRB waiver.
Genome Characterization Centers (GCCs) and Genome Sequencing Centers (GSCs): The GCCs and GSCs produced high-quality genomic, transcriptomic, proteomic, and epigenomic data using validated technologies (eg, DNA and RNA sequencing, methylation arrays, etc.) to reveal the spectrum of alterations that exist in human tumors.
Genome Data Analysis Centers (GDACs): The GDACs worked hand-in-hand with the GCCs and GSCs to perform higher level
analyses of the data produced by the GCCs and GSCs and to develop state-of-the-art tools that assist researchers with processing and integrating data analyses across the entire genome. These analyses took the form of both fully automated pipelines, as well as ad hoc analyses performed at the request of the Analysis Working Groups (AWG) from each project.
Data Coordination Center (DCC) and Cancer Genomics Hub (CGHub): Data generated by GCCs and GSCs were deposited into central repositories—the DCC and CGHub, as soon as they were validated, in general within a few weeks of generation. CGHub handled raw sequence data generated from the GCCs and GSCs, while the DCC handled higher level interpreted data. Data submitted to each repository followed rigorously developed data standards