Ebook699 pages6 hours

Molecular Data Analysis Using R

Name: Molecular Data Analysis Using R
Author: Csaba Ortutay
ISBN: 9781119165040

By Csaba Ortutay and Zsuzsanna Ortutay

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book addresses the difficulties experienced by wet lab researchers with the statistical analysis of molecular biology related data. The authors explain how to use R and Bioconductor for the analysis of experimental data in the field of molecular biology. The content is based upon two university courses for bioinformatics and experimental biology students (Biological Data Analysis with R and High-throughput Data Analysis with R). The material is divided into chapters based upon the experimental methods used in the laboratories.

Key features include:
• Broad appeal--the authors target their material to researchers in several levels, ensuring that the basics are always covered.
• First book to explain how to use R and Bioconductor for the analysis of several types of experimental data in the field of molecular biology.
• Focuses on R and Bioconductor, which are widely used for data analysis. One great benefit of R and Bioconductor is that there is a vast user community and very active discussion in place, in addition to the practice of sharing codes. Further, R is the platform for implementing new analysis approaches, therefore novel methods are available early for R users.

Skip carousel

Medical

LanguageEnglish

PublisherWiley

Release dateDec 29, 2016

ISBN9781119165040

Author

Csaba Ortutay

Related authors

Skip carousel

Related to Molecular Data Analysis Using R

Related ebooks

Skip carousel

Integration of Omics Approaches and Systems Biology for Clinical Applications
Ebook
Integration of Omics Approaches and Systems Biology for Clinical Applications
byAntonia Vlahou
Rating: 0 out of 5 stars
0 ratings
Knowledge-Based Bioinformatics: From Analysis to Interpretation
Ebook
Knowledge-Based Bioinformatics: From Analysis to Interpretation
byGil Alterovitz
Rating: 0 out of 5 stars
0 ratings
Emerging Trends in Applications and Infrastructures for Computational Biology, Bioinformatics, and Systems Biology: Systems and Applications
Ebook
Emerging Trends in Applications and Infrastructures for Computational Biology, Bioinformatics, and Systems Biology: Systems and Applications
byHamid R Arabnia
Rating: 0 out of 5 stars
0 ratings
Introduction to Bioinformatics Using Action Labs
Ebook
Introduction to Bioinformatics Using Action Labs
byJean-Louis Lassez
Rating: 0 out of 5 stars
0 ratings
Introduction to Population Pharmacokinetic / Pharmacodynamic Analysis with Nonlinear Mixed Effects Models
Ebook
Introduction to Population Pharmacokinetic / Pharmacodynamic Analysis with Nonlinear Mixed Effects Models
byJoel S. Owen
Rating: 0 out of 5 stars
0 ratings
Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology: Algorithms and Software Tools
Ebook
Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology: Algorithms and Software Tools
byHamid R Arabnia
Rating: 5 out of 5 stars
5/5
Mastering Scientific Computing with R
Ebook
Mastering Scientific Computing with R
byPaul Gerrard
Rating: 3 out of 5 stars
3/5
Computational Systems Biology: From Molecular Mechanisms to Disease
Ebook
Computational Systems Biology: From Molecular Mechanisms to Disease
byAndres Kriete
Rating: 5 out of 5 stars
5/5
Frontiers in Drug Design & Discovery: Volume 10
Ebook
Frontiers in Drug Design & Discovery: Volume 10
byPublishDrive
Rating: 0 out of 5 stars
0 ratings
Biostatistics Decoded
Ebook
Biostatistics Decoded
byA. Gouveia Oliveira
Rating: 0 out of 5 stars
0 ratings
Machine Learning in Bioinformatics
Ebook
Machine Learning in Bioinformatics
byYanqing Zhang
Rating: 0 out of 5 stars
0 ratings
Data Analysis and Visualization in Genomics and Proteomics
Ebook
Data Analysis and Visualization in Genomics and Proteomics
byFrancisco Azuaje
Rating: 0 out of 5 stars
0 ratings
Oncoviruses: Cellular and Molecular Virology
Ebook
Oncoviruses: Cellular and Molecular Virology
byAbubakar Yaro
Rating: 0 out of 5 stars
0 ratings
AACR 2016: Abstracts 1-2696
Ebook
AACR 2016: Abstracts 1-2696
byAmerican Association for Cancer Research (AACR)
Rating: 0 out of 5 stars
0 ratings
Patient Registry Data for Research: A Basic Practical Guide
Ebook
Patient Registry Data for Research: A Basic Practical Guide
byMohamad Adam Bujang
Rating: 0 out of 5 stars
0 ratings
Spatio-temporal Design: Advances in Efficient Data Acquisition
Ebook
Spatio-temporal Design: Advances in Efficient Data Acquisition
byJorge Mateu
Rating: 0 out of 5 stars
0 ratings
Essentials of Medical Genomics
Ebook
Essentials of Medical Genomics
byStuart M. Brown
Rating: 0 out of 5 stars
0 ratings
Statistical Issues in Drug Development
Ebook
Statistical Issues in Drug Development
byStephen S. Senn
Rating: 0 out of 5 stars
0 ratings
Genes in Health and Disease
Ebook
Genes in Health and Disease
byPublishDrive
Rating: 0 out of 5 stars
0 ratings
Drug Delivery Nanosystems for Biomedical Applications
Ebook
Drug Delivery Nanosystems for Biomedical Applications
byChandra P. Sharma
Rating: 0 out of 5 stars
0 ratings
Frontiers in Computational Chemistry: Volume 5
Ebook
Frontiers in Computational Chemistry: Volume 5
byPublishDrive
Rating: 0 out of 5 stars
0 ratings
System Vaccinology: The History, the Translational Challenges and the Future
Ebook
System Vaccinology: The History, the Translational Challenges and the Future
byVijay Kumar Prajapati
Rating: 0 out of 5 stars
0 ratings
Computational Non-coding RNA Biology
Ebook
Computational Non-coding RNA Biology
byYun Zheng
Rating: 0 out of 5 stars
0 ratings
Scientific writing and publishing simply explained: How to write and publish a scientific paper
Ebook
Scientific writing and publishing simply explained: How to write and publish a scientific paper
byPatricia Sommer
Rating: 0 out of 5 stars
0 ratings
Translational Medicine: Tools And Techniques
Ebook
Translational Medicine: Tools And Techniques
byAamir Shahzad
Rating: 0 out of 5 stars
0 ratings
Concepts and Techniques in Genomics and Proteomics
Ebook
Concepts and Techniques in Genomics and Proteomics
byN Saraswathy
Rating: 0 out of 5 stars
0 ratings
Emery and Rimoin’s Principles and Practice of Medical Genetics and Genomics: Hematologic, Renal, and Immunologic Disorders
Ebook
Emery and Rimoin’s Principles and Practice of Medical Genetics and Genomics: Hematologic, Renal, and Immunologic Disorders
byReed E. Pyeritz
Rating: 0 out of 5 stars
0 ratings
Frontiers in Clinical Drug Research - Anti Infectives: Volume 4
Ebook
Frontiers in Clinical Drug Research - Anti Infectives: Volume 4
byPublishDrive
Rating: 0 out of 5 stars
0 ratings
Bacterial Cell Wall
Ebook
Bacterial Cell Wall
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
Biosimilars of Monoclonal Antibodies: A Practical Guide to Manufacturing, Preclinical, and Clinical Development
Ebook
Biosimilars of Monoclonal Antibodies: A Practical Guide to Manufacturing, Preclinical, and Clinical Development
byCheng Liu
Rating: 0 out of 5 stars
0 ratings

Medical For You

Skip carousel

The Lost Book of Simple Herbal Remedies: Discover over 100 herbal Medicine for all kinds of Ailment Inspired By Barbara O'Neill
Ebook
The Lost Book of Simple Herbal Remedies: Discover over 100 herbal Medicine for all kinds of Ailment Inspired By Barbara O'Neill
byBlossom Davis
Rating: 0 out of 5 stars
0 ratings
Native American Herbalist's Bible - 10 Books in 1: Create Your Green Paradise of Medicinal Plants and Herbal Remedies to Unleash Your Vitality: Herbal Apotecary Collection
Ebook
Native American Herbalist's Bible - 10 Books in 1: Create Your Green Paradise of Medicinal Plants and Herbal Remedies to Unleash Your Vitality: Herbal Apotecary Collection
byLomasi Ahusaka
Rating: 5 out of 5 stars
5/5
The 40 Day Dopamine Fast
Ebook
The 40 Day Dopamine Fast
byGreg Kamphuis
Rating: 4 out of 5 stars
4/5
The Vagina Bible: The Vulva and the Vagina: Separating the Myth from the Medicine
Ebook
The Vagina Bible: The Vulva and the Vagina: Separating the Myth from the Medicine
byDr. Jen Gunter
Rating: 5 out of 5 stars
5/5
Holistic Herbal: A Safe and Practical Guide to Making and Using Herbal Remedies
Ebook
Holistic Herbal: A Safe and Practical Guide to Making and Using Herbal Remedies
byDavid Hoffmann
Rating: 4 out of 5 stars
4/5
Mediterranean Diet Meal Prep Cookbook: Easy And Healthy Recipes You Can Meal Prep For The Week
Ebook
Mediterranean Diet Meal Prep Cookbook: Easy And Healthy Recipes You Can Meal Prep For The Week
byLisa Rainolds
Rating: 5 out of 5 stars
5/5
Rewire Your Brain: Think Your Way to a Better Life
Ebook
Rewire Your Brain: Think Your Way to a Better Life
byJohn B. Arden
Rating: 4 out of 5 stars
4/5
The Hormone Reset Diet: Heal Your Metabolism to Lose Up to 15 Pounds in 21 Days
Ebook
The Hormone Reset Diet: Heal Your Metabolism to Lose Up to 15 Pounds in 21 Days
bySara Szal Gottfried M.D.
Rating: 4 out of 5 stars
4/5
Period Power: Harness Your Hormones and Get Your Cycle Working For You
Ebook
Period Power: Harness Your Hormones and Get Your Cycle Working For You
byMaisie Hill
Rating: 4 out of 5 stars
4/5
Adult ADHD: How to Succeed as a Hunter in a Farmer's World
Ebook
Adult ADHD: How to Succeed as a Hunter in a Farmer's World
byThom Hartmann
Rating: 4 out of 5 stars
4/5
Tight Hip Twisted Core: The Key To Unresolved Pain
Ebook
Tight Hip Twisted Core: The Key To Unresolved Pain
byChristine Koth
Rating: 4 out of 5 stars
4/5
The Emotion Code: How to Release Your Trapped Emotions for Abundant Health, Love, and Happiness (Updated and Expanded Edition)
Ebook
The Emotion Code: How to Release Your Trapped Emotions for Abundant Health, Love, and Happiness (Updated and Expanded Edition)
byDr. Bradley Nelson
Rating: 4 out of 5 stars
4/5
The Obesity Code: Unlocking the Secrets of Weight Loss (Why Intermittent Fasting Is the Key to Controlling Your Weight)
Ebook
The Obesity Code: Unlocking the Secrets of Weight Loss (Why Intermittent Fasting Is the Key to Controlling Your Weight)
byDr. Jason Fung
Rating: 4 out of 5 stars
4/5
What Happened to You?: Conversations on Trauma, Resilience, and Healing
Ebook
What Happened to You?: Conversations on Trauma, Resilience, and Healing
byOprah Winfrey
Rating: 4 out of 5 stars
4/5
ATOMIC HABITS:: How to Disagree With Your Brain so You Can Break Bad Habits and End Negative Thinking
Ebook
ATOMIC HABITS:: How to Disagree With Your Brain so You Can Break Bad Habits and End Negative Thinking
byDr. Dan Builfford
Rating: 5 out of 5 stars
5/5
The Abandonment Recovery Workbook: Guidance through the Five Stages of Healing from Abandonment, Heartbreak, and Loss
Ebook
The Abandonment Recovery Workbook: Guidance through the Five Stages of Healing from Abandonment, Heartbreak, and Loss
bySusan Anderson
Rating: 4 out of 5 stars
4/5
The Diabetes Code: Prevent and Reverse Type 2 Diabetes Naturally
Ebook
The Diabetes Code: Prevent and Reverse Type 2 Diabetes Naturally
byDr. Jason Fung
Rating: 4 out of 5 stars
4/5
Herbal Healing for Women
Ebook
Herbal Healing for Women
byRosemary Gladstar
Rating: 4 out of 5 stars
4/5
Gut: The Inside Story of Our Body's Most Underrated Organ (Revised Edition)
Ebook
Gut: The Inside Story of Our Body's Most Underrated Organ (Revised Edition)
byGiulia Enders
Rating: 4 out of 5 stars
4/5
Taking Charge of Your Fertility: The Definitive Guide to Natural Birth Control, Pregnancy Achievement, and Reproductive Health
Ebook
Taking Charge of Your Fertility: The Definitive Guide to Natural Birth Control, Pregnancy Achievement, and Reproductive Health
byToni Weschler
Rating: 4 out of 5 stars
4/5
The Amazing Liver and Gallbladder Flush
Ebook
The Amazing Liver and Gallbladder Flush
byAndreas Moritz
Rating: 5 out of 5 stars
5/5
Woman: An Intimate Geography
Ebook
Woman: An Intimate Geography
byNatalie Angier
Rating: 4 out of 5 stars
4/5
Sick to Fit: 3 Simple Techniques that Got Me from 420 Pounds to the Cover of Runner's World, Good Morning America, and The Today Show
Ebook
Sick to Fit: 3 Simple Techniques that Got Me from 420 Pounds to the Cover of Runner's World, Good Morning America, and The Today Show
byJosh LaJaunie, Jr
Rating: 5 out of 5 stars
5/5
Living Daily With Adult ADD or ADHD: 365 Tips o the Day
Ebook
Living Daily With Adult ADD or ADHD: 365 Tips o the Day
byDouglas A Puryear MD
Rating: 5 out of 5 stars
5/5
The Art of Dying Well: A Practical Guide to a Good End of Life
Ebook
The Art of Dying Well: A Practical Guide to a Good End of Life
byKaty Butler
Rating: 4 out of 5 stars
4/5
Women With Attention Deficit Disorder: Embrace Your Differences and Transform Your Life
Ebook
Women With Attention Deficit Disorder: Embrace Your Differences and Transform Your Life
bySari Solden
Rating: 5 out of 5 stars
5/5
Working Stiff: Two Years, 262 Bodies, and the Making of a Medical Examiner
Ebook
Working Stiff: Two Years, 262 Bodies, and the Making of a Medical Examiner
byJudy Melinek
Rating: 4 out of 5 stars
4/5
Healthy Gut, Healthy You: The Personalized Plan to Transform Your Health from the Inside Out
Ebook
Healthy Gut, Healthy You: The Personalized Plan to Transform Your Health from the Inside Out
byMichael Ruscio
Rating: 4 out of 5 stars
4/5
The Song of the Cell: An Exploration of Medicine and the New Human
Ebook
The Song of the Cell: An Exploration of Medicine and the New Human
bySiddhartha Mukherjee
Rating: 4 out of 5 stars
4/5
The Butchering Art: Joseph Lister's Quest to Transform the Grisly World of Victorian Medicine
Ebook
The Butchering Art: Joseph Lister's Quest to Transform the Grisly World of Victorian Medicine
byLindsey Fitzharris
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Management of Heart Failure in 2019-2020, Part 1: JAMA Deputy Editor Ed Livingston, MD, interviews Akshay Desai, MD, at the European Society of Cardiology's 2019 conference in Paris, France.
Podcast episode
Management of Heart Failure in 2019-2020, Part 1: JAMA Deputy Editor Ed Livingston, MD, interviews Akshay Desai, MD, at the European Society of Cardiology's 2019 conference in Paris, France.
byJAMA Clinical Reviews
100%
100% found this document useful
NEJM Interview: Dr. Jerry Avorn on the evolving approach to drug safety in the United States
Podcast episode
NEJM Interview: Dr. Jerry Avorn on the evolving approach to drug safety in the United States
byNEJM Interviews
0 ratings
0% found this document useful
Proteomics and Deep Learning with Melih Yilmaz
Podcast episode
Proteomics and Deep Learning with Melih Yilmaz
byAxial Podcast
0 ratings
0% found this document useful
Tzachi Pilpel's imaginary conversations with friends
Podcast episode
Tzachi Pilpel's imaginary conversations with friends
byNight Science
0 ratings
0% found this document useful
Yoshua Bengio on Pausing More Powerful AI Models and His Work on World Models: Yoshua Bengio, one of the founders of deep learning, talks about the famous pause letter of which he was perhaps the most prominent signer. And discusses his work on world models and inference machines that would give AI systems the ability to reason...
Podcast episode
Yoshua Bengio on Pausing More Powerful AI Models and His Work on World Models: Yoshua Bengio, one of the founders of deep learning, talks about the famous pause letter of which he was perhaps the most prominent signer. And discusses his work on world models and inference machines that would give AI systems the ability to reason...
byEye On A.I.
0 ratings
0% found this document useful
Complexity in Early Phase Clinical Trials with Dr. Oren Cohen: When a new drug or device undergoes “first in human” experience, the primary focus is on participant safety. How will the drug or device interact with the human body? And will this interaction be safe? To answer these two questions,...
Podcast episode
Complexity in Early Phase Clinical Trials with Dr. Oren Cohen: When a new drug or device undergoes “first in human” experience, the primary focus is on participant safety. How will the drug or device interact with the human body? And will this interaction be safe? To answer these two questions,...
byClinical Trial Podcast | Conversations with Clinical Research Experts
0 ratings
0% found this document useful
Barriers to clinical trial enrollment for patients with gynecologic cancers: Why patients don’t participate and how to improve enrollment: The greatest barrier to clinical trial enrollment is patients not knowing an appropriate trial exists, according to a survey of gynecologic cancer survivors. The most common reason survey respondents gave for not enrolling in clinical trials was that...
Podcast episode
Barriers to clinical trial enrollment for patients with gynecologic cancers: Why patients don’t participate and how to improve enrollment: The greatest barrier to clinical trial enrollment is patients not knowing an appropriate trial exists, according to a survey of gynecologic cancer survivors. The most common reason survey respondents gave for not enrolling in clinical trials was that...
byBlood & Cancer
0 ratings
0% found this document useful
The New Age of CRISPR: CRISPR has emerged as a powerful tool for altering DNA sequences with incredible precision, opening up new avenues of research into the treatment of disease. In this episode, we explore the science behind CRISPR, as well as its potential. From curing genetic disorders to creating new crop varieties, the possibilities seem endless. Our four guests today are scientists working to push these gene editing tools to the next frontier.
Podcast episode
The New Age of CRISPR: CRISPR has emerged as a powerful tool for altering DNA sequences with incredible precision, opening up new avenues of research into the treatment of disease. In this episode, we explore the science behind CRISPR, as well as its potential. From curing genetic disorders to creating new crop varieties, the possibilities seem endless. Our four guests today are scientists working to push these gene editing tools to the next frontier.
byI AM BIO
0 ratings
0% found this document useful
Addressing the Challenge of Making Antibody-Drug Conjugates: Antibody-drug conjugates marry the precise targeting of an antibody to a cytotoxic payload. That has the potential to provide a powerful treatment approach to a variety of cancers with less toxicity than systemically delivered chemotherapy. The probl...
Podcast episode
Addressing the Challenge of Making Antibody-Drug Conjugates: Antibody-drug conjugates marry the precise targeting of an antibody to a cytotoxic payload. That has the potential to provide a powerful treatment approach to a variety of cancers with less toxicity than systemically delivered chemotherapy. The probl...
byThe Bio Report
0 ratings
0% found this document useful
Becoming a Genomics Data Scientist (How to Data with Georgia Whitton - Ep 4)
Podcast episode
Becoming a Genomics Data Scientist (How to Data with Georgia Whitton - Ep 4)
byHow to Data (Joshiverse- Journey of a Budding Data Scientist)
100%
100% found this document useful
My MTHFR Protocol: I have a free gift for you. That is my MTHFR Protocol, a 7-page quick guide to optimizing and personalizing your methylation status using foods and supplements. My MTHFR Protocol is free to anyone who subscribes to my Substack, whether free or paid....
Podcast episode
My MTHFR Protocol: I have a free gift for you. That is my MTHFR Protocol, a 7-page quick guide to optimizing and personalizing your methylation status using foods and supplements. My MTHFR Protocol is free to anyone who subscribes to my Substack, whether free or paid....
byMastering Nutrition
0 ratings
0% found this document useful
S5.09 Nature, Nurture and 'The Wobble': The power of randomness
Podcast episode
S5.09 Nature, Nurture and 'The Wobble': The power of randomness
byGenetics Unzipped
0 ratings
0% found this document useful
The hard truth about male reproductive anatomy: In this episode, we covered high-yield aspects of male reproductive anatomy and embryology for the USMLE and COMLEX. Get ready to learn about topics like the urogenital ridge, the processus vaginalis, varicocele, hydrocele, testicular torsion, epididymit...
Podcast episode
The hard truth about male reproductive anatomy: In this episode, we covered high-yield aspects of male reproductive anatomy and embryology for the USMLE and COMLEX. Get ready to learn about topics like the urogenital ridge, the processus vaginalis, varicocele, hydrocele, testicular torsion, epididymit...
byPhysiology by Physeo (An InsideTheBoards Podcast)
0 ratings
0% found this document useful
Episode 35: DNA Structure and Function Part 2: Continuing on from episode 34, I discuss in detail the processes of DNA replication, transcription from DNA to RNA, and the translation of RNA to proteins. In doing so I examine the molecules and structures involved, the mechanisms of their operation, an...
Podcast episode
Episode 35: DNA Structure and Function Part 2: Continuing on from episode 34, I discuss in detail the processes of DNA replication, transcription from DNA to RNA, and the translation of RNA to proteins. In doing so I examine the molecules and structures involved, the mechanisms of their operation, an...
byThe Science of Everything Podcast
0 ratings
0% found this document useful
#312: Mapping Genetics and Race with Dr. Janina Jeff (Replay): Dr. Jeff, a Senior Bioinformatics scientist at Illumina and host of the podcast ‘In Those Genes’, joins us today for an insightful conversation around genetics and race, helping us to understand that race is, indeed, a social construct.
Podcast episode
#312: Mapping Genetics and Race with Dr. Janina Jeff (Replay): Dr. Jeff, a Senior Bioinformatics scientist at Illumina and host of the podcast ‘In Those Genes’, joins us today for an insightful conversation around genetics and race, helping us to understand that race is, indeed, a social construct.
by15 Minute Matrix
0 ratings
0% found this document useful
TWiM #164: Indiana Quorum: Horizontal gene transfer, quorum sensing, and chromosome organization in bacteria.
Podcast episode
TWiM #164: Indiana Quorum: Horizontal gene transfer, quorum sensing, and chromosome organization in bacteria.
byThis Week in Microbiology
100%
100% found this document useful
Gene therapies in hemophilia with Dr. Glenn Pierce: A “very basic” type of gene therapy could potentially cure hemophilia, but a major hurdle has been the lack of an effective mode of delivery. Recent strides in using adeno-associated virus (AAV) vectors are changing that, and Glenn Pierce, MD,...
Podcast episode
Gene therapies in hemophilia with Dr. Glenn Pierce: A “very basic” type of gene therapy could potentially cure hemophilia, but a major hurdle has been the lack of an effective mode of delivery. Recent strides in using adeno-associated virus (AAV) vectors are changing that, and Glenn Pierce, MD,...
byBlood & Cancer
100%
100% found this document useful
Episode 112: Introduction to Microbiology: An overview of the field of microbiology, beginning with a brief history of the discipline, and then proceeding through a summary of the structure and function of various microbial life forms, including protists, yeasts, bacteria, archaea, viruses, and p...
Podcast episode
Episode 112: Introduction to Microbiology: An overview of the field of microbiology, beginning with a brief history of the discipline, and then proceeding through a summary of the structure and function of various microbial life forms, including protists, yeasts, bacteria, archaea, viruses, and p...
byThe Science of Everything Podcast
0 ratings
0% found this document useful
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
Podcast episode
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
byNew Books in Sociology
0 ratings
0% found this document useful
Genetic Gateways – Kenneth Weiss, Penn State Department of Biology – The Quest for Answers About Genetic Variation: As an interested scientific mind, Kenneth Weiss has been fascinated by biology and genetic research through his long career in science. In this podcast, the Evan Pugh Professor Emeritus of Penn State’s biology department discusses genetics and...
Podcast episode
Genetic Gateways – Kenneth Weiss, Penn State Department of Biology – The Quest for Answers About Genetic Variation: As an interested scientific mind, Kenneth Weiss has been fascinated by biology and genetic research through his long career in science. In this podcast, the Evan Pugh Professor Emeritus of Penn State’s biology department discusses genetics and...
byFinding Genius Podcast
0 ratings
0% found this document useful
Episode 226: Patient Education for Next-Generation Sequencing to Guide Cancer Therapy: “Nurses can bridge the information gap and help patients better understand that the information received from next-generation sequencing (NGS) can really help to determine which treatment they will respond best to, if there are therapies that...
Podcast episode
Episode 226: Patient Education for Next-Generation Sequencing to Guide Cancer Therapy: “Nurses can bridge the information gap and help patients better understand that the information received from next-generation sequencing (NGS) can really help to determine which treatment they will respond best to, if there are therapies that...
byThe Oncology Nursing Podcast
0 ratings
0% found this document useful
604: Not Wasting Any Time in Search of Genetic Treatments for Muscular Dystrophy - Dr. Kay Davies: Professor Dame Kay Davies is the Dr. Lee's Professor of Anatomy in the Department of Physiology, Anatomy and Genetics and Director of the MRC Functional Genomics Unit at the University of Oxford. She is also the Honorary Director of the MRC Functional...
Podcast episode
604: Not Wasting Any Time in Search of Genetic Treatments for Muscular Dystrophy - Dr. Kay Davies: Professor Dame Kay Davies is the Dr. Lee's Professor of Anatomy in the Department of Physiology, Anatomy and Genetics and Director of the MRC Functional Genomics Unit at the University of Oxford. She is also the Honorary Director of the MRC Functional...
byPeople Behind the Science Podcast - Stories from Scientists about Science, Life, Research, and Science Careers
0 ratings
0% found this document useful
Bioenergetics
Podcast episode
Bioenergetics
byMedical Biochemistry
0 ratings
0% found this document useful
076: Evolution of bacterial biofilm populations with Vaughn Cooper: Most bacteria live a sedentary lifestyle in community structures called biofilms. Vaughn Cooper tells us what bacterial biofilms are, why biofilms differ from test tube environments, and how long-term evolution experiments combined with population...
Podcast episode
076: Evolution of bacterial biofilm populations with Vaughn Cooper: Most bacteria live a sedentary lifestyle in community structures called biofilms. Vaughn Cooper tells us what bacterial biofilms are, why biofilms differ from test tube environments, and how long-term evolution experiments combined with population...
byMeet the Microbiologist
0 ratings
0% found this document useful
Morgan Barense (memory and perception): Dr. Morgan Barense joins us to talk about the close relationship between perception and memory. In particular, the hippocampus — a brain structure most often associated with memory — is also involved in high-level vision. Also discussed: the formative impact of undergraduate research experience and mentors, different kinds of perception and memory, and an app for your phone (Hippocamera) that might help you remember things better.
Podcast episode
Morgan Barense (memory and perception): Dr. Morgan Barense joins us to talk about the close relationship between perception and memory. In particular, the hippocampus — a brain structure most often associated with memory — is also involved in high-level vision. Also discussed: the formative impact of undergraduate research experience and mentors, different kinds of perception and memory, and an app for your phone (Hippocamera) that might help you remember things better.
byThe Brain Made Plain
0 ratings
0% found this document useful
Pearls For Utilizing Sarecycline in Your Practice
Podcast episode
Pearls For Utilizing Sarecycline in Your Practice
byThe Dermatology Digest Podcast Exclusives
0 ratings
0% found this document useful
Episode 52. Neurological Disorders: Sara Manning Peskin: The brain is the most mysterious and complex organ of the body, and when things go awry, we may be confronted with personal tragedy and we may gain insights on what it means to be human. With us to discuss neurological disorders and the history of...
Podcast episode
Episode 52. Neurological Disorders: Sara Manning Peskin: The brain is the most mysterious and complex organ of the body, and when things go awry, we may be confronted with personal tragedy and we may gain insights on what it means to be human. With us to discuss neurological disorders and the history of...
byScience History Podcast
0 ratings
0% found this document useful
11: Which MCAT Materials are the Best for Me?: Session 11 Your questions, answered here on the OldPreMeds Podcast. Ryan and Rich again dive into the forums over at OldPreMeds.org where they pull a question and deliver the answers right on to you. Today, they discuss about the best resource...
Podcast episode
11: Which MCAT Materials are the Best for Me?: Session 11 Your questions, answered here on the OldPreMeds Podcast. Ryan and Rich again dive into the forums over at OldPreMeds.org where they pull a question and deliver the answers right on to you. Today, they discuss about the best resource...
byOldPreMeds Podcast
0 ratings
0% found this document useful
REPEAT- Next Generation Sequencing of Infectious Pathogens in Public Health and Clinical Practice: Next-generation sequencing is a catchall term for new, high-throughput technologies that allow rapid sequencing of a full genome. It can be used to sequence a patient’s DNA in diagnosing a genetic disorder or characterizing a cancer, but it can also...
Podcast episode
REPEAT- Next Generation Sequencing of Infectious Pathogens in Public Health and Clinical Practice: Next-generation sequencing is a catchall term for new, high-throughput technologies that allow rapid sequencing of a full genome. It can be used to sequence a patient’s DNA in diagnosing a genetic disorder or characterizing a cancer, but it can also...
byJAMA Clinical Reviews
100%
100% found this document useful
Episode 238: Cancer Genomics for Every Oncology Nurse: “Genomics is part and parcel of oncology treatment today. Even if a patient’s genomics might not affect the current choice of therapy, it may do so in the future. The use of genomics and biomarkers is just an evidence-based expansion and extension...
Podcast episode
Episode 238: Cancer Genomics for Every Oncology Nurse: “Genomics is part and parcel of oncology treatment today. Even if a patient’s genomics might not affect the current choice of therapy, it may do so in the future. The use of genomics and biomarkers is just an evidence-based expansion and extension...
byThe Oncology Nursing Podcast
0 ratings
0% found this document useful

Skip carousel

2 Common Plant Extracts Shield Cells From COVID
Futurity
Article
2 Common Plant Extracts Shield Cells From COVID
Feb 13, 2023
4 min read
What Defines a Stem Cell?
Nautilus
Article
What Defines a Stem Cell?
Dec 27, 2018
For the past three years, researchers at the Hubrecht Institute in the Netherlands have been painstakingly cataloging and mapping all the proliferating cells found in mouse hearts, looking for cardiac stem cells. The elusive cells should theoreticall
5 min read
You Had Questions For David Liu About CRISPR, Prime Editing, And Advice To Young Scientists. He Has Answers
STAT
Article
You Had Questions For David Liu About CRISPR, Prime Editing, And Advice To Young Scientists. He Has Answers
Nov 6, 2019
You had questions for David Liu about CRISPR, prime editing, and advice to young scientists. He has answers.
17 min read
Nanoparticles Let Cancer ‘Leak’ From Blood Vessels. Is This A Fix?
Futurity
Article
Nanoparticles Let Cancer ‘Leak’ From Blood Vessels. Is This A Fix?
Aug 27, 2019
1 min read
With the Nobel Prizes Around the Corner, It’s Crystal Ball Time
STAT
Article
With the Nobel Prizes Around the Corner, It’s Crystal Ball Time
Sep 25, 2017
The scientists who pioneered immunotherapy and other cancer advances are among the favorites for this year's Nobel Prizes, which will be announced the first week of October.
4 min read
Supercool Protein Imaging Gets the Nobel Prize
Quanta
Article
Supercool Protein Imaging Gets the Nobel Prize
Oct 4, 2017
3 min read
A Soil Biocontrol Agent, Biofertiliser And Plant Growth Promoter In One
Farmer's Weekly
Article
A Soil Biocontrol Agent, Biofertiliser And Plant Growth Promoter In One
Jun 17, 2022
4 min read
Cracking The Caffeine Code
Simply Woman & Home
Article
Cracking The Caffeine Code
Feb 27, 2020
3 min read
Personalised Medicine: More Than Just Personal
AQ: Australian Quarterly
Article
Personalised Medicine: More Than Just Personal
Mar 31, 2017
9 min read
Antibiotic-Resistant Infections Spiking in Children
Newsweek
Article
Antibiotic-Resistant Infections Spiking in Children
Mar 24, 2017
3 min read
Microbes Rule Your Health — and Further Prove That Kids Should Eat Dirt
Chicago Tribune
Article
Microbes Rule Your Health — and Further Prove That Kids Should Eat Dirt
Oct 18, 2017
4 min read
Genetic Drift: Misconceptions and Realities
Cannabis & Tech Today
Article
Genetic Drift: Misconceptions and Realities
Apr 1, 2019
8 min read
It’s the End of the Gene As We Know It
Nautilus
Article
It’s the End of the Gene As We Know It
Jan 3, 2019
We’ve all seen the stark headlines: “Being Rich and Successful Is in Your DNA” (Guardian, July 12); “A New Genetic Test Could Help Determine Children’s Success” (Newsweek, July 10); “Our Fortunetelling Genes” make us (Wall Street Journal, Nov. 16); a
10 min read
Abolish The FDA
Reason
Article
Abolish The FDA
Mar 19, 2021
5 min read
Can Roche’s Little Tech Startup Help The FDA Change Clinical Trials?
STAT
Article
Can Roche’s Little Tech Startup Help The FDA Change Clinical Trials?
Feb 25, 2019
A startup founded by two guys in their 20s turned into a $1.9 billion acquisition — and now is increasingly catching the interest of the FDA.
4 min read
Opinion: Real-world Evidence Is Changing The Way We Study Drug Safety And Effectiveness
STAT
Article
Opinion: Real-world Evidence Is Changing The Way We Study Drug Safety And Effectiveness
Jan 29, 2019
Instead of waiting for fleshed-out protocols for using real-world evidence, pharmaceutical companies should dive in now using a commonsense approach that relies on rigor and reason.
4 min read
Cortisol and Serotonin Linked to Male Liver Cancer
Futurity
Article
Cortisol and Serotonin Linked to Male Liver Cancer
Aug 28, 2017
Two hormone factors, cortisol and serotonin, play critical roles in the development of male-biased liver cancers, biologists report. Liver cancer occurs more frequently in males than females. This suggests that sex hormones might be involved in the d
1 min read
The Genome in Turmoil: Hope your week went well. It might have changed your genetics.
Nautilus
Article
The Genome in Turmoil: Hope your week went well. It might have changed your genetics.
Jun 6, 2013
When President Obama delivered a speech at MIT in 2009, he used a common science metaphor: “We have always been about innovation,” he said. “We have always been about discovery. That’s in our DNA.” Deoxyribonucleic acid, the chemical into which our g
6 min read
Why Synthetic Protein Research Needs More Funding
Nautilus
Article
Why Synthetic Protein Research Needs More Funding
Jun 4, 2017
4 min read
Exosomal Biomarkers for Neurologic Conditions: A New Frontier
Aster Medical Journal (AMJ)
Article
Exosomal Biomarkers for Neurologic Conditions: A New Frontier
Sep 1, 2022
8 min read
Spider Monkey Habits Reveal Why Humans Started Drinking Alcohol
Science Illustrated
Article
Spider Monkey Habits Reveal Why Humans Started Drinking Alcohol
Aug 17, 2022
ANIMALS Countless studies have demonstrated how alcohol intake affects our brains and bodies. Yet alcohol consumption remains a deeply rooted part of human culture – and has been so for thousands of years. But why are we so keen on fermented liquids?
1 min read
Malnutrition Still A Killer Among Children
Weekend Argus Saturday
Article
Malnutrition Still A Killer Among Children
Feb 20, 2021
SEVERE Acute Malnutrition (SAM) remains a serious fundamental cause of child deaths in South Africa, accounting for one quarter of children in-hospital deaths. This was one of the findings in the report of Food and Nutrition Security, South African C
2 min read
Remember, Remember The 2020 November
PC Pro Magazine
Article
Remember, Remember The 2020 November
Jan 7, 2021
World-changing innovations are like London buses: you wait for years and then three come along at once. The recent wait has been particularly irksome, as virology and epidemiology felt like the only relevant sciences in lockdown – apart from rocket s
3 min read
Biology Will Take Some Mistakes to Maintain Speed
Futurity
Article
Biology Will Take Some Mistakes to Maintain Speed
May 8, 2017
When it comes to duplicating DNA, evolution seems to value speed over accuracy, new research suggests. The finding challenges assumptions that perfectly accurate transcription and translation are critical to the success of biological systems. It turn
2 min read
Estimating Relationships from Shared DNA
Family Tree
Article
Estimating Relationships from Shared DNA
Oct 18, 2022
1 min read
Math Cuts Trial And Error In Building Biological Circuits
Futurity
Article
Math Cuts Trial And Error In Building Biological Circuits
Aug 14, 2018
Synthetic biologists have the tools to build complex, computer-like DNA circuits that sense or trigger activities in cells. And thanks to new research, they now have a way to test those circuits in advance. Researchers developed models to predict the
3 min read
Machine Learning – With Zero Programming
APC
Article
Machine Learning – With Zero Programming
Aug 12, 2019
6 min read
Grid Modeling Overview: Four Types of Models Guiding the Transition to Clean Electricity
Union of Concerned Scientists
Article
Grid Modeling Overview: Four Types of Models Guiding the Transition to Clean Electricity
Apr 25, 2022
6 min read
Circuit Programs Human Cells to Add and Subtract
Futurity
Article
Circuit Programs Human Cells to Add and Subtract
Apr 15, 2017
A new platform offers a fast and more efficient way to target and program mammalian cells as genetic circuits, even complex ones. “The problem synthetic biologists are trying to solve is how we ask cells to make decisions and try to design a strategy
2 min read
CRISPR Can Turn Human Cells Into Biocomputers
Futurity
Article
CRISPR Can Turn Human Cells Into Biocomputers
Apr 16, 2019
3 min read

Related categories

Skip carousel

Reviews for Molecular Data Analysis Using R

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Molecular Data Analysis Using R - Csaba Ortutay

CHAPTER 1

Introduction to R statistical environment

Why R?

If you work in the field of biodata analysis, or if you are interested in getting a bioinformatics job, you can see a large number of related job advertisements targeting young professionals. There is one common topic coming back in those ads: they demand a high degree of familiarity with R/Bioconductor. (Here, I am quoting an actual recent ad from Monster.com.)

Besides, when we have to create and analyze a large amount of data during our bio‐researcher career, sooner or later we realize that simple approaches using spread sheets (aka the Excel part of MS Office) are not flexible anymore to fulfill the needs of our projects. In these situations, we start to look for dedicated statistical software tools, and soon we encounter the countless alternatives from which we can choose. The R statistical environment is one among the possibilities.

With the exponential spread of high‐throughput experimental methods, including microarray and next‐generation sequencing (NGS)-based experiments, the skills related to large‐scale analysis of data from biological experiments have higher and higher value. R and Bioconductor offer a free and flexible tool‐set for these types of analyses; therefore, many research groups and companies select it as their data analysis platform.

R is an open‐source software licensed under the GNU General Public License (GPL). This has an advantage that you can install R for free on your desktop computer, regardless of whether you use Windows, Mac OS X, or a Linux distribution.

Introducing all the features of R thoroughly at a general level exceeds the scope and purpose of this book, which is to focus on molecular biology‐specific applications. For those who are interested in a deeper introduction into R itself, it is suggested reading the book R for Beginners by Emmanuel Paradis as a reference guide. It is an excellent general guide, which can be found online (Paradis 2005). In the course, we use more biology‐oriented examples to illustrate the most important topics. The other recommended book for this chapter is R in a Nutshell by Joseph Adler (2012).

Installing R

The first task of analyzing data with R is to install R on the computer. There is a nice discussion on the bioinformatics blogs about why people so seldom use their knowledge acquired on short bioinformatics courses. One of the main considerations points out that it is because the greatest challenge is to install the software in question.

There are plenty of available information on the web about how to install R, but the most authentic source is the website of the R project itself. In this page, the official documentation, installer, and other related links from the developers of R themselves are collected. The first step is to navigate to the download section of the page and find the mirror pages closest to the location of the user.

However, there are some differences in the installation process depending on the operating system of the computer in use. Windows users should find the Windows installer to their system from the download pages. It is useful to check for the base installer, not the contributed libraries. In the case of a Linux distribution, R can be installed via the package manager. Several Linux distributions provide R (and many R libraries) as a part of their repositories. This way, the package manager can take care of the updates. Mac OS X users and Apple fans can find the pkg file containing the R framework, 64‐bit graphical user interface (GUI) (R.app) and Tcl/Tk 8.6.0 X11 libraries for installing the R base systems on their computer. Brave users of other UNIX systems (i.e., FreeBSD or OpenBSD) can use R, but they should compile it from the source. This is not a beginner topic. In the case of a computer owned by a company, university, or library, the installation of R (just like many other programs) requires most often superuser rights.

Interacting with R

The interface of R is somewhat different from other software used for statistics, such as SPSS, S‐plus, Prism, or MS Excel (which is not a statistical software tool!). There are neither icons nor sophisticated menus to perform analyses. Instead, commands should be typed in the appropriate place of R called the command prompt. It is marked with >. In this book, the commands for typing into the prompt are marked by fixed‐width (monospaced) fonts:

> citation()

After typing in a command (and hitting Enter), the results turn up either under the command or, in case of graphics, in a separate window. If the result of a command is nothing, the string NULL appears as a result. Mistyping or making an error in the parameters of a command leads to an error message with some information about what was wrong.

> c() NULL > a * 5 Error: object 'a' not found

From now on, we will omit the > prompt character from the code samples so you can just copy/paste the commands. Leaving R happens with the quit() function.

quit(save='no') q()

Graphical interfaces and integrated development environment (IDE) integration

A command‐line interface is enough for performing the practices. However, some prefer to have GUI. There are multiple choices depending on the operating system in use. The Windows and Mac versions of R starts with a very simple GUI, while Linux/UNIX versions start only with a command‐line interface. The Java GUI for R is available for any platform capable of running Java, and it sports simple, functional menus to perform the most basic tasks related to an analysis (Helbig, Urbanek, and Fellows 2013).

For a more advanced GUI, one can experiment with RStudio or R Commander (Fox 2005). There are several plugins to integrate R into the best coding production tools, such as Emacs (with the Emacs Speaks Statistics add‐on), Eclipse (by StatET for R), and many others.

Scripting and sourcing

Doing data analysis in R means typing in commands and experimenting with parameters suitable for the given set of data. At a later stage, the procedure will be repeated either on the same data with slight modifications in the course of the analysis, or on different data with the same analysis. For example, the analyzed data are submitted to publication, but the manuscript reviewers request slight modifications in the analysis. It means to repeat almost the entire process, but parameter x should be 0.6 instead of 0.5 as used earlier.

Scripts are used to register the steps of an analysis. Scripts are small text files containing the commands of the analysis one after the other, in the same order as are issued during the data processing. Traditionally, we use .R extension (instead of .txt) for these text files to mark that these are R script files. Script files are the solution for

archiving an analysis,

automate tasks that take much time to run.

Script files can easily be included into an analysis flow called sourcing (the term is borrowed from other computer script languages) by issuing the source() command. For example, let's have the following script file my_first_script.R:

a<-rep(5,5) b<-rnorm(5) print(a) print(b) print(a*b)

Scripts can be created using any text editor (i.e., gedit, mcedit, Notepad) but not with a word processor software (i.e., MS Word, LibreOffice Writer, and iWork Pages) unless it is possible to save it as a text file, and not a .doc, .docx, .odt, or any other more complex formats. R should be led to the location where the saved file can be found:

setwd('/home/path/to/your/files') #on Linux/UNIX setwd('/Users/User Name/Documents/FOLDER') #on Mac setwd('c:/path/to/my/directory/') #on Windows

The working directory can be checked using the getwd() command.

Loading the script file in the working directory is simple:

source('my_first_script.R')

If the script is somewhere else, the path is desired:

source('/path/to/my_first_script.R')

The R history and the R environment file

When starting R first time, it creates two files to register what was done: the history and the environment file. If R was started from the command line, these files are saved in the directory where R was started. Launching R by an icon results in saving the history and the environment file to a default place.

The history file is a text file that saves all the commands issued in a session with R, while the environment file holds the data used during the session. It is worth saving these files for further use with the savehistory(file = /path/to/Rhistory) and save.image(file = /path/to/RData) commands. When exiting R by using the q() command, it asks whether you want to save these to the default places. Choosing this option leads to start R from the same directory next time, and it will also remember the past work and data.

Packages and package repositories

Statistics is a huge field, and many disciplines use it for their specific purposes. All of them have different needs, flavors, and data types specifically designed for their needs. It would be meaningless and hopeless to put everything into a single software. Honestly, the majority of the code would never be used as a bioinformatician rarely uses statistics designed for particle physics, likewise a computational chemist rarely reads in data from gene expression microarrays.

To address this problem, R developers have decided to provide only a common framework and some basic functionality as part of the base installation, and subject‐specific elements are organized into code bags called packages. In reality, the base installation of R is not very useful for molecular data analysis. Its most useful part is that suitable packages can be found for most of the often applied analysis types.

R packages are collected into so‐called package repositories on the web. These places are dedicated to the maintenance and distribution of the packages. The concept is probably familiar to Linux users. R uses its own internal code called package management system to find, install, and update packages. There are two important package repositories, which are also used in this book: Comprehensive R Archive Network (CRAN) and Bioconductor.

Comprehensive R Archive Network

(R Core Team 2015) is a place for general‐purpose packages, but many biology‐related packages can be found here too. One can search packages related to the topic of interest (left side of the page, Software/packages/table of available packages, sorted by name) by keyword search. For example, if some biological sequences‐related packages are required, searching (Ctrl+F) for the keyword biological sequence on this page will result in those soon.

Here, we introduce the sequences package (Gatto and Stojnic 2014). Clicking on the name of the package leads to a general information page. The most relevant documents here are the Vignettes (if they are available), providing a quick introduction to the package, and the reference manual that shows an extensive explanation for all the commands and datasets provided by the package.

Installing and managing CRAN packages is best done within R itself. Most GUIs provide some assistance for package management in the Packages menu. It is simple to install packages using the install.packages() command. Picking up the packages and their dependencies requires Internet access. The installation process can take much time if the selected package has many other packages to depend on.

install.packages(sequences)

On Linux install.packages() works properly if it is issued in an R session of the root, or if a library is specified as the package directory to write:

install.packages(sequences, lib=/home/mydir/Rpackages/)

The full list of available packages can be checked using available.packages(). This command lists all the packages compatible with the version and operating system on the computer in use. It often means many thousands of packages.

ap<-available.packages() row.names(ap)

Loading a successfully installed package (e.g., the sequences package in the previous example) is done using the library() command (without quotation marks around the package name this time).

library(sequences)

Bioconductor

There is another R package repository dedicated mostly to the analysis of high‐throughput data from molecular biology, called Bioconductor (Gentleman et al. 2004). It contains more than 1500 packages dedicated to this exciting field of bioinformatics. The packages are divided into three groups:

Software—This section contains the most interesting packages that can assist different kind of analysis. This sub‐repository is roughly analog to CRAN in the sense that the packages here provide the statistical methods and procedures, such as microarray normalization functions or enrichment analysis approaches.

AnnotationData—Here is a collection of very important supporting information concerning genome, microarray platform, and database annotation. These packages are useful mostly as input data for other packages in the Software section.

ExperimentData—Prepared experimental data are available from here for further analysis. It is a good idea to test a new statistics or analysis approach on data from here first. This effort will assure that the code in use is compatible with the rest of Bioconductor's framework.

The packages are listed in a logical and hierarchical system, and it is relatively easy to find relevant packages for a certain type of analysis. For example, if mass spectrometry is in the focus of interest, the relevant packages can be found in the Software ‐> Assay Technologies ‐> Mass Spectrometry branch of the hierarchy; while in case of inferring networks from experimental data, the Software ‐>Bioinformatics ‐> Networks ‐> Network Inference branch should be checked. The vignette and the reference manual appears in a similar way on the dedicated page of the chosen package as it is in CRAN.

There is another, perhaps even more practical, way to find suitable packages from Bioconductor. There are complete recipes for more popular data analysis tasks in the Workflows section of the Bioconductor page Help menu, which not only shows the needed packages but also demonstrates how to use them.

Bioconductor uses its own package management system that works somewhat differently than the stock R system. It is based on a script titled biocLite.R, which can be sourced directly from the Internet:

source("http://bioconductor.org/biocLite.R")

This script contains everything needed for managing Bioconductor packages. For example, to install the affy package (Gautier et al. 2004), the biocLite() command should be called:

source("http://bioconductor.org/biocLite.R) biocLite(affy")

This command processes the dependencies and installs everything in need. The annotation and experimental data packages tend to be huge, so high‐speed Internet (or a lot of patience) and sufficient amount of disk space is needed to install them.

Loading of the installed packages happens in the same way as with CRAN packages:

library(affy)

Working with data

For a data analysis project, well, data are needed. It is a crucial question, how to load data into R, and that is often the second biggest challenge for a newbie bioinformatician. Several R tutorials start to explain this topic by introducing the c(), edit(), and fix() commands. These are commands and functions used to type in numbers and information in a tabular format. Also, these are the commands that are rarely used in a real‐life project. The cause of this is simple: no‐one type in the gene expression values of 40,000 gene probes for a few dozens of samples.

Most often data are loaded from files. Files might come out from databases, from measurement instruments, or from another software. Often data tables are assembled in MS Excel. MS Excel or other spreadsheet software can also export data tables as .csv files, which are easy to load to R. Depending on the operating system in use and the exact installation of R, there are multiple possibilities to read .xls files. The package gdata (Warnes et al. 2015) contains the read.xls() command, which can access the content of both .xls and .xlsx files:

library(gdata) my.data<-read.xls(data_file.xlsx, sheet=1)

This code reads in a table from the first sheet in the .xlsx file into the my.data data frame. This is an excellent tool, but it requires the installation of Perl (a scripting language) on the computer. In Linux/Unix installations it is not a problem, but in Windows environments, it is not easy to solve. A universal solution of this problem is to read data from exported .csv files. This approach works for all platforms, and it does not require the installation of additional packages:

my.data<-read.csv(fdata_file.csv,sep=\t,row.names=1)

The first step is to prepare a data table using Excel or another spreadsheet software. The data are then exported to a .csv file called tabular text file or comma‐separated text file in different software. It is important to specify the usage of a tab as a field separator (sep=\t) in the settings instead of a comma that is usually the default field separator for .csv files.

To handle the problem of transferring data from MS Excel to R, a new package called readxl has been released recently (Wickham 2015). The ultimate goal of this package is to provide accessibility to data saved in Excel files without further dependencies, and in an operating system‐independent way.

There are many proprietary file formats coming out from different instruments. There are dedicated packages developed to read their content and load it in proper data structures in R for further analysis. For example, the ReadAffy() command from the affy package is designed to import Affymetrix GeneChip CEL files. Similarly, the read.fasta() command of seqinr package (Charif and Lobry 2007) or the readFASTA() command of Biostrings package (Pages et al. 2015) can import FASTA formatted sequence files.

This book has a dedicated support webpage. Here, all the R scripts and data are available to do all the practices discussed in the following chapters. As bonus material, the scripts used for generating the figures on these pages are also available from the same place.

Save the file furin_data.csv from the webpage of the book, and open it with a text editor. There are rows and columns of the data in the file. The first step for now is to set the exact path to the location of the file in the furin.file variable, and use the read.csv() command to read its command to the my.data variable. Checking the structure of the data happens by using the str() command.

furin.file<- '/path/to/your/file/furin_data.csv' my.data<-read.csv(furin.file,sep=\t) str(my.data)

Basic operations in R

All scripting languages provide simple ways to perform basic computational operations on data. R is not different in that sense. Certainly, the most basic things like arithmetic operations work as expected. For example, adding and multiplying with numbers the same way as in math classes:

4 + 7 6 * 2

Of course, R is not the most suitable choice if only a calculator is needed. R is used to store numbers, information, and data, and also to perform different tricks and calculations on those. For those, who know one or other programming languages, it is clear that variables should be used. For the sake of those who are not familiar with these issues: variables are similar to labeled shoe‐boxes containing data items. During an analysis the data items can be stored in these shoe‐boxes instead of reading them from file for each operation. The arrow mark (or assignment operator) is used for loading any data, for example, a number, into these variables.

my.data <- 5

Now the number 5 is loaded into the my.data variable. The direction in which way the arrow points tells the story. For example, the result of an operation can be stored in a variable, and later on the data inside of the variable can be the subject of further operations.

my.data <- 5 + 3 my.other.data <- my.data * 2

Typing the name of the variable will show what is inside it:

>my.data 8 >my.other.data 16

The variables in R can store a great many types of different things like numbers, list of numbers, strings, sequences, data tables, data matrices, entire genomes, or multiple sequence alignments. Several operations have different meanings depending on what kind of data are applied to them. R is smart enough to figure out if a command has a different version specifically fit for a particular data type.

a <- 5 a + 3 b <- c(5,6,7) b + 3

In the previous example, there are two very different variables: a and b. Variable a holds a single number (5), while variable b holds a vector of three numbers. The addition (+) operator guesses that applying it to a single number (variable a), it should add 3 to a single number. However, in case of vector (b), it will add 3 to all the numbers in the vector. This distinction is crucial, as the result of the first operation is a single value, while the result of the second one is a vector itself.

R has this kind of smart redundancy that is especially handy with the plot() command. There are several data types that are represented in graphs and figures, which are very often generated by the plot() command. Specific data types have their specific plots, and R packages are well prepared to draw different plots for them. Using the furin_data.csv file again as an example, different graphs can be generated by of plotting one column of the data table (Figure 1.1), or all of them for checking their correlation (Figure 1.2):

furin.file<- '/path/to/your/file/furin_data.csv' my.data<-read.csv(furin.file,sep=\t) plot(my.data$Naive.KO.1) plot(my.data)

Scatterplot with data vs. index on x-and y-axes from one column of the data frame.

Figure 1.1 The plot() function produces a scatter plot when it is called on a column of a data frame. Unless specified differently, the values appear in their order in the data frame itself.

12 Scatterplots illustrating the correlation of multiple columns, with four boxes labeled Naive.WT.1, Naive.WT.2, Naive.KO.1, and Naive.KO.2

Figure 1.2 Calling plot() on multiple columns of a data frame results a correlogram among the columns of the data frame.

When complex data tables and data structures are loaded or created, the variables contain specific types of data like vectors, lists, matrices, or data frames. Many commands need one of these data types since some operations make sense for particular data. For example, calculating the mean works with a vector of numbers but not with DNA sequences. Putting the data into an accurate format to be suitable for a certain command is a special point in working with R. This is the problem that causes a lot of headache for R newbies as well as longtime R users.

Some basics of graphics in R

A picture is worth a thousand words, as the old saying states. In data analysis context, it means that very often the goal of using R is to summarize the results of an experiment using plots and images. R has a versatile imaging facility that is a bit difficult at first. After taming it, however, it will help to generate publication‐quality images with the possibility to adjust all their important aspects.

The topic is covered by shorter writings as well as several books. The main idea is simple, although it is somewhat unusual. Calling the plot() command (or one of its relatives) results in an image appearing on the screen. This image is not the one that is saved into a file! The secret behind the system is called graphical device. Calling a plot() command in R sends the graphics to the actual graphical device that is open at the moment. The default is a window on the screen. However, any other devices can be opened, such as a pdf, a jpeg, or a tiff file. Those are other graphical tools for R, so the next plot() command will draw into those open devices. These file devices should be closed first, so R can take care of their proper formatting.

b<-rnorm(1000) hist(b)

The first command here creates a vector of 1000 random, normally distributed numbers, and stores them in the b variable. The hist() command (a close relative of plot()) shows the frequency distribution of those numbers, so the shape of the histogram is not really surprising. Now how to have the same in a jpeg file?

setwd('/home/path/to/your/files') #on Linux/UNIX setwd('/Users/User Name/Documents/FOLDER') #on Mac setwd('c:/path/to/my/directory/') #on Windows jpeg(filename=my_first_plot.jpg) hist(b) dev.off()

After issuing these commands, a new jpeg file called my_first_plot.jpg appears in the working directory (Figure 1.3). It has the histogram similar to that seen before, but now it can be included in a manuscript or project report. Also, many aspects of the file itself can be adjusted, including its size and resolution.

Histogram illustrating the frequency distribution of 1000 random numbers.

Figure 1.3 The frequency distribution of 1000 random numbers generated by the hist() function.

jpeg(filename=my_first_plot.jpg, width = 640, height = 480, res=100, quality=100)

The pdf(), bmp(), png(), and tiff() devices can be used in a very similar manner.

Getting help in R

As it was shown in the call of jpeg(), read.csv(), and other commands, different commands and functions can have many parameters and requirements. Often an explanation is desired about these parameters, or about the details of how a particular function works. R provides an excellent help system to assist in these issues. If one wants to know the suitable input data, the different parameters about a particular function (known by its name), either the help() command can be used, or the function name should be appended to a question mark:

help(jpeg) help(read.csv) ?jpeg

These commands provide all the necessary details. However, this approach has some limitations. First, it is extremely useful if the help page is open during experimenting with the different parameters of a command. Next, there are cases when the name of a function in not known, although the task to perform is clear (e.g., Student's t‐test). In these situations, it is a good idea to look for the more advanced help system built into R. Calling the help.start() function will direct to a local page on the web browser. Here, the local manuals can be browsed that provide detailed information about the locally installed packages, and one can also perform searches in these pages.

help.start()

If one dives into the provided help pages, it can be noted that the result of this command is a huge amount of very detailed help supported by manuals and examples. After a little practice, it can be realized that almost all possible questions are already answered here. What remains for figuring out how to dig out those answers. The help facility of R can assist in this task.

In addition to this, there are two more education‐oriented services built in R: the commands example() and demo(). The former is provided for the most important commands, while the latter is designed to demonstrate selected topics and how to perform them with the installed packages.

example(read.csv)

This command prints several lines of text with instructions and explanations about the read.csv() command.

demo()

This command will show the list of available demos coming from the installed packages on the computer in use. The base installation contains quite a few of them (e.g., about recursion and scoping). It is easy to call them and check the possibilities that R provides:

demo(recursion) demo(image)

Files for practicing

furin_data.csv—This file is an excerpt of a gene expression microarray dataset. There are normalized gene expression values for different genes in the rows and different samples in the columns. The file is in CSV format, but the filed separators are tabulator (tab) characters, which is denoted in R as \t. The first row contains the column names. This file can be read into R using either the read.csv() function using the sep=\t parameter, or by the read.delim() function.

Study exercises and questions

How to assign values to different variables in R?

What is a data frame ?

What are packages in R, and how to load a suitable package?

How to save a sequence of R commands to a file?

What are package repositories for R?

What is Bioconductor, and how it is related to R?

Give an example of assigning a vector to a variable.

How do we get a help on a function? List multiple ways.

If you had to name at least two useful R packages for bioinfomaticians, which ones would you choose? Provide a brief description of their functionality.

If you are plotting into a file, how to instruct R to close the graphic file?

In your opinion, what are the advantage/disadvantages of working with R‐studio as compared to regular R‐console?

Which are the commands to check and set your working directory?

Explain the difference between R history and R environment file?

How to install a certain package in R?

What happens if scripts are not saved as text files, but as .doc/.docx files? Why?

Describe how we can load a script that is located in the same place as the working directory and a script located outside of the working directory?

References

Adler, J. 2012. R in a Nutshell. O'Reilly Media. http://shop.oreilly.com/product/0636920 022008.do (accessed May 4, 2016).

Charif, D., and J. R. Lobry. 2007. SeqinR 1.0‐2: A Contributed Package to the R Project for Statistical Computing Devoted to Biological Sequences Retrieval and Analysis. In Structural Approaches to Sequence Evolution: Molecules, Networks, Populations, edited by U. Bastolla, M. Porto, H. E. Roman, and M. Vendruscolo, 207–32. Biological and Medical Physics, Biomedical Engineering. New York: Springer Verlag.

Fox, J. 2005. The R Commander: A Basic Statistics Graphical User Interface to R. Journal of Statistical Software 14 (9): 1–42.

Gatto, L., and R. Stojnic. 2014. Sequences: Generic and Biological Sequences. http://CRAN.R‐project.org/package=sequences (accessed May 4, 2016).

Gautier, L., L. Cope, B. M. Bolstad, and R. A. Irizarry. 2004. Affy‐Analysis of Affymetrix GeneChip Data at the Probe Level. Bioinformatics 20 (3): 307–15. doi:http://dx.doi.org/10.1093/bioinformatics/btg405.

Gentleman, R. C., V. J. Carey, D. M. Bates et al. 2004. Bioconductor: Open Software Development for Computational Biology and Bioinformatics. Genome Biology 5: R80.

Helbig, M., S. Urbanek, and I. Fellows. 2013. JGR: JGR ‐ Java GUI for R. http://CRAN.R‐project.org/package=JGR (accessed May 4, 2016).

Pages, H., P. Aboyoun, R. Gentleman, and S. DebRoy. 2015. Biostrings: String Objects Representing Biological Sequences, and Matching Algorithms.

Paradis, E. 2005. R for Beginners. http://cran.r‐project.org/doc/contrib/Paradis‐rdebuts_en.pdf (accessed May 4, 2016).

R Core Team. 2015. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R‐project.org/ (accessed May 4, 2016).

Warnes, G. R., B. Bolker, G. Gorjanc, G. Grothendieck, A. Korosec, T. Lumley, D. MacQueen, A. Magnusson, J. Rogers et al. 2015. Gdata: Various R Programming Tools for Data Manipulation. http://CRAN.R‐project.org/package=gdata (accessed May 4,

Enjoying the preview?

Page 1 of 1

Molecular Data Analysis Using R

About this ebook

Csaba Ortutay

Related authors

Related to Molecular Data Analysis Using R

Related ebooks

Medical For You

Related podcast episodes

Related articles

Related categories

Reviews for Molecular Data Analysis Using R

What did you think?

Book preview

Molecular Data Analysis Using R - Csaba Ortutay

Why R?

Installing R

Interacting with R

Graphical interfaces and integrated development environment (IDE) integration

Scripting and sourcing

The R history and the R environment file

Packages and package repositories

Comprehensive R Archive Network

Bioconductor

Working with data

Basic operations in R

Some basics of graphics in R

Getting help in R

Files for practicing

Study exercises and questions

References