Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Translating Diverse Environmental Data into Reliable Information: How to Coordinate Evidence from Different Sources
Translating Diverse Environmental Data into Reliable Information: How to Coordinate Evidence from Different Sources
Translating Diverse Environmental Data into Reliable Information: How to Coordinate Evidence from Different Sources
Ebook1,080 pages9 hours

Translating Diverse Environmental Data into Reliable Information: How to Coordinate Evidence from Different Sources

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Translating Diverse Environmental Data into Reliable Information: How to Coordinate Evidence from Different Sources is a resource for building environmental knowledge, particularly in the era of Big Data. Environmental scientists, engineers, educators and students will find it essential to determine data needs, assess their quality, and efficiently manage their findings. Decision makers can explore new open access databases and tools, especially portals and dashboards. The book demonstrates how environmental knowledgebases are and can be built to meet the needs of modern students and professionals. Topics covered include concepts and principles that underpin air, water, and other public health and ecological topics. Integrated and systems perspectives are woven throughout, with clues on how to build and apply interdisciplinary data, which can increasingly be obtained from sources ranging from peer-reviewed research appearing in scientific journals to information gathered by citizen scientists. This opens the door to using vast amounts of open data and the necessary quality assurance and metadata considerations for their countless applications.

  • Provides tools to manage data of varying sizes and quality
  • Identifies both opportunities and cautions in using “other people’s data
  • Updates physical, chemical and biological factors that must be considered in risk evaluations and life cycle assessments
  • Applies to data collected by academic, governmental, businesses, and citizen scientists across environmental systems
  • Improves readers’ ability to organize and visualize their work in the age of Big Data
LanguageEnglish
Release dateSep 15, 2017
ISBN9780128124475
Translating Diverse Environmental Data into Reliable Information: How to Coordinate Evidence from Different Sources
Author

Daniel A. Vallero

Professor Daniel A. Vallero is an internationally recognized author and expert in environmental science and engineering. He has devoted decades to conducting research, teaching, and mentoring future scientists and engineers. He is currently developing tools and models to predict potential exposures to chemicals in consumer products. He is a full adjunct professor of civil and environmental engineering at Duke University’s Pratt School of Engineering. He has authored 20 environmental textbooks, with the most recent addressing the importance of physical principles in environmental science and engineering. His books have addressed all environmental compartments and media within the earth’s atmosphere, hydrosphere, lithosphere, and biosphere.

Read more from Daniel A. Vallero

Related to Translating Diverse Environmental Data into Reliable Information

Related ebooks

Nature For You

View More

Related articles

Related categories

Reviews for Translating Diverse Environmental Data into Reliable Information

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Translating Diverse Environmental Data into Reliable Information - Daniel A. Vallero

    Translating Diverse Environmental Data into Reliable Information

    How to Coordinate Evidence from Different Sources

    Daniel Vallero

    Table of Contents

    Cover image

    Title page

    Copyright

    Dedication

    Foreword

    Part I. Data and the Environment

    Introduction

    Systems thinking

    Translational science

    Precision science

    Chapter 1. Building a New Environmental Knowledgebase

    Data-intensive scientific discovery

    The role of data in environmental protection

    Chapter 2. The Environmental Knowledge Cascade

    Knowledge-building and decision-making

    Sound science in decision space

    Multicriteria decision analysis

    The scientific method

    Objectivity

    Reproducibility

    Coherence

    Part II. Environmental Knowledgebases

    Introduction

    Chapter 3. Stressors

    Inherency

    Pollutant transport

    Inherent property data

    Chapter 4. Pathways

    Stressor complexities

    Adverse outcome pathways

    Biomonitoring data

    Toxicokinetic data and models

    Chapter 5. Air

    Properties of the atmosphere

    The troposphere

    Air pollution

    Air pollutant transport and fate

    Types of air quality data

    Pollutant transport to the atmosphere

    Emissions data

    Ambient air quality data

    Air exposure data

    Modeling data

    Chapter 6. Water

    Properties of water systems

    Hydrologic cycle

    Physical properties of environmental fluids

    Discharge and flow at hydrologic units

    Streamflow example

    Pressure

    Acceleration

    Water pollution

    Types of water quality data

    Effluent data

    Storage and retrieval and water quality exchange

    Modeling data

    Chapter 7. Contaminant Storage Systems

    Sinks

    Unconsolidated materials

    Solid waste

    Movement within matrices

    Environmental aspects of soil

    Active sequestration

    Solid matrix partitioning

    Part III. Managing Environmental Knowledge Building

    Introduction

    Data, knowledge, and ethics

    Chaos in decision-making

    Communicating decision-making methods and results

    Chapter 8. Environmental Models

    Model development

    Models to extend data

    Transport

    Modeling pollutants within and between ecosystems

    Pollutant modeling within the human body

    Model quality

    Chapter 9. Environmental Data Analysis

    Metrics of data reliability

    Data quality and decision making

    Applying experimental data

    Extrapolating from the known to the unknown

    Uncertainty, safety, and risk

    Data interpretation

    Chapter 10. Data Interpretation and Presentation

    Events

    Expressions of risk

    Causal links between risk factors and adverse outcomes

    Rare events: perfect storms and black swans

    Tools for interpreting mined data

    Chapter 11. Case Studies and Examples

    Expert elicitation and multiobjective decision making

    Life cycle analysis example

    Emission scenarios

    Sustainable design

    Physiological models

    Environmental interoperability

    Environmental impact of heat

    Disaster knowledgebases: a personal reflection

    Knowledge

    Appendix 1. Physicochemical Data Sources

    Appendix 2. OECD's Emission Scenario Document Method for Calculating Environmental Releases

    Appendix 3. Key Terms and Notations

    Appendix 4. Default Exposure Factors for Human Health Evaluations Under the Comprehensive Environmental Response, Compensation and Liability Act, as Amended (CERCLA), and as Implemented by the National Oil and Hazardous Substances Pollution Contingency Plan (U.S. Environmental Protection Agency, 2014)

    Appendix 5: Little Blue District Ground Water Plan—Metadata

    Index

    Copyright

    Academic Press is an imprint of Elsevier

    125 London Wall, London EC2Y 5AS, United Kingdom

    525 B Street, Suite 1800, San Diego, CA 92101-4495, United States

    50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States

    The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom

    Copyright © 2018 Elsevier Inc. All rights reserved.

    No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

    This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

    Notices

    Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

    Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

    To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

    Library of Congress Cataloging-in-Publication Data

    A catalog record for this book is available from the Library of Congress

    British Library Cataloguing-in-Publication Data

    A catalogue record for this book is available from the British Library

    ISBN: 978-0-12-812446-8

    For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals

    Publisher: Candice Janco

    Acquisition Editor: Anneka Hess

    Editorial Project Manager: Tasha Frank

    Production Project Manager: Surya Narayanan Jayachandran

    Designer: Mathew Limbert

    Typeset by TNQ Books and Journals

    Dedication

    For my Grandchildren:

    Chloe Jayne; Daniel Alexander; and Samuel Joseph.

    Photo credit: Amelia Christine Randall; used with permission.

    Foreword

    The practice of environmental science and engineering has changed much since I entered the field in the mid-1970s. The goals are similar. We still know we need clean air and water. We still know we need to protect ecosystems. However, the way we go about achieving these goals is quite different now. To a much greater extent, we rely on others. We must not only share one another's data and knowledge but also depend on these data being generated with integrity. Otherwise, we will propagate uncertainties and compound errors to the point that our studies would be worse than useless; they may well lead to wrong decisions.

    There are numerous reasons that today's environmental knowledge is so different than it was a few decades ago. In the early 1970s, the field was embarking on uncharted paths, so all knowledge was new. Just look at the PhD dissertations of the 1970s. Chances are that the middle school science fair down the road has students who can report better data, use higher technologies, and cover topics we could have only imagined. Sensors, mobile devices, the Internet, communications, and the like can generate more data and provide greater spatial and temporal coverage than some of the largest environmental projects back then.

    Of course, other changes are driving this new way of building knowledge, including smaller budgets for observational field studies, epidemiologic studies, experimentation, and other investigator-driven knowledge building. Now, the data are coming at us from all directions and of varying quality and representativeness. More is not necessarily better. Just because I can pull data from hundreds of other studies does not mean they are relevant to my interests. Even if they are, I cannot assume that they were gathered and handled with the care and documentation I will need for my specific work.

    I have also tried to capture another difference, i.e., that our approach to science has also changed. We are not happy to stay within tightly defined lines of discipline. Nobel Prize winners in medicine frequently are not exclusively biomedical researchers, but winners also include those from the more basic sciences. Science now recognizes that the whole is greater than its parts. Every field seems to have added systems to its name, e.g., systems medicine, systems engineering, systems biology, systems carpentry (well, I made that one up). This means that the sciences also have had to become translational. If we are going to be systematic, we need to understand thoroughly what these other folks are doing. Only then, we can translate their science into what we need.

    Environmental science has somewhat of an advantage over its basic science forerunners. From its beginning, we knew that everything is interconnected. The living and nonliving parts of the ecosystem all had to work together for it to function properly and efficiently. Take away or harm one part of the system and the whole thing falls apart.

    In a sense, we are experiencing the law of diminishing returns. Today, we have pretty much gathered most of the low-hanging fruit. The concentrations of the most obvious pollutants in much of the industrialized parts of the world are substantially lower than what they were in the 20th century. However, as science has advanced, we have kept learning of new, previously unknown pollutants, including those that cause cancer, birth defects, and neurologic problems. These toxics are harder to measure and model than the handful of pollutants we worried about in the 1970s. And, as detection limits decrease, we keep finding more. Indeed, we would not have even been aware of the extent of certain environmental problems and the presence of these toxic substances had science not advanced exponentially in the past few decades. Chemical analysis, computational horsepower, modeling and statistical tools, and web applications have steadily lowered the levels of detection and quantitation. We have become able to identify chemical, physical, and biological agents at extremely low levels. Finding them is the first step. Preventing and controlling them to reduce risk is the next.

    The nature of risk and uncertainty has also changed. We continue to base many environmental decisions on risks, but the manner in which we use science in risk assessment is changing. We must deal with unconventional and emerging technologies that provide both risk and reward. It seems to me that we as scientists are being asked to consider hazards and exposure more systematically than when the risk assessment process was basically a stepwise approach. That is, we started worrying about the extent of exposure only after realizing that something was harmful. Only after substantially completing formal processes for identifying hazards and determining the amount of harm per dose did exposure become important. Now, we must do this all at the same time to gauge risk. Add to this that risk is no longer the only metric of environmental acceptability. Sustainability is a measure of our success. Precaution is now mandated in many large-scale decisions, especially when a threat is potentially severe and irreversible. In these cases, evidence is hard to find and, if found, it can be even harder to assure the quality of the underlying data.

    I argue that we have moved from an era of identifying acceptable risk to an era of acceptable uncertainty. How certain must we be to take an action and to decide that an action is not needed or when an action is even worse than no action (e.g., the opportunity risk of banning something that may have provided greater societal value)? When am I confident that I can move from the data-rich known to the data-poor unknown?

    In light of these new challenges, there is a real value to being able to find out that someone in another lab has conducted a well-designed study of one of these pollutants or processes. And, with the Internet and friendly searches, we are now able to find this study that would have had very limited currency a few decades ago.

    This book considers both the challenges and opportunities in this new scientific era. In a sense, it is both a compilation of key information needed for knowledge building and an update of my previous work, especially Environmental Contaminants, Fundamentals of Air Pollution, and Environmental Biotechnology. In fact, I updated all three in the process of writing this one. But, it is meant to be much more. I have also drawn on my recent collaborations with the next generation of environmental researchers and professionals to write about the transition to era of Big Data, and the cautions that go with these changes.

    From a teaching perspective, the book can be adopted en toto for an environmental management course. It can also be used as the text for an introductory environmental science course, with most emphasis on Part II. For those interested in environmental big data, I recommend using Parts I and III, with Part II providing specific information about specific processes.

    As was true for my previous books, I am grateful to my wife, Janis. She was not only patient with my late hours and weekends dedicated to this project, but was so generous and insightful in sharing perspectives that have enriched this book. Her understanding of human behavior, as always, greatly enhanced the final product. I also want to thank my oldest grandchild, Chloe Jayne Randall, who helps me keep up with trends that I would have otherwise missed. She is truly wise beyond her years. I fully expect to continue to receive such intergenerational guidance from her, along with my two grandsons, Daniel Alexander Vallero and Samuel Joseph Vallero in future projects. I thank God for these profound, yet undeserved, blessings.

    DAV

    Part I

    Data and the Environment

    Outline

    Introduction

    Chapter 1. Building a New Environmental Knowledgebase

    Chapter 2. The Environmental Knowledge Cascade

    Introduction

    Never before had so much data and information become available to support environmental decision-making. Scientists and engineers can now avail themselves to a wealth of methods for producing these data. The challenge for these scientists and engineers is that they must increasingly rely on an array of data sources that were never intended to be used in any way other than their original purpose. A survey of doctoral dissertations would show that many were based on very specific information under the complete control of the researcher, from the hypothesis to the design of the experiments, to the quality control and assurance, to the data collection and reduction and, ultimately, to the conclusions. Every other subsequent user of these data will not have the luxury of the intimacy and control of the originator. However, the large accumulation of these data provide the potential for extending knowledge, e.g., any future researcher's mining and analyzing disparate datasets for utility in whatever scientific need is at hand. This is the promise of using other people's data (OPD). It is what has come to be known as big data.

    Besides big data, another trend in science and engineering is to move away from generalizations and toward individualized, personal information. For example, not long ago, the best one could say about human sensitivities to adverse outcomes, such as diseases, is that they may fall into certain exposure classes (e.g., living in urban areas) or polymorphisms, i.e., genetic variations in rather large subpopulations [1]. Today, however, each person's exposure is becoming more precisely described, e.g., the exposome [2–5], and each person's genetic information readily available. As evidence, firms advertise their offer not only to describe the person's genome but also their life expectancies and disease potential [6].

    The individualization and large repository of data lend themselves to precise environmental science. Such precision environmental science can tailor data, information, and knowledge to the individual receptor, which can be a single person, a community, an ecosystem, or any environmental entity. I admit to coopting the term from precision medicine, i.e., health care that is not off-the-rack, but is individualized and unique to each patient [7].

    Of course, having the data about a person is only the beginning. The data can only represent a person if they are properly interpreted. Such interpretation is a rational exercise. Finding meaning and applying it require sound reasoning. Science and engineering depend on reasoning, which can be deductive or inductive. Deductive reasoning, arguably the most common in the physical and natural sciences, works its way from general knowledge, i.e. scientific principles, toward the specific circumstance. Conversely, inductive reasoning moves from the specific circumstance to generalizations. Both are needed in precision medicine and, by extensiton, precision environmental science.

    Similar to precision medicine, precision environmental science must apply knowledge in a more focused way. It must consolidate that data about an individuals' biology, environment, and lifestyle differ from that of others. Because precision medicine seeks to reconsider disease onset and progression, treatment response, and health outcomes through the more precise measurement of molecular, environmental, and behavioral factors that contribute to health and disease [8], environmental protection must better understand all of the steps that lead to a desirable or adverse environmental outcome. This understanding would help to target environmental assessment, protect air, water, and other resources based on rational pollution prevention strategies, improve studies to support cleanup, and lead to better technologies, e.g., sensors that track individual exposures to environmental agents. This would also be consistent with the current trends toward greater nonscientist involvement and deployment of citizen science in a time of dwindling governmental funding to expand spatial and temporal coverage of environmental conditions and public health status [9].

    For much of the history of modern science, information has been gathered by experimentation, whether in a laboratory with conditions controlled to identify the behavior of one or a few variables or in the field where variability and uncertainty is sufficiently known or can be assumed to be randomly assigned, i.e., the natural experiment [10]. The experiment is often very specific but has been intended to provide results that can be extended to a larger population or other scenarios. If we learn something about how a compound is absorbed and metabolized in one rat, we may be able to extend that knowledge not only to other rats, but to other species, including humans. Increasing the number of rats observed helps to reduce uncertainty about the compound's potential effects on the rat species. Unfortunately, uncertainties increase as we move to other species, so additional data and information will be needed if we are to say much about the compound's impacts beyond the single species. So, science has cared exceedingly less about that single rat than what the rat tells us about some important area of science.¹ These findings begin to form a knowledgebase that grows with each study. The goal is to have a knowledgebase that applies to a population, subpopulation, soil classification, habitat, etc. Thus, the information becomes less individualized and less specific to that particular animal, but the information is designed to apply a larger population, in this case, that does not even include the species being tested. Ironically, to build a personalized, precise knowledgebase for an individual person requires that a general knowledgebase first be built from composites of information derived from very specific studies of other people and lower species.

    Science in many fields that directly or indirectly support environmental and public health decisions are undergoing transformation. The shift is leading to science that is both systematic and translational.

    Systems thinking

    Ecology is arguably the archetypical systems science. After all, it is all about relationships among and between species and their living and nonliving surroundings. Early in the 20th century, Poincaré and his ilk prepared the scientific community for chaos theory and other systems thinking [11]. He likened factual data to stones and science to the house built from those stones [12,13]. Simply aggregating interesting data in mindless searches is insufficient and often counterproductive to explain how systems behave and change. Selecting the relevant and credible facts about the risk of any technology is difficult; yet constructing these facts systematically is more difficult, and it is even more difficult for living systems. Thus, all organisms, from single cellular to mammals, must be understood as highly complex, complicated, and changing systems [14].

    Studying the environment and living systems is fraught with uncertainties, requiring objectivity and insight. Oftentimes, environmental engineers and scientists are already part of a team managing some risk, e.g., treating a disease or cleaning up a hazardous waste site. Therefore, they must be able to detach themselves emotionally and intellectually to allow them to separate the scientific methods of the risk assessment process from the design, management, and policy considerations [15,16]. Environmental and public health decisions are made within a milieu of often competing perspectives. Risk management decisions are based on credible risk assessments, begging the question as to what constitutes an acceptable risk. One standard of risk acceptability is that an operation, a product, or a system should be as low as reasonably practical (ALARP), a concept formulated by the United Kingdom Health and Safety Commission [17]. The range of possibilities fostered by this standard can be envisioned as a diagram (see Fig. 1). The upper area (highest actual risk) is clearly where the risk is unacceptable. Below this intolerable level is the ALARP. Risks in this region require measures to reduce risk to the point where costs disproportionately outweigh benefits. Direct or short-term risks are relatively easy to identify compared with indirect or long-term risks that may occur long after the immediate benefits are attained.

    Scientifically and socially acceptable outcomes can be seen from a utility perspective. Akin to Mill's utilitarian principle that a moral decision is one that provides the greatest good for the greatest number, the utility of a particular application of a microbial population, for example, is based on the greatest good that any goods and services provide to human populations and ecosystems everywhere [19]. However, even Mill realized that this decision must also account for the potential harm it may cause, i.e., his harm principle. For example, a genetically modified microbial population that degrades an organic contaminant that has seeped into the ground water more efficiently than other available techniques (e.g., extracting the ground water and treating it aboveground by air stripping) would appear to be morally acceptable [14]. Single-variable, single-value assessments like these are uncommon and can cause new problems [20]. This is where the harm principle comes into play. For example, in addition to the beneficial biodegradation, the decision must also account for any downstream and side effects introduced by the microbial population's growth and their metabolism including the production of harmful metabolites or whether they could change the diversity and condition of neighboring microbial population, i.e., horizontal gene transfer² [21,22].

    Figure 1  Regions of risk tolerance. Note that allowable risk decreases from top to bottom. In this case, i.e., the United Kingdom, workers are allowed three orders of magnitude greater risk, partially because the workplace standard requires risk reduction and personal protective equipment. Environmental regulations must often be more stringent than occupational regulations for these reasons, i.e., the whole population must be protected. In addition, at least in most developed nations, the working population has much lower percentages of sensitive subpopulations, e.g., children and aged. From: Health and Safety Executive. Guidance on ALARP decisions in COMAH. 2017 Available from: http://www.hse.gov.uk/foi/internalops/hid_circs/permissioning/spc_perm_37/.

    Another aspect of ALARP is that the need for a margin of safety should be sought, i.e., the principle assumes that marginal improvements in safety can be compared to the marginal costs of the increases in reliability [18]. Margins of safety will be discussed at length in Part II, but for now it should be considered to go beyond scientifically credible risk assessment as a risk reduction and management approach. That is, the margin must be both protective and reasonable. Unfortunately, the means of determining reasonable actions are open to interpretation. Lawyers use a similar stratagem, i.e., the reasonable person standard. Engineering and other environmental professionals also directly or indirectly apply such a standard, e.g., when determining whether conduct has been ethical. The reasonable person is actually a legal fiction that applies Kant's categorical imperative [23]. The ethical review must determine whether universalizing the decision and action (e.g., cleanup) would be perceived to be a good or bad approach. That is, if every environmental profession did it this way, would this be the ideal way to handle the situation? For example, a review board may ask whether the environmental engineer knew or should have known the adverse aspects of the design in advance, based on the best available information, including both qualitative and quantitative data [24,25]. Thus, ensuring the quality and representativeness of OPD and secondary sources of information is both a scientific and ethical requirement.

    Translational science

    Today's scientist is less likely to operate in an intellectual silo and more likely to network with various experts, many outside of her or his chosen field. In this manner of thinking, translational science is an extension of systems thinking. The term has at least two different meanings. First, it is a way of harnessing knowledge from basic sciences to produce new drugs, devices, and treatment options…. [26].

    Translating the basic sciences can definitely be extended to the environmental sciences, as emerging technologies improve measurements and models, which in turn lead to advances in environmental understanding, i.e., the translation of interfaces between the basic sciences and the applied sciences and engineering. This is the second connotation of translational science, i.e., bringing basic findings to the practice of environmental engineering and technology [26]. Reliable data are needed to translate physical, chemical, and biological research results to incorporate them into analyses and assessments.

    Precision science

    The advancement of biomedical science has predominantly and progressively been to improve the diagnosis and treatment of diseases (See Fig. 1). This has led to an impressive knowledgebase about diseases, including linkages to environmental contaminants and genetic predisposition. However, until recently this has mainly been to move closer to the onset of symptoms. More recently, techniques were developed to begin to identify the onset of a disease presymptomatically, e.g., using biomarkers. Now, medical science has begun to identify ways an individual may deviate from the typical patient [8].

    Precision environmental science would find the various types of information, evaluate its quality and representativeness to the need at hand, and tailor the data to a single person, ecosystem, or other entity. Part I's two chapters consider the value and challenges of using existing data from various perspectives, given their strengths and weaknesses.

    References

    [1] Natarajan A, Obe G. Screening of human populations for mutations induced by environmental pollutants: use of human lymphocyte system. Ecotoxicology and Environmental Safety. 1980;4(4):468–481.

    [2] Dennis K.K, et al. The importance of the biological impact of exposure to the concept of the exposome. Environmental Health Perspectives. 2016;124(10):1504–1510.

    [3] Rappaport S. The exposome Monterey, California. 2013.

    [4] Wild C.P. Complementing the genome with an exposome: the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidemiology Biomarkers and Prevention. 2005;14(8):1847–1850. .

    [5] Wild C.P. The exposome: from concept to utility. International Journal of Epidemiology. 2012;41(1):24–32.

    [6] Annas G.J, Elias S. 23andMe and the FDA. New England Journal of Medicine. 2014;370(11):985–988.

    [7] Hodson R. Precision medicine. Nature. 2016;537(7619):S49.

    [8] Hudson K, Lifton R, Patrick-Lake B. The precision medicine initiative cohort program–building a research foundation for 21st century medicine. In: Precision Medicine Initiative (PMI) Working Group Report to the Advisory Committee to the Director. 2015.

    [9] Conrad C.C, Hilchey K.G. A review of citizen science and community-based environmental monitoring: issues and opportunities. Environmental Monitoring and Assessment. 2011;176(1):273–291.

    [10] Gerber A.S, Green D.P. Field experiments and natural experiments. In: The oxford handbook of political methodology.. Oxford University Press; 2008.

    [11] Holmes P. Poincaré, celestial mechanics, dynamical-systems theory and chaosPhysics Reports. 1990;193(3):137–163.

    [12] Poincare H. La Science et I'Hypothese. Paris: Flammarion, English translation; 1952 [New York: Dover].

    [13] Poincaré H. La Science et l'Hypothèse. Paris: Flammarion; 1902 [CR Acad Sci Paris 1905;140:1504].

    [14] Vallero D. Environmental biotechnology: a biosystems approach. Elsevier Science; 2015.

    [15] Byrd III. D.M, Cothern R.C. Introduction to risk analysis: a systematic approach to science-based decision making. Government Institutes; 2000.

    [16] Beierle T.C. The quality of stakeholder-based decisions. Risk Analysis. 2002;22(4):739–749.

    [17] Faber M.H, Stewart M.G. Risk assessment for civil engineering facilities: critical overview and discussion. Reliability Engineering and System Safety. 2003;80(2):173–184.

    [18] Health and Safety Executive. Guidance on ALARP decisions in COMAH Available from:. 2017. http://www.hse.gov.uk/foi/internalops/hid_circs/permissioning/spc_perm_37/.

    [19] Mill J.S. Utilitarianism. London: Parker, Son and Bourn; 1863.

    [20] Hancock J. Toxic pollution as a right to harm others: contradictions in Feinberg's formulation of the harm principle. Capitalism Nature Socialism. 2007;18(2):91–108.

    [21] Cases I, de Lorenzo V. Genetically modified organisms for the environment: stories of success and failure and what we have learned from them. International Microbiology. 2010;8(3):213–222.

    [22] Keese P. Risks from GMOs due to horizontal gene transfer. Environmental Biosafety Research. 2008;7(3):123–149.

    [23] Kant I. The metaphysics of morals (1797). 1996.

    [24] Driscoll D.L, et al. Merging qualitative and quantitative data in mixed methods research: how to and why not. Ecological and Environmental Anthropology (University of Georgia). 2007:18.

    [25] Lazer D, et al. The parable of Google Flu: traps in big data analysis. Science. 2014;343(6176):1203–1205.

    [26] Woolf S.H. The meaning of translational research and why it matters. JAMA. 2008;299(2):211–213.


    ¹ Although extending knowledge of toxic substance from one species is more common, there are also ample examples of extending knowledge of one toxic substance to another. This is known as a toxic equivalent (TEQ) or toxic equivalent factor (TEF). Often, these are used when a class of chemicals have similar biological response and other properties, but with very large ranges in potency. For example, there are many congeners of dioxins, but one form, i.e. 2,3,7.8-dibenzo-para-dioxin (TCDD) is the most toxic; so other dioxin forms are compared to TCDD and the total dioxin toxicity is often reported as a TEQ. Thus, knowledge is not only extended from biological species to other species, but from chemical compound to other chemical compounds.

    ² Although both of the cited references consider the risks of horizontal gene transfer to be negligible, there remains uncertainty sufficient to warrant caution in any application of genetic engineering and synthetic biology. This holds for any emerging technology, e.g., nanotechnology, and even proven approaches when they are applied in unprecedented settings and/or under untested, data-poor conditions.

    Chapter 1

    Building a New Environmental Knowledgebase

    Abstract

    This chapter explains how data and information are gathered and used to build environmental knowledge. Environmental protection and ecology are not synonymous, nor are environmental protection and public health. Environmental protection embodies ecosystems and human populations, as well as other resources, such as material integrity of monuments and other cultural resources. In addition, environmental studies and assessments often address societal needs, such as economics and justice. This means that the knowledgebases that underpin these studies and the decisions they support are diverse.

    Keywords

    Data; Environmental knowledgebase; Environmental protection; Information; Information technology; Precaution; Scientific method; Uncertainty

    Environmental protection and ecology are not synonymous, nor are environmental protection and public health. Environmental protection embodies ecosystems and human populations, as well as other resources, such as material integrity of monuments and other cultural resources. In addition, environmental studies and assessments often address societal needs, such as economics and justice. This means that the knowledgebases that underpin these studies and the decisions they support are diverse.

    This text is designed to provide information from which to build environmental knowledgebases of all types. Most of these are scientific, but depending on the decision to be made and the stakeholders involved, even the very scientific decisions include numerous nonscientific data and information. The scientific data must be rigorous and credible, following the scientific method. In addition, however, the other data sets being combined with these scientific data must also be of high quality, meeting standards of their associated disciplines, e.g., economics, anthropology,¹ etc.

    Data-intensive scientific discovery

    Many regard Aristotle to be the first scientist, based on his ground-breaking work in the 4th century BC, laying out methods for observing nature, inquiring meaning from these observations, discovering new concepts by demonstrating results, and applying reason and logic [1,2]. Aristotle used forms and other concepts to explain natural phenomena and, as was common for the Greeks of the time, with a heavy dose of a priori knowledge [1,3–5]. However, what is generally accepted as science in modern times is actually the latest phase in an evolution from natural philosophy in the 14th and 15th centuries to a more structured Renaissance science that blossomed in the 16th and 17th centuries and remains fairly intact today. Another way of describing this scientific evolution is adherence to the major paradigms held by the community of scientists. In the first paradigm to appear in modern science [6], the natural philosophers took bold moves to distinguish science from other ways of knowing and the manner of explaining natural phenomena.

    In the Western hemisphere, it was during the Renaissance that science began to be transformed into modern science. Galileo Galilei, Johannes Kepler, Francis Bacon, Robert Boyle, and their contemporaries were concerned with describing events in nature using a posteriori methods, especially the controlled experiment and field observation. This empirical paradigm was the foundation and codification of the scientific method, which continues to be crucial. This group of pioneering empiricists was soon joined by scientists² using rational tools and models, i.e., the theoretical paradigm.

    The scientific approach remained fairly well accepted and intact for the next four centuries, until the arrival of the computational paradigm, which allows scientists to simulate the physical and natural world and is arguably an extension of information gained from the empirical paradigm and an enhancement of the theoretical paradigm. None of these paradigms were supplanted by each succeeding paradigm but were improved as the scientist's tool box gained new approaches to uncovering the mysteries of natural phenomena.

    Indeed, they have extensively overlapped and merged. As evidence, there are numerous examples of the empirical and computational paradigms evolving simultaneously and interdependently. The ascendance of mathematics led to proofs of scientific laws. Although mathematics is the language of science, it underpins scientific advancement also. It was no accident, for example, that Kepler and Galileo were the mathematical geniuses. Even before dropping a ball from a ship mast, Galileo arrived at his hypothesis for falling objects by using the thought experiment [7], which is an intellectual and, arguably computational, model. In the process, this combination of empirical and computational paradigms disproved the a priori expectation of Aristotle that heavier objects fall faster than lighter ones [8]. Likewise, Kepler needed mathematical simulations to make use of Tycho Brahe's astronomical observational data set to arrive at his three laws of planetary motion and, in the process, disproving another a priori explanation, i.e., the epicyclic theory [9,10].

    Of course, the divergent and unprecedented approaches introduced by each shift have required at least a modicum of time and energy before acceptance, especially if they are at odds with currently held scientific orthodoxy [11].

    We are now in the fourth paradigm, known as data-intensive scientific discovery [12]. The onset of this new paradigm was accompanied by changes in the conduct of science, generally, and environmental science, specifically.

    The role of data in environmental protection

    Environmental science and environmental engineering address some of society's most vexing and seemingly intractable problems, from hazardous waste and toxic substances to water and air pollution to habitat destruction and endangered species to natural and human disasters. Decisions about these and other environmental and public health issues must be informed with reliable information built from credible and sound data [13]. The types of data needed are diverse and include inherent property of chemical and biological agents, spatial descriptions of habitat and land use, epidemiological and sociological factors, control technologies, and regulatory successes and failures.

    The data sources for this information are derived from myriad sources. Research publications, industrial reports, government documents, medical records, legal findings, periodicals, and other documents generate and use enormous amounts of data that comprise a full range from completely unreliable and undocumented blogs to highly reliable, well documented, peer-reviewed research. Indeed, of late, the scientific community has stressed the need for not only high-quality, peer-reviewed publications, but also that publications include supporting information (SI) that is needed to ensure that any reported results be able to be replicated. This was not possible not so long ago when journals' limited print space dictated brevity in word count. Now, many journals are digital and even those that are not may require that the SI be available digitally.

    It is becoming increasingly rare for anyone engaging in environmental decision and policy-making to own all of the data needed. Even environmental researchers must find and use data other than those they personally generate and ensure quality. Often, the researcher has a small subset of all the data needed to prepare an assessment report, publish a journal article, or even present results at a conference, workshop, seminar, or webinar. Many of the necessary sources of data have varying degrees of document. Increasingly, environmental data users completely lack control over the data. Therefore, using other peoples' data (OPD) presents two major challenges:

    1. The data are usually generated for uses other than those intended in future scientific and engineering endeavors.

    2. The quality and other characteristics of the data, i.e., the metadata, are unknown or incomplete.

    The promise of OPD is huge, but these and other challenges require that they be employed with the humility and care. Data themselves are useless to scientist and engineers. With regards to environmental science and engineering, data become useful only when transformed into information that can add to an environmental knowledgebase and wise decision-making [14]. Evaluating the quality of data and turning these data into useful information is a daunting challenge for the 21st century scientist, engineer and, really, anyone who wants to make reliable decisions about how the environment is changing and what can be done to intervene against adverse environmental events. The first step in solving a problem is understanding it. Too often, decisions must be made based on inadequate information. Sometimes, the information is inadequate simply because the problem was not well characterized. Other times, sufficient information exists to characterize the problem adequately, but there is not enough relevant information to compare possible solutions to these problems and project future conditions expected if each of these alternative solutions was realized.

    Promise and cautions

    The big data era is already ensconced in the scientific psyche, practically every academic discipline, not just those steeped in information technology (IT). The promise of informatics and data mining now envelops a large and growing range, from even the most basic of the sciences to practically every area of human inquiry, whether scientific or nonscientific. As the demand has grown, so has the need for thoughtful interrogation and critique, especially with regard to pitfalls and faulty assumptions about the quality and relevance of any data used for purposes beyond the original motivation for gathering these data [15].

    The writer envies the humanities in their creativity, especially in using ambiguity as a tool for the reader's interests. Creative writers take great pride allowing the reader to work his or her way through a story. Ambiguity makes for interesting reading. However, ambiguity deters from logic and is a mortal enemy to the scientific method. Scientists and professionals fight against ambiguity. The scientist's publications must be sufficiently clear and unambiguous as to allow any other competent scientist to replicate the results [16]. Omitting key information, assuming something about apparatus or methods to be understood by future readers, or implying a sequence of events different from what actually occurred are examples of ambiguities in peer review. The ambiguity increases as others accept, but who are unaware that the results come with caveats, such as only applying to carefully control, laboratory conditions, lose quality and relevance in the real world, e.g. with increasing scale or the complications introduced by myriad real-world variables. Furthermore, even if the data are sound and relevant, the scientific professional must apply them properly, such as when the engineer builds this knowledge and then transfers it unambiguously to drawings, blueprints, and reports; the physician writes prescriptions unambiguously; and the airline mechanic ensures that the plane's engine's parts diagram is the appropriate one.

    In many ways, technology has helped to improve clarity and to reduce ambiguity. Pharmacists who receive an online script from the physician no longer must decipher the handwriting (does ‘As’ mean aspirin or arsenic?). Engineers who share computer-aided drafting systems are less likely to have the units misunderstood (recall one of the funniest scenes in the movie, This Is Spinal Tap [17], where the plans called for feet [‘], but the recorder wrote the unit for inches ["], making for a very small Stonehenge around which the band was to perform).

    Technology has been a boon to sharing information. One challenge, however, is the condition of the information as it is shared. Ample examples exist for biomedical, environmental, and other information that is prematurely shared, when data are incomplete or conclusions made are beyond what the data actual reveal. This phenomenon, known disparagingly as science by press conference, can result from conflicts of interest and attempts to be first in line for funding. Fears of being preempted and losing credit are also reasons for premature releases of the data. The biomedical community employs the Ingelfinger rule [18,19] to try to prevent such data releases, but they still occur and are likely being exacerbated by IT [20]. Only after the work is actually completed or when others try to replicate the results may the flaws and incorrect conclusions be exposed. Unfortunately for future users, the original data may remain in cyberspace indefinitely where it may continue to be acquired.

    Reality: extending the allegory of the cave

    We must remind ourselves as researchers and practitioners that data are merely representations of reality, but not reality itself. Most scientists pride themselves on being the arbiters of what is real. Scientists distinguish themselves from nonscientists in that a scientist's conclusions are only supposed to come after a rigorous, systematic consideration of facts. Even a nonscientist is thought of as scientific as long as the reasoning is sound and fact based. The reasoning must not only be logical but also fact based, i.e., real. That is, one can be logical and yet unscientific if the logic is based on flawed information. This is particularly the case for the physical and natural sciences but also extends to the social sciences. In fact, the social scientists are often challenged by the even greater amount of diversity and the diminished quality of data (in terms of precision and accuracy) from which to apply reasoning, arguably to a much greater extent that the physical sciences.³

    This begs the question, then, What is real? Modern science actually has to depend on the work of others, past and present, to define the margins of reality. This requires many articles of faith (or at least trust in one's sages and forbearers). To wit, consider reality from the perspective of one of the most scientifically rigorous components of an environmental investigation, i.e., chemical analysis. When an extract is injected into a gas chromatograph (GC), the sample travels through a column (actually a hollow, capillary tube with a coating inside, e.g., fused silica). The tube is in an oven, so it can be heated so that the compounds in the sample will volatilize at different times (compounds with lower boiling points generally reach the detector first, i.e., they have lower retention times). Depending on the detector, something happens to the compounds, which is recorded as a chromatogram. If the detection is by electron capture (ECD), a radioactive beta particle collides with the carrier gas to generate ions that capture electrons from the compounds in the sample [21]. With mass spectrometry (MS), the compound is bombarded by electrons and fragmented. Each fragment is classified by what is known as the mass-to-charge ratio, from which the concentration of each compound is determined [22]. For these and other detectors, the chromatogram does not, in reality, show the concentration of a compound, but it shows only the representation that must be interpreted as the concentration from a standard or calibration curve that is built from known quantities of whatever compounds one wants to measure [23]. Thus, even for well-established, scientific systems such as GC-ECD and GC-MS, there has been a cascade of interpretations and calibrations. Any mistakes along the way detract from the representation of what is actually there.

    Many scientific pursuits are less precise and accurate than chromatography. The data must be transformed to begin to be representative of physical reality. Almost every major public health and environmental decision is made under enormous uncertainty as to the effect that an action or policy will have on outcome. Indeed, even smaller scale decisions are fraught with uncertainty, other than an overall, general sense of how a choice will affect an outcome. For example, one is fairly certain that eating less sugar is often a good choice, but directly linking a precise amount of reduced sugar intake with diabetes or other risks is much less feasible.

    Similarly, researchers must deal with the incompleteness of understanding of data. With apologies to philosophers who know that Plato was addressing the Theory of Forms, the writer will liken scientific and methodological uncertainties to Plato's Allegory of the Cave [24]. Plato argued that people are ignorant of many processes, using the metaphor of chained prisoners in a cave, whose heads are restrained so that they cannot see the reality behind them. A fire behind them is the source of light, but the prisoners can only see what is projected on the cave wall in front of them. Between the fire and the prisoners is a parapet, on which puppeteers are walking. The puppeteers periodically hold up puppets that cast shadows on the wall of the cave. The prisoners cannot see these puppets (i.e., they cannot see reality). Plato tells us that we, the data users, are the prisoners. What we do see and hear are the shadows cast and echoes from the objects (i.e., measurements and other data) [25]. Thus, even the more accepted scientific approaches are biased and inaccurate to varying extents. The uncertainties and errors can be further propagated as we engage in mathematical modeling, given the needed assumptions about initial and boundary conditions, values given to parameters, and the unknowns.⁴

    Indeed, Plato's prisoners actually believe that the shadows are reality, so they do not strive to break away. Indeed, Plato contends that even if the prisoner is freed, he rejects the reality because it differs with his knowledgebase, although with time he can adjust to the new knowledge [26]. It may be somewhat heartening that data users are often not completely prisoners to the secondary data. As users of OPD, most of us are aware of some of the pitfalls and shortcomings, and we strive to avoid misuse.

    Addressing uncertainty

    According to decision theory as it applies to engineering optimization and operations research, decisions can be partitioned into two broad categories [27,28]. The first category is a decision under risk; the second, a decision under uncertainty. A decision under risk is one where the probability of all possible outcomes is known. When these outcomes cannot be known or are only partially known, the decision is said to be under risk. When optimizing for best approaches to clean up an environmental mess or prevent the mess in the first place, the first thing one needs to know is whether there are sufficient data to construct an event tree. The tree shows the probability that each possible outcome can be drawn (See Fig. 1.1). Fig. 1.2 shows that if beneficial outcomes increase, so may detrimental outcomes. Obviously, identifying every contributing event is an impossible task for all but the simplest decisions; so such event trees often include very broad assimilations of possible outcomes. Even if there is a wealth of information about possible outcomes, these outcomes occurred in the past. None of the scenarios and contingencies that led to a particular outcome will ever again occur in exactly the same way.

    Figure 1.1  Hypothetical example of an event tree representing decision under risk (fictitious data). From: Vallero DA, Letcher TM. Unraveling environmental disasters. Newnes; 2012.

    The definitions in the previous paragraph come from operations research, engineering optimization, and the decision sciences. They may give some pause to environmental knowledge builders who have not engaged in this aspect of decision theory because these definitions are not at all consistent with the same terms applied in environmental risk assessment and management. Environmental risk is the likelihood of an adverse environmental or public health event. In this sense, risk is a function of hazard and exposure to that hazard. Uncertainty entails our inability to understand and predict these hazards and exposures precisely and accurately.

    Environmental uncertainty results from variability and lack of knowledge [30]. Perhaps, more than most scientific venues, the factors that lead to environmental risk are highly variable, given the large number of habitats, sources of pollution, diseases in populations, geographic diversity, and many other environmental circumstances. For example, within a square meter of soil, the microbial populations, soil texture, organic matter, and chemical makeup can be highly variable. Extending this to the effect of dredging near a 10-ha wetland or trying to determine the impact of a leaking underground gasoline tank propagates these uncertainties. This is compounded by the measurement imprecision, voids and gaps in observations, practical obstacles (e.g., no access to private property and physical barriers), lack of consistency in measured and modeled results, and simply not being able to understand what the information means, i.e., structural uncertainty.

    Figure 1.2  Event tree where probabilities in Fig. 1.1 increase for both desirable and adverse outcomes (fictitious data). From: Vallero DA, Letcher TM. Unraveling environmental disasters. Newnes; 2012.

    Shifting paradigms

    Within an environmental context, then, uncertainty is the degree to which we cannot define this risk. The lack of reliable information to build knowledge is the major source of uncertainty and a major barrier to making the right decisions to manage and communicate actual and potential risks [30]. Thus, we are uncertain about how we are doing, about what to do to make things better, and about what will happen if I choose to adopt a certain action to address the problem. Indeed, I may even make things worse than if I had done nothing at all, i.e., the no action alternative can sometimes be the best, or at least better, than some or all of the other available actions. In this sense, environmental decision-making and the information it requires is different from many business decisions, which can be more easily modified if things go wrong.

    Lee Iacocca, the savvy business guru and former chairman of the Chrysler Corporations, famously quipped that one needs to lead, follow, or get out of the way and the Nike Corporation's motto of Just do it may be dangerous if universally applied to environmental and public health decisions. Sometimes, immediate action must be taken; other times the action must await better supporting information. This quandary is prevalent in some of the larger and controversial environmental issues, such as climate change, food supply and quality, water use, and genetically modified organisms (GMOs). Indeed, the traditional risk assessment process used by the United States and other governmental bodies has recently been reconsidered. This system, especially as applied to chemicals, followed a stepwise approach, beginning with the identification of hazard, calculation of its danger as expressed by a dose-response curve, assessing exposures to the hazard, estimating the effects, and ultimately characterizing the risks [31]. This stepwise process has weaknesses. For example, microbial and nonchemical agents do not necessarily adhere to this paradigm. Even for chemical agents, the linear approach is not conducive to systems thinking [32,33].

    Applying experimental data

    Data and information are a neutral commodity. That is, they only have value in their usage. Data are seldom useful in raw form and must be validated, verified, and reduced according to the needs of the user. The scientific method since Robert Boyle's air pump has focused on trying to control the conditions for all but one or a few variables, i.e., to obtain knowledge about the unknown via the experiment. Indeed, Renaissance science as articulated by Boyle and the Royal Society had three inviolable components [34,35]:

    1. A posteriori knowledge deduced from observations, i.e., experimentation;

    2. Technical publication of the observations; and

    3. Peer review.

    To advance knowledge, the experiment, if successful, broadens the domain of the known. After experimental results are found, the experimenter strives to derive something that had previously been unknown from the domain of the known. This is the essence of the first scientific paradigm mentioned earlier. In the modern era, the advancement of science and understanding of physical phenomena have rested on experimentation, the first of the three components. The other two are designed to ensure the quality of the results. Quality is a function of the precision of the observed data, i.e., how close the measured data points lie to one another. A tightly grouped set of data points is more precise than a loosely arranged. Each scientific discipline has a particular way of defining precision, but it is the general expression of data exactness.

    From the perspective of the fourth paradigm, the second and third components of modern science are crucial to support the data intensity needed for scientific discovery. The acts of doing the experiment and making observations must be followed by careful documentation in the scientific literature, which in turn, must be reviewed for quality and meaning by other objective and competent scientists. Interestingly, modern science has really never reached complete unanimity regarding the extent and type, or even the authority of peer review. For example, biomedical science has had much diversity of thought about the techniques, roles, and responsibilities of the experts asked to conduct the reviews. For example, in the United States in the early part of the 20th century, many publishing physicians resented criticisms of the writings by any other physician and argued that membership in a medical organization was a sufficient reason for an article to be accepted for publication [36]. This is changed substantially, with a much larger consensus regarding the need to accept high-quality and meaningful manuscripts [37]. Some of this consensus, no doubt, is merely lip service, but for most of the scientific community, peer review is essential in building and maintaining the representation of science organizations and publishers. Unanimity is impossible, given that what one journal editor considers high quality and meaningful differs from others. Indeed, the explosion in the number of scientific journals in biomedical, public health, and environmental disciplines requires great care in choosing only the data and conclusions from the highest tier publications and data sources.

    One means of assessing data precision is to obtain a sufficiently large sample and repeat the same conditions numerous times to see how much the data deviate from a central metric; commonly, this is the mean. If the data points all lie close to the mean and your experiment does not have any design flaws, you can be more confident of the data precision. However, the caveat of no design flaws is commonly not the case.

    Obviously, precision is a necessary, but wholly insufficient, aspect of data quality You may shoot 20 arrows at the wall, all spaced within a 10  cm diameter, but if the target is 10  m away, you missed it entirely. One can be very exact, yet exactly wrong! Thus, data precision is irrelevant without its companion, accuracy. In fact, scientists commonly refer to the initialism, P&A. Precision and accuracy are required in tandem when deciding whether a data set is good enough for a particular use.

    Data accuracy is much more difficult to attain than precision. Accuracy is the extent to which a measurement agrees with the true value [38]. Only

    Enjoying the preview?
    Page 1 of 1