Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Discrimination Testing in Sensory Science: A Practical Handbook
Discrimination Testing in Sensory Science: A Practical Handbook
Discrimination Testing in Sensory Science: A Practical Handbook
Ebook879 pages5 hours

Discrimination Testing in Sensory Science: A Practical Handbook

Rating: 3 out of 5 stars

3/5

()

Read preview

About this ebook

Discrimination Testing in Sensory Science: A Practical Handbook is a one-stop-shop for practical advice and guidance on the performance and analysis of discrimination testing in sensory science. The book covers all aspects of difference testing: the history and origin of different methods, the practicalities of setting up a difference test, replications, the statistics behind each test, dealing with the analysis, action standards, and the statistical analysis of results with R.

The book is written by sensory science experts from both academia and industry, and edited by an independent sensory scientist with over twenty years of experience in planning, running and analyzing discrimination tests. This is an essential text for academics in sensory and consumer science and any sensory scientist working in research and development in food, home, and personal care products, new product development, or quality control.

  • Contains practical guidance on the performance and analysis of discrimination testing in sensory and consumer science for both food and non-food products
  • Includes the latest developments in difference testing, including both new methods and state-of-the-art approaches
  • Features extensive coverage of analysis with a variety of software systems
  • Provides essential insight for academics in sensory and consumer science and any sensory scientist working in research and development in food, home, and personal care products, new product development, or quality control
LanguageEnglish
Release dateSep 29, 2017
ISBN9780081011164
Discrimination Testing in Sensory Science: A Practical Handbook

Related to Discrimination Testing in Sensory Science

Titles in the series (27)

View More

Related ebooks

Food Science For You

View More

Related articles

Reviews for Discrimination Testing in Sensory Science

Rating: 3 out of 5 stars
3/5

2 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Discrimination Testing in Sensory Science - Lauren Rogers

    Discrimination Testing in Sensory Science

    A Practical Handbook

    Editor

    Lauren Rogers

    Table of Contents

    Cover image

    Title page

    Related Titles

    Copyright

    Dedication

    List of Contributors

    Preface

    Acknowledgments

    Part I. Introduction to Discrimination Testing

    Chapter 1. Introduction and History of Sensory Discrimination Testing

    Chapter 2. Statistics for Use in Discrimination Testing

    1. Business Risk

    2. Data Arising From Sensory Discrimination Test Methods

    3. Analysis of Data Arising From Tests With a Chance Bound (e.g., Triangle Test)

    4. Analysis of Data Arising From Simple Classification Tasks Without a Chance Bound (e.g., A-Not-A Test)

    5. Analysis of Data Arising From Difference From Control/Degree of Difference Test Methods

    6. Analysis of Data Arising From a Ranking Test Method

    7. Evaluating Sensory Equivalency

    8. Contextualizing Sensory Discrimination Results to Make Business Decisions

    9. Summary

    10. Recommended Reading

    Chapter 3. Deciding Which Test to Use in Discrimination Testing

    1. The Objective/Business Need

    2. Considering All Possible Test Methods

    3. Generating a Hypothesis

    4. The Action Standard and Possible Outcomes

    5. Assessors and Statistical Power

    6. Budget

    7. Product Considerations

    8. When Not to Use Discrimination Testing

    9. Summary

    Chapter 4. Applications and Limitations of Discrimination Testing

    1. Introduction

    2. Categorizing Discrimination Tests Within Sensory Methodology

    3. Applications of Discrimination Tests

    4. Limitations of Discrimination Tests

    5. Using Consumers in Discrimination Tests

    6. Applications and Limitations of Commonly Used Discrimination Tests

    7. Conclusion

    Part II. Methods and Analysis in Discrimination Testing: Practical Guidance

    Chapter 5. Paired Comparison/Directional Difference Test/2-Alternative Forced Choice (2-AFC) Test, Simple Difference Test/Same-Different Test

    1. Introduction

    2. Same-Different Test: Comparing Two Samples

    3. Directional Paired Comparison: Comparing Two Samples

    4. Multiple Paired Comparison: Comparing Multiple Samples

    Chapter 6. A-Not-A Test

    1. What Is the A-Not-A Test?

    2. Procedure

    3. When to Use the A-Not-A Test

    4. Analysis of A-Not-A Results

    5. Conclusion

    6. Case Study

    Chapter 7. Triangle Test

    1. Test Principle

    2. Why and When to Use It

    3. Advantages and Disadvantages

    4. Terms and Definitions (BS ISO 4120)

    5. Setting up the Test

    6. Assessors

    7. Product Preparation and Serving

    8. Test Layout

    9. Analysis and Reporting

    Chapter 8. Two-Out-of-Five Test

    1. Introduction

    2. Experimental Design

    3. Data Analysis

    4. Analysis Interpretation

    5. Two-Out-of-Five Method in Use

    6. Handy Hints

    7. Case Study 1

    8. Case Study 2

    Chapter 9. Tetrad Test

    1. Why the Upsurge in Popularity of the Tetrad?

    2. When to Use a Tetrad

    3. Setting Your Objective

    4. Assessors

    5. Setting Up the Test

    6. Case Study

    Chapter 10. Duo-Trio

    1. Introduction

    2. Origin

    3. Principle of the Test

    4. Assessors

    5. Facilities and Best Practice

    6. Why Choose a Duo-Trio Test

    7. Duo-Trio Additional Research

    8. Statistics—Definitions

    9. Case Studies

    Chapter 11. Difference From Control (DFC) Test

    1. Method Outline

    2. Why and When to Use This Method

    3. Advantages

    4. Disadvantages

    5. Test Procedure

    6. Test Layout and Setup

    7. Assessors

    8. Number of Samples

    9. Practicalities

    10. Reporting

    11. Constraints

    12. Case Studies

    Chapter 12. Ranking Test

    1. Method Outline

    2. Why and When to Use This Method

    3. Advantages

    4. Disadvantages

    5. Test Procedure

    6. Assessors

    7. Number of Samples

    8. Practicalities

    9. Reporting

    10. Constraints

    11. Other Uses

    12. Case Studies

    Chapter 13. ABX Discrimination Task

    1. Introduction

    2. Method Outline

    3. A Brief History

    4. Advantages and Disadvantages of the ABX Discrimination Task

    5. ABX Discrimination Task Methodology

    6. Data Analysis

    7. Case Study

    8. Conclusion

    Chapter 14. Dual-Standard Test

    1. Introduction

    2. Dual-Standard Test

    3. Experimental Design

    4. Results and Data Analysis

    5. Conclusion

    Chapter 15. Analysis of the Data Using the R Package sensR

    1. Introduction

    2. Basic Single Proportion of Correct Data

    3. Analysis of A-Not-A Tests

    4. Analysis of Same-Different Tests

    5. Difference From Control Data

    6. Ranking Data

    7. ABX and Dual-Standard Data

    8. Overview of the sensR Package

    Part III. The Future of Sensory Discrimination Testing

    Chapter 16. The Future of Sensory Discrimination Testing

    1. The Implication of Technology

    2. Memory-Based Monadic Testing

    3. Optimizing Testing: Using the Right Method for the Product and Getting More Power From Fewer Assessors

    4. Important Versus Significant

    5. Changing Global Consumer Markets

    6. Authenticity

    7. Impact of Global Climate Change

    8. A Future Perspective on Equivalence

    Appendix 1. International Sensory Science Standards

    Appendix 2. Statistical Tables

    Index

    Related Titles

    Developing Food Products for Consumers with Specific Dietary Needs

    (978-0-08-100329-9)

    Individual Differences in Sensory and Consumer Science

    (978-0-08-101000-6)

    Sensory Panel Management

    (978-0-08-101001-3)

    Copyright

    Woodhead Publishing is an imprint of Elsevier

    The Officers’ Mess Business Centre, Royston Road, Duxford, CB22 4QH, United Kingdom

    50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States

    The Boulevard, Langford Lane, Kidlington, OX5 1GB, United Kingdom

    Copyright © 2017 Elsevier Ltd. All rights reserved.

    No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

    This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

    Notices

    Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

    Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

    To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

    Library of Congress Cataloging-in-Publication Data

    A catalog record for this book is available from the Library of Congress

    British Library Cataloguing-in-Publication Data

    A catalogue record for this book is available from the British Library

    ISBN: 978-0-08-101009-9 (print)

    ISBN: 978-0-08-101116-4 (online)

    For information on all Woodhead publications visit our website at https://www.elsevier.com/books-and-journals

    Publisher: Andre Gerhard Wolff

    Acquisition Editor: Rob Sykes

    Editorial Project Manager: Karen R. Miller

    Production Project Manager: Lisa M. Jones

    Designer: Greg Harris

    Typeset by TNQ Books and Journals

    Dedication

    This book is dedicated to Eric and André.

    List of Contributors

    Maame Y.B. Adjei,     University of Ghana, Accra, Ghana

    Sarah Billson,     Reading, United Kingdom

    Per B. Brockhoff,     Technical University of Denmark, Lyngby, Denmark

    John C. Castura,     Compusense Inc., Guelph, ON, Canada

    Antoine G. de Bouillé,     Philip Morris Products S.A., Neuchâtel, Switzerland

    Chris Findlay,     Compusense Inc., Guelph, ON, Canada

    Rebecca A. Ford,     University of Nottingham, Nottingham, United Kingdom

    Brian C. Franczak,     MacEwan University, Edmonton, AB, Canada

    Ruth Elizabeth Greenaway,     Sensory Dimensions Ltd, Bulwell, Nottinghamshire, United Kingdom

    Christine B. Linander,     Technical University of Denmark, Lyngby, Denmark

    May L. Ng,     Pepsico, Leicestershire, United Kingdom

    Michael Plater Findlay,     Compusense Inc., Guelph, ON, Canada

    Sue Purcell,     Campden BRI, Chipping Campden, United Kingdom

    Lauren Rogers,     Sensory Science Consultant

    Tracey Sanderson,     Sensory Dimensions Ltd, Reading, United Kingdom

    Cécile Sinkinson,     JTI (Japan Tobacco International), Geneva, Switzerland

    Vladimir Vietoris,     SUA, Nitra, Slovakia

    Victoria J. Whelan,     British American Tobacco, Southampton, United Kingdom

    Qian Yang,     University of Nottingham, Leicestershire, United Kingdom

    Preface

    The main reason for writing this book was to give more detail for each of the sensory discrimination methods than would normally be found in a standard sensory text book. I had been very interested in finding out more about the origin of the various discrimination tests and how they all compared, and that started the ball rolling. I also wanted to provide a more detailed reference for the statistical analysis of data from discrimination tests, and two chapters (Chapters 2 and 15) complement each other in this regard, and the various method chapters (Chapters 5 to 14) give case studies that give examples of the analyses in action.

    While pulling together all the various chapters for this book, it was interesting to see how different people and different companies do things in different ways, especially in the statistical analysis and interpretation of the results for similarity testing. Chapter 2 gives an excellent account of the use of discrimination tests for similarity testing, and there is a really interesting and useful way to deal with testing for similarity in Chapter 15. If your discrimination tests are pretty much all about making sure there is no difference between your products, I can recommend that you read the relevant parts of both chapters before reading more about the specific methods. Using R for the analysis of your data, really is as simple as copying and pasting the script provided on my website, http://www.laurenlrogers.com/discrimination-testing-in-sensory-science.html. At first glance it may appear rather daunting but RStudio is actually quite easy to use!

    Writing the history chapter for this book gave me the opportunity to see how the various discrimination tests developed over time and this was incredibly interesting and enlightening. It also made me realize that there is an infinite number of tests available to us and that there is no magic associated with the triangle test. It will be really interesting to see sensory scientists trying out different discrimination tests, chosen to meet the requirements of the decision-making process, as opposed to relying on the company's method of choice for each and every decision. I hope the detailed method chapters written by the various authors in this book, as well as the useful information given in Chapters 3 and 4, will help you in your choice of test for each individual project that requires a discrimination test and that you will enjoy trialing new approaches. Do write and tell me all about it!

    Acknowledgments

    I am extremely grateful to all the authors who have contributed to this book; I could not have done it without you! The book is definitely a team effort. Thank you for all your contributions, ideas, and hard work in pulling together the various chapters.

    Thanks to Compusense for being kind enough to help with the cover photo, and a huge thank you to Per Bruun Brockhoff for all his statistical analysis advice and support. Thanks also to Joshua Brain for helping me with the proofreading.

    Special thanks to Lawrence Blackburn for the majority of the figures contained in Chapter 1 and for supplying me with copious cups of tea while reading, writing, and editing.

    Part I

    Introduction to Discrimination Testing

    Outline

    Chapter 1. Introduction and History of Sensory Discrimination Testing

    Chapter 2. Statistics for Use in Discrimination Testing

    Chapter 3. Deciding Which Test to Use in Discrimination Testing

    Chapter 4. Applications and Limitations of Discrimination Testing

    Chapter 1

    Introduction and History of Sensory Discrimination Testing

    Lauren Rogers     Sensory Science Consultant

    Abstract

    This chapter includes an introduction to sensory discrimination testing and the history of the origin of many of the sensory discrimination tests. The chapter also introduces the idea that there is no magic associated with the triangle test, for example, and that the actual number of tests available to the sensory scientist is infinite! Adjustment of the test instructions for the assessor and the different ways in which products and samples might be presented has the potential to create many new test designs or resurrect some previous tests for further evaluation.

    Keywords

    ABX; A-not-A; Difference; Difference from control; Directional paired comparison; Dual pair; Dual standard; Duo-trio; History; m-Alternative forced choice (2-AFC, 3-AFC, 4-AFC, etc.); Multiple standards; One-out-of-four; One-out-of-three; Paired comparison; Ranking; Same–different; Sensory discrimination tests; Similarity; Tetrad; Triangle; Two-out-of-five

    There are probably more than 20 sensory discrimination tests in use today, and the standard sensory texts (e.g., Kemp et al., 2009; Lawless and Heymann, 2010; Stone et al., 2012; Meilgaard et al., 2016) all contain a wealth of information and case studies about the main tests in use. The aim of this chapter is to detail the history of the creation of the various sensory discrimination tests¹ and also to introduce the idea that there is no magic associated with the triangle test and that the actual number of tests available to the sensory scientist is infinite!

    The main problem with all sensory discrimination tests is with the contradiction in the following two statements (Frijters, 1984): (1) two products with different formulations can result in the same sensory response from an assessor and (2) the same sample can give a variety of sensory responses from the same assessor. This illustrates why we need to recruit several assessors for each test and be especially careful about our experimental design and methodology.

    The majority of discrimination tests involve the comparison of two products, which we often refer to as A and B. For example, in the triangle test (see Fig. 1.1) there are two products: A and B. Products will be variations on a theme: for example, a fat-reduced yogurt, a new supplier for ingredient X, a new factory location for a washing detergent, a new improved flavor for a pizza.

    The number of samples (A's and B's) in the test can be symmetrical, e.g., AA-BB as in the tetrad or dual standard tests, or asymmetrical, e.g., AAA-B as in the one-out-of-four [also known as 4-alternative forced choice (4-AFC)] or the dual-pair tests. Note that we have the same sample layout with two different symmetrical test names and three different asymmetrical test names. By samples we are referring to the number of A's and B's (you can think of this as the number of cups, plates, laundry swatches, hair switches, etc.) presented to the assessor: for example, in the triangle test the number of samples would be three (see Fig. 1.1) and for the two-out-of-five test this number would be five (e.g., AA-BBB).

    Figure 1.1  The triangle test layout showing that the number of products is two (A and B) and the number of samples (of these products) is three (ABA for example).

    Methodological exceptions to the two product rule include ranking and the difference from control test (DFC), where any number of products, with certain limitations, might be compared. There are also options to conduct tests with more than two products such as AA-BB-CC. Basker (1980) called these tests polyhedral difference tests but they do not seem to have been taken up in practice. Richardson's (1938) and Torgerson's (1958) method of triads is another test design where three products can be presented. In the first example (Richardson, 1938) the subject's task is to decide which of the three simultaneously presented products are the most alike and which are the most different. Torgerson's method (1958) involves the presentation of the three products three times. The first time the subject is asked whether A is more like B or C, the second time whether B is more like A or C, and the third time whether C is more like A or B. These tests are actually quite similar to the duo-trio and ABX procedures with the main difference being that three different products can be presented (Moskowitz et al., 2003).

    One of the main differences between the sensory discrimination tests is related to whether the test has a specified attribute (e.g., sweetness, softness) or not. For example, in the 4-AFC the assessor will be asked which of the four samples is the most bitter (the attribute of interest, bitter, is specified) while the dual-pair is an unspecified test and hence asks which pair contains the different pair of samples. In the tetrad the assessor is asked to sort the samples into two similar pairs in the unspecified version, while the specified version of the same test will ask the assessor to group samples based on a specific attribute. Having specified and unspecified versions of most tests takes us to more than 40 different named tests, but many tests are very similar as they are based on the same principle. All that really varies for the assessor are the number of samples, their task, and whether or not a reference sample is identified; therefore, discrimination tests can be grouped in a number of different ways. For example:

    • Type: whether they involve a specified attribute such as sweetness or if they are unspecified. Unspecified tests are also known as overall discrimination tests;

    • Reference: if a reference or control sample is identified in the test;

    • Task/action: the manner in which the assessor makes the judgment: answering yes/no, e.g., same-different test and A-not-A; matching, e.g., to a reference; oddity, e.g., picking the different or odd sample; choosing, e.g., the most intense sample or the different pair; or sorting, e.g., putting samples into groups (Gridgeman, 1959a);

    • The number of samples presented; from 1 to 12, e.g., 1 sample in the A-not-A test through to 12 samples in the six-out-of-twelve test;

    • The number of products presented: the majority of tests involve two products; however, tests such as ranking, DFC, and polyhedral tests can contain any number (within reason);

    • Whether or not there is a response bias (see later) associated with the test, e.g., same-different test and A-not-A;

    • Whether some form of rating scale is included as part of the methodology.

    For example, look at the five tests in Fig. 1.2: they all involve the comparison of two products with the use of four samples. The dual standard, in contrast to the other four tests, is quite different, as it is the only test to contain any identified references. The assessor's task in the dual standard is to match the two coded samples to a different reference sample. In the tetrad, which can be specified and unspecified, the assessor's task is to group the samples into two similar groups, while in the 4-AFC the assessor's task is to choose the most intense sample for a specified attribute. In the dual-pair the assessor is presented with two pairs of samples, both coded: a matched pair and an unmatched pair, and the assessor's task is to choose the pair that is unmatched. And finally, in the one-out-of-four test, the assessor is asked to pick out the odd sample (similar to the triangle test, which could also be referred to as a one-out-of-three test).

    The majority of sensory discrimination tests are included in Table 1.1. They are categorized in columns by the type of test, and notes next to the test names give the task associated with the test. Note that some tests appear in multiple columns as they belong to both categories. As you can see from the table, there are an infinite number of tests available to us and the potential to create many more designs with different numbers of references, presentation order of the samples, and task instructions.

    Figure 1.2  Comparing five tests each with four samples and two products.

    Going back in time to gather the information for this chapter was not as easy as it might look, mainly due to the differences in terminology over the years, as well as getting access to the various publications from more than 80  years ago. It was not as simple as doing a Google search for first sensory discrimination test or first record of triangle test! Sensory science was often referred to as eating quality, organoleptic testing, palatability testing, and taste tests; sensory assessors as tasters, judges, panelists, and subjects; the various methods were called all different sorts of names (for example, the triangle test was referred to as: the trio comparison, triangular test, odd sample method, and the three glass test) and were not always the same method even when called by the same name; and discrimination testing was often referred to as differential tests, difference testing, identification testing, as well as subjective tests and comparison tests.

    Table 1.1

    An Overview of Sensory Discrimination Tests Sorted by Type

    The number after each test name gives an idea of the panelist task (but should not be used to develop the panelist questionnaire: check the relevant chapter or literature for the exact wording for each method. The ellipses (…) indicate that the sequence can be continued where relevant for the product type and experimental objectives. AFC, alternative forced choice.

    a When the reference or reminder is present in the test.

    b No labeled reference is provided—the two initial coded samples serve as blind references.

    c Generally unspecified but can be specified by attribute or by modality.

    d These questions simply summarize the panelist task: they should not be used to develop panelist questionnaires. Please check the relevant chapter or literature for the exact wording for each method.

    The first discrimination tests were really those used by Weber and Fechner in the early 19th century to examine the relationships between physical stimuli and sensory experience—known today as the study of psychophysics. Interestingly, the researchers in the 19th and early 20th centuries believed that sensations could not be measured directly and hence they constructed all sorts of methods and experiments to measure perception indirectly. Weber's experimentation in 1834 explored the just noticeable difference between different weights with blindfolded subjects and found that the difference that was detectable was proportional to the original weight (for a good summary, see Holden et al., 2011). The method has since been called the method of constant stimuli and you may recognize some elements of the paired comparison methods, e.g., the 2-AFC test we use in sensory science today.

    Fechner developed Weber's findings into Weber's law and also added two more methods: the method of limits and the method of adjustment or average error (Fechner, 1860; Lawless and Heymann, 2010). The latter method has not found much use in food research as it is difficult to create easy ways for a subject to adjust levels of, say salt in baked goods; however, the method of limits, which involves changing the stimulus by successive stages and asking the subject if they detect any sensation or not, is very similar to the same-different test and the ascending forced-choice method of limits used to determine thresholds we use today. Maybe we can say then that the technique of comparing pairs of stimuli was started by Weber and developed further by Fechner (David, 1963). Thurstone (1927, 1954) also used the paired comparison extensively in his psychophysical work to describe the discrimination process, which has recently been of great interest in the field of sensory discrimination testing [i.e., Thurstonian modeling and signal detection theory (SDT)]. These psychophysical methods were developed to help the researchers study differences in people's sensitivity to certain stimuli, but, although sensory scientists may also be interested in this area (after all, people are our instrument of choice), our main focus tends to be on sample or product differences and not on the person per se.

    Maybe the first example of a sensory discrimination test in the literature with the focus on food is Fisher's famous article about tea tasting, which was to become the fundamental statistics reference for hypothesis testing. The test originated in the 1920s in Cambridge, when a group of friends, including Ronald Fisher, were discussing the merits of pouring tea and whether the milk should be put into the cup before or after the tea. One of the party stated that she would be able to tell if the milk had been added first or last and so Fisher went about designing an experiment to determine if she could (Fisher, 1935). The original documentation is interesting as it describes the design of the experiment and the statistics behind the analysis, as well as the outcome. The method used for the tea testing was not named, although Gridgeman (1959b) referred to it as a double-tetrad sorting design; today we would probably call it an octad or classify it under M  +  N tests (Lockhart, 1951).

    Of course, discrimination testing has been around for much longer than the last 80 or so years. Sensory tests, albeit informal, would have been used for the assessment of the edibility of food and for checking drinking water, but even nonfood testing was conducted all those years ago. Examples include checking the suitability of housing (read caves) and weapons, like assessing the sharpness of flint tools (Meilgaard et al., 2016).

    Perhaps the first sensory discrimination method, although there is no publication to back this up (as mentioned by Dove, 1947), was ranking, as this method was undoubtedly used to rank food in terms of quality and also preference. In fact, Henry VIII is rumored to have ranked his wives in order of preference! Early references to the use of ranking for consumers' preference of eggs (Morse, 1942) and sweet corn (Dove, 1943) imply that there were many previous studies relating to preference ranking, but neither study is thorough enough to list them. Dove (1947) states that the reason for this omission is that there was too much literature and it covered too many disciplines; sensory science even then had many links with, for example, psychology, genetics, statistics, nutrition, and chemistry.

    Although not a discrimination test as such, grading was one of the first more formal sensory tests. A person might assess a sample of a larger batch prior to purchase and make the decision based on a system of grading (https://www.linkedin.com/pulse/tea-grades-taking-mystery-out-how-graded-darlene-green). In fact, some of these grading methods are still in use for tea, wine, and coffee (Kilcast, 2010). Grading generally uses a small number of experts to make the assessment of quality, and these experts are the first example of panel training and maintenance in the literature (Crocker and Platt, 1937). However, the experts checked their own assessments against those of colleagues or standard samples and there was no mention of screening for sensory acuity.

    The first named discrimination testing method was published in 1936 when Sylvia Cover wrote her paper on the assessment of meat tenderness (Cover, 1936). Cover wanted to find out, on behalf of housewives, if the cooking temperature of roasts made the meat more or less tender. She called her new methodology the paired-eating method and the paper makes an interesting read. We would now call this method "the paired comparison, and this discrimination test is actually referred to as the first published method in sensory science. Cover based her new method on one used in animal husbandry called the paired-feeding method," where two animals are fed the same amount of the same food (set to the lowest amount consumed by one of the pair) to observe the effects of, for example, dietary supplements.

    Cover presented her judges with two carefully selected samples known as paired bites. Because of the complex nature of testing meat, the cuts were taken from the left and right sides of the same animal and the bite-sized pieces were from the same muscle type. The judges were not aware of the experimental conditions or which sample came from which cooking temperature. Although these controls are excellent from a sampling point of view, the use of three-digit codes or balanced designs have yet to appear. Cover does not state how many judges were employed, nor whether they were screened (unlikely) or trained, although she did conduct 261 paired comparisons over a period of two years! She stated that the advantages of using the paired-eating method was that it was easy for the judges to detect and record differences and it also allowed the samples to be compared directly. She also used some simple statistics (the binomial method) to determine the statistical significance of her result but does not mention this as an advantage.

    The method that Cover used, the paired comparison, had been in use for many years in other disciplines, firstly studied by psychologists and then statisticians, mathematicians, and economists (David, 1963). In fact, our old friend, Fechner was the first to use the paired comparison method (Fechner, 1860) with his pioneering psychological experiments on just noticeable differences of weights and light intensities. But, we can still assign the first discrimination test on food to Sylvia Cover.

    Cover (1940) made some improvements to her method on the basis of colleague suggestions, in order for the method to be useful for other aspects of food research. One of the main changes to the method was in the order that samples were presented. Cover noted that if her technician attempted to arrange the samples randomly for assessment, the technician tended to be biased so, in the absence of any computers or software, she used a deck of playing cards to assign certain samples to certain judges. She also removed the meat identification information from the sheet that the judges were given to minimize any potential expectation error. Cover used the method of chi-squared for the analysis of the data alongside the binomial method she had used previously. The aspect of the selection, training, and number of judges also appears in this publication. Cover (1940) states, No method has yet been devised for detecting persons who will make superior judges for using the paired-eating method (p. 391); no standards or sensory textbooks were yet available to give her guidance. She does mention that some initial familiarization of the test method is an advantage for the judges and that anyone of average intelligence and with average ability to concentrate ought to make a good judge (Cover, 1940, p. 391). Cover suggests that there should be more than two people taking part in each experiment and she used six judges for the majority of the tests. Each judge assessed between 8 and 11 pairs of the same two samples; discussions about the analysis of replicated discrimination tests are 60  years in the future.

    It is interesting to note that Cover's paired-eating method later became called the duo test (Frijters, 1984) before it was typically referred to as the paired comparison. This sheds light on the naming of the duo-trio test, which has probably confused a whole host of people, as the test name appears to indicate there are five samples presented (duo 2  +  trio 3  =  5). If Cover's experiment were conducted today, we might refer to it as a directional paired comparison or a directional difference test; it would not strictly be a 2-AFC as the judges were allowed to give a no difference verdict. Of course, lots more work followed Cover's original use of the method including the round-robin version of the test when more than two samples needed to be compared (David, 1960), comparisons to other methods (e.g., Gridgeman, 1955; Hopkins and Gridgeman, 1955), a whole book on the topic of paired comparisons (David, 1963), and a good bibliography by Davidson and Farquhar (1976), to name a few.

    So if the paired comparison (or paired-eating method) was the first sensory (discrimination) test, what was the second? The award for the second test probably goes to the triangle test, which was developed independently by two groups of researchers: in 1941–1942 at the Seagram Quality Research Laboratory, although they did not publish the details until later (Peryam and Swartz, 1950), and in 1946 at the Carlsberg Breweries Research Laboratory (Helm and Trolle, 1946; Bengtsson and Helm, 1946). In fact the 1946 papers refer to earlier publications that also discuss methodologies and statistics, but as the cited papers were written in Swedish and not easily accessible, it is difficult to know if the triangle test was mentioned. Both sets of authors (Seagram and Carlsberg) refer to the triangular test as having been in use for several years, so perhaps we might guess at its origin a few years before.

    There were two papers published by Carlsberg Breweries (Bengtsson and Helm, 1946; Helm and Trolle, 1946) that are both credible as they are noteworthy in a historical sense, and not just for discrimination testing; the references to consumer testing, mass testing as it was referred to, are incredibly interesting, as are the photographs. There is a very nice example of a consumer questionnaire which has just two questions (six if we were to include name, age, profession, and address)—definitely keeping to the requirements of a short and simple questionnaire! The publications also describe all the elements that should be considered prior to setting up a sensory study that we now take for granted. In fact it is kind of difficult to imagine what life would have been like for the food scientists at the time trying to decide whether a new product had any potential or if the change in an ingredient made a noticeable difference to the perceiver. The introduction to the two papers (Editor, 1946) contains some very useful points for us in our attempt to travel back in time:

    It would be hard to conceive conclusions normally more subject to doubt than those concerning relatively minor differences in the odour and taste of beers. The problem is highly complex… Yet, even these developments promise marked improvements ultimately in the reliability with which flavour and taste judgements can be made (p. 167).

    The authors' objectives in the study described in the second paper (Helm and Trolle, 1946) were to select a panel of expert tasters and also to conduct a scientific study investigating the taste (and by this they meant both odor and taste) of beer. They were interested in the impact that aspects such as age, smoking, occupation, and previous experience had on taste sensitivity. They decided to conduct the experiments using what they termed differential tests as opposed to the grading tests that tended to be used at the time, for three reasons:

    1. To determine if the tasters² were able to differentiate between four pairs of beers;

    2. To avoid the grading-type tests because the statements from the tasters were too vague and difficult to summarize or analyze;

    3. And because they felt that the triangular test is particularly suitable for differential tests, since it can be established with certainty whether the tasters have judged correctly.

    The reason for the interest of the paper's authors in developing the triangle test can be summarized in a couple of sentences taken directly from the paper (Helm and Trolle, 1946):

    The traditional manner in which taste tests were conducted was not satisfactory. In most cases we were able to establish only the fact that it was not possible to discern the difference between samples with any certainty (p. 181).

    The triangular test was conducted by giving each taster three bottles of beer identified with a number or letter. Two bottles contained the same beer and the third contained a different beer. The authors were also interested to find out whether the use of three samples, an increase of one from the two bottle test they used most often, had an impact on the results because of fatigue. The appearance of the bottles and the beers were identical, other than the identifying letter or number. The tasters poured their own beer into three glasses provided. The triangular tests were carried out with the familiar presentation design of the six possible A and B combinations, and this was also randomized across four replicate tests so that no one taster saw the same presentation design. The tasters worked independently and in a room kept at 20°C. The questionnaire asked the assessor to identify Which two samples are identical? as opposed to the selection of the odd sample (see Fig. 1.3).

    The test conducted in this early paper differs markedly from the triangle test as performed nowadays, in that the tasters were informed about the nature of the difference, for example, bitterness or original gravity. We might refer to this as a specified triangle test if using this method today. The tasters were also asked for their preference after the test³ and were also told immediately whether they were correct or incorrect in their choice, neither of which is now recommended. The statistical analysis was carried out by comparison to a table of values drawn up by Bengtsson (Helm and Trolle, 1946) when he adapted the chi-squared analysis for use with triangular tests. The table has some errors but is very similar to what we would use today. The panelists were allowed to specify that they could not detect a difference (see Fig. 1.3) and hence there were three allowed answers for each test (number correct, number incorrect, and those who could detect no difference).

    Figure 1.3  Triangle test questionnaire layout from Carlsberg brewery ( Bengtsson and Helm, 1946 ).

    The other main difference between how the test was conducted then and now was that each taster took part in each test around 24 times, with no provision for replicated testing in the analysis. However, one of the aims of the testing was to select expert tasters and therefore the authors needed this type of data to determine each individual's tasting ability.

    The conclusions from the analysis of the 6878 triangular tests are interesting. Firstly, they state that it is not easy to conduct taste tests (p. 194) as many people find it difficult to remember one sample to the next. They found the triangular taste test worked well and was not subject to fatigue despite the increase in the sample number from the two-bottle test. Also, the authors found that if the experiment involved determining whether there was a difference between experimental and commercial beers, selecting tasters and using a differential test was a good option in comparison to a quality type test. This was because the latter requires a larger number of people who should be representative of the target consumer. Another conclusion was that there were two main test types in taste testing: differential and quality, and that they should be approached in a different manner.

    Another group (Peryam and Swartz, 1950) appears to have created the triangle test method at a similar time as the Carlsberg Brewery group, and it seems that these authors were also concerned about the difference between quality analysis and discrimination tests (a new term coined by the authors and still in evidence today). The authors state that human behavior can be dealt with scientifically, which was often disputed or simply not understood at the time. The authors created three tests for measuring sensory differences because they wanted more objective methods that were discriminative and not judgmental, and also that use statistical analysis to give a more simple, direct, and actionable answer.

    The description of the triangle test (Peryam and Swartz, 1950) is similar to how the method might be conducted today; there are three samples, two are identical and one is different, and the judge is asked to pick out the different sample; however, one major difference was that the control sample would always be presented twice and therefore there were only three presentation designs in total; so you can see that this is quite different to the Bengtsson, Helm, and Trolle's description of the test. The triangle test would also be used for preference, which is now generally avoided. It also appears that it was common practice to present one warm-up sample prior to the test itself and the test would often be repeated directly after the first presentation of three samples.

    ), to the standard error of the result that would be obtained by chance. In 1948 the first table of critical values for the triangular taste test was drawn up by Roessler et al. (1948) at the University of California, which, with the absence of computers, must have made life a lot easier for the researchers.

    In the early days the triangle test was referred to by several different names: the Helm technique, the trio comparison, the triad, triangular test, triangle test, odd sample method, oddity (this name understandably went out of favor quite quickly), and the three glass test. In fact, the ISO standard was still called the triangular test until the name triangle became adopted in the title in 2004.

    A further two discrimination tests are described in the Seagram Quality Research Laboratory paper (Peryam and Swartz, 1950): the duo-trio and the dual standard, so we can assume that these were also developed in 1941–42 alongside the triangle test. Again, the tests are described in a similar manner to how they would be conducted today but with a warm-up sample for the duo-trio, and a second replicate for both test types. In the duo-trio, the judges were presented with three samples, one of which was assigned the control and labeled as such, and of the other two coded samples, one was the test sample and the other the control. The task for the judge was to decide which sample was different to the control (and hence the other coded sample).

    In the dual standard, four samples were presented. For example, the first pair was labeled as standard 1 and standard 2 and the judge was allowed to acquaint themselves with the differences between the samples. These two samples were then presented again but coded. The judge had to decide which of the two coded samples was like standard 1 and which was like standard 2. For more information please see Chapter 14 in this book.

    So we have the first sensory discrimination method published on foods in 1936, followed by the three methods devised in 1941–42: triangle, duo-trio, and dual standard (Fig. 1.4). The next test to be devised was the difference-preference test (Dove, 1947) as part of the subjective-objective approach suggested by the author. The author uses this terminology to elevate the importance of the subjective assessments, which at the time were being discredited and overlooked by the use of instrumental or objective measures. The test described is basically the paired comparison with an added preference question using a 10-point scale: five equal degrees of acceptability and five equal degrees of nonacceptability are allowed.

    Figure 1.4  The early years.

    The author lists requirements for the laboratory where the tests are to be conducted (e.g., air conditioned, segregated booths, prescribed lighting), requirements for sample preparation (e.g., controlled quantity and temperature, hidden codes), and requirements for the judges (selection based on vocabulary, experience, and ability in detecting small differences as opposed to screening with basic tastes—something we are revisiting today). Some other authors had begun this task, but this is one of the most complete lists of the time. Another reason to read the paper is to enjoy the description of conducting taste tests with animals instead of humans on products such as lettuce and cabbage, where humans are confused by the taste!

    So what happened next? To describe this we have to travel back to 1932 when Arthur Fox first discovered the taste anomaly with phenylthiocarbamide (PTC) in his famous dust flying experiment (Fox, 1932). Harris and Kalmus (1949) discussed the various methods available to assess whether people were tasters or nontasters of PTC to try to determine why the published results seemed to be in conflict. They

    Enjoying the preview?
    Page 1 of 1