Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Computer Vision: Principles, Algorithms, Applications, Learning
Computer Vision: Principles, Algorithms, Applications, Learning
Computer Vision: Principles, Algorithms, Applications, Learning
Ebook1,788 pages25 hours

Computer Vision: Principles, Algorithms, Applications, Learning

Rating: 5 out of 5 stars

5/5

()

Read preview

About this ebook

Computer Vision: Principles, Algorithms, Applications, Learning (previously entitled Computer and Machine Vision) clearly and systematically presents the basic methodology of computer vision, covering the essential elements of the theory while emphasizing algorithmic and practical design constraints. This fully revised fifth edition has brought in more of the concepts and applications of computer vision, making it a very comprehensive and up-to-date text suitable for undergraduate and graduate students, researchers and R&D engineers working in this vibrant subject.

See an interview with the author explaining his approach to teaching and learning computer vision - http://scitechconnect.elsevier.com/computer-vision/

  • Three new chapters on Machine Learning emphasise the way the subject has been developing; Two chapters cover Basic Classification Concepts and Probabilistic Models; and the The third covers the principles of Deep Learning Networks and shows their impact on computer vision, reflected in a new chapter Face Detection and Recognition.
  • A new chapter on Object Segmentation and Shape Models reflects the methodology of machine learning and gives practical demonstrations of its application.
  • In-depth discussions have been included on geometric transformations, the EM algorithm, boosting, semantic segmentation, face frontalisation, RNNs and other key topics.
  • Examples and applications—including the location of biscuits, foreign bodies, faces, eyes, road lanes, surveillance, vehicles and pedestrians—give the ‘ins and outs’ of developing real-world vision systems, showing the realities of practical implementation.
  • Necessary mathematics and essential theory are made approachable by careful explanations and well-illustrated examples.
  • The ‘recent developments’ sections included in each chapter aim to bring students and practitioners up to date with this fast-moving subject.
  • Tailored programming examples—code, methods, illustrations, tasks, hints and solutions (mainly involving MATLAB and C++)
LanguageEnglish
Release dateNov 15, 2017
ISBN9780128095751
Computer Vision: Principles, Algorithms, Applications, Learning
Author

E. R. Davies

Roy Davies is Emeritus Professor of Machine Vision at Royal Holloway, University of London. He has worked on many aspects of vision, from feature detection to robust, real-time implementations of practical vision tasks. His interests include automated visual inspection, surveillance, vehicle guidance, crime detection and neural networks. He has published more than 200 papers, and three books. Machine Vision: Theory, Algorithms, Practicalities (1990) has been widely used internationally for more than 25 years, and is now out in this much enhanced fifth edition. Roy holds a DSc at the University of London, and has been awarded Distinguished Fellow of the British Machine Vision Association, and Fellow of the International Association of Pattern Recognition.

Related to Computer Vision

Related ebooks

Computers For You

View More

Related articles

Reviews for Computer Vision

Rating: 5 out of 5 stars
5/5

1 rating0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Computer Vision - E. R. Davies

    Computer Vision

    Principles, Algorithms, Applications, Learning

    Fifth Edition

    E.R. Davies

    Royal Holloway, University of London, United Kingdom

    Table of Contents

    Cover image

    Title page

    Copyright

    Dedication

    About the Author

    Foreword

    Preface to the Fifth Edition

    Preface to the First Edition

    Acknowledgments

    Topics Covered in Application Case Studies

    Influences Impinging Upon Integrated Vision System Design

    Glossary of Acronyms and Abbreviations

    Chapter 1. Vision, the challenge

    Abstract

    1.1 Introduction—Man and His Senses

    1.2 The Nature of Vision

    1.3 From Automated Visual Inspection to Surveillance

    1.4 What This Book Is About

    1.5 The Part Played by Machine Learning

    1.6 The Following Chapters

    1.7 Bibliographical Notes

    Part 1: Low-level vision

    Part 1. Low-level vision

    Chapter 2. Images and imaging operations

    Abstract

    2.1 Introduction

    2.2 Image Processing Operations

    2.3 Convolutions and Point Spread Functions

    2.4 Sequential Versus Parallel Operations

    2.5 Concluding Remarks

    2.6 Bibliographical and Historical Notes

    2.7 Problems

    Chapter 3. Image filtering and morphology

    Abstract

    3.1 Introduction

    3.2 Noise Suppression by Gaussian Smoothing

    3.3 Median Filters

    3.4 Mode Filters

    3.5 Rank Order Filters

    3.6 Sharp–Unsharp Masking

    3.7 Shifts Introduced by Median Filters

    3.8 Shifts Introduced by Rank Order Filters

    3.9 The Role of Filters in Industrial Applications of Vision

    3.10 Color in Image Filtering

    3.11 Dilation and Erosion in Binary Images

    3.12 Mathematical Morphology

    3.13 Morphological Grouping

    3.14 Morphology in Grayscale Images

    3.15 Concluding Remarks

    3.16 Bibliographical and Historical Notes

    3.17 Problems

    Chapter 4. The role of thresholding

    Abstract

    4.1 Introduction

    4.2 Region-Growing Methods

    4.3 Thresholding

    4.4 Adaptive Thresholding

    4.5 More Thoroughgoing Approaches to Threshold Selection

    4.6 The Global Valley Approach to Thresholding

    4.7 Practical Results Obtained Using the Global Valley Method

    4.8 Histogram Concavity Analysis

    4.9 Concluding Remarks

    4.10 Bibliographical and Historical Notes

    4.11 Problems

    Chapter 5. Edge detection

    Abstract

    5.1 Introduction

    5.2 Basic Theory of Edge Detection

    5.3 The Template Matching Approach

    5.4 Theory of 3×3 Template Operators

    5.5 The Design of Differential Gradient Operators

    5.6 The Concept of a Circular Operator

    5.7 Detailed Implementation of Circular Operators

    5.8 The Systematic Design of Differential Edge Operators

    5.9 Problems With the Above Approach—Some Alternative Schemes

    5.10 Hysteresis Thresholding

    5.11 The Canny Operator

    5.12 The Laplacian Operator

    5.13 Concluding Remarks

    5.14 Bibliographical and Historical Notes

    5.15 Problems

    Chapter 6. Corner, interest point, and invariant feature detection

    Abstract

    6.1 Introduction

    6.2 Template Matching

    6.3 Second-Order Derivative Schemes

    6.4 A Median Filter–based Corner Detector

    6.5 The Harris Interest Point Operator

    6.6 Corner Orientation

    6.7 Local Invariant Feature Detectors and Descriptors

    6.8 Concluding Remarks

    6.9 Bibliographical and Historical Notes

    6.10 Problems

    Chapter 7. Texture analysis

    Abstract

    7.1 Introduction

    7.2 Some Basic Approaches to Texture Analysis

    7.3 Graylevel Co-occurrence Matrices

    7.4 Laws’ Texture Energy Approach

    7.5 Ade’s Eigenfilter Approach

    7.6 Appraisal of the Laws and Ade approaches

    7.7 Concluding Remarks

    7.8 Bibliographical and Historical Notes

    Part 2: Intermediate-level vision

    Part 2. Intermediate-level vision

    Chapter 8. Binary shape analysis

    Abstract

    8.1 Introduction

    8.2 Connectedness in Binary Images

    8.3 Object Labeling and Counting

    8.4 Size Filtering

    8.5 Distance Functions and Their Uses

    8.6 Skeletons and Thinning

    8.7 Other Measures for Shape Recognition

    8.8 Boundary Tracking Procedures

    8.9 Concluding Remarks

    8.10 Bibliographical and Historical Notes

    8.11 Problems

    Chapter 9. Boundary pattern analysis

    Abstract

    9.1 Introduction

    9.2 Boundary Tracking Procedures

    9.3 Centroidal Profiles

    9.4 Problems With the Centroidal Profile Approach

    9.5 The (s,ψ) Plot

    9.6 Tackling the Problems of Occlusion

    9.7 Accuracy of Boundary Length Measures

    9.8 Concluding Remarks

    9.9 Bibliographical and Historical Notes

    9.10 Problems

    Chapter 10. Line, circle, and ellipse detection

    Abstract

    10.1 Introduction

    10.2 Application of the Hough Transform to Line Detection

    10.3 The Foot-of-Normal Method

    10.4 Using RANSAC for Straight Line Detection

    10.5 Location of Laparoscopic Tools

    10.6 Hough-Based Schemes for Circular Object Detection

    10.7 The Problem of Unknown Circle Radius

    10.8 Overcoming the Speed Problem

    10.9 Ellipse Detection

    10.10 Human Iris Location

    10.11 Concluding Remarks

    10.12 Bibliographical and Historical Notes

    10.13 Problems

    Chapter 11. The generalized Hough transform

    Abstract

    11.1 Introduction

    11.2 The Generalized Hough Transform

    11.3 The Relevance of Spatial Matched Filtering

    11.4 Gradient Weighting Versus Uniform Weighting

    11.5 Use of the GHT for Ellipse Detection

    11.6 Comparing the Various Methods for Ellipse Detection

    11.7 A Graph-Theoretic Approach to Object Location

    11.8 Possibilities for Saving Computation

    11.9 Using the GHT for Feature Collation

    11.10 Generalizing the Maximal Clique and Other Approaches

    11.11 Search

    11.12 Concluding Remarks

    11.13 Bibliographical and Historical Notes

    11.14 Problems

    Chapter 12. Object segmentation and shape models

    Abstract

    12.1 Introduction

    12.2 Active Contours

    12.3 Practical Results Obtained Using Active Contours

    12.4 The Level-Set Approach to Object Segmentation

    12.5 Shape Models

    12.6 Concluding Remarks

    12.7 Bibliographical and Historical Notes

    Part 3: Machine learning and deep learning networks

    Part 3. Machine learning and deep learning networks

    Chapter 13. Basic classification concepts

    Abstract

    13.1 Introduction

    13.2 The Nearest Neighbor Algorithm

    13.3 Bayes’ Decision Theory

    13.4 Relation of the Nearest Neighbor and Bayes’ Approaches

    13.5 The Optimum Number of Features

    13.6 Cost Functions and Error–Reject Tradeoff

    13.7 Supervised and Unsupervised Learning

    13.8 Cluster Analysis

    13.9 The Support Vector Machine

    13.10 Artificial Neural Networks

    13.11 The Back-Propagation Algorithm

    13.12 Multilayer Perceptron Architectures

    13.13 Overfitting to the Training Data

    13.14 Concluding Remarks

    13.15 Bibliographical and Historical Notes

    13.16 Problems

    Chapter 14. Machine learning: Probabilistic methods

    Abstract

    14.1 Introduction

    14.2 Mixtures of Gaussians and the EM Algorithm

    14.3 A More General View of the EM Algorithm

    14.4 Some Practical Examples

    14.5 Principal Components Analysis

    14.6 Multiple Classifiers

    14.7 The Boosting Approach

    14.8 Modeling AdaBoost

    14.9 Loss Functions for Boosting

    14.10 The LogitBoost Algorithm

    14.11 The Effectiveness of Boosting

    14.12 Boosting with Multiple Classes

    14.13 The Receiver Operating Characteristic

    14.14 Concluding Remarks

    14.15 Bibliographical and Historical Notes

    14.16 Problems

    Chapter 15. Deep-learning networks

    Abstract

    15.1 Introduction

    15.2 Convolutional Neural Networks

    15.3 Parameters for Defining CNN Architectures

    15.4 LeCun et al.’s LeNet Architecture

    15.5 Krizhevsky et al.’s AlexNet Architecture

    15.6 Zeiler and Fergus’s Work on CNN Architectures

    15.7 Zeiler and Fergus’s Visualization Experiments

    15.8 Simonyan and Zisserman’s VGGNet Architecture

    15.9 Noh et al.’s DeconvNet Architecture

    15.10 Badrinarayanan et al.’s SegNet Architecture

    15.11 Recurrent Neural Networks

    15.12 Concluding Remarks

    15.13 Bibliographical and Historical Notes

    Part 4: 3D vision and motion

    Part 4. 3D vision and motion

    Chapter 16. The three-dimensional world

    Abstract

    16.1 Introduction

    16.2 Three-Dimensional Vision—The Variety of Methods

    16.3 Projection Schemes for Three-Dimensional Vision

    16.4 Shape from Shading

    16.5 Photometric Stereo

    16.6 The Assumption of Surface Smoothness

    16.7 Shape from Texture

    16.8 Use of Structured Lighting

    16.9 Three-Dimensional Object Recognition Schemes

    16.10 Horaud’s Junction Orientation Technique

    16.11 An Important Paradigm—Location of Industrial Parts

    16.12 Concluding Remarks

    16.13 Bibliographical and Historical Notes

    16.14 Problems

    Chapter 17. Tackling the perspective n-point problem

    Abstract

    17.1 Introduction

    17.2 The Phenomenon of Perspective Inversion

    17.3 Ambiguity of Pose Under Weak Perspective Projection

    17.4 Obtaining Unique Solutions to the Pose Problem

    17.5 Concluding Remarks

    17.6 Bibliographical and Historical Notes

    17.7 Problems

    Chapter 18. Invariants and perspective

    Abstract

    18.1 Introduction

    18.2 Cross Ratios: The Ratio of Ratios Concept

    18.3 Invariants for Noncollinear Points

    18.4 Invariants for Points on Conics

    18.5 Differential and Semidifferential Invariants

    18.6 Symmetric Cross-Ratio Functions

    18.7 Vanishing Point Detection

    18.8 More on Vanishing Points

    18.9 Apparent Centers of Circles and Ellipses

    18.10 Perspective Effects in Art and Photography

    18.11 Concluding Remarks

    18.12 Bibliographical and Historical Notes

    18.13 Problems

    Chapter 19. Image transformations and camera calibration

    Abstract

    19.1 Introduction

    19.2 Image Transformations

    19.3 Camera Calibration

    19.4 Intrinsic and Extrinsic Parameters

    19.5 Correcting for Radial Distortions

    19.6 Multiple View Vision

    19.7 Generalized Epipolar Geometry

    19.8 The Essential Matrix

    19.9 The Fundamental Matrix

    19.10 Properties of the Essential and Fundamental Matrices

    19.11 Estimating the Fundamental Matrix

    19.12 An Update on the Eight-Point Algorithm

    19.13 Image Rectification

    19.14 3-D Reconstruction

    19.15 Concluding Remarks

    19.16 Bibliographical and Historical Notes

    19.17 Problems

    Chapter 20. Motion

    Abstract

    20.1 Introduction

    20.2 Optical Flow

    20.3 Interpretation of Optical Flow Fields

    20.4 Using Focus of Expansion to Avoid Collision

    20.5 Time-to-Adjacency Analysis

    20.6 Basic Difficulties with the Optical Flow Model

    20.7 Stereo from Motion

    20.8 The Kalman Filter

    20.9 Wide Baseline Matching

    20.10 Concluding Remarks

    20.11 Bibliographical and Historical Notes

    20.12 Problem

    Part 5: Putting computer vision to work

    Part 5. Putting computer vision to work

    Chapter 21. Face detection and recognition: The impact of deep learning

    Abstract

    21.1 Introduction

    21.2 A Simple Approach to Face Detection

    21.3 Facial Feature Detection

    21.4 The Viola–Jones Approach to Rapid Face Detection

    21.5 The Eigenface Approach to Face Recognition

    21.6 More on the Difficulties of Face Recognition

    21.7 Frontalization

    21.8 The Sun et al. DeepID Face Representation System

    21.9 Fast Face Detection Revisited

    21.10 The Face as Part of a 3-D Object

    21.11 Concluding Remarks

    21.12 Bibliographical and Historical Notes

    Chapter 22. Surveillance

    Abstract

    22.1 Introduction

    22.2 Surveillance—The Basic Geometry

    22.3 Foreground–Background Separation

    22.4 Particle Filters

    22.5 Use of Color Histograms for Tracking

    22.6 Implementation of Particle Filters

    22.7 Chamfer Matching, Tracking, and Occlusion

    22.8 Combining Views from Multiple Cameras

    22.9 Applications to the Monitoring of Traffic Flow

    22.10 License Plate Location

    22.11 Occlusion Classification for Tracking

    22.12 Distinguishing Pedestrians by Their Gait

    22.13 Human Gait Analysis

    22.14 Model-based Tracking of Animals

    22.15 Concluding Remarks

    22.16 Bibliographical and Historical Notes

    22.17 Problem

    Chapter 23. In-vehicle vision systems

    Abstract

    23.1 Introduction

    23.2 Locating the Roadway

    23.3 Location of Road Markings

    23.4 Location of Road Signs

    23.5 Location of Vehicles

    23.6 Information Obtained by Viewing License Plates and Other Structural Features

    23.7 Locating Pedestrians

    23.8 Guidance and Egomotion

    23.9 Vehicle Guidance in Agriculture

    23.10 Concluding Remarks

    23.11 More Detailed Developments and Bibliographies Relating to Advanced Driver Assistance Systems

    23.12 Problem

    Chapter 24. Epilogue—Perspectives in vision

    Abstract

    24.1 Introduction

    24.2 Parameters of Importance in Machine Vision

    24.3 Tradeoffs

    24.4 Moore’s Law in Action

    24.5 Hardware, Algorithms, and Processes

    24.6 The Importance of Choice of Representation

    24.7 Past, Present, and Future

    24.8 The Deep Learning Explosion

    24.9 Bibliographical and Historical Notes

    Appendix A. Robust statistics

    A.1 Introduction

    A.2 Preliminary Definitions and Analysis

    A.3 The M-Estimator (Influence Function) Approach

    A.4 The Least Median of Squares Approach to Regression

    A.5 Overview of the Robustness Problem

    A.6 The RANSAC Approach

    A.7 Concluding Remarks

    A.8 Bibliographical and Historical Notes

    A.9 Problems

    Appendix B. The sampling theorem

    B.1 The Sampling Theorem

    Appendix C. The representation of color

    C.1 Introduction

    C.2 Details of the HSI Color Representation

    C.3 A Typical Example of the Use of Color

    C.4 Bibliographical and Historical Notes

    Appendix D. Sampling from distributions

    D.1 Introduction

    D.2 The Box–Muller and Related Methods

    D.3 Bibliographical and Historical Notes

    References

    Index

    Copyright

    Academic Press is an imprint of Elsevier

    125 London Wall, London EC2Y 5AS, United Kingdom

    525 B Street, Suite 1800, San Diego, CA 92101-4495, United States

    50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States

    The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom

    Copyright © 2018 Elsevier Inc. All rights reserved.

    No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

    This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

    Notices

    Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

    Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

    To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

    British Library Cataloguing-in-Publication Data

    A catalogue record for this book is available from the British Library

    Library of Congress Cataloging-in-Publication Data

    A catalog record for this book is available from the Library of Congress

    ISBN: 978-0-12-809284-2

    For Information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals

    Publisher: Mara Conner

    Acquisition Editor: Tim Pitts

    Editorial Project Manager: Charlotte Kent

    Production Project Manager: Sruthi Satheesh

    Cover Designer: Greg Harris

    Typeset by MPS Limited, Chennai, India

    Dedication

    This book is dedicated to my family.

    To my late mother, Mary Davies, to record her never-failing love and devotion.

    To my late father, Arthur Granville Davies, who passed on to me his appreciation of the beauties of mathematics and science.

    To my wife, Joan, for love, patience, support, and inspiration.

    To my children, Elizabeth, Sarah, and Marion, the music in my life.

    To my grandchildren, Jasper, Jerome, Eva, and Tara, for constantly reminding me of the carefree joys of youth!

    About the Author

    Roy Davies is Emeritus Professor of Machine Vision at Royal Holloway, University of London, United Kingdom. He has worked on many aspects of vision, from feature detection and noise suppression to robust pattern matching and real-time implementations of practical vision tasks. His interests include automated visual inspection, surveillance, vehicle guidance, and crime detection. He has published more than 200 papers and three books—Machine Vision: Theory, Algorithms, Practicalities (1990), Electronics, Noise and Signal Recovery (1993), and Image Processing for the Food Industry (2000); the first of these has been widely used internationally for more than 25 years, and is now out in this much enhanced fifth edition. Roy is a fellow of the IoP and the IET, and a senior member of the IEEE. He is on the Editorial Boards of Pattern Recognition Letters, Real-Time Image Processing, Imaging Science, and IET Image Processing. He holds a DSc from the University of London, he was awarded BMVA Distinguished Fellow in 2005, and Fellow of the International Association of Pattern Recognition in 2008.

    Foreword

    Mark S. Nixon, University of Southampton, Southampton, United Kingdom

    It is an honor to write a foreword for Roy Davies’ new edition of Computer and Machine Vision, now entitled Computer Vision: Principles, Algorithms, Applications, Learning. This is one of the major books in Computer Vision and not just for its longevity, having now reached its Fifth Edition. It is actually a splendid achievement to reach this status and it reflects not only on the tenacity and commitment of its author, but also on the achievements of the book itself.

    Computer Vision has shown awesome progress in its short history. This is part due to technology: computers are much faster and memory is now much cheaper than they were in the early days when Roy started his research. There have been many achievements and many developments. All of this can affect the evolution of a textbook. There have been excellent textbooks in the past, which were neither continued nor maintained. That has been avoided here as the textbook has continued to mature with the field and its many developments.

    We can look forward to a future where automated computer vision systems will make our lives easier while enriching them too. There are already many applications of Computer Vision in the food industry and robotic cars that will be with us very soon. Then there are continuing advancements in medical image analysis, where Computer Vision techniques can be used to aid in diagnosis and therapy by automated means. Even accessing a mobile phone is considerably more convenient when using a fingerprint and access by face recognition continues to improve. These have all come about due to advancements in computers, Computer Vision, and applied artificial intelligence.

    Adherents of Computer Vision will know it to be an exciting field indeed. It manages to cover many aspects of technology from human vision to machine learning requiring electronic hardware, computer implementations, and a lot of computer software. Roy continues to cover these in excellent detail.

    I remember the First Edition when it was first published in 1990 with its unique and pragmatic blend of theory, implementation, and algorithms. I am pleased to see that the Fifth Edition maintains this unique approach, much appreciated by students in previous editions who wanted an accessible introduction to Computer Vision. It has certainly increased in size with age, and that is often the way with books. It is most certainly the way with Computer Vision since many of its researchers continue to improve, refine, and develop new techniques.

    A major change here is the inclusion of Deep Learning. Indeed, this has been a major change in the field of Computer Vision and Pattern Recognition. One implication of the increase in computing power and the reduction of memory cost is that techniques can become considerably more complex, and that complexity lends itself to application in the analysis of big data. One cannot ignore the performance of deep learning and convolutional neural networks: one only has to peruse the program of top international conferences to perceive their revolutionary effect on research direction. Naturally, it is early days but it is good to have guidance as we have here. The nature of performance is always in question in any system in artificial intelligence and part of the way to answer those questions is to consider more deeply the architectures and their basis. That again is the function of a textbook for it is the distillation of research and practice in a ratiocinated exposition. It is a brave move to include Deep Learning in this edition, but a necessary one.

    And what of Roy Davies himself? Following his DPhil in Solid State Physics at Oxford, he later developed a new sensitive method in Nuclear Resonance called Davies-ENDOR (Electron and Nuclear Double Resonance) which avoided the blind spots of its predecessor Mims-ENDOR. In 1970 he was appointed as a lecturer at Royal Holloway and a long series of publications in pattern recognition and its applications led to the award of his Personal Chair, his DSc and then the Distinguished Fellow of the British Machine Vision Association (BMVA), 2005. He has served the BMVA in many ways, latterly editing its Newsletter. Clearly the level of his work and his many contacts and papers have contributed much to the material that is found herein.

    I look forward to having this Fifth Edition sitting proudly in my shelf, replacing the Fourth that will in turn pass to one of my student’s shelves. It will not stop there for long for it is one of the textbooks I often turn to for the information I need. Unlike the snapshots to be found on the Web, in a textbook I find it placed in context and in sequence and with extension to other material. That is the function of a textbook and it will be well served by this Fifth Edition.

    July 2017

    Preface to the Fifth Edition

    Roy Davies, Royal Holloway, University of London, United Kingdom

    The first edition of this book came out in 1990, and was welcomed by many researchers and practitioners. However, in the subsequent two decades the subject moved on at a rapidly accelerating rate, and many topics that hardly deserved a mention in the first edition had to be solidly incorporated into subsequent editions. For example, it seemed particularly important to bring in significant amounts of new material on feature detection, mathematical morphology, texture analysis, inspection, artificial neural networks, 3D vision, invariance, motion analysis, object tracking, and robust statistics. And in the fourth edition, cognizance had to be taken of the widening range of applications of the subject: in particular, two chapters had to be added on surveillance and in-vehicle vision systems. Since then, the subject has not stood still. In fact, the past four or five years have seen the onset of an explosive growth in research on deep neural networks, and the practical achievements resulting from this have been little short of staggering. It soon became abundantly clear that the fifth edition would have to reflect this radical departure—both in fundamental explanation and in practical coverage. Indeed, it necessitated a new part in the book—Part 3, Machine Learning and Deep Learning Networks—a heading which affirms that the new content reflects not only Deep Learning (a huge enhancement over the older Artificial Neural Networks) but also an approach to pattern recognition that is based on rigorous probabilistic methodology.

    All this is not achieved without presentation problems: for probabilistic methodology can only be managed properly within a rather severe mathematical environment. Too little maths, and the subject could be so watered down as to be virtually content-free: too much maths, and many readers might not be able to follow the explanations. Clearly, one should not protect readers from the (mathematical) reality of the situation. Hence, Chapter 14 had to be written in such a way as to demonstrate in full what type of methodology is involved, while providing paths that would take readers past some of the mathematical complexities—at least, on first encounter. Once past the relatively taxing Chapter 14, Chapters 15 and 21 take the reader through two accounts consisting largely of case studies, the former through a crucial development period (2012–2015) for deep learning networks, and the latter through a similar period (2013–2016) during which deep learning was targeted strongly at face detection and recognition, enabling remarkable advances to be made. It should not go unnoticed that these additions have so influenced the content of the book that the title had to be modified to reflect them. Interestingly, the organization of the book was further modified by collecting three applications chapters into the new Part 5, Putting Computer Vision to Work.

    It is worth remarking that, at this point in time, computer vision has attained a level of maturity that has made it substantially more rigorous, reliable, generic, and—in the light of the improved hardware facilities now available for its implementation (in particular, extremely powerful GPUs)—capable of real-time performance. This means that workers are more than ever before using it in serious applications, and with fewer practical difficulties. It is intended that this edition of the book will reflect this radically new and exciting state of affairs at a fundamental level.

    A typical final-year undergraduate course on vision for Electronic Engineering and Computer Science students might include much of the work of Chapters 1–13 and Chapter 16, plus a selection of sections from other chapters, according to requirements. For MSc or PhD research students, a suitable lecture course might go on to cover Parts 3 or 4 in depth, and several of the chapters in Part 5, with many practical exercises being undertaken on image analysis systems. (The importance of the appendix on robust statistics should not be underestimated once one gets onto serious work, though this will probably be outside the restrictive environment of an undergraduate syllabus.) Here much will depend on the research programme being undertaken by each individual student. At this stage the text may have to be used more as a handbook for research, and indeed, one of the prime aims of the volume is to act as a handbook for the researcher and practitioner in this important area.

    As mentioned in the original Preface, this book leans heavily on experience I have gained from working with postgraduate students: in particular, I would like to express my gratitude to Mark Edmonds, Simon Barker, Daniel Celano, Darrel Greenhill, Derek Charles, Mark Sugrue, and Georgios Mastorakis, all of whom have in their own ways helped to shape my view of the subject. In addition, it is a pleasure to recall very many rewarding discussions with my colleagues Barry Cook, Zahid Hussain, Ian Hannah, Dev Patel, David Mason, Mark Bateman, Tieying Lu, Adrian Johnstone, and Piers Plummer, the last two of whom were particularly prolific in generating hardware systems for implementing my research group’s vision algorithms. Next, I would like to record my thanks to my British Machine Vision Association colleagues for many wide-ranging discussions on the nature of the subject: in particular, I am hugely grateful to Majid Mirmehdi, Adrian Clark, Neil Thacker, and Mark Nixon, who, over time, have strongly influenced the development of the book and left a permanent mark on it. Next, I would like to thank the anonymous reviewers for making insightful comments and what have turned out to be extremely valuable suggestions. Finally, I am indebted to Tim Pitts of Elsevier Science for his help and encouragement, without which this fifth edition might never have been completed.

    Supporting materials:

    Elsevier’s website for the book contains programming and other resources to help readers and students using this text. Please check the publisher’s website for further information: https://www.elsevier.com/books-and-journals/book-companion/9780128092842.

    Preface to the First Edition

    Over the past 30 years or so, machine vision has evolved into a mature subject embracing many topics and applications: these range from automatic (robot) assembly to automatic vehicle guidance, from automatic interpretation of documents to verification of signatures, and from analysis of remotely sensed images to checking of fingerprints and human blood cells; currently, automated visual inspection is undergoing very substantial growth, necessary improvements in quality, safety, and cost-effectiveness being the stimulating factors. With so much ongoing activity, it has become a difficult business for the professional to keep up with the subject and with relevant methodologies: in particular, it is difficult for them to distinguish accidental developments from genuine advances. It is the purpose of this book to provide background in this area.

    The book was shaped over a period of 10–12 years, through material I have given on undergraduate and postgraduate courses at London University, and contributions to various industrial courses and seminars. At the same time, my own investigations coupled with experience gained while supervising PhD and postdoctoral researchers helped to form the state of mind and knowledge that is now set out here. Certainly it is true to say that if I had had this book 8, 6, 4, or even 2 years ago, it would have been of inestimable value to myself for solving practical problems in machine vision. It is therefore my hope that it will now be of use to others in the same way. Of course, it has tended to follow an emphasis that is my own—and in particular one view of one path towards solving automated visual inspection and other problems associated with the application of vision in industry. At the same time, although there is a specialism here, great care has been taken to bring out general principles—including many applying throughout the field of image analysis. The reader will note the universality of topics such as noise suppression, edge detection, principles of illumination, feature recognition, Bayes’ theory, and (nowadays) Hough transforms. However, the generalities lie deeper than this. The book has aimed to make some general observations and messages about the limitations, constraints, and tradeoffs to which vision algorithms are subject. Thus there are themes about the effects of noise, occlusion, distortion, and the need for built-in forms of robustness (as distinct from less successful ad hoc varieties and those added on as an afterthought); there are also themes about accuracy, systematic design, and the matching of algorithms and architectures. Finally, there are the problems of setting up lighting schemes which must be addressed in complete systems, yet which receive scant attention in most books on image processing and analysis. These remarks will indicate that the text is intended to be read at various levels—a factor that should make it of more lasting value than might initially be supposed from a quick perusal of the contents.

    Of course, writing a text such as this presents a great difficulty in that it is necessary to be highly selective: space simply does not allow everything in a subject of this nature and maturity to be dealt with adequately between two covers. One solution might be to dash rapidly through the whole area mentioning everything that comes to mind, but leaving the reader unable to understand anything in detail or to achieve anything having read the book. However, in a practical subject of this nature this seemed to me a rather worthless extreme. It is just possible that the emphasis has now veered too much in the opposite direction, by coming down to practicalities (detailed algorithms, details of lighting schemes, and so on): individual readers will have to judge this for themselves. On the other hand, an author has to be true to himself and my view is that it is better for a reader or student to have mastered a coherent series of topics than to have a mishmash of information that he is later unable to recall with any accuracy. This, then, is my justification for presenting this particular material in this particular way and for reluctantly omitting from detailed discussion such important topics as texture analysis, relaxation methods, motion, and optical flow.

    As for the organization of the material, I have tried to make the early part of the book lead into the subject gently, giving enough detailed algorithms (especially in Chapter 2: Images and imaging operations and Chapter 6: Corner, interest point, and invariant feature detection) to provide a sound feel for the subject—including especially vital, and in their own way quite intricate, topics such as connectedness in binary images. Hence Part I provides the lead-in, although it is not always trivial material and indeed some of the latest research ideas have been brought in (e.g., on thresholding techniques and edge detection). Part II gives much of the meat of the book. Indeed, the (book) literature of the subject currently has a significant gap in the area of intermediate-level vision; while high-level vision (AI) topics have long caught the researcher’s imagination, intermediate-level vision has its own difficulties which are currently being solved with great success (note that the Hough transform, originally developed in 1962, and by many thought to be a very specialist topic of rather esoteric interest, is arguably only now coming into its own). Part II and the early chapters of Part III aim to make this clear, while Part IV gives reasons why this particular transform has become so useful. As a whole, Part III aims to demonstrate some of the practical applications of the basic work covered earlier in the book, and to discuss some of the principles underlying implementation: it is here that chapters on lighting and hardware systems will be found. As there is a limit to what can be covered in the space available, there is a corresponding emphasis on the theory underpinning practicalities. Probably this is a vital feature, since there are many applications of vision both in industry and elsewhere, yet listing them and their intricacies risks dwelling on interminable detail, which some might find insipid; furthermore, detail has a tendency to date rather rapidly. Although the book could not cover 3D vision in full (this topic would easily consume a whole volume in its own right), a careful overview of this complex mathematical and highly important subject seemed vital. It is therefore no accident that Chapter 16, The three-dimensional world, is the longest in the book. Finally, Part IV asks questions about the limitations and constraints of vision algorithms and answers them by drawing on information and experience from earlier chapters. It is tempting to call the last chapter the Conclusion. However, in such a dynamic subject area any such temptation has to be resisted, although it has still been possible to draw a good number of lessons on the nature and current state of the subject. Clearly, this chapter presents a personal view but I hope it is one that readers will find interesting and useful.

    Acknowledgments

    The author would like to credit the following sources for permission to reproduce tables, figures, and extracts of text from earlier publications:

    Elsevier

    For permission to reprint portions of the following papers from Image and Vision Computing as text in Chapter 5; as Tables 5.1–5.5; and as Figs. 3.31, 5.2:

    Davies (1984b, 1987b)

    For permissiovon to reprint portions of the following paper from Pattern Recognition as text in Chapter 8; and as Fig. 8.11:

    Davies and Plummer (1981)

    For permission to reprint portions of the following papers from Pattern Recognition Letters as text in Chapters 3, 5, 10, 11, 13; as Tables 3.2; 10.4; 11.1; and as Figs. 3.6, 3.8, 3.10, 5.1, 5.3, 10.1, 10.10, 10.11, 10.12, 10.13, 11.1,11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 11.9, 11.10, 11.11:

    Davies (1986, 1987a,c,d, 1988b,c,e, 1989a)

    For permission to reprint portions of the following paper from Signal Processing as text in Chapter 3; and as Fig. 3.15, 3.17, 3.18, 3.19, 3.20:

    Davies (1989b)

    For permission to reprint portions of the following paper from Advances in Imaging and Electron Physics as text in Chapter 3:

    Davies (2003c)

    For permission to reprint portions of the following article from Encyclopedia of Physical Science and Technology as Figs. 8.9, 8.12, 9.1, 9.4:

    Davies, E.R., 1987. Visual inspection, automatic (robotics). In: Meyers, R.A. (Ed.) Encyclopedia of Physical Science and Technology, vol. 14. Academic Press, San Diego, pp. 360–377.

    IEEE

    For permission to reprint portions of the following paper as text in Chapter 3; and as Figs. 3.4, 3.5, 3.7, 3.11:

    Davies (1984a)

    IET

    For permission to reprint portions of the following papers from the IET Proceedings and Colloquium Digests as text in Chapters 3, 4, 6, 13, 21, 22, 23; as Tables 3.3, 4.2; and as Fig. 3.21, 3.28, 3.29, 4.6, 4.7, 4.8, 4.9, 4.10, 6.5, 6.6, 6.7, 6.8, 6.9, 6.12, 11.20, 14.16, 14.17, 22.16, 22.17, 22.18, 23.1, 23.3, 23.4:

    Davies (1988a, 1999c, 2000a, 2005, 2008)

    Sugrue and Davies (2007)

    Mastorakis and Davies (2011)

    Davies et al. (1998)

    Davies et al. (2003)

    IFS Publications Ltd

    For permission to reprint portions of the following paper as text in Chapters 12, 20; and as Figs. 10.7, 10.8:

    Davies (1984c)

    The Royal Photographic Society

    For permission to reprint portions of the following papers (see also the Maney website: www.maney.co.uk/journals/ims) as text in Chapter 3; and as Fig. 3.12, 3.13, 3.22, 3.23, 3.24:

    Davies (2000c)

    Charles and Davies (2004)

    Springer-Verlag

    For permission to reprint portions of the following papers as text in Chapter 6; and as Figs. 6.2, 6.4:

    Davies (1988d), Figs. 1–3

    World Scientific

    For permission to reprint portions of the following book as text in Chapters 7, 22, 23; and as Fig. 3.25, 3.26, 3.27, 5.4, 22.20, 23.15, 23.16:

    Davies, 2000. Image Processing for the Food Industry. World Scientific, Singapore.

    The Committee of the Alvey Vision Club

    To acknowledge that extracts of text in Chapter 11 and Figs. 11.12, 11.13, 11.17 were first published in the Proceedings of the 4th Alvey Vision Conference:

    Davies, E.R., 1988. An alternative to graph matching for locating objects from their salient features. In: Proceedings of 4th Alvey Vision Conference, Manchester, 31 August–2 September, pp. 281–286.

    F.H. Sumner

    For permission to reprint portions of the following article from State of the Art Report: Supercomputer Systems Technology as text in Chapter 8; and as Fig. 8.4:

    Davies, E.R., 1982. Image processing. In: Sumner, F.H. (Ed.), State of the Art Report: Supercomputer Systems Technology. Pergamon Infotech, Maidenhead, pp. 223–244.

    Royal Holloway, University of London

    For permission to reprint extracts from the following examination questions, originally written by E.R. Davies:

    EL385/97/2; EL333/98/2; EL333/99/2, 3, 5, 6; EL333/01/2, 4–6; PH5330/98/3, 5; PH5330/03/1–5; PH4760/04/1–5.

    University of London

    For permission to reprint extracts from the following examination questions, originally written by E.R. Davies:

    PH385/92/2, 3; PH385/93/1–3; PH385/94/1–4; PH385/95/4; PH385/96/3, 6; PH433/94/3, 5; PH433/96/2, 5.

    Collectors of publicly available image databases and utilities

    To acknowledge use of the following image databases and utilities for generating a number of images presented in Chapters 15 and 21:

    The Cambridge semantic segmentation online demo

    The images in Fig. 15.14 were processed using the online demo available from the University of Cambridge, UK (see Badrinarayanan et al., 2015) at

    http://mi.eng.cam.ac.uk/projects/segnet/ (website accessed 07.10.16).

    The CMU image dataset

    The newsradio image used to obtain Fig. 21.6 was taken from Test Set C—collected at CMU by Rowley, H.A., Baluja, S., and Kanade, T.—and is described in their paper:

    Rowley, H.A., Baluja, S., Kanade, T., 1998. Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 23–38.

    It may be downloaded from the website:

    http://vasc.ri.cmu.edu/idb/html/face/frontal_images/ (website accessed 20.04.17).

    The Bush LFW dataset

    The images of George W. Bush used in Chapter 21 were taken from the set collected at the University of Massachusetts:

    Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E., 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. University of Massachusetts, Amherst, Technical Report 07-49, October.

    The database may be downloaded from the website:

    http://vis-www.cs.umass.edu/lfw/ (website accessed 20.04.17).

    Topics Covered in Application Case Studies

    Influences Impinging Upon Integrated Vision System Design

    Glossary of Acronyms and Abbreviations

    1-D one dimension/one-dimensional

    2-D two dimensions/two-dimensional

    3-D three dimensions/three-dimensional

    AAM active appearance model

    ACM Association for Computing Machinery (USA)

    ADAS advanced driver assistance system

    AFW annotated faces in the wild

    AI artificial intelligence

    ANN artificial neural network

    AP average precision

    APF auxiliary particle filter

    ASCII American Standard Code for Information Interchange

    ASIC application specific integrated circuit

    ASM active shape model

    ATM automated teller machine

    AUC area under curve

    AVI audio video interleave

    BCVM between-class variance method

    BDRF bidirectional reflectance distribution function

    BetaSAC beta [distribution] sampling consensus

    BMVA British Machine Vision Association

    BPTT backpropagation through time

    CAD computer-aided design

    CAM computer-aided manufacture

    CCTV closed-circuit television

    CDF cumulative distribution function

    CLIP cellular logic image processor

    CNN convolutional neural network

    CPU central processor unit

    CRF conditional random field

    DCSM distinct class based splitting measure

    DET Beaudet determinant operator

    DG differential gradient

    DN Dreschler–Nagel corner detector

    DNN deconvolution network

    DoF degree of freedom

    DoG difference of Gaussians

    DPM deformable parts models

    EM expectation maximization

    EURASIP European Association for Signal Processing

    f.c. fully connected

    FAR frontalization for alignment and recognition

    FAST features from accelerated segment test

    FCN fully convolutional network

    FDDB face detection data set and benchmark

    FDR face detection and recognition

    FFT fast Fourier transform

    FN false negative

    fnr false negative rate

    FoE focus of expansion

    FoV field of view

    FP false positive

    FPGA field programmable gate array

    FPP full perspective projection

    fpr false positive rate

    GHT generalized Hough transform

    GLOH gradient location and orientation histogram

    GMM Gaussian mixture model

    GPS global positioning system

    GPU graphics processing unit

    GroupSAC group sampling consensus

    GVM global valley method

    HOG histogram of orientated gradients

    HSI hue, saturation, intensity

    HT Hough transform

    IBR intensity extrema-based region detector

    IDD integrated directional derivative

    IEE Institution of Electrical Engineers (UK)

    IEEE Institute of Electrical and Electronics Engineers (USA)

    IET Institution of Engineering and Technology (UK)

    ILSVRC ImageNet large-scale visual recognition object challenge

    ILW iterated likelihood weighting

    IMPSAC importance sampling consensus

    IoP Institute of Physics (UK)

    IRLFOD image-restricted, label-free outside data

    ISODATA iterative self-organizing data analysis

    JPEG/JPG Joint Photographic Experts Group

    k-NN k-nearest neighbor

    KL Kullback–Leibler

    KR Kitchen–Rosenfeld corner detector

    LED light emitting diode

    LFF local-feature-focus method

    LFPW labeled face parts in the wild

    LFW labeled faces in the wild

    LIDAR light detection and ranging

    LMedS least median of squares

    LoG Laplacian of Gaussian

    LRN local response normalization

    LS least squares

    LSTM long short-term memory

    LUT lookup table

    MAP maximum a posteriori

    MDL minimum description length

    ML machine learning

    MLP multi-layer perceptron

    MoG mixture of Gaussians

    MP microprocessor

    MSER maximally stable extremal region

    NAPSAC n adjacent points sample consensus

    NIR near infra-red

    NN nearest neighbor

    OCR optical character recognition

    OVR one versus the rest

    PASCAL Network of Excellence on pattern analysis, statistical modeling and computational learning

    PC personal computer

    PCA principal components analysis

    PE processing element

    PnP perspective n-point

    PPR probabilistic pattern recognition

    PR pattern recognition

    PROSAC progressive sample consensus

    PSF point spread function

    R-CNN regions with CNN features

    RAM random access memory

    RANSAC random sample consensus

    RBF radial basis function [classifier]

    RELU rectified linear unit

    RGB red, green, blue

    RHT randomized Hough transform

    RKHS reproducible kernel Hilbert space

    RMS root mean square

    RNN recurrent neural network

    ROC receiver–operator characteristic

    RoI region of interest

    RPS Royal Photographic Society (UK)

    s.d. standard deviation

    SFC Facebook social face classification

    SFOP scale-invariant feature operator

    SIFT scale invariant feature transform

    SIMD single instruction stream, multiple data stream

    Sir sampling importance resampling

    SIS sequential importance sampling

    SISD single instruction stream, single data stream

    SOC sorting optimization curve

    SOM self-organizing map

    SPIE Society of Photo-optical Instrumentation Engineers

    SPR statistical pattern recognition

    STA spatiotemporal attention [neural network]

    SURF speeded-up robust features

    SUSAN smallest univalue segment assimilating nucleus

    SVM support vector machine

    TM template matching

    TMF truncated median filter

    TN true negative

    tnr true negative rate

    TP true positive

    tpr true positive rate

    TV television

    USEF unit step edge function

    VGG Visual Geometry Group (Oxford)

    VJ Viola–Jones

    VLSI very large scale integration

    VMF vector median filter

    VOC visual object classes

    VP vanishing point

    WPP weak perspective projection

    YOLO you only look once

    YTF YouTube faces

    ZH Zuniga–Haralick corner detector

    Chapter 1

    Vision, the challenge

    Abstract

    This chapter introduces the subject of computer vision. It shows how recognition may be performed partly by image processing, although abstract pattern recognition methods are usually needed to complete the task. Important in this process is normalization of the image content to reduce variability so that statistical pattern recognizers such as the nearest neighbor algorithm can carry out their task with limited training requirements and low error rates. It extends the discussion by introducing machine learning and the recently prominent deep learning networks. This chapter also discusses the various applications of vision, contrasting automated visual inspection, and surveillance.

    Keywords

    Computer vision; process of recognition; nearest neighbor algorithm; template matching; image preprocessing; need for normalization; machine learning; deep learning networks; automated visual inspection; surveillance

    1.1 Introduction—Man and His Senses

    Of the five senses—vision, hearing, smell, taste, and touch—vision is undoubtedly the one that man has come to depend upon above all others, and indeed the one that provides most of the data he receives. Not only do the input pathways from the eyes provide megabits of information at each glance but also the data rates for continuous viewing probably exceed 10 Mbps. However, much of this information is redundant and is compressed by the various layers of the visual cortex, so that the higher centers of the brain have to interpret abstractly only a small fraction of the data. Nonetheless, the amount of information the higher centers receive from the eyes must be at least two orders of magnitude greater than all the information they obtain from the other senses.

    Another feature of the human visual system is the ease with which interpretation is carried out. We see a scene as it is—trees in a landscape, books on a desk, widgets in a factory. No obvious deductions are needed and no overt effort is required to interpret each scene; in addition, answers are effectively immediate and are normally available within a tenth of a second. Just now and again some doubt arises—e.g., a wire cube might be seen correctly or inside out. This and a host of other optical illusions are well known, although for the most part we can regard them as curiosities—irrelevant freaks of nature. Somewhat surprisingly, illusions are quite important, since they reflect hidden assumptions that the brain is making in its struggle with the huge amounts of complex visual data it is receiving. We have to pass by this story here (although it resurfaces now and again in various parts of this book). However, the important point is that we are for the most part unaware of the complexities of vision. Seeing is not a simple process: it is just that vision has evolved over millions of years, and there was no particular advantage in evolution giving us any indication of the difficulties of the task (if anything, to have done so would have cluttered our minds with irrelevant information and slowed our reaction times).

    In the present-day and age, man is trying to get machines to do much of his work for him. For simple mechanistic tasks this is not particularly difficult, but for more complex tasks the machine must be given the sense of vision. Efforts have been made to achieve this, sometimes in modest ways, for well over 40 years. At first, schemes were devised for reading, for interpreting chromosome images, and so on; but when such schemes were confronted with rigorous practical tests, the problems often turned out to be more difficult. Generally, researchers react to finding that apparent trivia are getting in the way by intensifying their efforts and applying great ingenuity, and this was certainly so with early efforts at vision algorithm design. However, it soon became plain that the task really is a complex one, in which numerous fundamental problems confront the researcher, and the ease with which the eye can interpret scenes turned out to be highly deceptive.

    Of course, one of the ways in which the human visual system gains over the machine is that the brain possesses more than 10¹⁰ cells (or neurons), some of which have well over 10,000 contacts (or synapses) with other neurons. If each neuron acts as a type of microprocessor, then we have an immense computer in which all the processing elements can operate concurrently. Taking the largest single man-made computer to contain several hundred million rather modest processing elements, the majority of the visual and mental processing tasks that the eye–brain system can perform in a flash have no chance of being performed by present-day man-made systems. Added to these problems of scale, there is the problem of how to organize such a large processing system and also how to program it. Clearly, the eye–brain system is partly hard-wired by evolution but there is also an interesting capability to program it dynamically by training during active use. This need for a large parallel processing system with the attendant complex control problems shows that computer vision must indeed be one of the most difficult intellectual problems to tackle.

    So what are the problems involved in vision that make it apparently so easy for the eye, yet so difficult for the machine? In the next few sections an attempt is made to answer this question.

    1.2 The Nature of Vision

    1.2.1 The Process of Recognition

    This section illustrates the intrinsic difficulties of implementing computer vision, starting with an extremely simple example—that of character recognition. Consider the set of patterns shown in Fig. 1.1A. Each pattern can be considered as a set of 25 bits of information, together with an associated class indicating its interpretation. In each case imagine a computer learning the patterns and their classes by rote. Then any new pattern may be classified (or recognized) by comparing it with this previously learnt training set, and assigning it to the class of the nearest pattern in the training set. Clearly, test pattern (1) (Fig. 1.1B) will be allotted to class U on this basis. Chapter 13, Basic Classification Concepts, shows that this method is a simple form of the nearest neighbor approach to pattern recognition.

    Figure 1.1 Some simple 25-bit patterns and their recognition classes used to illustrate some of the basic problems of recognition: (A) training set patterns (for which the known classes are indicated); (B) test patterns.

    The scheme outlined above seems straightforward and is indeed highly effective, even being able to cope with situations where distortions of the test patterns occur or where noise is present: this is illustrated by test patterns (2) and (3). However, this approach is not always foolproof. First, there are situations where distortions or noise is excessive, so errors of interpretation arise. Second, there are situations where patterns are not badly distorted or subject to obvious noise, yet are misinterpreted: this seems much more serious, since it indicates an unexpected limitation of the technique rather than a reasonable result of noise or distortion. In particular, these problems arise where the test pattern is displaced or misorientated relative to the appropriate training set pattern, as with test pattern (6).

    As will be seen in Chapter 13, Basic Classification Concepts, there is a powerful principle that indicates why the unlikely limitation given above can arise: it is simply that there are insufficient training set patterns, and that those that are present are insufficiently representative of what will arise in practical situations. Unfortunately, this presents a major difficulty, since providing enough training set patterns incurs a serious storage problem and an even more serious search problem when patterns are tested. Furthermore, it is easy to see that these problems are exacerbated as patterns become larger and more real (obviously, the examples of Fig. 1.1 are far from having enough resolution even to display normal type-fonts). In fact, a combinatorial explosion takes place: this is normally taken to mean that one or more parameters produce fast-varying (often exponential) effects, which explode as the parameters increase by modest amounts. Forgetting for the moment that the patterns of Fig. 1.1 have familiar shapes, let us temporarily regard them as random bit patterns. Now the number of bits in these N×N patterns is N: even in a case where N=20, remembering all these patterns and their interpretations would be impossible on any practical machine, and searching systematically through them would take impracticably long (involving times of the order of the age of the universe). Thus it is not only impracticable to consider such brute force means of solving the recognition problem, but is also effectively impossible theoretically. These considerations show that other means are required to tackle the problem.

    1.2.2 Tackling the Recognition Problem

    An obvious means of tackling the recognition problem is to standardize the images in some way. Clearly, normalizing the position and orientation of any 2D picture object would help considerably: indeed this would reduce the number of degrees of freedom by three. Methods for achieving this involve centralizing the objects—arranging that their centroids are at the center of the normalized image—and making their major axes (e.g., deduced by moment calculations) vertical or horizontal. Next, we can make use of the order that is known to be present in the image—and here it may be noted that very few patterns of real interest are indistinguishable from random dot patterns. This approach can be taken further: if patterns are to be nonrandom, isolated noise points may be eliminated. Ultimately, all these methods help by making the test pattern closer to a restricted set of training set patterns (although care must also be taken to process the training set patterns initially so that they are representative of the processed test patterns).

    It is useful to consider character recognition further. Here we can make additional use of what is known about the structure of characters—namely, that they consist of limbs of roughly constant width. In that case the width carries no useful information, so the patterns can be thinned to stick figures (called skeletons—see Chapter 8: Binary Shape Analysis); then, hopefully, there is an even greater chance that the test patterns will be similar to appropriate training set patterns (Fig. 1.2). This process can be regarded as another instance of reducing the number of degrees of freedom in the image, and hence of helping to minimize the combinatorial explosion—or, from a practical point of view, to minimize the size of the training set necessary for effective recognition.

    Figure 1.2 Use of thinning to regularize character shapes. Here character shapes of different limb widths—or even varying limb widths—are reduced to stick figures or skeletons. Thus irrelevant information is removed and at the same time recognition is facilitated.

    Next, consider a rather different way of looking at the problem. Recognition is necessarily a problem of discrimination—i.e., of discriminating between patterns of different classes. However, in practice, considering the natural variation of patterns, including the effects of noise and distortions (or even the effects of breakages or occlusions), there is also a problem of generalizing over patterns of the same class. In practical problems there is a tension between the need to discriminate and the need to generalize. Nor is this a fixed situation. Even for the character recognition task, some classes are so close to others (n’s and h’s will be similar) that less generalization is possible than in other cases. On the other hand, extreme forms of generalization arise when, for example, an A is to be recognized as an A whether it is a capital or small letter, or in italic, bold, suffix, or other form of font—even if it is handwritten. The variability is determined largely by the training set initially provided. What we emphasize here, however, is that generalization is as necessary a prerequisite to successful recognition as is discrimination.

    At this point it is worth considering more carefully the means whereby generalization was achieved in the examples cited above. First, objects were positioned and orientated appropriately; second, they were cleaned of noise spots; and third, they were thinned to skeleton figures (although the latter process is relevant only for certain tasks such as character recognition). In the last case, we are generalizing over characters drawn with all possible limb widths, width being an irrelevant degree of freedom for this type of recognition task. Note that we could have generalized the characters further by normalizing their size and saving another degree of freedom. The common feature of all these processes is that they aim to give the characters a high level of standardization against known types of variability before finally attempting to recognize them.

    The standardization (or generalization) processes outlined above are all realized by image processing, i.e., the conversion of one image into another by suitable means. The result is a two-stage recognition scheme: first, images are converted into more amenable forms containing the same numbers of bits of data; and second, they are classified with the result that their data content is reduced to very few bits (Fig. 1.3). In fact, recognition is a process of data abstraction, the final data being abstract and totally unlike the original data. Thus we must imagine a letter A starting as an array of perhaps 20×20 bits arranged in the form of an A, and then ending as the 7 bits in an ASCII representation of an A, namely 1000001 (which is essentially a random bit pattern bearing no resemblance to an A).

    Figure 1.3 The two-stage recognition paradigm: C, input from camera; G, grab image (digitize and store); P, preprocess; R, recognize (i, image data; a, abstract data). The classical paradigm for object recognition is that of (1) preprocessing (image processing) to suppress noise or other artefacts and to regularize the image data and (2) applying a process of abstract (often statistical) pattern recognition to extract the very few bits required to classify the object.

    The last paragraph reflects to a large extent the history of image analysis. Early on, a good proportion of the image analysis problems being tackled were envisaged as consisting of an image preprocessing task carried out by image processing techniques, followed by a recognition task undertaken by pure pattern recognition methods (see Chapter 13: Basic Classification Concepts). These two topics—image processing and pattern recognition—consumed much research effort and effectively dominated the subject of image analysis, while intermediate-level approaches such as the Hough transform were, for a time, slower to develop. One of the aims of this book is to ensure that such intermediate-level processing techniques are given due emphasis, and indeed that the best range of techniques is applied to any computer vision task.

    1.2.3 Object Location

    The problem that was tackled above—that of character recognition—is a highly constrained one. In a great many practical applications it is necessary to search pictures for objects of various types, rather than just interpreting a small area of a picture.

    Search is a task that can involve prodigious amounts of computation and is also subject to a combinatorial explosion. Imagine the task of searching for a letter E in a page of text. An obvious way of achieving this is to move a suitable template of size n×n over the whole image, of size N×N, and to find where a match occurs (Fig. 1.4). A match can be defined as a position where there is exact agreement between the template and the local portion of the image but, in keeping with the ideas of Section 1.2.1, it will evidently be more relevant to look for a best local match (i.e., a position where the match is locally better than in adjacent regions) and where the match is also good in some more absolute sense, indicating that an E is present.

    Figure 1.4 Template matching, the process of moving a suitable template over an image to determine the precise positions at which a match occurs, hence revealing the presence of objects of a particular type.

    One of the most natural ways of checking for a match is to measure the Hamming distance between the template and the local n×n region of the image, i.e., to sum the number of differences between corresponding bits. This is essentially the process described in Section 1.2.1. Then places with a low Hamming distance are places where the match is good. These template-matching ideas can be extended to cases where the corresponding bit positions in the template and the image do not just have binary values but may have intensity values over a range 0–255. In that case the sums obtained are no longer Hamming distances but may be generalized to the form:

    (1.1)

    It being the local template value, Ii being the local image value, and the sum being taken over the area of the template. This makes template matching practicable in many situations: the possibilities are examined in more detail in subsequent chapters.

    We referred above to a combinatorial explosion in this search problem too. The reason this arises is as follows. First, when a 5×5 template is moved over an N×N image in order to look for a match, the number of operations required is of the order of 5²N², totaling some 1 million operations for a 256×256 image. The problem is that when larger objects are being sought in an image, the number of operations increases as the square of the size of the object, the total number of operations being N²n² when an n×n template is used. For a 30×30 template and a 256×256 image, the number of operations required rises to ~60 million. Note that, in general, a template will be larger than the object it is used to search for, because some background will have to be included to help demarcate the object.

    Next, recall that in general, objects may appear in many orientations in an image (E’s on a printed page are exceptional). If we imagine a possible 360 orientations (i.e., one per degree of rotation), then a corresponding number of templates will in principle have to be applied in order to locate the object. This additional degree of freedom pushes the search effort and time to enormous levels, so far away from the possibility of real-time implementation that new approaches must be found for tackling the task. [Real-time is a commonly used phrase meaning that the information has

    Enjoying the preview?
    Page 1 of 1