Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Object Detection and Recognition in Digital Images: Theory and Practice
Object Detection and Recognition in Digital Images: Theory and Practice
Object Detection and Recognition in Digital Images: Theory and Practice
Ebook910 pages9 hours

Object Detection and Recognition in Digital Images: Theory and Practice

Rating: 5 out of 5 stars

5/5

()

Read preview

About this ebook

Object detection, tracking and recognition in images are key problems in computer vision. This book provides the reader with a balanced treatment between the theory and practice of selected methods in these areas to make the book accessible to a range of researchers, engineers, developers and postgraduate students working in computer vision and related fields.

Key features:

  • Explains the main theoretical ideas behind each method (which are augmented with a rigorous mathematical derivation of the formulas), their implementation (in C++) and demonstrated working in real applications.
  • Places an emphasis on tensor and statistical based approaches within object detection and recognition.
  • Provides an overview of image clustering and classification methods which includes subspace and kernel based processing, mean shift and Kalman filter, neural networks, and k-means methods.
  • Contains numerous case study examples of mainly automotive applications.
  • Includes a companion website hosting full C++ implementation, of topics presented in the book as a software library, and an accompanying manual to the software platform.
LanguageEnglish
PublisherWiley
Release dateMay 20, 2013
ISBN9781118618363
Object Detection and Recognition in Digital Images: Theory and Practice

Related to Object Detection and Recognition in Digital Images

Related ebooks

Physics For You

View More

Related articles

Reviews for Object Detection and Recognition in Digital Images

Rating: 5 out of 5 stars
5/5

1 rating0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Object Detection and Recognition in Digital Images - Boguslaw Cyganek

    Contents

    Cover

    Title Page

    Copyright Page

    Dedication

    Preface

    Acknowledgements

    Notations and Abbreviations

    Chapter 1: Introduction

    1.1 A Sample of Computer Vision

    1.2 Overview of Book Contents

    References

    Chapter 2: Tensor Methods in Computer Vision

    2.1 Abstract

    2.2 Tensor – A Mathematical Object

    2.3 Tensor – A Data Object

    2.4 Basic Properties of Tensors

    2.5 Tensor Distance Measures

    2.6 Filtering of Tensor Fields

    2.7 Looking into Images with the Structural Tensor

    2.8 Object Representation with Tensor of Inertia and Moments

    2.9 Eigendecomposition and Representation of Tensors

    2.10 Tensor Invariants

    2.11 Geometry of Multiple Views: The Multifocal Tensor

    2.12 Multilinear Tensor Methods

    2.13 Closure

    References

    Chapter 3: Classification Methods and Algorithms

    3.1 Abstract

    3.2 Classification Framework

    3.3 Subspace Methods for Object Recognition

    3.4 Statistical Formulation of the Object Recognition

    3.5 Parametric Methods – Mixture of Gaussians

    3.6 The Kalman Filter

    3.7 Nonparametric Methods

    3.8 The Mean Shift Method

    3.9 Neural Networks

    3.10 Kernels in Vision Pattern Recognition

    3.11 Data Clustering

    3.12 Support Vector Domain Description

    3.13 Appendix – MATLAB® and other Packages for Pattern Classification

    3.14 Closure

    Problems and Exercises

    References

    Chapter 4: Object Detection and Tracking

    4.1 Introduction

    4.2 Direct Pixel Classification

    4.3 Detection of Basic Shapes

    4.4 Figure Detection

    4.5 CASE STUDY – Road Signs Tracking and Recognition

    4.6 CASE STUDY – Framework for Object Tracking

    4.7 Pedestrian Detection

    4.8 Closure

    Problems and Exercises

    References

    Chapter 5: Object Recognition

    5.1 Abstract

    5.2 Recognition from Tensor Phase Histograms and Morphological Scale Space

    5.3 Invariant Based Recognition

    5.4 Template Based Recognition

    5.5 Recognition from Deformable Models

    5.6 Ensembles of Classifiers

    5.7 CASE STUDY – Ensemble of Classifiers for Road Sign Recognition from Deformed Prototypes

    5.8 Recognition Based on Tensor Decompositions

    5.9 Eye Recognition for Driver's State Monitoring

    5.10 Object Category Recognition

    5.11 Closure

    Problems and Exercises

    References

    A Appendix

    A.1 Abstract

    A.2 Morphological Scale-Space

    A.3 Morphological Tensor Operators

    A.4 Geometry of Quadratic Forms

    A.5 Testing Classifiers

    A.6 Code Acceleration with OpenMP

    A.7 Useful MATLAB® Functions for Matrix and Tensor Processing

    A.8 Short Guide to the Attached Software

    A.9 Closure

    Problems and Exercises

    References

    Index

    Title Page

    This edition first published 2013

    © 2013 John Wiley & Sons, Ltd

    Registered office

    John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

    For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.

    The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

    All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

    Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

    Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

    MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This books use or discussion of MATLAB® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® software.

    Library of Congress Cataloging-in-Publication Data

    Cyganek, Boguslaw.

     Object detection and recognition in digital images : theory and practice / Boguslaw Cyganek.

       pages cm

     Includes bibliographical references and index.

     ISBN 978-0-470-97637-1 (cloth)

      1. Pattern recognition systems.  2. Image processing-Digital techniques.  3. Computer vision.  I. Title.

     TK7882.P3C94 2013

     621.39′94–dc23

    2012050754

    A catalogue record for this book is available from the British Library

    ISBN: 978-0-470-97637-1

    To my family with love

    Preface

    We live in an era of technological revolution in which developments in one domain frequently entail breakthroughs in another. Similar to the nineteenth century industrial revolution, the last decades can be termed an epoch of computer revolution. For years we have been witnessing the rapid development of microchip technologies which has resulted in a continuous growth of computational power at ever decreasing costs. This has been underpinned by the recent developments of parallel computational systems of graphics processing units and field programmable gate arrays. All these hardware achievements also open up new application areas and possibilities in the quest of making a computer see and understand what it sees – is a primary goal in the domain of computer vision. However, although fast computers are of great help in this respect, what really makes a difference are new and better processing methods and their implementations.

    The book presents selected methods of object detection and recognition with special stress on statistical and – relatively new to this domain – tensor based approaches. However, the number of interesting and important methods is growing rapidly, making it difficult to offer a complete coverage of these methods in one book. Therefore the goal of this book is slightly different, namely the methods chosen here have been used by myself and my colleagues in many projects and proved to be useful in practice. Our main areas concern automotive applications in which we try to develop vision systems for road sign recognition or driver monitoring. When starting this book my main purpose was to not only give an overview of these methods, but also to provide the necessary, though concise, mathematical background. However, just as important are implementations of the discussed methods. I'm convinced that the connection of detailed theory and its implementation is a prerequisite for the in-depth understanding of the subject. In this respect the choice of the implementation platform is also not a surprise. The C++ programming language used throughout this book and in the attached software library is of worldwide industry standard. This does not mean that implementations cannot be done using different programming platforms, for which the provided code examples can be used as a guide or for direct porting. The book is accompanied by a companion website at www.wiley.com/go/cyganekobject which contains the code, color figures, as well as slides, errata and other useful links.

    This book grew as a result of my fascination with modern computer vision methods and also after writing my previous book, co-authored with J. Paul Siebert and devoted mostly to the processing of 3D images. Thus, in some sense it can be seen as a continuation of our previous work, although both can be read as standalone texts.

    Thus, the book can be used by all scientists and industry practitioners related to computer vision and machine pattern recognition, but can also be used as a tutorial for students interested in this rapidly developing area.

    Bogusław Cyganek

    Poland

    Acknowledgements

    Writing a book is a tremendous task. It would not be possible if not for the help of friends, colleagues, cooperators, and many other people, whose names I sometimes don't even know, but I know they did wonderful work to make this book happen.

    I would particularly like to thank numerous colleagues from the AGH University of Science and Technology, as well as the Academic Computer Centre Cyfronet, in Kraków, Poland. Special thanks go to Professor Ryszard Tadeusiewicz and Professor Kazimierz Wiatr for their continuous encouragement and support.

    I would also like to express my thanks to Professor Ralf Reulke from the Humboldt-Universität zu Berlin, and Deutsches Zentrum für Luft- und Raumfahrt, as well as to all the colleagues from his team, for our fruitful cooperation in interesting scientific endeavours.

    I'm very grateful to the Wiley team who have helped to make this book possible. I'd like to express my special thanks to Richard Davies, Alex King, Nicky Skinner, Simone Taylor, Liz Wingett, as well as to Nur Wahidah Binte Abdul Wahid, Shubham Dixit, Caroline McPherson, and all the others whose names I don't know but I know they did a brilliant job to make this book happen. Once again – many thanks!

    I'm also very grateful to many colleagues around the world, and especially readers of my previous book on 3D computer vision, for their e-mails, questions, suggestions, bug reports, and all the discussions we've had. All these helped me to develop better text and software. I also ask for your support now and in the future!

    I would like to kindly express my gratitude to the National Science Centre NCN, Republic of Poland, for their financial support in scientific research projects conducted over the years 2007–2009, as well as 2011–2013 under the contract no. DEC-2011/01/B/ST6/01994, which greatly contributed to this book. I would also like to express my gratitude to the AGH University of Science and Technology Press for granting the rights to use parts of my previous publication.

    Finally, I would like to thank my family: my wife Magda, my children Nadia and Kamil, as well as my mother, for their patience, support, and encouragement during all the days I worked on this book.

    Notations and Abbreviations

    1

    Introduction

    Look in, let not either the proper quality, or the true worth of anything pass thee, before thou hast fully apprehended it.

    —MARCUS AURELIUS Meditations, 170–180 AD (Translated by Meric Casaubon, 1634)

    This book presents selected object detection and recognition methods in computer vision, joining theory, implementation as well as applications. The majority of the selected methods were used in real automotive vision systems. However, two groups of methods were distinguished. The first group contains methods which are based on tensors, which in the last decade have opened new frontiers in image processing and pattern analysis. The second group of methods builds on mathematical statistics. In many cases, object detection and recognition methods draw from these two groups. As indicated in the title, equally important is the explanation of the main concepts of the methods and presentation of their mathematical derivations, as their implementations and usage in real applications. Although object detection and recognition are strictly connected, to some extent both domains can be seen as pattern classification and frequently detection precedes recognition, we make a distinction between the two. Object detection in our definition mostly concerns answering a question about whether a given type of object is present in images. Sometimes, their current appearance and position are also important. On the other hand, the goal of object recognition is to tell its particular type. For instance, we can detect a face, or after that identify a concrete person. Similarly, in the road sign recognition system for some signs, their detection unanimously reveals their category, such as Yield. However, for the majority of them, we first detect their characteristic shapes, then we identify their particular type, such as 40km/h speed limit, and so forth.

    Detection and recognition of objects in the observed scenes is a natural biological ability. People and animals perform this effortlessly in daily life to move without collisions, to find food, avoid threats, and so on. However, similar computer methods and algorithms for scene analysis are not so straightforward, despite their unprecedented development. Nevertheless, biological systems after close observations and analysis provide some hints for their machine realizations. A good example here are artificial neural networks which in their diversity resemble biological systems of neurons and which – in their software realization – are frequently used by computers to recognize objects. This is how the branch of computer science, called computer vision (CV), developed. Its main objective is to make computers see as humans, or even better. Sometimes it becomes possible.

    Due to technological breakthroughs, domains of object detection and recognition have changed so dynamically that preparation of even a multivolume publication on the majority of important subjects in this area seems impossible. Each month hundreds of new papers are published with new ideas, theorems, algorithms, etc. On the other hand, the fastest and most ample source of information is Internet. One can easily look up almost all subjects on a myriad of webpages, such as Wikipedia. So, nowadays the purpose of writing a book on computer vision has to be stated somewhat differently than even few years ago. The difference between an ample set of information versus knowledge and experience starts to become especially important when we face a new technological problem and our task is to solve it or design a system which will do this for us. In this case we need a way of thinking, which helps us to understand the state of nature, as well as a methodology which takes us closer to a potential solution. This book grew up in just this way, alongside my work on different projects related to object recognition in images. To be able to apply a given method we need first to understand it. At this stage not just a final formula summarizing a method, but also its detailed mathematical background, are of great use. On the other hand, bare formulas don't yet solve the problem. We need their implementations. This is the second stage, sometimes requiring more time and work than the former. One of the main goals of this book is to join the two domains on a selected set of useful methods of object detection and recognition. In this respect I hope this book will be of practical use, both for self study and also as a reference when working on a concrete problem. Nevertheless, we are not able to go through all stages of all the methods, but I hope the book will provide at least a solid start for further study and development in this fascinating and dynamically changing area.

    As indicated in the title, one of my goals was to join theory and practice. My experience is that such composition leads to an in-depth understanding of the subject. This is further underpinned by case studies of mostly automotive applications of object detection and recognition. Thus, sections of this book can be grouped as follows:

    Presentations of methods, their main concepts, and mathematical background.

    Method implementations which contain C++ code listings (sections of this type are indicated with word IMPLEMENTATION).

    Analysis of special applications (their names start with CASE STUDY).

    Apart from this we have some special entries which contain brief explanations of some mathematical concepts with examples which aim is to help in understanding the mathematical derivation in the surrounding sections.

    A comment on code examples. I have always been convinced that in a book like this we should not spoil pages with an introduction to C, C++ or other basic principles of computer science, as sometimes is the case. The reasons are at least twofold: the first is that for computer science there are a lot of good books available, for which I provide the references. The second reason, is so to not divert a Reader from the main purpose of this book, which is an in-depth presentation of the modern computer vision methods and algorithms. On the other hand, Readers who are not familiar with C++ can skip detailed code explanations and focus on implementation in other platforms. However, there is no better way of learning the method than through practical testing and usage in applications.

    This book is based on my experience gathered while working on many scientific projects. Results of these were published in a number of conference and journal articles. In this respect, two previous books are special. The first, An Introduction to 3D Computer Vision Techniques and Applications, written together with J. Paul Siebert, was published by Wiley in 2009 [1]. The second is my habilitation thesis [2], also issued in 2009 by the AGH University of Science and Technology Press in Kraków, Poland. Extended parts of the latter are contained in different sections of this book, permission for which was granted by the AGH University Press.

    Most of all, I have always found being involved in scientific and industry projects real fun and an adventure leading to self-development. I wish the same to you.

    1.1 A Sample of Computer Vision

    In this section let us briefly take a look at some applications of computer vision in the systems of driver monitoring, as well as scene analysis. Both belong to the on-car Driver Assisting System aimed at facilitating driving, for example by notifying drivers of incoming road signs, and most of all by preventing car accidents, for example due to the driver falling asleep.

    Figure 1.1 depicts a system of cameras mounted in a test car. The cameras can observe the driver and allow the system to monitor his or her state. Cameras can also observe the front of the car for pedestrian detection or road sign recognition, in which case they can send an image like the one presented in Figure 1.2.

    Figure 1.1 System of cameras mounted in a car. The cameras can observe a driver to monitor his/her state. Cameras can also observe the front of a car for pedestrian detection or road sign recognition. Such vision modules will probably soon become standard equipment, being a part of the on-board Driver Assisting System of a car.

    c01f001

    Figure 1.2 A traffic scene. A car-mounted computer with cameras can provide information on the road scene to help safe driving. But computer vision can also help you identify where the picture was taken.

    c01f002

    What type of information can we draw from such an image? This depends on our goal, certainly. In the real traffic situation depicted we are mainly interested in driving the car safely, avoiding pedestrians and other vehicles in motion or parked, as well as spotting and reacting to traffic signals and signs. However, in a situation where someone sent us this image we might be interested in finding out the name of that street, for instance. What can computer vision do for us? To some extent all of the above, and soon driving a car, at least in special conditions. Let us look at some stages of processing by computer vision methods, details of which are discussed in the next chapters.

    Let us first observe that even a single color image has three dimensions, as shown in Figure 1.3(a). In the case of multiple images or a video stream, dimensions grow. Thus, we need tools to analyze such structures. As we will see, tensors offer new possibilities in this respect. Also, their recently developed decompositions allow insight into information contained in such multidimensional structures, as well as their compression or extraction of features for further classification. Much research into computer vision and pattern recognition is on feature detection and their properties. In this respect such transformations are investigated which change the original intensity or color pixels into some new representation which provides some knowledge about image contents or is more appropriate for finding specific objects. An example of an application of the structural tensor to image in Figure 1.2 for detection of areas with strong local structures is shown in Figure 1.3(b). Found structures are encoded with color – their orientation is represented by different colors, whereas strength is by color saturation. Let us observe that areas with no prominent structures show no response of this filter – in Figure 1.3(b) they are simply black. As will be shown, such representation proves very useful in finding specific figures in images, such as pedestrians, cars, or road signs, and so forth.

    Figure 1.3 A color image can be seen as a 3D structure (a). Internal properties of such multidimensional signals can be analyzed with tensors. Local structures can be detected with the structural tensor (b). Here different colors encode orientations of areas with strong signal variations, such as edges. Areas with weak texture are in black. These features can be used to detect pedestrians, cars, road signs and other objects.

    c01f003

    Let us now briefly show the possible steps that lead to detection of road signs in the image in Figure 1.2. In this method signs are first detected with fast segmentation by specific colors characteristic to different groups of expected signs. For instance, red color segmentation is used to spot all-red objects, among which could also be the red rims of the prohibitive signs, and so on for all colors of interest.

    Figure 1.4 shows binary maps obtained of the image in Figure 1.2 after red and blue segmentations, respectively. There are many segmentation methods which are discussed in this book. In this case we used manually gathered color samples which were used to train the support vector classifiers.

    Figure 1.4 Segmentation of image in Figure 1.2. Red (a), blue color segmentation (b).

    c01f004

    From the maps in Figure 1.4 we need to find a way of selecting objects whose shape and size potentially correspond to the road signs we are looking for. This is done by specific methods which rely on detection of salient points, as well as on fuzzy logic rules which define the potential shape and size of the candidate objects.

    Figure 1.5 shows the detected areas of the signs. These now need to be fed to the next classifier which will provide a final response, first if we are really observing a sign and not for instance a traffic light, and then what the type of particular sign it is. However, observed signs can be of any size and can also be rotated. Classifiers which can cope with such patterns are for instance the cooperating groups of neural networks or the decomposition of tensors of deformed prototypes. Both of the aforementioned classifiers respond with the correct type of signs visible in Figure 1.2. These, as well as many other methods of object detection and recognition, are discussed in this book.

    Figure 1.5 Circular signs are found by outlining all red objects detected in the scene. Then only those which fulfill the definition and relative size expected for a sign are left (a). Triangular and rectangular shapes are found based on their corner points. The points are checked for all possible rectangles and again only those which comply with fuzzy rules defining sought figures are left (b).

    c01f005

    Figure 1.6 Organization of the book.

    c01f006

    1.2 Overview of Book Contents

    Organizing a book is not straightforward due to many the interrelations between the topics discussed. Such relations are not linear, and in this respect electronic texts with inner links show many benefits. The printed version has its own features. On the one hand, the book can be read linearly, from the beginning to its end. On the other, selected topics can be read independently, especially when looking for a specific method or its implementation. The book is organized into six chapters, starting with the Introduction.

    Chapter 2 is entirely devoted to different aspects of tensor methods applied to numerous tasks of computer vision and pattern recognition. We start with basic explanations of what tensors are, as well as their different definitions. Then basic properties of tensors, and especially their distances, are discussed. The next section provides some information on filtering of tensor data. Then structural tensor is discussed, which proves very useful in many different tasks and different types of images. A further important topic is tensor of inertia, as well as statistical moments, which can be used at different stages of object detection and recognition. Eigendecomposition of tensors, as well as their invariants, are discussed next. A separate topic are multi-focal tensors which are used to represent relations among corresponding points in multiple views of the same scene.

    The second part of Chapter 2 is devoted to multilinear methods. First the most important concepts are discussed, such as k-mode product, tensor flattening, as well as different ranks of tensors. These are followed by the three main important tensor decompositions, namely Higher Order Singular Value Decomposition, best rank-1, as well as best rank-(R1, …, RP) where R1 to RP represent desired ranks of each of the P dimension of the tensor. The chapter ends with a discussion of subspace data representation, as well as nonnegative decompositions of tensors.

    Chapter 3 presents an overview of classification methods. We start with a presentation of subspace methods with one of the most important data representation methods – Principal Component Analysis. The majority of the methods have their roots in mathematical statistics, so the next chapters present a concise introduction to the statistical framework of object recognition. Not surprisingly the key concept here is the Bayes theorem. Then we discuss the parametric methods as well as the Kalman filter, frequently used in tracking systems but whose applications reach far beyond this. A discussion on the nonparametric follows, starting with simple, but surprisingly useful, histogram methods. Then the Parzen approach is discussed with its connections to nearest-neighbor methods. Mean shift methods are discussed in the consecutive parts of Chapter 3. Then the probabilistic, Hamming, as well as morphological neural networks are presented.

    A separate topic within Chapter 3 concerns kernel processing. These are important novel classification methods which rely on smart data transformation into a higher dimensional space in which linear classification is possible. From this group come Support Vector Machines, one of the most important types of data classifier.

    The last part of Chapter 3 is devoted to the family of k-means data clustering methods which find broad application in many areas of data processing. They are used in many of the discussed applications, for which special attention to ensembles of classifiers is deserved, such as the one discussed at the end of Chapter 3.

    Chapter 4 deals with object detection and tracking. It starts with a discussion on the various methods of direct pixel classification, used mostly for fast image segmentation, as shown with the help of two applications. Methods of detection of basic shapes and figures follow. These are discussed mostly in the context of automotive applications. Chapter 4 ends with a brief overview of the recent methods of pedestrian detection.

    Object recognition is discussed in Chapter 5. We start with recognition methods that are based on analysis of phase histograms of objects which come from the structural tensor. Discussion on scale-space template matching in the log-polar domain follows. This technique has found many applications in CV. From these, two are discussed. Two very important topics are discussed next. The first is the idea of object recognition in the domain of deformable prototypes. The second concerns ensembles of classifiers. As was shown, these show superior results even compared to very sophisticated but single classifiers.

    Chapter 5 concludes with a presentation of the road sign classification systems based on ensembles of classifiers and deformable patterns, but realized in two different ways. The first employs Hamming neural networks. The second is based on decomposition of a tensor of deformable prototype patterns. The latter is also shown in the context of handwritten digit recognition.

    A very specific topic discussed at the end of Chapter 5 is eye recognition, used for monitoring the driver's state to prevent dangerous situations arising from the driver falling asleep. Chapter 5 concludes with a discussion on the recent methods of object category recognition.

    Appendix A discusses a number of auxiliary topics. It starts with a presentation of the morphological scale-space. Then a domain of morphological tensors operators is briefly discussed. Next, the geometry of quadratic forms is provided. Then the problem of testing classifiers is discussed. This section gathers different approaches to classifier testing, as well as containing a list of frequent parameters and measures used to assess classifiers. The rest of Appendix A briefly presents the OpenMP library used to convert serial codes into functionally corresponding but concurrent versions. In the last section some useful MATLAB® functions for matrix and tensor processing are presented.

    As already mentioned, the majority of the presented topics are accompanied by their full C++ implementations. Their main parts are also discussed in the book. The full implementation in the form of a software library can be downloaded from the book webpage [3]. This webpage also contains some additional materials, such as the manual to the software platform, color images, and other useful links.

    Last but not least, I will be very grateful to hear your opinion of the book.

    References

    [1] Cyganek B., Siebert J.P.: An Introduction to 3D Computer Vision Techniques and Algorithms, Wiley, 2009.

    [2] Cyganek B.: Methods and Algorithms of Object Recognition in Digital Images. Habilitation Thesis. AGH University of Science and Technology Press, 2009.

    [3] http://www.wiley.com/go/cyganekobject

    2

    Tensor Methods in Computer Vision

    2.1 Abstract

    This chapter gathers different computer vision techniques which make use of tensors, as well as their decomposition and analysis. As will be shown, the discussed methods have found application in many methods for object detection and recognition in images. Although tensors have been known in mathematics for over a hundred years, their application in computer vision (CV) and pattern recognition (PR) has been a matter of the last two decades. The real power of tensor processing in these areas comes from their natural ability to represent the multidimensional nature of processed data well.

    Based on the fundamental sampling theorem, continuous signals when sampled with sufficient frequency can be unambiguously represented by their discrete samples [1, 2]. This fundamental property transforms physical measurements with the world of computer processing, since digital signals are just data in computer memory. As will be shown, tensors are the right tools for processing a variety of digital signals, such as sound, vision, seismic, medical electroencephalogram (EEG), as well as magnetic resonance imaging (MRI), which opens vast possibilities in medical diagnosis. In MRI, for instance, it is assumed that the motion of water molecules in tissues can be approximated by a Brownian motion in the voxels of the image. However, the Brownian motion is entirely described by a symmetric and positive definite matrix, called the diffusion tensor. Processing and visualization of diffusion tensors is one of the most rapidly growing domains, joining mathematics, physics, medicine, and computer vision.

    The goal of this chapter is to present different areas of CV and PR which can be well represented and analyzed with tensors. We start with definitions of tensors, as well as basic properties of tensors. The two most pronounced characteristics of tensors are their transformation rules with respect to changes of the coordinate systems. The other is their multidimensionality, which makes them the right tool to process data which depend on many factors, as will be discussed. We present the structural tensor and its variants, as well as the tensor of inertia. The former is based on signal differentiation, whereas the latter is related to the statistical moments computed from the signal. Both are useful to represent local areas, as well as whole objects, in the images. We also discuss methods of filtering of tensor data, as well as their eigendecomposition and invariants. Tensors are also the right tool to represent mutual relations between features of real objects imaged in multiple views. The next part of this chapter is devoted to the second aspect of tensors – their ability to represent and analyze multidimensional data. Presented are the most important tensor decompositions, the Higher-Order Singular Value Decomposition (HOSVD), best rank-1, as well as best rank-(R1, …, RP), where R1 to RP are the desired ranks of each of the P dimensions of the tensor. Finally, the nonnegative matrix and tensor factorizations, as well as the subspace data representation, are discussed.

    Apart from a presentation of the mathematical background of the methods, their object-oriented implementations are also presented and discussed. Since implementations are generic, that is they can be used with user specified data types, they can be directly used in other projects. Also, some applications are provided which aim to exemplify the most important features of the methods. The subsequent chapters of the book contain further examples of applications of the presented tensor methods in real computer vision systems (CVS).

    2.2 Tensor – A Mathematical Object

    The tensor was developed to facilitate mathematical representation of physical laws in changing coordinate systems. One of the most famous application of tensors is the theory of relativity originally provided by A. Einstein at the beginning of the 20th century [3]. In the following sections we outline their basic properties which constitute a foundation for further concepts presented in this book. However, many valuable readings on tensors are available, among which the classical text by Bishop and Goldberg is especially interesting [4], as well as a dissertation on the frontiers of physics, relativity and mathematical physics, including tensor analysis and manifolds, by Penrose [5]. A unique treatment on tensors and their nonlinear functions is discussed in the book by Dimitrienko [6]. Regarding computer vision, tensors have found broad applications in such tasks as multiple view analysis [7, 8], structural tensor [11, 12], as well as multidimensional pattern recognition and view synthesis [11, 12], to name a few. These are further analyzed in the subsequent sections. An introduction to the domain of tensor analysis for the purpose of CV can be found, for example, in the papers by Triggs [13, 14], or in the book by Cyganek and Siebert [15].

    2.2.1 Main Properties of Linear Spaces

    A characteristic feature of tensor notation is the existence of upper and lower indices, which do not denote powers as in the case of polynomials. The number and position of the indices determine a type (valence) of a tensor. The other characteristic property of tensors is a summation convention, originally proposed by Einstein, to shorten mathematical formulas [3]. It simply assumes elimination of the summation symbol when there are two opposite indices which uniquely indicate summation. Hence, instead of ∑i = 1n aixi simply ai xi is written, assuming that summation spans through the same index i which for a is in the contravariant (upper) position, whereas for x it is in the covariant (lower) position.

    A vector x in the L dimensional space with basis bi can be expressed as

    (2.1) Numbered Display Equation

    which in Einstein's notation takes the following form

    (2.2) Numbered Display Equation

    Hence, knowing a base, we can write x = (, , …, xL). The linear form (2.2), as well as its dual, constitute the base of tensor algebra which is especially useful if the base of a coordinate system changes. The tensor algebra is mainly constructed on two mathematical concepts:

    Linear operators.

    Change of the coordinate system (the Jacobian).

    Explanation of the above concepts can be found in the majority of the books dealing with concepts of vector spaces, e.g. [4, 15, 16, 17].

    2.2.2 Concept of a Tensor

    A definition of a tensor is based on the concept of a vector space and its dual space, as follows [4]:

    Definition 2.1 Let W and W* be the vector space and its dual, respectively. A tensor over the base W is a multilinear scalar function f with arguments from W and W*

    (2.3) Numbered Display Equation

    The number of arguments p and q denote a covariant and a contravariant degrees

    (a valence) of a tensor.      inline

    From the above definition we easily notice that a tensor of valence (0, 0) is a scalar, whereas tensors (1, 0) and (0, 1) are contra- and co-variant vectors, respectively. Moreover, tensors defined in the above manner create a vector space by themselves. It is called a tensor space over W.

    Definition 2.1 provides a very convenient explanation of a tensor as a multilinear function. Thanks to this, the concept of a tensor is a very versatile one. However, in some respects such an ample definition is cumbersome for interpretation and other equivalent representations become more common. For example, knowing that all linear functions can be uniquely determined exclusively based on their values at base vectors, an analogous property for tensors is expressed as

    Theorem 2.1 A tensor is uniquely determined by its values on the basis and its dual basis. The values are products of the tensor and the elements of the base and the dual base.      inline

    Based on the above tensors, being multilinear functions, are represented by the indexed values, such as ti..jm..n, in the basis and its dual. In accordance with Equation (2.3), there are p lower indices (covariant) and q upper indices (contravariant). Such a representation is frequently used in physics. However, the most important thing is the way these values change on change of the base of a coordinate's space. This is a key property of tensors which is sometimes used as a second means of definition. That is, a tensor is defined by its values ti..jm..n which change in a way characteristic of the type of a tensor, i.e. its degree and valence (1.1). This also paves the way to the method of checking whether a physical value is a tensor or not, and if the answer is positive, what its valence is [18]. We would like to point up that this is the proper way to determine if a given mathematical object is, or is not, a tensor.

    Let us now specify dimension of a tensor, based on values used in Equation (2.3). It is given by:

    Theorem 2.2 A dimension of a tensor of p covariant and q contravariant indices is L p + q, where L = dim(W) denotes dimension of the vector space W.      inline

    The above two theorems allow us to treat a tensor as an P-dimensional cube of data, where P = p + q denotes its common dimension. In computer science terms, this is an P-dimensional array in which to access an element P indices need to be provided. In this interpretation all indices are treated in the same way, i.e. we do not distinguish between contra- and co-variant ones since the transformation laws are not considered in this case, but only the number of independent indices. This way the tensor analysis can be used in such domains which require representation and manipulation of multidimensional data which we encounter frequently in physics, chemistry, data mining, as well as image processing, to name a few.

    Summarizing, we encounter three definitions of tensors:

    1. A multilinear function which takes the p-dimensional vector space and q-dimensional its dual, into the space of real values inline , as defined in Equation (2.3);

    2. A mathematical object described by the indexed values ti..jm..n, in which there is p covariant and q contravariant indices. These values transform in accordance with the transformation laws with a change of the base of the spaces [4, 15, 18];

    3. A multidimensional array of data with P = p + q independent indices.

    As already indicated, the first interpretation is rather mathematical but allows a coherent introduction of the tensor algebra, discussed in the next section. The second interpretation is used frequently in physics, especially in mechanics, relativity, and so forth. The third simplifies the definition given in point (2) and is mostly used in multidimensional data representation and analysis. In CV the second and third interpretations are the most common, as will be discussed. However, once again we would like to stress that the proper way to check if a given mathematical object is, or is not, a tensor is to check its behavior on the change of the basis of the space. Only if this follows the tensor transformation laws for tensors of a given valence, is such an object a tensor. On the other hand, having a multidimensional data controlled by P independent indices does not mean that this is a tensor in the sense of definitions (1) or (2). Nevertheless, in data mining a multidimensional array is used to be named as a tensor due to many similarities with definition (2).

    2.3 Tensor – A Data Object

    As alluded to previously, tensors can be relaxed from their strict transformation laws, and can be regarded as multidimensional arrays of data, in which each dimension separately corresponds to a different feature of described objects or phenomena. Following this idea, Figure 2.1 depicts a 3D tensor of dimensions 3 × 4 × 2. This point of view is parallel to the previous one, rather than instead of, since even with explicitly provided n covariant and m contravariant indices, a tensor in a given base is unambiguously described by n·m real (or complex) values. This way we obtain a multilinear approach which constitutes an extension to the classical two-dimensional (matrix) analysis. Such an approach was recently undertaken to tackle problems of data mining, pattern recognition, neuroscience and psychometrics, chemistry, signal processing, computer vision, and many more [11, 19, 20, 121, 22].

    Figure 2.1 An example of a 3D tensor with the total of P = 3 dimensions N1 = 3, N2 = 4, and N3 = 2.

    c02f001

    A simple color image together with its three color channels red, green, and blue is shown in Figure 2.2. Since each element of an image, a pixel, has exactly three independent coordinates, which are columns, rows, and channels, it can be interpreted as a three-dimensional array of data or just a 3D tensor. For a video sequence there is a fourth free index, time (frame number).

    Figure 2.2 An image and its three color channels as an example of a 3D tensor.

    c02f002

    This interpretation is a fundamental one if the internal structure of an image has to be analyzed, e.g. for object recognition. On the other hand, an approach in which images are vectorized can lead to loss of information, frequently resulting in limited robustness of such methods.

    However, we can give yet another interpretation of an image, such as the one in Figure 2.2. It can be seen as a single point in a multidimensional space of dimensions being a product of all allowable values of columns, rows, and channels [23]. Thus, we can suspect that similar images in such a space will tend to be places close together or, in other words, they will be contained in subspaces. But how can we measure a distance between points in this space? This requires a geometrical notion of a distance. However, to set some geometric properties such as a distance, an angle, etc. we need to join the space with a coordinate system. We can do this locally, obtaining so called manifolds which in some neighborhoods of the points from the space provide real coordinate functions. This way it becomes possible to determine the position of a point in that space and topological properties of its neighborhood. In other words, a space becomes locally Cartesian. These issues are further discussed in Sections 2.5 and 5.5 devoted to deformable models.

    Let us analyze another type of image – hyperspectral imaging (HI) is the technology of acquiring a series of images, each at different electromagnetic spectral bands, starting from the ultraviolent up to the long-infrared spectrum. The main reason is that different objects reveal different properties at different wavelengths. Thanks to this property, hyperspectral images can show what is inaccessible in just the human visible spectrum. Different objects have unique characteristics in hyperspectra, which is called an object's spectral signature. Interestingly, many life creatures possess the ability to perceive wide wavebands. For instance, mantis shrimps are sensitive to the spectrum from infrared up to ultraviolet, which allows them to detect their prey [24]. Also humans at a certain stage of evolution, thanks to the acquired ability to detect colors, became able to distinguish ripe fruits from others which obviously affected the diet. Hyperspectral images are used in many areas, such as medical imaging for tumor detections, dentistry to detect tooth decay, remote sensing for Earth surface monitoring, agriculture for monitoring of a crop's health, the food industry for food quality assessment, pharmacy for detection of chemical components, and forensic science for ink examination on checks, to name a few [24, 25]. Acquisition of hyperspectral images requires special devices, such as push-broom cameras. One of the most famous is the Airborne Visible InfraRed Imaging Spectrometer (AVIRIS) which is an optical sensor developed by NASA [26]. It delivers calibrated images of the upwelling spectral radiance in 224 contiguous spectral channels in the range from 400 to 2500 nm. Another example is the eye monitoring system operating in the near infrared (NIR) spectrum, discussed in Section 5.9. NIR images, being invisible to a driver, allow eye monitoring for detection of a driver's fatigue, sleepiness or inattention in order to react on time.

    From the above discussion we easily conclude that HI naturally leads to 3D tensors, with two spatial

    Enjoying the preview?
    Page 1 of 1