Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Guided Randomness in Optimization, Volume 1
Guided Randomness in Optimization, Volume 1
Guided Randomness in Optimization, Volume 1
Ebook463 pages3 hours

Guided Randomness in Optimization, Volume 1

Rating: 0 out of 5 stars

()

Read preview

About this ebook

The performance of an algorithm used depends on the GNA. This book focuses on the comparison of optimizers, it defines a stress-outcome approach which can be derived all the classic criteria (median, average, etc.) and other more sophisticated.   Source-codes used for the examples are also presented, this allows a reflection on the "superfluous chance," succinctly explaining why and how the stochastic aspect of optimization could be avoided in some cases.

LanguageEnglish
PublisherWiley
Release dateMay 4, 2015
ISBN9781119136453
Guided Randomness in Optimization, Volume 1

Related to Guided Randomness in Optimization, Volume 1

Related ebooks

Computers For You

View More

Related articles

Reviews for Guided Randomness in Optimization, Volume 1

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Guided Randomness in Optimization, Volume 1 - Maurice Clerc

    Preface

    About this book

    This book is the final fruit of a long process, spanning almost 20 years. In 1997, I was involved in translating Earl D. Cox’s seminal book on fuzzy logic [COX 95] into French. At that time, I was a regular contributor to an online discussion board on this topic, through which I met James Kennedy, the co-inventor of particle swarm optimization (PSO), with Russell Eberhart [EBE 95]. I rapidly became involved in work on this method, with a focus on its mathematical aspects.

    The original idea was to model interpersonal relationships (Jim is a social psychologist) and not, as is often stated, on the social behavior of bees, fish or other animals. The model was too simplistic to be useable; on the other hand, however, the resulting algorithm seemed to be able to identify the minimum of numerical functions, even in relatively complex cases, at a remarkable speed.

    The situation was, in fact, too good to be true: solutions were being found too quickly. For the problems we were dealing with (still in 1995), the minimum was situated at the origin of the coordinate system, and PSO has a strong tendency to prioritize this area of the search space. The very first code also included an error which reinforced this bias.

    This tendency was confirmed using more systematic experimentation, which also highlighted an additional bias, favoring the axes and diagonals of the coordinate system. A theoretical explanation was established considerably later [SPE 10]. Improved versions were created in the meantime, making PSO an efficient and widely used metaheuristic.

    Observation of the bias led me to wonder whether this type of effect was a result of the way in which chance was used. One thing led to another, and I started to consider a number of different elements: the possibility of using tools to detect intrinsic biases in optimizers, the random number generators used in the context of optimization, the validity of comparisons of stochastic optimizers, and even the very necessity of chance and the possibility of replacing this element with deterministic techniques.

    Most of these questions have also been considered by other researchers, and at least partial responses have been provided. Generally speaking, I receive around two articles on the subject of optimization to read each week: some already published and some awaiting publication. During the time I have been working on the subject, and particularly since the publication of my book on particle swarm optimization [CLE 06], I have read more than one thousand research papers, in addition to a number of theses and monographs.

    In the case of pseudo-random number generators, authors generally either describe and use a stochastic algorithm, or prefer a deterministic method. The basic questions underpinning this technique, however, are rarely discussed at any length. For example, all of these algorithms presume that a structure is present to allow a problem to be solved, without the structure being explicitly discussed. Moreover, there is no systematic detection of intrinsic bias. To give a third example, in reality, there is no clear-cut distinction between random and deterministic elements, but rather a continuum of levels of randomness produced by generators; consequently, the same algorithm may perform differently based on the choice of pseudo-random generator. Furthermore, the best performance is not always obtained using the best generator.

    Issues also exist concerning the reliability of the tests used to compare two algorithms, or to compare results when an arbitrary parameter, defined by the user, is introduced (e.g. a threshold value used to decide whether or not a test may be considered successful).

    The aim of this book is to address these issues in greater detail.

    In the course of my reading, I have noted a certain number of methodological errors or unfounded claims made in published works on the subject (including my own). As part of this book, I have compiled a list of these errors, which will be discussed in detail, for the benefit of future authors.

    Organization of the book

    This book is split into three main parts: reflections on the nature of chance, a comparison of optimizers and a comparison of tests. The book also includes a number of appendices. Readers are strongly advised to read chapters in this order. However, the book may also be considered as a collection of instructions and source codes (some of which are included in full and others are available online). Certain chapters may also be taken in isolation, such as Chapters 9 and 10.

    Certain information that might be expected to be present in this book has been excluded, as it is readily available in published books or articles or online. This essentially concerns detailed information on certain pseudo-random number generators and the details of statistical tests which may be used to analyze the results produced by stochastic optimizers.

    Any scientific document should provide with the elements necessary for reproduction of the experiment. In this case, the source codes used throughout the book are included.

    Tools

    A number of free programs were used in creating this book, and they come highly recommended. A significant part of the documentation for the book was collected online using the Startpage meta-search engine (https://startpage.com). The main advantage of this tool is that the same request is sent to several search engines simultaneously; results are then aggregated and presented to the user. Only a bare minimum of information concerning the origin of the request is transmitted to the search engines themselves.

    The actual writing of the book was carried out using LYX (http://lyx.org), which allows simple generation of LATEX files, without requiring knowledge of the language. The bibliography was generated using Zotero (https://www.zotero.org/) with the LyZ extension to facilitate interfacing with LYX. Graphics were created using LibreOffice Calc (http://www.libreoffice.org/), Gimp (http://www.gimp.org/), Inkscape (http://www.inkscape.org), and SciLab (http://www.scilab.org), which is almost identical to MATLAB®.

    The programs used were also written using SciLab.

    All of the computer processes discussed in the book were carried out using the Linux Ubuntu operating system, on a 32-byte micro-computer, with a machine epsilon of 2.22 × 10−16 (this means that all calculations using positive numbers supposed to be lower than this value should be treated with extreme suspicion).

    Key points

    Following the principle of the elevator pitch, in our opinion, readers may wish to retain three key elements:

    – stochastic optimization does not require the use of sophisticated pseudo-random number generators;

    – several criteria should be taken into consideration when evaluating the performance of an optimizer, and reasonable convergence should be obtained in relation to the number of tests.

    The third element, as with any publication (scientific or non-scientific), is that readers should always attempt to read behind the lines, decoding information which is not explicitly set down on paper.

    Contact the author

    For comments, suggestions or to report errors, the author can be contacted:

    – by e-mail: Maurice.Clerc@WriteMe.com;

    – via the publisher.

    Maurice CLERC

    March 2015

    Introduction

    The part played by chance in solving optimization problems is essentially due to the use of metaheuristics. Metaheuristics, by definition, are used for solving difficult problems for which no definitive method exists, although, as we will see, this definition is not clear-cut. Random drawing is a natural choice when making certain choices or applying certain rules; to do this, metaheuristics use one or more random number generators (RNG).

    The first part of this book includes a discussion of the risks inherent to the use of chance in the context of optimization; the rest of this part essentially deals with RNGs. A distinction is made between three classes: (1) truly random generators, based on physical phenomena; (2) coded generators, which attempt to simulate physical phenomena as closely as possible, resulting in highly complex algorithms; and (3) simple codes, used to generate lists of numbers which may be used by metaheuristics. A discussion of the way in which RNGs can be manipulated to produce specific distributions, for example multimodal distributions, will then follow. Finally, to conclude this part, different techniques for the use of guided randomness will be considered.

    In the second part of the book, we will show how the performance of an algorithm is dependent on the selected RNG; consequently, an optimizer is made up of an algorithm/RNG pairing. The way in which optimizers may be compared will also be considered, using an effort/outcome approach; this approach can be used to derive all of the classic criteria (medians, means, etc.) alongside more sophisticated criteria, for example using the notion of result quality. The interpretation of criteria values highlights the notions of estimation convergence and significant difference. Consideration will also be given to test cases, notably the biases which may be inherent toward different types of optimizers.

    The third and final part contains appendices, including source codes. It notably includes reflections on unnecessary randomness, with a brief explanation of why and how the stochastic aspects of optimization could be avoided in certain cases. This discussion may be developed at a later date into an additional volume on the subject of deterministic optimization.

    PART 1

    Randomness in Optimization

    1

    Necessary Risk

    Il arrive souvent de ne rien obtenir parce que lon ne tente rien (Often, nothing is gained because nothing is attempted)

    Jacques Deval

    In using chance to solve a problem, there is always a risk of failure, unless an unlimited number of attempts are permitted: this is rarely possible. The basic idea involved in stochastic optimization is that this risk is necessary, for the simple reason that no other solution is available; however, it may be reduced by carefully controlling the use of random elements. This is generally true, in that a correctly-defined optimizer will produce better results than a purely random search for most test cases. However, this is not always the case, and the ability to identify these anomalous situations is valuable.

    1.1. No better than random search

    Let us take a set of permutation tests. A precise definition is given in the Appendices (section 7.1). Here, note simply that based on one discrete finite function, all of the other functions can be generated by permutations of possible values at each point.

    The definition space is E = (0, 1, 2, 3) and the value space is V = (1, 2, 3). A function is therefore defined by its values at the points of E, for example f1 ≡ (1, 3, 2, 2). One possible permutation of this function is f2 ≡ (1, 2, 3, 2); there are 12 such functions in total, each of which is a permutation of the others, shown in the first column of Table 1.1. Each function has a minimum value of 1 (to simplify our discussion, optimization in this case will always be taken to mean minimization). Now, let us consider three iterative algorithms, and calculate the probability that they will find the minimum of each function. These algorithms are all without repetition, and conserve the best position obtained along with the associated value (the ratchet effect). A brief, informal description of these algorithms is given below. For each, the result is given as a pair (x∗, f (x∗ )), where x∗ is the proposed solution.

    1.1.1. Uniform random search

    This algorithm, like those which follow, includes an initialization phase, followed by an iteration phase (see section 1.1.). Let us calculate the probability p (t) of finding the solution after t position draws. As there is only one solution, p (1) = ¹/4, the probability of not obtaining the solution on the first try is therefore 1 − p (1). In this case, as three nonsampled permutations remain, the probability of obtaining the solution on the second try is ¹/3. Thus, the probability of obtaining the solution on the first or second try is p (2) = p (1) + (1 − p (1)) ¹3 = 1/4 + 3/ 4 ¹/ 3 = ¹/2. Similarly, the probability of obtaining the solution on the first, second or third try is p (3) = p (2) + (1 − p (2) ¹/2) = ³/4. Evidently, as the algorithm is without repetition, the probability of having found the solution on the fourth try is 1, as an exhaustive search will have been carried out.

    Algorithm 1.1. Random search without repetition

    Initialization

    – Draw a position x∗ at random, following a uniform distribution (each position has the same selection probability).

    Iterations

    As long as the STOP criterion (for example a maximum number of iterations) has not been reached:

    – draw a position x at random from the unsampled population;

    – if f (x) < f (x∗ ), then replace x∗ by x.

    1.1.2. Sequential search

    This method consists of drawing positions one by one, not at random (without repetition), but in a predefined order, for example position 1, position 2, etc. To calculate p (t), each function must be considered individually. For f4 ≡ (3, 1, 2, 2), for example, a solution will definitely be found after two tries, compared to a probability of ¹/2 using the previous method.

    However, the opposite situation also occurs, for example for f6 ≡ (3, 2, 2, 1). After two tries, the solution can not be found, as the random method may find it, with a probability of ¹/2. Overall, this method is therefore equivalent to the previous method in terms of probabilities p (t). Improvements are thus required.

    1.1.3. Partial gradient

    Using this method, the first two positions are drawn sequentially. Next, if the two values obtained are decreasing, the sequential approach is retained, as the direction of the search appears to be correct. Otherwise, positions are drawn at random from the remaining population. This means that differences from the previous method will only emerge at draw p (3). Once again, each function must be examined individually for calculation purposes. Take, for example, a function such as f6 ≡ (3, 2, 2, 1). The first two draws give results of 3 and 2. As the direction appears promising, the third position is drawn: the value is 2. This is not the minimum, as p (3) = 0. With a function such as f9 ≡ (2, 2, 1, 3), there is no clear preferred direction, and so the third point is drawn at random from the two remaining points, giving a probability of success of ¹/2.

    The probabilities of success for these three methods, p (t) for t = 1, 2, 3, using the 12 function test case defined above, are given in Table 1.1. Naturally, all of these algorithms obtain the solution with the same probability, 1 (certainty), after four attempts, as they are repetition-free. However, their average performance will not be necessarily identical after one, two or three attempts. The partial gradient algorithm, which is slightly more sophisticated, might be expected to be somewhat more efficient; it is the only method which has a chance of finding a solution to f10 or f11 after three attempts. However, success is not guaranteed for f9 and f12. Finally, as demonstrated in the final line of the table, the three algorithms give the same average performance over the set of test cases.

    Table 1.1. Permutation test cases. Probability of success after one, two and three attempts. The three algorithms are repetition-free and present the same average performance, as the conditions of the No Free Lunch Theorem (NFLT) are fulfilled

    Enjoying the preview?
    Page 1 of 1