Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Optimization Techniques and Applications with Examples
Optimization Techniques and Applications with Examples
Optimization Techniques and Applications with Examples
Ebook721 pages5 hours

Optimization Techniques and Applications with Examples

Rating: 0 out of 5 stars

()

Read preview

About this ebook

A guide to modern optimization applications and techniques in newly emerging areas spanning optimization, data science, machine intelligence, engineering, and computer sciences 

Optimization Techniques and Applications with Examples introduces the fundamentals of all the commonly used techniques in optimization that encompass the broadness and diversity of the methods (traditional and new) and algorithms. The author—a noted expert in the field—covers a wide range of topics including mathematical foundations, optimization formulation, optimality conditions, algorithmic complexity, linear programming, convex optimization, and integer programming.  In addition, the book discusses artificial neural network, clustering and classifications, constraint-handling, queueing theory, support vector machine and multi-objective optimization, evolutionary computation, nature-inspired algorithms and many other topics.

Designed as a practical resource, all topics are explained in detail with step-by-step examples to show how each method works. The book’s exercises test the acquired knowledge that can be potentially applied to real problem solving. By taking an informal approach to the subject, the author helps readers to rapidly acquire the basic knowledge in optimization, operational research, and applied data mining.  This important resource:

  • Offers an accessible and state-of-the-art introduction to the main optimization techniques
  • Contains both traditional optimization techniques and the most current algorithms and swarm intelligence-based techniques
  • Presents a balance of theory, algorithms, and implementation
  • Includes more than 100 worked examples with step-by-step explanations 

Written for upper undergraduates and graduates in a standard course on optimization, operations research and data mining, Optimization Techniques and Applications with Examples is a highly accessible guide to understanding the fundamentals of all the commonly used techniques in optimization.

LanguageEnglish
PublisherWiley
Release dateSep 24, 2018
ISBN9781119490623
Optimization Techniques and Applications with Examples
Author

Xin-She Yang

Xin-She Yang obtained his DPhil in Applied Mathematics from the University of Oxford. He then worked at Cambridge University and National Physical Laboratory (UK) as a Senior Research Scientist. He is currently a Reader in Modelling and Simulation at Middlesex University London, Fellow of the Institute of Mathematics and its Application (IMA) and a Book Series Co-Editor of the Springer Tracts in Nature-Inspired Computing. He has published more than 25 books and more than 400 peer-reviewed research publications with over 82000 citations, and he has been on the prestigious list of highly cited researchers (Web of Sciences) for seven consecutive years (2016-2022).

Read more from Xin She Yang

Related to Optimization Techniques and Applications with Examples

Related ebooks

Mathematics For You

View More

Related articles

Reviews for Optimization Techniques and Applications with Examples

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Optimization Techniques and Applications with Examples - Xin-She Yang

    Preface

    Optimization and operations research are part of many university courses because they are important to many disciplines and applications such as engineering design, business planning, computer science, data mining, machine learning, artificial intelligence, and industries.

    There are many books on the market, but most books are very specialized in the sense that they target at a specialized audience in a given subject such as applied mathematics, business management, computer science, or engineering. Most of these books, though with much details and often more than 500 pages, have focused on the traditional optimization techniques, commonly used in operations research. In some sense, some courses in the traditional operations research areas have focused too much on many traditional topics, and these topics have many excellent textbooks. However, optimization techniques have evolved over the last few decades, and some new techniques start to show their effectiveness and have become an integrated part of new mainstream methods. Though important traditional topics will be still introduced in this book, the contents on some techniques may be brief instead of overlapping with other books, we will provide relevant references to appropriate literature when necessary. Therefore, this book tries to address modern topics, as used mostly by researchers working in these newly emerging areas, spanning optimization, data science, machine intelligence, engineering, and computer sciences, so that the topics covered in this book are most relevant to the current areas of research and syllabus topics in optimization, data mining, machine learning, and operations research.

    This book aims at an introductory level, introducing all the fundamentals of all the commonly used techniques so that students can see the broadness and diversity of the methods and algorithms and sample the flavor of each method (both traditional and new). The book covers almost all major optimization techniques with many worked examples. Topics include mathematical foundations, optimization formulation, optimality conditions, algorithmic complexity, linear programming, convex optimization, integer programming, artificial neural network, clustering and classifications, regression and least squares, constraint‐handling, queueing theory, support vector machine and multiobjective optimization, evolutionary computation, nature‐inspired algorithms, and others. Each topic is explained in detail with step‐by‐step examples to show how each method works, and exercises are also provided to test the acquired knowledge and potentially apply to real problem solving.

    With an informal approach and more than 100 worked examples and exercises, this introductory book is especially suitable for both undergraduates and graduates to rapidly acquire the basic knowledge in optimization, operational research, machine learning, and data mining. This book can also serve as a reference for researchers and professionals in engineering, computer sciences, operations research, data mining, and management science.

    Xin‐She Yang

    Cambridge and London

    March 2018

    Acknowledgements

    I would like to thank my students who took my relevant courses and provided some feedback on the contents concerning optimization algorithms and examples. I also thank the participants for their comments at international conferences where I gave tutorials on nature‐inspired algorithms. I would also like to thank the editors, Kathleen Pagliaro, Vishnu Narayanan, Mindy Okura‐Marszycki, and staff at Wiley for their help and professionalism. Last but not least, I thank my family for their support.

    Xin‐She Yang

    Acronyms

    Introduction

    Optimization is everywhere, though it can mean different things from different perspectives. From basic calculus, optimization can be simply used to find the maximum or minimum of a function such as in the real domain . In this case, we can either use a gradient‐based method or simply spot the solution due to the fact that x⁴ and x² are always nonnegative, the minimum values of x⁴ and x² are zero, thus has a minimum at . This can be confirmed easily by taking the first derivative of f(x) with respect to x, we have

    (I.1)

    which has only one solution because x is a real number. The condition seems to be sufficient to determine the optimal solution in this case. In fact, this function is convex with only one minimum in the whole real domain.

    However, things become more complicated when f(x) is highly nonlinear with multiple optima. For example, if we try to find the maximum value of in the real domain, we can naively use

    (I.2)

    which has an infinite number of solutions for . There is no simple formula for these solutions, thus a numerical method has to be used to calculate these solutions. In addition, even with all the efforts to find these solutions, care has to be taken because the actual global maximum occurs at . However, this solution can only be found by taking the limit , and it is not part of the solutions from the above condition of . This highlights the potential difficulty for nonlinear problems with multiple optima or multi‐modality.

    Furthermore, not all functions are smooth. For example, if we try to use to find the minimum of

    (I.3)

    we will realize that f(x) is not differentiable at , though the global minimum occurs at . In this case, optimization techniques that require the calculation of derivatives will not work.

    Problems become more challenging in higher‐dimensional spaces. For example, the nonlinear function

    (I.4)

    where for , has a minimum at , but this function is not differentiable at x.

    Therefore, optimization techniques have to be diverse to use gradient information when appropriate, and not to use it when it is not defined or not easily calculated. Though the above nonlinear optimization problems can be challenging to solve, constraints on the search domain and certain independent variables can make the search domain much more complicated, which can consequently make the problem even harder to solve. In addition, sometime, we have to optimize several objective functions instead of just one function, which will in turn make a problem more challenging to solve.

    In general, it is possible to write most optimization problems in the general form mathematically

    (I.5)

    (I.6)

    (I.7)

    where fi(x), ϕj(x), and ψk(x) are functions of the design vector

    (I.8)

    Here, the components xi of x are called design or decision variables, and they can be real continuous, discrete, or the mixture of these two. The functions fi(x) where are called the objective functions, and in the case of , there is only a single objective, and the problem becomes single‐objective optimization. The objective function is sometimes called the cost function, loss function, or energy function in the literature. The space spanned by the decision variables is called the design space or search space ℝn, while the space formed by the objective function values is called the response space or objective landscape. The equalities for ϕj and inequalities for ψk are called constraints. It is worth pointing out that we can also write the inequalities in the other way , and we can also formulate the objectives as maximization.

    If a solution vector x satisfies all the constraints ϕj(x) and ψk(x), it is called a feasible solution; otherwise, it is infeasible. All the feasible points or vectors form the feasible regions in the search space.

    In a rare but extreme case where there is no objective at all, there are only constraints. Such a problem is called a feasibility problem and any feasible solution is an optimal solution to such problems.

    If all the problem functions are linear, the problem becomes a linear programming (LP) problem. Here, the term programming means planning, it has nothing to do with computer programming in this context. Thus, mathematical optimization is also called mathematical programming. In addition, if the variables in LP only take integer values, the LP becomes an integer LP or simple integer programming (IP). If the variables can only take two values (0 or 1), such an LP is called binary IP.

    If there are no constraints (i.e. and ), the problem becomes an unconstrained optimization problem. If but , it becomes equality‐constrained optimization. Conversely, if but , it becomes the inequality‐constrained problem.

    The classification of optimization problems can also be done by looking at the modality of the objective landscape, which can be divided into multimodal problems and unimodal problems including convex optimization. In addition, classification can also be about the determinacy of the problem. If there is NO randomness in the formulation, the problem is called deterministic and in fact all the above problems are essentially deterministic. However, if there is uncertainty in the variables or function forms, then optimization involves probability distribution and expectation, such problems are often called stochastic optimization or robust optimization.

    Classification of optimization problems and the terminologies used in the optimization literature can be diverse and confusing. We summarize most of these terms in Figure I.1.

    Figure I.1 Classification of optimization problems.

    Whether an optimization problem is considered easy or hard, it can depend on many factors and the actual perspective of mathematical formulations. In fact, three factors that make a problem more challenging are: nonlinearity of the objective function, the high dimensionality of the problem, and the complex shape of the search domain.

    Nonliearity and multimodality: The high nonlinearity of a function to be optimized usually means multimodality with multiple local modes with multiple (even infinite) optima. In most cases, algorithms to solve such problems are more likely to get trapped in local modes.

    High dimensionality: High dimensionality is associated with the so‐called curse of dimensionality where the distance becomes large, while the volume of any algorithm that can actually search becomes insignificant compared to the vast feasible space.

    Constraints: The multiple complex constraints can make the feasible region complicated with irregular geometry or shapes. In some cases, feasible regions can be split into multiple disconnected regions with isolated islands, which makes it harder for algorithms to search all the feasible regions (thus potentially missing the true optimality).

    Other factors such as the evaluation time of an objective are also important. In many applications such as protein folding, bio‐informatics, aero‐space engineering, and deep machine learning (ML), the evaluation of a single objective can take a long time (from a few hours to days or even weeks), therefore the computational costs can be very high. Thus, the choice of efficient algorithms becomes paramount.

    Algorithms for solving optimization problems tend to be iterative, and thus multiple evaluations of objectives are needed, typically hundreds or thousands (or even millions) of evaluations. Different algorithms may have different efficiency and different requirements. For example, Newton’s method, which is gradient‐based, is very efficient for solving smooth objective functions, but they can get stuck in local modes if the objective is highly multimodal. If the objective is not smooth or has a kink, then the Nelder–Mead simplex method can be used because it is a gradient‐free method, and can work well even for problems with discontinuities, but it can become slow and get stuck in a local mode. Algorithms for solving nonlinear optimization are diverse, including the trust‐region method, interior‐point method, and others, but they are mostly local search methods. In a special case when the objective becomes convex, they can be very efficient. Quadratic programming (QP) and sequential quadratic programming use such convexity properties to their advantage.

    The simplex method for solving LP can be efficient in practice, but it requires all the problem functions to be linear. But, if an LP problem has integer variables, the simplex method will not work directly, it has to be combined with branch and bound to solve IP problems.

    As traditional methods are usually local search algorithms, one of the current trends is to use heuristic and metaheuristic algorithms. Here meta‐ means beyond or higher level, and they generally perform better than simple heuristics. In addition, all metaheuristic algorithms use certain tradeoff of randomization and local search. It is worth pointing out that there are no agreed definitions of heuristics and metaheuristics in the literature, some use heuristics and metaheuristics interchangeably. However, recent trends tend to name all stochastic algorithms with randomization and local search as metaheuristic. Here, we will also use this convention. Randomization provides a good way to move away from local search to the search on the global scale. Therefore, almost all metaheuristic algorithms intend to be suitable for global optimization, though global optimality may be still challenging to achieve for most problems in practice.

    Most metaheuristic algorithms are nature‐inspired as they have been developed based on some abstraction of nature. Nature has evolved over millions of years and has found perfect solutions to almost all the problems she met. We can thus learn the success of problem‐solving from nature and develop nature‐inspired heuristic and/or metaheuristic algorithms. More specifically, some nature‐inspired algorithms are inspired by Darwin’s evolutionary theory. Consequently, they are said to be biology‐inspired or simply bio‐inspired.

    Two major components of any metaheuristic algorithms are: selection of the best solutions and randomization. The selection of the best ensures that the solutions will converge to the optimality, while the randomization avoids the solutions being trapped at local optima and, at the same time, increases the diversity of the solutions. The good combination of these two components will usually ensure that the global optimality is achievable.

    Metaheuristic algorithms can be classified into many ways. One way is to classify them as: population‐based and trajectory‐based. For example, genetic algorithms are population‐based as they use a set of strings, so are the particle swarm optimization (PSO) and the firefly algorithm (FA) which use multiple agents or particles. PSO and FA are also referred to as agent‐based algorithms.

    On the other hand, simulated annealing (SA) uses a single agent or solution which moves through the design space or search space in a piecewise style. A better move or solution is always accepted, while a not‐so‐good move can be accepted with certain probability. The steps or moves trace a trajectory in the search space, with a nonzero probability that this trajectory can reach the global optimum.

    The types of algorithms that solve relevant optimization problems are summarized in Figure I.2 where we also include some machine learning algorithms such as artificial neural network (ANN), regression, and support vector machine (SVM).

    Figure I.2 Classification of optimization algorithms.

    We will introduce all the above optimization problems and related algorithms in this book. Therefore, the first two chapters review the fundamental mathematics relevant to optimization, complexity and algorithms. Then, Chapters 3–5 in Part II cover the formulation of optimization problems and traditional techniques. Chapters 6–10 in Part III focus on the applied optimization, including LP, IP, regression, and machine learning algorithms such as ANN, SVM as well as data mining techniques, and queueing theory and management. Part IV has two chapters, introducing the multiobjective optimization and handling of constraints, and Part V introduces the latest techniques of evolutionary algorithms and nature‐inspired optimization algorithms. Finally, the appendices briefly discuss the commonly used optimization software packages. In addition, each chapter has some exercises with answers. Detailed Bibliography or Further Reading also provided at the end of each chapter.

    Further Reading

    Boyd, S.P. and Vandenberghe, L. (2004). Convex Optimization. Cambridge, UK: Cambridge University Press.

    Fletcher, R. (2000). Practical Methods of Optimization, 2e. New York: Wiley.

    Gill, P.E., Murray, W., and Wright, M.H. (1982). Practical Optimization. Bingley: Emerald Publishing.

    Nocedal, J. and Wright, S.J. (2006). Numerical Optimization, 2e. New York: Springer.

    Yang, X.S. (2010). Engineering Optimization: An Introduction with Metaheuristic Applications. Hoboken, NJ: Wiley.

    Yang, X.S. (2014). Nature‐Inspired Optimization Algorithms. London: Elsevier.

    Part I

    Fundamentals

    1

    Mathematical Foundations

    The basic requirements of this book are the fundamental knowledge of functions, basic calculus, and vector algebra. However, we will review here the most relevant fundamentals of functions, vectors, differentiation, and integration. Then, we will introduce some useful concepts such as eigenvalues, complexity, convexity, probability distributions, and optimality conditions.

    1.1 Functions and Continuity

    1.1.1 Functions

    Loosely speaking, a function is a quantity (say y) which varies with another independent quantity or variable x in a deterministic way. For example, a simple quadratic function is

    (1.1)

    For any given value of x, there is a unique corresponding value of y. By varying x smoothly, we can vary y in such a manner that the point (x, y) will trace out a curve on the x–y plane (see Figure 1.1). Thus, x is called the independent variable, and y is called the dependent variable or function. Sometimes, in order to emphasize the relationship as a function, we use f(x) to express a generic function, showing that it is a function of x. This can also be written as .

    Figure 1.1 A simple quadratic function .

    The domain of a function is the set of numbers x for which the function f(x) is valid (that is, f(x) gives a valid value for a corresponding value of x). If a function is defined over a range , we say its domain is [a, b] that is called a closed interval. If both a and b are not included, we have , which is denoted by (a, b), and we call this interval an open interval. If b is included, while a is not, we have , and we often write this half‐open and half‐closed interval as . Thus, the domain of function is the whole set of all real numbers ℝ, so we have

    (1.2)

    Here, the notation means infinity. In this case, the domain of is all the real numbers, which can be simply written as ℝ, that is, .

    All the values that a function can take for a given domain form the range of the function. Thus, the range of is or . Here, the closed bracket [ means that the value (here 0) is included as part of the range, while the open round bracket ) means that the value (here + ) is not part of the range.

    1.1.2 Continuity

    A function is called continuous if an infinitely small change δ of the independent variable x always lead to an infinitely small change in . Alternatively, we can loosely view that the graph representing the function forms a single piece, unbroken curve. More formally, we can say that for any small change of the independent variable in the domain, there is an such that

    (1.3)

    This is the continuity condition. Obviously, functions such as x, , and x² are all continuous.

    If a function does not satisfy the continuity condition, then the function is called discontinuous. For example, the Heaviside step function

    (1.4)

    is discontinuous at as shown in Figure 1.2 where the solid dot means that is included in the right branch , and the hollow dot means that it is not included. In this case, this function is called a right‐continuous function.

    Figure 1.2 Discontinuity of the Heaviside step function at .

    1.1.3 Upper and Lower Bounds

    For a given non‐empty set of real numbers, we now introduce some important concepts such as the supremum and infimum. A number is called an upper bound for S if for all . An upper bound β is said to be the least (or smallest) upper bound for S, or the supremum, if for any upper bound . This is often written as

    (1.5)

    All such notations are widely used in the literature of mathematical analysis. Here, denotes both equality and definition, which means is identical to.

    On the other hand, a number L is called a lower bound for S if for all x in S (that is, for all x, denoted by ). A lower bound α is referred to as the greatest (or largest) lower bound if for any lower bound L, which is written as

    (1.6)

    In general, both the supremum β and the infimum α, if they exist, may or may not belong to S.

    Example 1.1

    For example, any numbers greater than 5, say, 7.2 and 500 are an upper bound for the interval or . However, its smallest upper bound (or sup) is 5. Similarly, numbers such as and are lower bound of the interval, but is the greatest lower bound (or inf). In addition, the interval has an infimum of 15 but it has no upper bound. That is to say, its supremum does not exist, or .

    There is an important completeness axiom which says that if a non‐empty set of real numbers is bounded above, then it has a supremum. Similarly, if a non‐empty set of real numbers is bounded below, then it has an infimum.

    Furthermore, the maximum for S is the largest value of all elements , and often written as max(S) or max S, while the minimum, min(S) or min S, is the smallest value among all . For the same interval , the maximum of this interval is 5 which is equal to its supremum, while its minimum 5 is also equal to its infimum. Though the supremum and infimum are not necessarily part of the set S, however, the maximum and minimum (if they exist) always belong to the set.

    However, the concepts of supremum (or infimum) and maximum (or minimum) are not the same, and maximum/minimum may not always exist.

    Example 1.2

    For example, the interval or has the supremum of , but S has no maximum.

    Similarly, the interval does not have a minimum, though its infimum is . Furthermore, the open interval has no maximum or minimum; however, its supremum is 7, and infimum is .

    It is worth pointing out that the problems we will discuss in this book will always have at least a maximum or a minimum.

    1.2 Review of Calculus

    1.2.1 Differentiation

    The gradient or first derivative of a function f(x) at point x is defined by

    (1.7)

    From the definition and basic function operations, it is straightforward to show that

    (1.8)

    In addition, for any two functions f(x) and g(x), we have

    (1.9)

    where a, b are two real constants. Therefore, it is easy to show that

    (1.10)

    where k is a constant. This means that a family of functions shifted by a different constant k will have the same gradient at the same point x, as shown in Figure 1.3.

    Figure 1.3 The gradients of a family of curves (where at any point x are the same .

    Some useful differentiation rules are the product rule

    (1.11)

    and the chair rule

    (1.12)

    where f(g(x)) is a composite function, which means that f is a function of g, and g is a function of x.

    Example 1.3

    For example, from and , we have

    Similarly, we have

    From the product rule (1.11), if we replace g(x) by 1/g(x), we have

    (1.13)

    and

    (1.14)

    which is the well‐known quotient rule. For example, we have

    (1.15)

    It is worth pointing out that a continuous function may not have well‐defined derivatives. For example, the absolute or modulus function

    (1.16)

    does not have a well‐defined gradient at x because the gradient of is +1 if x approaches 0 from (using notation ). However, if x approaches 0 from (using notation ), the gradient of is . That is

    (1.17)

    In this case, we say that is not differentiable at . However, for , we can write

    (1.18)

    Example 1.4

    A nature extension of this is that for a function f(x), we have

    (1.19)

    As an example, for , we have

    Higher derivatives of a univariate real function can be defined as

    (1.20)

    for all positive integers .

    Example 1.5

    The first, second, and third derivatives of are

    and

    For a continuous function f(x), if its first derivatives are well defined at every point in the whole domain, the function is called differentiable or a continuously differential function. A continuously differentiable function is said to be class C¹ if its first derivative exists and is continuous. Similarly, a function is said to be class C² if its both first and second derivatives exist, and are continuous. This can be extended to class Ck in a similar manner. If the derivatives of all orders (all positive integers) everywhere in its domain, the function is called smooth.

    It is straightforward to check that , sin(x), exp(x), and are all smooth functions, but and are not smooth. Some of these functions are shown in Figure 1.4.

    Figure 1.4 Smooth functions (left) and non‐smooth (but continuous) functions (right).

    1.2.2 Taylor Expansions

    In numerical methods and some mathematical analysis, series expansions make some calculations easier. For example, we can write the exponential function ex as a series about as

    (1.21)

    Now let us try to determine these coefficients. At , we have

    (1.22)

    which gives . In order to reduce the power or order of the expansion so that we can determine the next coefficient, we first differentiate both sides of Eq. (1.21) once; we have

    (1.23)

    By setting again , we have

    (1.24)

    which gives . Similarly, differentiating it again, we have

    (1.25)

    At , we get

    (1.26)

    or !. Here, is the factorial of 2. In general, the factorial n ! is defined as

    .

    Following the same procedure and differentiating it n times, we have

    (1.27)

    and leads to !. Therefore, the final series expansion can be written as

    (1.28)

    which is an infinite series. Obviously, we can follow a similar process to expand other functions. We have seen here the importance of differentiation and derivatives.

    If we know the value of f(x) at x0, we can use some approximations in a small interval (see Figure 1.5). Following the same idea as Eq. (1.21), we can first write the approximation in the following general form:

    (1.29)

    Enjoying the preview?
    Page 1 of 1