Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Multi-armed Bandit Allocation Indices
Multi-armed Bandit Allocation Indices
Multi-armed Bandit Allocation Indices
Ebook531 pages4 hours

Multi-armed Bandit Allocation Indices

Rating: 0 out of 5 stars

()

Read preview

About this ebook

In 1989 the first edition of this book set out Gittins' pioneering index solution to the multi-armed bandit problem and his subsequent investigation of a wide of sequential resource allocation and stochastic scheduling problems. Since then there has been a remarkable flowering of new insights, generalizations and applications, to which Glazebrook and Weber have made major contributions.

This second edition brings the story up to date. There are new chapters on the achievable region approach to stochastic optimization problems, the construction of performance bounds for suboptimal policies, Whittle's restless bandits, and the use of Lagrangian relaxation in the construction and evaluation of index policies. Some of the many varied proofs of the index theorem are discussed along with the insights that they provide. Many contemporary applications are surveyed, and over 150 new references are included.

Over the past 40 years the Gittins index has helped theoreticians and practitioners to address a huge variety of problems within chemometrics, economics, engineering, numerical analysis, operational research, probability, statistics and website design. This new edition will be an important resource for others wishing to use this approach.

LanguageEnglish
PublisherWiley
Release dateFeb 18, 2011
ISBN9781119990215
Multi-armed Bandit Allocation Indices

Related to Multi-armed Bandit Allocation Indices

Related ebooks

Mathematics For You

View More

Related articles

Reviews for Multi-armed Bandit Allocation Indices

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Multi-armed Bandit Allocation Indices - John Gittins

    Title Page

    This edition first published 2011

    © 2011 John Wiley & Sons, Ltd

    Registered office

    John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

    For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.

    The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

    All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

    Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

    Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

    MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This book's use or discussion of MATLAB® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® software.

    Library of Congress Cataloging-in-Publication Data

    Gittins, John C., 1938-

    Multi-armed bandit allocation indices / John Gittins, Richard Weber, Kevin Glazebrook.—2nd ed.

    p. cm.

    Includes bibliographical references and index.

    ISBN 978-0-470-67002-6(cloth)

    1. Resource allocation—Mathematical models. 2. Mathematical optimization. 3. Programming (Mathematics) I. Weber, Richard, 1953- II. Glazebrook, Kevin D., 1950- III. Title.

    QA279.G55 2011

    519.5—dc22

    2010044409

    A catalogue record for this book is available from the British Library.

    Print ISBN: 978-0-470-67002-6

    ePDF ISBN: 978-0-470-98004-0

    oBook ISBN: 978-0-470-98003-3

    ePub ISBN: 978-1-119-99021-5

    Foreword

    John Gittins' first edition of this book marked the end of an era, the era in which a succession of investigators struggled for an understanding and ‘solution’ of the multi-armed bandit problem. My foreword to that edition celebrated the gaining of this understanding, and so it seems fitting that this should be retained.

    The opening of a new era was like the stirring of an ant-heap, with the sudden emergence of an avid multitude and a rush of scurrying activity. The first phase was one of exploitation, in which each worker tried to apply his special expertise in this new context. This yielded, among other things, a remarkable array of proofs of optimality of the Gittins index policy. The most elegant and insightful was certainly Richard Weber's ‘prevailing charge’ proof (see section 2.5), expressible in a single paragraph of verbal reasoning. I must confess, however, to a lingering attachment to the dynamic programming proof (see section 4.3) which provided also the value function of the Gittins policy and a treatment immediately generalizable to the case of superprocesses.

    The phase of real interest was the subsequent one, of exploration. To what range of models can the Gittins technique be extended? Here the simile latent in the terms ‘bandit’ and ‘arm’ (with their gambling machine origins) begins to become quite strained. I myself spoke rather of a population of ‘projects’, only one of which could be engaged at any given time. One wishes to operate only the high-value projects, but can determine which these are only by trying them all—it is to this that the phrase ‘exploitation and exploration’ first referred. The process is then one of allocation and learning. Any project does not itself change, but its ‘informational state’ does—one's knowledge of it as expressed by a Bayesian updating.

    However, situations occur for which the projects do also have a physical state, which may change by stochastic rules, or for which the population of projects changes by immigration or emigration. These are cases one would wish to cover. It is then more natural to think of the projects as ‘activities’, having their own dynamic structure, and whose performance one would wish to optimize by the appropriate allocation of given resources over activities. This is the classical economic activity analysis in a dynamic setting. However, we are now effectively adding to this the feature that the current physical states of the activities are incompletely known, and must be inferred from observation. ‘Observation time’ is then an additional resource to be allocated. Section 6.8 gives the clearest indication of such an ambition.

    Extension to such cases requires new tools, and Chapters 5 and 6 consider two such: study of the achievable region and Lagrangian relaxation of the optimization. These central chapters present very clearly the current state of theory in these areas, a considerable enhancement of understanding and many examples of cases of practical interest for which an optimal policy—of index character—is determined. It must be confessed, however, that the general problems of indexability and optimality remain beached on the research frontier, although one senses a rising tide which will lift them.

    Explicitness is also a feature of Chapters 7 and 8, which go into hard detail on the determination of indices under different statistical assumptions.

    This scholarly and modern work gives a satisfyingly complete, rigorous and illuminating account of the current state of the subject. It also conveys a historical perspective, from the work of Klimov and von Olivier (which appeared in print before appreciation of John Gittins' advance had become general) to the present day. All three authors of the text have made key and recent contributions to the subject, and are in the best position to lay its immediate course.

    Peter Whittle

    Foreword to the First Edition

    The term ‘Gittins index’ now has firm currency in the literature, denoting the concept which first proved so crucial in the solution of the long-standing multi-armed bandit problem and since then has provided a guide for the deeper understanding of all such problems. The author is, nevertheless, too modest to use the term so I regard it as my sole role to reassure the potential reader that the author is indeed the Gittins of the index, and that this book sets forth his pioneering work on the solution of the multi-armed bandit problem and his subsequent investigation of a wide class of sequential allocation problems.

    Such allocation problems are concerned with the optimal division of resources between projects which need resources in order to develop and which yield benefit at a rate depending upon their degree of development. They embody in essential form a conflict evident in all human action. This is the conflict between taking those actions which yield immediate reward and those (such as acquiring information or skill, or preparing the ground) whose benefit will come only later.

    The multi-armed bandit is a prototype of this class of problems, propounded during the Second World War, and soon recognized as so difficult that it quickly became a classic, and a byword for intransigence. In fact, John Gittins had solved the problem by the late sixties, although the fact that he had done so was not generally recognized until the early eighties. I can illustrate the mode of propagation of this news, when it began to propagate, by telling of an American friend of mine, a colleague of high repute, who asked an equally well-known colleague ‘What would you say if you were told that the multi-armed bandit problem had been solved?’ The reply was somewhat in the Johnsonian form: ‘Sir, the multi-armed bandit problem is not of such a nature that it can be solved’. My friend then undertook to convince the doubter in a quarter of an hour. This is indeed a feature of John's solution: that, once explained, it carries conviction even before it is proved.

    John Gittins gives here an account which unifies his original pioneering contributions with the considerable development achieved by both by himself and other workers subsequently. I recommend the book as the authentic and authoritative source-work.

    Peter Whittle

    Preface

    The first edition of this book was published in 1989. A shortened version of the preface to that edition follows this preface. The uninitiated reader might find it helpful to read it at this point.

    Since 1989 the field has developed apace. There has been a remarkable flowering of different proofs of the main index theorem, each giving its own particular insight as to why the result is true. Major bodies of related theory have also emerged, notably the achievable region and restless bandit methodologies, and the discussion in economics of the appropriate balance between exploration and exploitation. These have led, for example, to improved algorithms for calculating indices and to improved bounds on the performance of index strategies when they are not necessarily optimal. These developments form the case for a new edition, plus the fact that there is an ongoing interest in the book, which is now difficult to buy.

    There are now three authors rather than just John Gittins. Kevin Glazebrook and Richard Weber bring familiarity with more recent developments, to which they have made important contributions. Their expertise has allowed us to include new chapters on achievable regions and on restless bandits.

    We have also taken the opportunity to substantially revise the core earlier chapters. Our aim has been to provide an accessible introduction to the main ideas, taking advantage of more recent work, before proceeding to the more challenging material in Chapters 4, 5 and 6. Overall we have tried to produce an expository account rather than just a research monograph. The exercises are designed as an aid to a deeper understanding of the material.

    The Gittins index, as it is commonly known, for a project competing for investment with other projects allows both for the immediate expected reward, and for the value of the information gained from the initial investment, which may be useful in securing later rewards. These factors are often called exploitation and exploration, respectively.

    In this spirit Chapter 1, which is a taster for the rest of the book, is subtitled ‘Exploration’. The mainstream of the book then continues through Chapters 2 to 4. It breaks into independent substreams represented by Chapter 5, Chapter 6, Chapters 7 and 8, and Chapter 9, which looks briefly at five further application areas.

    Readers should have some prior knowledge of university-level probability theory, including Markov processes, and for a full understanding of Chapters 5 and 6 will need to know the basic ideas of linear programming. We hope the book will be of interest to researchers in chemometrics, combinatorics, economics, management science, numerical analysis, operational research, probability theory and statistics. Few readers will want to read from cover to cover. For example, both chemometricians and numerical analysts are likely to be interested mainly in Chapter 8 and the tables which it describes: chemometricians as potential users of the tables, and numerical analysts for the sake of the methods of calculation. As another example, Chapters 1, 2 and 3 could form the basis for a specialized lecture course in applied probability for final-year undergraduates.

    In addition to those colleagues acknowledged in the preface to the first edition we are very grateful to Anne-Marie Oreskovich and Morton Sorenson for further help in the calculation of index values.

    John Gittins

    Department of Statistics, University of Oxford

    Kevin Glazebrook

    Department of Management Science, Lancaster University

    Richard Weber

    Statistical Laboratory, University of Cambridge

    August 2010

    Preface to the First Edition

    A prospector looking for gold in the Australian outback has to decide where to look, and if he prospects for any length of time must repeatedly take further such decisions in the light of his success to date. His decision problem is how to allocate his time, in a sequential manner, so as to maximize his likely reward. A similar problem faces a student about to graduate from university who is looking for employment, or the manager of a commercial laboratory with several research projects competing for the attention of the scientists at his or her disposal.

    It is to problems like these that the indices described in this book may be applied. The problems are characterized by alternative independent ways in which time or effort may be consumed, for each of which the outcome is uncertain and may be determined only in the course of applying the effort. The choice at each stage is therefore determined partly on the basis of maximizing the expected immediate rate of return, and partly by the need to reduce uncertainty, and thereby provide a basis for better choices later on. It is the tension between these two requirements that makes the decision problem both interesting and difficult.

    The classic problem of this type is the multi-armed bandit problem, in which several Bernoulli processes with different unknown success probabilities are available, each success yielding the same reward. There are already two good books on this subject: by Presman and Sonin (1982), and by Berry and Fristedt (1985). Both books go well beyond the simple version of the problem just stated: most notably to include, respectively, an exponentially distributed rather than constant interval between trials, and a study of the consequences of discounting rewards in different ways.

    The aims of this book are to expound the theory of dynamic allocation indices, and to explore the class of problems for which they define optimal policies. These include sampling problems, like the multi-armed bandit problem, and stochastic scheduling problems, such as the research manager's problem. Tables of index values are given for a number of sampling problems, including the classical Bayesian multi-armed bandit problem. For the most part, and except where otherwise indicated, the book is an account of original work, though much of it has appeared before in the publications bearing the author's name which are listed in the references.

    It is a pleasure to acknowledge the stimulus and encouragement which this work has received over the years by conversations, and in some cases collaboration, with Joao Amaral, Tony Baker, John Bather, Sten Bergman, Owen Davies, Kevin Glazebrook, David Jones, Frank Kelly, Aamer Khan, Dennis Lindley, Peter Nash, David Roberts and Peter Whittle.

    I should particularly like to thank Joao Amaral, Frank Geisler and David Jones for the substantial effort involved in calculating index values, and Brenda Willoughby for her painstaking work in typing the manuscript. The book is dedicated to Hugh and Anna, who have grown up during its preparation.

    John Gittins

    Mathematical Institute, Oxford University

    March 1989

    Chapter 1

    Introduction or Exploration

    This book is about mathematical models for optimizing in a sequential manner the allocation of effort between a number of competing projects. The effort and the projects may take a variety of forms. Examples are: an industrial processor and jobs waiting to be processed; a server with a queue of customers; an industrial laboratory with research projects; any busy person with jobs to do; a stream of patients and alternative treatments (yes, the patients do correspond to effort—see Problem 5 later in this chapter); a searcher who may look in different places. In every case effort is treated as being homogeneous, and the problem is to allocate it between the different projects so as to maximize the expected total reward which they yield. It is a sequential problem, as effort is allowed to be reallocated in a feedback manner, taking account of the pattern of rewards so far achieved. The reallocations are assumed to be costless, and to take a negligible time, since the alternative is to impose a travelling-salesman-like feature, thereby seriously complicating an already difficult problem.

    The techniques which come under the heading of dynamic programming have been devised for sequential optimization problems. The key idea is a recurrence equation relating the expected total reward (call this the payoff) at a given decision time to the distribution of its possible values at the next decision time (see equation (2.2)). Sometimes this equation may be solved analytically. Otherwise a recursive numerical solution may, at any rate in principle, be carried out. This involves making an initial approximation to the payoff function, and then successive further approximations by substituting in the right-hand side of the recurrence equation. As Bellman (1957), for many years the chief protagonist of this methodology, pointed out, using the recurrence equation involves less computing than a complete enumeration of all policies and their corresponding payoffs, but none the less soon runs into the sands of intractable storage and processing requirements as the number of variables on which the payoff function depends increases.

    For the problem of allocating effort to projects, the number of variables is at least equal to the number of projects. An attractive idea, therefore, is to establish priority indices for each project, depending on its past history but not that of any other project, and to allocate effort at each decision time only to the project with the highest current index value. To calculate these indices it should be possible to calibrate a project in a given state against some set of standard projects with simple properties. If this could be done we should have a reasonable policy without having to deal with any function of the states of more than one project.

    That optimal policies for some problems of effort allocation are expressible in terms of such indices is well known. The first three problems described in this chapter are cases in point. Chapters 2, 3 and 4 set out a general theory which puts these results into context. They include several theorems asserting the optimality of index policies under different conditions. In fact five of the seven problems described in this chapter may be solved fairly rapidly by using the index theorem (Theorem 2.1), and its Corollary 3.5. Problem 4 requires Theorem 3.1, which is a continuous-time version of the index theorem.

    Since they may change as more effort is allocated, these priority indices may aptly be termed dynamic allocation indices. The policies that they define include, in relevant circumstances, the cμ-rule, Smith's rule, and the shortest processing time rule (SPT). The main aim of this chapter is to exhibit their range of application by describing a number of particular instances. A second aim is to give an informal introduction to the main methods available for determining indices. These are by (i) interchange arguments; (ii) exploiting any special features of the bandit processes concerned, in particular those which lead to the optimality of myopic policies; (iii) calibration by reference to standard bandit processes, often involving iteration using the dynamic programming recurrence equation; and (iv) using the fact that a dynamic allocation index may be regarded as a maximized equivalent constant reward rate.

    Problem 1 Single-Machine Scheduling (see also Section 2.12, Section 3.2 and Exercise 4.1)

    There are n jobs ready to be processed on a single machine. A cost ci is incurred per unit of time until job i has been completed, and the service time (or processing time) for job i is si. In what order should the jobs be processed so as to minimize the total cost?

    Problem 2 Gold-Mining (see also Section 3.5.2)

    A woman owns n gold mines and a gold-mining machine. Each day she must assign the machine to one of the mines. When the machine is assigned to mine i there is a probability pi that it extracts a proportion qi of the gold left in the mine, and a probability 1 − pi that it extracts no gold and breaks down permanently. To what sequence of mines on successive days should the machine be assigned so as to maximize the expected amount of gold mined before it breaks down?

    Problem 3 Search (see also Section 3.5.4 and Exercise 3.6)

    A stationary object is hidden in one of n boxes. The probability that a search of box i finds the object if it is in box i is qi. The probability that the object is in box i is pi, and changes by Bayes' theorem as successive boxes are searched. The cost of a single search of box i is ci. How should the boxes be sequenced for search so as to minimize the expected cost of finding the object?

    For Problem 1 an optimal schedule is one which processes the jobs in decreasing order of ci/si, as may be shown by the following simple interchange argument. If jobs 1 and 2 are processed as the first and second, respectively, then the total holding cost due to these two jobs is c1s1 + c2(s1 + s2). If the processing order of jobs 1 and 2 is interchanged, then this quantity becomes c2s2 + c1(s1 + s2). The holding cost due to the other jobs is unchanged. Thus, the first ordering has the smallest cost if and only if c1/s1 ≥ c2/s2. It follows that the relationship ci/si cj/sj must hold for any two jobs i and j such that i immediately precedes j in an optimal processing order of the jobs. This implies that the total holding cost is maximized by processing the n jobs in decreasing order of indices which are defined as μi = ci/si.

    For Problem 2, an optimal policy is one which, when xi is the amount of gold remaining in mine i on a particular day, allocates the machine to a mine j such that

    images/c01_I0001.gif

    For Problem 3, if box j is searched next under an optimal policy, then

    images/c01_I0002.gif

    For Problems 2 and 3, a policy is specified by a sequence of numbers corresponding to the order of the mines to which the machine is allocated until it breaks down, or the order in which the boxes are to be searched until the object is found. The policies just quoted may therefore be shown to be optimal in the class C of all those policies specified by sequences in which each i (= 1, 2, …, n) occurs an infinite number of times, by arguments involving interchanges of adjacent numbers in such a sequence. Since any policy may be matched by a policy in C for any arbitrarily large initial portion of the specifying number sequence, it follows that the above policies are optimal in the class of all policies.

    The optimal policies for Problems 1, 2 and 3 may be described as follows. At each stage where a job, mine or box is to be selected there is a real-valued function defined on the set of available alternatives, and the one selected is such as to maximize this function. These functions, then, are dynamic allocation indices for the three problems.

    We have described these indices as functions on the sets of available alternatives. More precisely, the index for Problem 2, for example, is a function of the three variables, xi, qi and pi, which constitute the information available for mine i when the amount of gold which it contains is xi.

    The point of this observation is that our indices do not simply tell us which alternative to choose at a given stage in a particular problem with n alternatives. Rather, they specify the optimal policy for a problem with any set of alternatives of a given type. This means that if we know that such an index exists for a given class of problems, without as yet having an explicit expression for it, then it is sufficient, in order to find an explicit expression, to consider problems within the class for which n = 2, and indeed even this restricted class of problems need not be solved in full generality, as our next example shows.

    Problem 4 Industrial Research (see also Section 3.2)

    The manager of a team of industrial scientists has n research projects which may be carried out in any order. Switches from project to project cause negligible loss of time, whether these occur when a project has been successfully completed or not. The value to the manager's employer of completing project i at time t is Vie−γt, (i = 1, 2, …, n;γ > 0). The time which the team would need to spend on project i in order to complete it is a random variable σi, which has distribution function Fi(t) and density function fi(t). What policy should the manager follow in order to maximize the expected total value generated by the n projects?

    Following standard usage, a switch from one project to another project is called a preemption. We say that we have the preemptive case if preemption is allowed at any time, even if the project that is currently receiving effort has not yet been completed. We say that we have the non-preemptive case if a project, once started, must continue to receive processing effort until it has been completed. In the non-preemptive case, the optimal schedule can be found with an interchange argument, just as in Problem 1. Suppose that projects 1 and 2 are the first two projects processed. The expected value obtained from these projects when 1 is processed before 2 is at least as great as when their order of processing is reversed if

    images/c01_I0003.gif

    which is equivalent to

    images/c01_I0004.gif

    We may conclude that the expected total value obtained from the n projects is maximized by processing them in decreasing order of indices defined as

    images/c01_I0005.gif

    We can make a connection with Problem 1. Suppose that under a given order of processing, project i completes at time ti. Then the expected total value obtained from the projects can be written as

    images/c01_I0006.gif

    Thus, in the limit as images/c01_I0007.gif , the problem of maximizing images/c01_I0008.gif is identical to that of minimizing EiViti, a measure that is identical to that in Problem 1 if we identify Vi with ci. Also, as images/c01_I0007.gif we have γμi Vi/Eσi, an index analogous to that of ci/si in Problem 1.

    The characterization of an optimal policy is not as straightforward in the preemptive case, that is when arbitrary switching amongst projects is allowed. However, let us take advantage of the fact that the problem may be shown (Theorem 3.1) to have an index, and proceed to find an expression for the index by considering the case n = 2. Project 1 will be an arbitrary project of the type described. Project 2 we suppose to require a non-random time Δ to complete, and V2 = λΔ. Let CΔ denote the class of projects of this type for a given Δ and different positive values of λ.

    For a set of projects all of which are in CΔ it is best to work on them in order of decreasing λ-values. This is almost obvious, as such a policy results in the larger undiscounted values λΔ being discounted least by the factors e−γs, and is easily checked by an interchange argument. It is also not difficult to convince oneself that in this case work on a project should not be interrupted before the project has been completed. Thus, for CΔ the λ-values define an index.

    Consider then the case n = 2, where project 1 is arbitrary and project 2 is in CΔ. If we can find a λ for which it is optimal to work initially either on project 1 or on project 2 then, since an index is known to exist for the entire set of projects, it follows that one such index function may be defined by assigning the index value λ to project 1 as well as to project 2. In effect we may calibrate the entire set of projects by using CΔ as a measuring device. Any monotone function of λ would of course serve equally well as an index.

    By the optimality property of the index rule, it follows that if in a given problem it is optimal to work on a member of CΔ then, if the problem is modified by adding further members of CΔ all with the same λ-value, it remains optimal to work on these in succession until they have all been completed. Thus for calibration purposes we may suppose that several replicates of project 2 are available. For good measure suppose there is an infinite number of replicates. (Discounting ensures that the maximum expected reward over all policies remains finite.) Note, too, that since the value of Δ is arbitrary we may as well work with the limiting case as Δ tends to zero, so that our infinity of replicates of project 2 reduce to a continuous reward stream at the undiscounted rate λ, an object which will appear later, and termed a standard bandit process with parameter λ, or simply S(λ).

    We should like, then, a criterion for deciding whether it is best to work on project 1 first, with the option of switching later to S(λ), or to start on S(λ). Denote these two alternatives by 1λ and λ. Note that the index rule never requires a switch back to project 1 in the case of positive Δ, and such a switch is therefore not required for optimality either for positive Δ or in the limiting case of a standard bandit process.

    Under 1λ, work proceeds on project 1 up to some time t (possibly ∞), or until project 1 terminates, whichever occurs first, and then switches to S(λ), from which point 1λ and λ yield identical reward streams. Thus 1λ is strictly preferable if

    1.1 1.1

    these two expressions being, respectively, the expected reward from 1λ up to the switch to S(λ), and the expected reward from λ up to that point. The uninformative suffix 1 has been dropped in (1.1). The right-hand side of (1.1) takes the given form since λe−γsδs + os) = return from S(λ) in the interval (s, s + δs), and 1 − F(s) = P{switch to S(λ) under 1λ takes place after s}.

    It is strictly better to start work on project 1 if and only if (1.1) holds for some t > 0, that is iff

    1.2 1.2

    It follows that the right-hand side of (1.2) is the required index for a project on which work has not yet started. For a project on which work has already continued for a time x without successful completion, the density f(s) and distribution function F(s) in (1.2) must be replaced by the conditional density f(s)/[1 − F(x)] and the conditional distribution function [F(s) − F(x)]/[1 − F(x)], giving the general expression for the index

    1.3 1.3

    The completion rate (or hazard rate) is defined as ρ(s) = f(s)/[1 − F(s)]. If this is decreasing then (1.3) reduces to Vρ(x). Since ρ(x) is the probability density for completion at time x conditional on no completion before x, using Vρ(x) as an index corresponds to maximizing the expected reward during the next infinitesimal time interval, a policy sometimes termed myopic because of its neglect of what might happen later, or disregard of long-range vision, to continue the optic metaphor. What we have shown is that for Problem 4 there is no

    Enjoying the preview?
    Page 1 of 1