Ebook531 pages4 hours

Multi-armed Bandit Allocation Indices

Name: Multi-armed Bandit Allocation Indices
Author: John Gittins
ISBN: 9781119990215

By John Gittins, Kevin Glazebrook and Richard Weber

Rating: 0 out of 5 stars

()

Read preview

About this ebook

In 1989 the first edition of this book set out Gittins' pioneering index solution to the multi-armed bandit problem and his subsequent investigation of a wide of sequential resource allocation and stochastic scheduling problems. Since then there has been a remarkable flowering of new insights, generalizations and applications, to which Glazebrook and Weber have made major contributions.

This second edition brings the story up to date. There are new chapters on the achievable region approach to stochastic optimization problems, the construction of performance bounds for suboptimal policies, Whittle's restless bandits, and the use of Lagrangian relaxation in the construction and evaluation of index policies. Some of the many varied proofs of the index theorem are discussed along with the insights that they provide. Many contemporary applications are surveyed, and over 150 new references are included.

Over the past 40 years the Gittins index has helped theoreticians and practitioners to address a huge variety of problems within chemometrics, economics, engineering, numerical analysis, operational research, probability, statistics and website design. This new edition will be an important resource for others wishing to use this approach.

Skip carousel

Mathematics

LanguageEnglish

PublisherWiley

Release dateFeb 18, 2011

ISBN9781119990215

Author

John Gittins

Related authors

Skip carousel

Related to Multi-armed Bandit Allocation Indices

Related ebooks

Skip carousel

Statistical Arbitrage: Algorithmic Trading Insights and Techniques
Ebook
Statistical Arbitrage: Algorithmic Trading Insights and Techniques
byAndrew Pole
Rating: 3 out of 5 stars
3/5
Bayesian Analysis of Stochastic Process Models
Ebook
Bayesian Analysis of Stochastic Process Models
byDavid Insua
Rating: 0 out of 5 stars
0 ratings
Portfolio Optimization with Different Information Flow
Ebook
Portfolio Optimization with Different Information Flow
byCaroline Hillairet
Rating: 0 out of 5 stars
0 ratings
Market Microstructure: Confronting Many Viewpoints
Ebook
Market Microstructure: Confronting Many Viewpoints
byFrédéric Abergel
Rating: 0 out of 5 stars
0 ratings
Aspects of Multivariate Statistical Theory
Ebook
Aspects of Multivariate Statistical Theory
byRobb J. Muirhead
Rating: 0 out of 5 stars
0 ratings
A Digital Framework for Industry 4.0: Managing Strategy
Ebook
A Digital Framework for Industry 4.0: Managing Strategy
byAna Landeta Echeberria
Rating: 0 out of 5 stars
0 ratings
Multiple Imputation and its Application
Ebook
Multiple Imputation and its Application
byJames Carpenter
Rating: 0 out of 5 stars
0 ratings
Fourier Transform Methods in Finance
Ebook
Fourier Transform Methods in Finance
byUmberto Cherubini
Rating: 0 out of 5 stars
0 ratings
Introduction to Probability Models
Ebook
Introduction to Probability Models
bySheldon M. Ross
Rating: 0 out of 5 stars
0 ratings
Computational Statistics
Ebook
Computational Statistics
byGeof H. Givens
Rating: 5 out of 5 stars
5/5
Bayesian Optimization and Data Science
Ebook
Bayesian Optimization and Data Science
byFrancesco Archetti
Rating: 0 out of 5 stars
0 ratings
Real Option Analysis and Climate Change: A New Framework for Environmental Policy Analysis
Ebook
Real Option Analysis and Climate Change: A New Framework for Environmental Policy Analysis
byBenoit Morel
Rating: 0 out of 5 stars
0 ratings
Uncertainty Quantification and Stochastic Modeling with Matlab
Ebook
Uncertainty Quantification and Stochastic Modeling with Matlab
byEduardo Souza de Cursi
Rating: 0 out of 5 stars
0 ratings
Cluster Analysis
Ebook
Cluster Analysis
byBrian S. Everitt
Rating: 4 out of 5 stars
4/5
A Theory of Individual Behavior
Ebook
A Theory of Individual Behavior
byRobert Wichers
Rating: 0 out of 5 stars
0 ratings
Efficient Management of Large Metadata Catalogs in a Ubiquitous Computing Environment
Ebook
Efficient Management of Large Metadata Catalogs in a Ubiquitous Computing Environment
byDaniel Beatty
Rating: 0 out of 5 stars
0 ratings
Financial Statistics and Mathematical Finance: Methods, Models and Applications
Ebook
Financial Statistics and Mathematical Finance: Methods, Models and Applications
byAnsgar Steland
Rating: 3 out of 5 stars
3/5
Handbook of Probability
Ebook
Handbook of Probability
byIonut Florescu
Rating: 0 out of 5 stars
0 ratings
Credit-Risk Modelling: Theoretical Foundations, Diagnostic Tools, Practical Examples, and Numerical Recipes in Python
Ebook
Credit-Risk Modelling: Theoretical Foundations, Diagnostic Tools, Practical Examples, and Numerical Recipes in Python
byDavid Jamieson Bolder
Rating: 0 out of 5 stars
0 ratings
Adaptive Processes in Economic Systems by Roy E Murphy
Ebook
Adaptive Processes in Economic Systems by Roy E Murphy
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
Fractal-Based Point Processes
Ebook
Fractal-Based Point Processes
bySteven Bradley Lowen
Rating: 4 out of 5 stars
4/5
Analysis of Financial Time Series
Ebook
Analysis of Financial Time Series
byRuey S. Tsay
Rating: 4 out of 5 stars
4/5
Practical Business Statistics
Ebook
Practical Business Statistics
byAndrew F. Siegel
Rating: 0 out of 5 stars
0 ratings
Introduction to Integral Calculus: Systematic Studies with Engineering Applications for Beginners
Ebook
Introduction to Integral Calculus: Systematic Studies with Engineering Applications for Beginners
byUlrich L. Rohde
Rating: 5 out of 5 stars
5/5
Applications of Statistics to Industrial Experimentation
Ebook
Applications of Statistics to Industrial Experimentation
byCuthbert Daniel
Rating: 3 out of 5 stars
3/5
Petroleum Reservoir Simulation: The Engineering Approach
Ebook
Petroleum Reservoir Simulation: The Engineering Approach
byJ.H. Abou-Kassem
Rating: 0 out of 5 stars
0 ratings
Maximum Likelihood Estimation and Inference: With Examples in R, SAS and ADMB
Ebook
Maximum Likelihood Estimation and Inference: With Examples in R, SAS and ADMB
byRussell B. Millar
Rating: 4 out of 5 stars
4/5
Optimal Design of Experiments: A Case Study Approach
Ebook
Optimal Design of Experiments: A Case Study Approach
byPeter Goos
Rating: 0 out of 5 stars
0 ratings
Statistical Disclosure Control
Ebook
Statistical Disclosure Control
byAnco Hundepool
Rating: 0 out of 5 stars
0 ratings
Statistical Monitoring of Complex Multivatiate Processes: With Applications in Industrial Process Control
Ebook
Statistical Monitoring of Complex Multivatiate Processes: With Applications in Industrial Process Control
byUwe Kruger
Rating: 0 out of 5 stars
0 ratings

Mathematics For You

Skip carousel

The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
Ebook
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
byChristopher Monahan
Rating: 4 out of 5 stars
4/5
Quantum Physics for Beginners
Ebook
Quantum Physics for Beginners
byMax Thomson
Rating: 4 out of 5 stars
4/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
My Best Mathematical and Logic Puzzles
Ebook
My Best Mathematical and Logic Puzzles
byMartin Gardner
Rating: 5 out of 5 stars
5/5
The Thirteen Books of the Elements, Vol. 1
Ebook
The Thirteen Books of the Elements, Vol. 1
byEuclid
Rating: 0 out of 5 stars
0 ratings
Basic Math & Pre-Algebra For Dummies
Ebook
Basic Math & Pre-Algebra For Dummies
byMark Zegarelli
Rating: 4 out of 5 stars
4/5
Game Theory: A Simple Introduction
Ebook
Game Theory: A Simple Introduction
byK.H. Erickson
Rating: 4 out of 5 stars
4/5
The Little Book of Mathematical Principles, Theories & Things
Ebook
The Little Book of Mathematical Principles, Theories & Things
byRobert Solomon
Rating: 3 out of 5 stars
3/5
Algebra - The Very Basics
Ebook
Algebra - The Very Basics
byMetin Bektas
Rating: 5 out of 5 stars
5/5
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
Ebook
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
byDavid Borman
Rating: 4 out of 5 stars
4/5
Mental Math Secrets - How To Be a Human Calculator
Ebook
Mental Math Secrets - How To Be a Human Calculator
byRandy Silverman
Rating: 5 out of 5 stars
5/5
Calculus Made Easy
Ebook
Calculus Made Easy
bySilvanus P. Thompson
Rating: 4 out of 5 stars
4/5
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
Ebook
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
byJane Cassie
Rating: 5 out of 5 stars
5/5
Real Estate by the Numbers: A Complete Reference Guide to Deal Analysis
Ebook
Real Estate by the Numbers: A Complete Reference Guide to Deal Analysis
byJ Scott
Rating: 0 out of 5 stars
0 ratings
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
Ebook
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
byS. Deviant
Rating: 4 out of 5 stars
4/5
Flatland
Ebook
Flatland
byEdwin A. Abbott
Rating: 4 out of 5 stars
4/5
Algebra I Workbook For Dummies
Ebook
Algebra I Workbook For Dummies
byMary Jane Sterling
Rating: 3 out of 5 stars
3/5
Logicomix: An epic search for truth
Ebook
Logicomix: An epic search for truth
byApostolos Doxiadis
Rating: 4 out of 5 stars
4/5
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
Ebook
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
byAndrew Hodges
Rating: 4 out of 5 stars
4/5
See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head
Ebook
See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head
byEditors of Portable Press
Rating: 4 out of 5 stars
4/5
The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need
Ebook
The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need
byChristopher Monahan
Rating: 5 out of 5 stars
5/5
Basic Math Notes
Ebook
Basic Math Notes
byErnest Bywater
Rating: 5 out of 5 stars
5/5
Algebra I For Dummies
Ebook
Algebra I For Dummies
byMary Jane Sterling
Rating: 4 out of 5 stars
4/5
Relativity: The special and the general theory
Ebook
Relativity: The special and the general theory
byAlbert Einstein
Rating: 5 out of 5 stars
5/5
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
Ebook
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
byAlbert Rutherford
Rating: 5 out of 5 stars
5/5
The Golden Ratio: The Divine Beauty of Mathematics
Ebook
The Golden Ratio: The Divine Beauty of Mathematics
byGary B. Meisner
Rating: 5 out of 5 stars
5/5
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
Ebook
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
byAlbert Rutherford
Rating: 3 out of 5 stars
3/5
The Math of Life and Death: 7 Mathematical Principles That Shape Our Lives
Ebook
The Math of Life and Death: 7 Mathematical Principles That Shape Our Lives
byKit Yates
Rating: 4 out of 5 stars
4/5
Is God a Mathematician?
Ebook
Is God a Mathematician?
byMario Livio
Rating: 4 out of 5 stars
4/5
Introducing Game Theory: A Graphic Guide
Ebook
Introducing Game Theory: A Graphic Guide
byIvan Pastine
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

MONEY REIMAGINED: Crypto’s Place in the ESG Investing Zeitgeist-Trusted Data and Aligned Incentives
Podcast episode
MONEY REIMAGINED: Crypto’s Place in the ESG Investing Zeitgeist-Trusted Data and Aligned Incentives
byCoinDesk Reports
0 ratings
0% found this document useful
423: Value chain mapping (Strategy Skills classics): For this episode, let's revisit a Strategy Skills classic where we discuss value chain mapping. We have seen many value chain mapping examples. We have seen good value chain mapping, bad value chain mapping, and incomplete value chain mapping....
Podcast episode
423: Value chain mapping (Strategy Skills classics): For this episode, let's revisit a Strategy Skills classic where we discuss value chain mapping. We have seen many value chain mapping examples. We have seen good value chain mapping, bad value chain mapping, and incomplete value chain mapping....
byThe Strategy Skills Podcast: Strategy | Leadership | Critical Thinking | Problem-Solving
0 ratings
0% found this document useful
90. LEAN Theorem Provers used to model Physics and Chemistry: http://breakingmath.io Breaking Math Email: BreakingMathPodcast@gmail.com Email us for copies of the transcript! Resources on the LEAN theorem prover and programming language can be found at the bottom of the show notes (scroll to the bottom). ...
Podcast episode
90. LEAN Theorem Provers used to model Physics and Chemistry: http://breakingmath.io Breaking Math Email: BreakingMathPodcast@gmail.com Email us for copies of the transcript! Resources on the LEAN theorem prover and programming language can be found at the bottom of the show notes (scroll to the bottom). ...
byBreaking Math Podcast
0 ratings
0% found this document useful
Alignment Newsletter #168: Four technical topics for which Open Phil is soliciting grant proposals: Four technical topics for which Open Phil is soliciting grant proposals
Podcast episode
Alignment Newsletter #168: Four technical topics for which Open Phil is soliciting grant proposals: Four technical topics for which Open Phil is soliciting grant proposals
byAlignment Newsletter Podcast
0 ratings
0% found this document useful
[Bite] Data Science and the Scientific Method
Podcast episode
[Bite] Data Science and the Scientific Method
byDataCafé
0 ratings
0% found this document useful
Is This a New Era of ‘Climate Capitalism’?: A special live episode of Shift Key featuring Akshat Rathi and Sophie Purdom.
Podcast episode
Is This a New Era of ‘Climate Capitalism’?: A special live episode of Shift Key featuring Akshat Rathi and Sophie Purdom.
byShift Key with Robinson Meyer and Jesse Jenkins
0 ratings
0% found this document useful
State of Containers in the Public Cloud
Podcast episode
State of Containers in the Public Cloud
byThe Cloudcast
0 ratings
0% found this document useful
On-Chain Data: A Framework to Evaluate Bitcoin
Podcast episode
On-Chain Data: A Framework to Evaluate Bitcoin
byFYI - For Your Innovation
0 ratings
0% found this document useful
Michael Recce – Tim Cook’s Dashboard - [Invest Like the Best, EP.91]: My guest this week is Michael Recce, the chief data scientist for Neuberger Berman. The topic of our conversation is the use of data in the investment process, to help cultivate what is commonly referred to as an information edge. I call the episode “
Podcast episode
Michael Recce – Tim Cook’s Dashboard - [Invest Like the Best, EP.91]: My guest this week is Michael Recce, the chief data scientist for Neuberger Berman. The topic of our conversation is the use of data in the investment process, to help cultivate what is commonly referred to as an information edge. I call the episode “
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Ben McMillan - Attack of the Liquid Alternative Clones (S2E3)
Podcast episode
Ben McMillan - Attack of the Liquid Alternative Clones (S2E3)
byFlirting with Models
0 ratings
0% found this document useful
Scott D. Anthony - Seeing What's Next: Some essential lessons in "Seeing What’s Next" re…
Podcast episode
Scott D. Anthony - Seeing What's Next: Some essential lessons in "Seeing What’s Next" re…
byThe Innovation Show
0 ratings
0% found this document useful
Mining the Social Web with Matthew Russell: This week's episode explores the possibilities of extracting novel insights from the many great social web APIs available. Matthew Russell's Mining the Social Web is a fantastic exploration of the tools and methods, and we explore a few...
Podcast episode
Mining the Social Web with Matthew Russell: This week's episode explores the possibilities of extracting novel insights from the many great social web APIs available. Matthew Russell's Mining the Social Web is a fantastic exploration of the tools and methods, and we explore a few...
byData Skeptic
0 ratings
0% found this document useful
Whiteboard Confessional: The 15-Person Startup with 700 Microservices: A Cautionary Tale: Join me as I continue a new series called Whiteboard Confessional with a look at the rise of microservices and some of the reasons why people started breaking apart monoliths in the first place, why microservices can be a great approach to software develo
Podcast episode
Whiteboard Confessional: The 15-Person Startup with 700 Microservices: A Cautionary Tale: Join me as I continue a new series called Whiteboard Confessional with a look at the rise of microservices and some of the reasons why people started breaking apart monoliths in the first place, why microservices can be a great approach to software develo
byAWS Morning Brief
0 ratings
0% found this document useful
How Passive HODL-ing Of Stocks Leads to Extreme Crashes | Mike Green
Podcast episode
How Passive HODL-ing Of Stocks Leads to Extreme Crashes | Mike Green
byForward Guidance
0 ratings
0% found this document useful
Christian Hauff, Co-Founder of Quantitive Brokers
Podcast episode
Christian Hauff, Co-Founder of Quantitive Brokers
byAlpha Exchange
0 ratings
0% found this document useful
287: Value Chain Mapping (Strategy Skills Classics): For this episode, let's revisit a Strategy Skills classic where we discuss value chain mapping. We have seen many value chain mapping examples. We have seen good value chain mapping, bad value chain mapping, and incomplete value chain mapping....
Podcast episode
287: Value Chain Mapping (Strategy Skills Classics): For this episode, let's revisit a Strategy Skills classic where we discuss value chain mapping. We have seen many value chain mapping examples. We have seen good value chain mapping, bad value chain mapping, and incomplete value chain mapping....
byThe Strategy Skills Podcast: Strategy | Leadership | Critical Thinking | Problem-Solving
0 ratings
0% found this document useful
083R_Operationalising a concept: The systematic review of composite indicator building for measuring community disaster resilience (research summary)
Podcast episode
083R_Operationalising a concept: The systematic review of composite indicator building for measuring community disaster resilience (research summary)
byWhat is The Future for Cities?
0 ratings
0% found this document useful
An S-Shaped Adventure Part 3 with Theodore Modis: In today's episode, we continue our exploration o…
Podcast episode
An S-Shaped Adventure Part 3 with Theodore Modis: In today's episode, we continue our exploration o…
byThe Innovation Show
0 ratings
0% found this document useful
Measuring What Matters: How do we measure what’s most important to us? And how do we translate those values into decisions about infrastructure projects so that they can deliver a diverse set of economic, environmental, and social benefits? That’s the focus of our...
Podcast episode
Measuring What Matters: How do we measure what’s most important to us? And how do we translate those values into decisions about infrastructure projects so that they can deliver a diverse set of economic, environmental, and social benefits? That’s the focus of our...
byEWN - Engineering With Nature
0 ratings
0% found this document useful
Should You Invest in Synthetic Biology Stocks?
Podcast episode
Should You Invest in Synthetic Biology Stocks?
byThe 7investing Podcast
0 ratings
0% found this document useful
Product Market Fit: A Deep Dive
Podcast episode
Product Market Fit: A Deep Dive
byRocketship.fm
0 ratings
0% found this document useful
Office Hours w/ Professor Jacob Mays
Podcast episode
Office Hours w/ Professor Jacob Mays
byPublic Power Underground
0 ratings
0% found this document useful
48. Big Data Wrangling for Core Sensing Technology
Podcast episode
48. Big Data Wrangling for Core Sensing Technology
byDiscovery to Recovery
0 ratings
0% found this document useful
The Economics & Beyond of Object Storage
Podcast episode
The Economics & Beyond of Object Storage
byThe Cloudcast
0 ratings
0% found this document useful
074R_Resilient urban planning: major principles and criteria (research summary)
Podcast episode
074R_Resilient urban planning: major principles and criteria (research summary)
byWhat is The Future for Cities?
0 ratings
0% found this document useful
TTU88: The Importance of Explaining the Why ft. Andrew Baxter of Cambridge Capital Management – 2of2
Podcast episode
TTU88: The Importance of Explaining the Why ft. Andrew Baxter of Cambridge Capital Management – 2of2
byTop Traders Unplugged
0 ratings
0% found this document useful
Ep. 73 Kate Beecroft on the critiques and possibilities of DAOs: Kate Beecroft works on ecosystem and community building at Centrifuge, the decentralised asset financing protocol. She has been involved in Decentralised Autonomous Organisations (DAOs) since 2018 and brings to them her experience of self-managing...
Podcast episode
Ep. 73 Kate Beecroft on the critiques and possibilities of DAOs: Kate Beecroft works on ecosystem and community building at Centrifuge, the decentralised asset financing protocol. She has been involved in Decentralised Autonomous Organisations (DAOs) since 2018 and brings to them her experience of self-managing...
byLeadermorphosis
0 ratings
0% found this document useful
The Chopping Block: Starknet's Airdrop, Founder Challenges, and Ethena's Mainnet - Ep. 611: The Chopping Block: From Starknet Controversy to Ethena's Novel Approach: Understanding Crypto's Latest
Podcast episode
The Chopping Block: Starknet's Airdrop, Founder Challenges, and Ethena's Mainnet - Ep. 611: The Chopping Block: From Starknet Controversy to Ethena's Novel Approach: Understanding Crypto's Latest
byUnchained
0 ratings
0% found this document useful
MONEY REIMAGINED: How Blockchain Tech Can Help Drive a Sustainable Future
Podcast episode
MONEY REIMAGINED: How Blockchain Tech Can Help Drive a Sustainable Future
byCoinDesk Reports
0 ratings
0% found this document useful
2022 Look Ahead to web3
Podcast episode
2022 Look Ahead to web3
byThe Cloudcast
0 ratings
0% found this document useful

Skip carousel

Business applications For Quantum computing
Rotman Management
Article
Business applications For Quantum computing
May 1, 2022
COMPUTERS DO ARITHMETIC. Underlying every amazing application of computers today is math, calculated using binary digits or ‘bits.’ The original computers of the early 1950s could perform about 465 multiplications per second — much faster than the ‘h
11 min read
Management: So Much More Than a Science
Rotman Management
Article
Management: So Much More Than a Science
Sep 1, 2019
11 min read
Dear Investor, Do Your Homework
Finweek - English
Article
Dear Investor, Do Your Homework
Oct 18, 2019
many years ago, I lost a painful amount on a major European corporation which was part of a basket of equities that a high-priced money manager had assembled for me. The AAA-rated corporation suddenly went bankrupt and it was 10% of my portfolio. Nev
5 min read
How Evolution Can Make Sense of the Stock Market
The Atlantic
Article
How Evolution Can Make Sense of the Stock Market
Jan 4, 2018
5 min read
“The Cybersecurity Sector Is Already Noisy, And It’s Going To Become Noisier”
PC Pro Magazine
Article
“The Cybersecurity Sector Is Already Noisy, And It’s Going To Become Noisier”
Apr 4, 2024
6 min read
Just When You Think You Understand Risk …
Finweek - English
Article
Just When You Think You Understand Risk …
May 28, 2021
this week I was asked to teach a workshop on “risk” to a group of graduate students in portfolio management. I was probably right to be concerned. I was about to walk into their classroom and blow apart everything that their highly-rated textbook (an
3 min read
AI And Digital Resources In Fintech: Creating An Evolutionary Analytic Platform For “Risk” Estimation
The European Business Review
Article
AI And Digital Resources In Fintech: Creating An Evolutionary Analytic Platform For “Risk” Estimation
Sep 20, 2018
5 min read
How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
Guest Editors Naomi Stead and Sandra Kaji-O’Grady
Architecture Australia
Article
Guest Editors Naomi Stead and Sandra Kaji-O’Grady
Jul 2, 2018
In this Dossier, we examine some models and motivations for design research in large architectural practices. We reflect on what kind of research might be conceptually and practically possible within the vicissitudes of sizeable commercial practice,
8 min read
THE NEXT GENERATION of INDUSTRY CHANGERS
Entrepreneur
Article
THE NEXT GENERATION of INDUSTRY CHANGERS
Aug 16, 2022
12 min read
Four New (Old) Ways the White House is Trying to Restrict Science for Policymaking
Union of Concerned Scientists
Article
Four New (Old) Ways the White House is Trying to Restrict Science for Policymaking
Apr 25, 2019
5 min read
How Quickly Do Large Language Models Learn Unexpected Skills?
Nautilus
Article
How Quickly Do Large Language Models Learn Unexpected Skills?
Mar 8, 2024
4 min read
Builder Brain
The Atlantic
Article
Builder Brain
Jan 20, 2022
6 min read
How Big Data Is Changing Investment
Finweek - English
Article
How Big Data Is Changing Investment
Sep 18, 2020
the world of investment is changing rapidly. A combination of Covid-19, lockdowns, fiscal stimulus packages, higher savings, and more leisure time has caused a surge in retail investing. Online investment platforms like eToro, Robinhood and Easy Equi
3 min read
Key Changes In Management: Implications From The Corona Crisis, Particularly On Family Businesses And The Portfolios Of Independent Investors
The European Business Review
Article
Key Changes In Management: Implications From The Corona Crisis, Particularly On Family Businesses And The Portfolios Of Independent Investors
Nov 30, 2020
8 min read
Thought Leader Interview: Rita McGrath
Rotman Management
Article
Thought Leader Interview: Rita McGrath
Jan 1, 2019
9 min read
Key Changes In Management: Implications From The Corona Crisis, Particularly On Family Businesses And The Portfolios Of Independent Investors
The European Business Review
Article
Key Changes In Management: Implications From The Corona Crisis, Particularly On Family Businesses And The Portfolios Of Independent Investors
May 31, 2020
8 min read
An Intellectual Odyssey
Business Today
Article
An Intellectual Odyssey
Dec 11, 2017
2 min read
The Fracking Debate: The Risks, Benefits, and Uncertainties of the Shale Revolution
The Independent Review
Article
The Fracking Debate: The Risks, Benefits, and Uncertainties of the Shale Revolution
Jul 1, 2020
By Daniel Raimi New York: Columbia University Press, 2018. Pp. x, 262. $30 hardcover. The fracking debate is often framed around two themes: horror stories about the danger of fracking and stories of the glories of the American energy renaissance. Da
5 min read
'The Cloud' and Other Dangerous Metaphors
The Atlantic
Article
'The Cloud' and Other Dangerous Metaphors
Jan 20, 2015
4 min read
“This Was Weaponised Technobabble. And They Were Bringing Out The Big Guns”
PC Pro Magazine
Article
“This Was Weaponised Technobabble. And They Were Bringing Out The Big Guns”
Dec 7, 2023
I’m not claiming that I manifested the glitch in the Air Traffic Control system at the end of the summer because I’m so deeply attuned to the natural and digital world that my worries actualised as a blip on the radar. But I might have done. Fortune,
6 min read
Intelligence Analysis
PRIVATE GAME WILDLIFE RANCHING
Article
Intelligence Analysis
Jun 13, 2018
3 min read
Why Many Modern Psychology Test Publishers Fail
The European Business Review
Article
Why Many Modern Psychology Test Publishers Fail
Jul 31, 2023
6 min read
How Clever Tech Is Changing The Game
Finweek - English
Article
How Clever Tech Is Changing The Game
Oct 18, 2019
3 min read
Can Topology Prevent Another Financial Crash?: New regulations are applying network science to restructure global finance.
Nautilus
Article
Can Topology Prevent Another Financial Crash?: New regulations are applying network science to restructure global finance.
Jun 30, 2016
Could Kevin Bacon have saved us from the 2008 financial crisis? Probably not. But the network science behind six degrees of Kevin Bacon just well may have. According to the famous saying, every movie actor is separated from Kevin Bacon by six degrees
12 min read
Jobs-to-be-Done 2.0: A Framework for Customer Discovery
Rotman Management
Article
Jobs-to-be-Done 2.0: A Framework for Customer Discovery
May 1, 2022
9 min read
FLP-IT FORWARD: A New Framework for Growth in the Post-Pandemic Era
The European Business Review
Article
FLP-IT FORWARD: A New Framework for Growth in the Post-Pandemic Era
May 25, 2021
5 min read
Triple A.i. Supply Chains
The European Business Review
Article
Triple A.i. Supply Chains
Jun 1, 2022
15 min read
Commentary: Worried That ChatGPT Is Coming For Your Job? An Old Assessment Tool May Have The Answer
Los Angeles Times
Article
Commentary: Worried That ChatGPT Is Coming For Your Job? An Old Assessment Tool May Have The Answer
Mar 7, 2023
4 min read
From “Less Bad” To “More Good”: Keys To A Sustainable Practice
Architecture Australia
Article
From “Less Bad” To “More Good”: Keys To A Sustainable Practice
Jun 26, 2022
Sustainability is not a dirty word, nor is it a static state to be achieved. For practices struggling to thread sustainability through their work, Clare Parry suggests that sharing resources with other organizations and focusing on the high-impact ar
5 min read

Related categories

Skip carousel

Reviews for Multi-armed Bandit Allocation Indices

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Multi-armed Bandit Allocation Indices - John Gittins

Title Page

This edition first published 2011

Registered office

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This book's use or discussion of MATLAB® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® software.

Library of Congress Cataloging-in-Publication Data

Gittins, John C., 1938-

Multi-armed bandit allocation indices / John Gittins, Richard Weber, Kevin Glazebrook.—2nd ed.

p. cm.

Includes bibliographical references and index.

ISBN 978-0-470-67002-6(cloth)

1. Resource allocation—Mathematical models. 2. Mathematical optimization. 3. Programming (Mathematics) I. Weber, Richard, 1953- II. Glazebrook, Kevin D., 1950- III. Title.

QA279.G55 2011

519.5—dc22

2010044409

A catalogue record for this book is available from the British Library.

Print ISBN: 978-0-470-67002-6

ePDF ISBN: 978-0-470-98004-0

oBook ISBN: 978-0-470-98003-3

ePub ISBN: 978-1-119-99021-5

Foreword

John Gittins' first edition of this book marked the end of an era, the era in which a succession of investigators struggled for an understanding and ‘solution’ of the multi-armed bandit problem. My foreword to that edition celebrated the gaining of this understanding, and so it seems fitting that this should be retained.

The opening of a new era was like the stirring of an ant-heap, with the sudden emergence of an avid multitude and a rush of scurrying activity. The first phase was one of exploitation, in which each worker tried to apply his special expertise in this new context. This yielded, among other things, a remarkable array of proofs of optimality of the Gittins index policy. The most elegant and insightful was certainly Richard Weber's ‘prevailing charge’ proof (see section 2.5), expressible in a single paragraph of verbal reasoning. I must confess, however, to a lingering attachment to the dynamic programming proof (see section 4.3) which provided also the value function of the Gittins policy and a treatment immediately generalizable to the case of superprocesses.

The phase of real interest was the subsequent one, of exploration. To what range of models can the Gittins technique be extended? Here the simile latent in the terms ‘bandit’ and ‘arm’ (with their gambling machine origins) begins to become quite strained. I myself spoke rather of a population of ‘projects’, only one of which could be engaged at any given time. One wishes to operate only the high-value projects, but can determine which these are only by trying them all—it is to this that the phrase ‘exploitation and exploration’ first referred. The process is then one of allocation and learning. Any project does not itself change, but its ‘informational state’ does—one's knowledge of it as expressed by a Bayesian updating.

However, situations occur for which the projects do also have a physical state, which may change by stochastic rules, or for which the population of projects changes by immigration or emigration. These are cases one would wish to cover. It is then more natural to think of the projects as ‘activities’, having their own dynamic structure, and whose performance one would wish to optimize by the appropriate allocation of given resources over activities. This is the classical economic activity analysis in a dynamic setting. However, we are now effectively adding to this the feature that the current physical states of the activities are incompletely known, and must be inferred from observation. ‘Observation time’ is then an additional resource to be allocated. Section 6.8 gives the clearest indication of such an ambition.

Extension to such cases requires new tools, and Chapters 5 and 6 consider two such: study of the achievable region and Lagrangian relaxation of the optimization. These central chapters present very clearly the current state of theory in these areas, a considerable enhancement of understanding and many examples of cases of practical interest for which an optimal policy—of index character—is determined. It must be confessed, however, that the general problems of indexability and optimality remain beached on the research frontier, although one senses a rising tide which will lift them.

Explicitness is also a feature of Chapters 7 and 8, which go into hard detail on the determination of indices under different statistical assumptions.

This scholarly and modern work gives a satisfyingly complete, rigorous and illuminating account of the current state of the subject. It also conveys a historical perspective, from the work of Klimov and von Olivier (which appeared in print before appreciation of John Gittins' advance had become general) to the present day. All three authors of the text have made key and recent contributions to the subject, and are in the best position to lay its immediate course.

Peter Whittle

Foreword to the First Edition

The term ‘Gittins index’ now has firm currency in the literature, denoting the concept which first proved so crucial in the solution of the long-standing multi-armed bandit problem and since then has provided a guide for the deeper understanding of all such problems. The author is, nevertheless, too modest to use the term so I regard it as my sole role to reassure the potential reader that the author is indeed the Gittins of the index, and that this book sets forth his pioneering work on the solution of the multi-armed bandit problem and his subsequent investigation of a wide class of sequential allocation problems.

Such allocation problems are concerned with the optimal division of resources between projects which need resources in order to develop and which yield benefit at a rate depending upon their degree of development. They embody in essential form a conflict evident in all human action. This is the conflict between taking those actions which yield immediate reward and those (such as acquiring information or skill, or preparing the ground) whose benefit will come only later.

The multi-armed bandit is a prototype of this class of problems, propounded during the Second World War, and soon recognized as so difficult that it quickly became a classic, and a byword for intransigence. In fact, John Gittins had solved the problem by the late sixties, although the fact that he had done so was not generally recognized until the early eighties. I can illustrate the mode of propagation of this news, when it began to propagate, by telling of an American friend of mine, a colleague of high repute, who asked an equally well-known colleague ‘What would you say if you were told that the multi-armed bandit problem had been solved?’ The reply was somewhat in the Johnsonian form: ‘Sir, the multi-armed bandit problem is not of such a nature that it can be solved’. My friend then undertook to convince the doubter in a quarter of an hour. This is indeed a feature of John's solution: that, once explained, it carries conviction even before it is proved.

John Gittins gives here an account which unifies his original pioneering contributions with the considerable development achieved by both by himself and other workers subsequently. I recommend the book as the authentic and authoritative source-work.

Peter Whittle

Preface

The first edition of this book was published in 1989. A shortened version of the preface to that edition follows this preface. The uninitiated reader might find it helpful to read it at this point.

Since 1989 the field has developed apace. There has been a remarkable flowering of different proofs of the main index theorem, each giving its own particular insight as to why the result is true. Major bodies of related theory have also emerged, notably the achievable region and restless bandit methodologies, and the discussion in economics of the appropriate balance between exploration and exploitation. These have led, for example, to improved algorithms for calculating indices and to improved bounds on the performance of index strategies when they are not necessarily optimal. These developments form the case for a new edition, plus the fact that there is an ongoing interest in the book, which is now difficult to buy.

There are now three authors rather than just John Gittins. Kevin Glazebrook and Richard Weber bring familiarity with more recent developments, to which they have made important contributions. Their expertise has allowed us to include new chapters on achievable regions and on restless bandits.

We have also taken the opportunity to substantially revise the core earlier chapters. Our aim has been to provide an accessible introduction to the main ideas, taking advantage of more recent work, before proceeding to the more challenging material in Chapters 4, 5 and 6. Overall we have tried to produce an expository account rather than just a research monograph. The exercises are designed as an aid to a deeper understanding of the material.

The Gittins index, as it is commonly known, for a project competing for investment with other projects allows both for the immediate expected reward, and for the value of the information gained from the initial investment, which may be useful in securing later rewards. These factors are often called exploitation and exploration, respectively.

In this spirit Chapter 1, which is a taster for the rest of the book, is subtitled ‘Exploration’. The mainstream of the book then continues through Chapters 2 to 4. It breaks into independent substreams represented by Chapter 5, Chapter 6, Chapters 7 and 8, and Chapter 9, which looks briefly at five further application areas.

Readers should have some prior knowledge of university-level probability theory, including Markov processes, and for a full understanding of Chapters 5 and 6 will need to know the basic ideas of linear programming. We hope the book will be of interest to researchers in chemometrics, combinatorics, economics, management science, numerical analysis, operational research, probability theory and statistics. Few readers will want to read from cover to cover. For example, both chemometricians and numerical analysts are likely to be interested mainly in Chapter 8 and the tables which it describes: chemometricians as potential users of the tables, and numerical analysts for the sake of the methods of calculation. As another example, Chapters 1, 2 and 3 could form the basis for a specialized lecture course in applied probability for final-year undergraduates.

In addition to those colleagues acknowledged in the preface to the first edition we are very grateful to Anne-Marie Oreskovich and Morton Sorenson for further help in the calculation of index values.

John Gittins

Department of Statistics, University of Oxford

Kevin Glazebrook

Department of Management Science, Lancaster University

Richard Weber

Statistical Laboratory, University of Cambridge

August 2010

Preface to the First Edition

A prospector looking for gold in the Australian outback has to decide where to look, and if he prospects for any length of time must repeatedly take further such decisions in the light of his success to date. His decision problem is how to allocate his time, in a sequential manner, so as to maximize his likely reward. A similar problem faces a student about to graduate from university who is looking for employment, or the manager of a commercial laboratory with several research projects competing for the attention of the scientists at his or her disposal.

It is to problems like these that the indices described in this book may be applied. The problems are characterized by alternative independent ways in which time or effort may be consumed, for each of which the outcome is uncertain and may be determined only in the course of applying the effort. The choice at each stage is therefore determined partly on the basis of maximizing the expected immediate rate of return, and partly by the need to reduce uncertainty, and thereby provide a basis for better choices later on. It is the tension between these two requirements that makes the decision problem both interesting and difficult.

The classic problem of this type is the multi-armed bandit problem, in which several Bernoulli processes with different unknown success probabilities are available, each success yielding the same reward. There are already two good books on this subject: by Presman and Sonin (1982), and by Berry and Fristedt (1985). Both books go well beyond the simple version of the problem just stated: most notably to include, respectively, an exponentially distributed rather than constant interval between trials, and a study of the consequences of discounting rewards in different ways.

The aims of this book are to expound the theory of dynamic allocation indices, and to explore the class of problems for which they define optimal policies. These include sampling problems, like the multi-armed bandit problem, and stochastic scheduling problems, such as the research manager's problem. Tables of index values are given for a number of sampling problems, including the classical Bayesian multi-armed bandit problem. For the most part, and except where otherwise indicated, the book is an account of original work, though much of it has appeared before in the publications bearing the author's name which are listed in the references.

It is a pleasure to acknowledge the stimulus and encouragement which this work has received over the years by conversations, and in some cases collaboration, with Joao Amaral, Tony Baker, John Bather, Sten Bergman, Owen Davies, Kevin Glazebrook, David Jones, Frank Kelly, Aamer Khan, Dennis Lindley, Peter Nash, David Roberts and Peter Whittle.

I should particularly like to thank Joao Amaral, Frank Geisler and David Jones for the substantial effort involved in calculating index values, and Brenda Willoughby for her painstaking work in typing the manuscript. The book is dedicated to Hugh and Anna, who have grown up during its preparation.

John Gittins

Mathematical Institute, Oxford University

March 1989

Chapter 1 Introduction or Exploration

This book is about mathematical models for optimizing in a sequential manner the allocation of effort between a number of competing projects. The effort and the projects may take a variety of forms. Examples are: an industrial processor and jobs waiting to be processed; a server with a queue of customers; an industrial laboratory with research projects; any busy person with jobs to do; a stream of patients and alternative treatments (yes, the patients do correspond to effort—see Problem 5 later in this chapter); a searcher who may look in different places. In every case effort is treated as being homogeneous, and the problem is to allocate it between the different projects so as to maximize the expected total reward which they yield. It is a sequential problem, as effort is allowed to be reallocated in a feedback manner, taking account of the pattern of rewards so far achieved. The reallocations are assumed to be costless, and to take a negligible time, since the alternative is to impose a travelling-salesman-like feature, thereby seriously complicating an already difficult problem.

The techniques which come under the heading of dynamic programming have been devised for sequential optimization problems. The key idea is a recurrence equation relating the expected total reward (call this the payoff) at a given decision time to the distribution of its possible values at the next decision time (see equation (2.2)). Sometimes this equation may be solved analytically. Otherwise a recursive numerical solution may, at any rate in principle, be carried out. This involves making an initial approximation to the payoff function, and then successive further approximations by substituting in the right-hand side of the recurrence equation. As Bellman (1957), for many years the chief protagonist of this methodology, pointed out, using the recurrence equation involves less computing than a complete enumeration of all policies and their corresponding payoffs, but none the less soon runs into the sands of intractable storage and processing requirements as the number of variables on which the payoff function depends increases.

For the problem of allocating effort to projects, the number of variables is at least equal to the number of projects. An attractive idea, therefore, is to establish priority indices for each project, depending on its past history but not that of any other project, and to allocate effort at each decision time only to the project with the highest current index value. To calculate these indices it should be possible to calibrate a project in a given state against some set of standard projects with simple properties. If this could be done we should have a reasonable policy without having to deal with any function of the states of more than one project.

That optimal policies for some problems of effort allocation are expressible in terms of such indices is well known. The first three problems described in this chapter are cases in point. Chapters 2, 3 and 4 set out a general theory which puts these results into context. They include several theorems asserting the optimality of index policies under different conditions. In fact five of the seven problems described in this chapter may be solved fairly rapidly by using the index theorem (Theorem 2.1), and its Corollary 3.5. Problem 4 requires Theorem 3.1, which is a continuous-time version of the index theorem.

Since they may change as more effort is allocated, these priority indices may aptly be termed dynamic allocation indices. The policies that they define include, in relevant circumstances, the cμ-rule, Smith's rule, and the shortest processing time rule (SPT). The main aim of this chapter is to exhibit their range of application by describing a number of particular instances. A second aim is to give an informal introduction to the main methods available for determining indices. These are by (i) interchange arguments; (ii) exploiting any special features of the bandit processes concerned, in particular those which lead to the optimality of myopic policies; (iii) calibration by reference to standard bandit processes, often involving iteration using the dynamic programming recurrence equation; and (iv) using the fact that a dynamic allocation index may be regarded as a maximized equivalent constant reward rate.

Problem 1 Single-Machine Scheduling (see also Section 2.12, Section 3.2 and Exercise 4.1)

There are n jobs ready to be processed on a single machine. A cost ci is incurred per unit of time until job i has been completed, and the service time (or processing time) for job i is si. In what order should the jobs be processed so as to minimize the total cost?

Problem 2 Gold-Mining (see also Section 3.5.2)

A woman owns n gold mines and a gold-mining machine. Each day she must assign the machine to one of the mines. When the machine is assigned to mine i there is a probability pi that it extracts a proportion qi of the gold left in the mine, and a probability 1 − pi that it extracts no gold and breaks down permanently. To what sequence of mines on successive days should the machine be assigned so as to maximize the expected amount of gold mined before it breaks down?

Problem 3 Search (see also Section 3.5.4 and Exercise 3.6)

A stationary object is hidden in one of n boxes. The probability that a search of box i finds the object if it is in box i is qi. The probability that the object is in box i is pi, and changes by Bayes' theorem as successive boxes are searched. The cost of a single search of box i is ci. How should the boxes be sequenced for search so as to minimize the expected cost of finding the object?

For Problem 1 an optimal schedule is one which processes the jobs in decreasing order of ci/si, as may be shown by the following simple interchange argument. If jobs 1 and 2 are processed as the first and second, respectively, then the total holding cost due to these two jobs is c1s1 + c2(s1 + s2). If the processing order of jobs 1 and 2 is interchanged, then this quantity becomes c2s2 + c1(s1 + s2). The holding cost due to the other jobs is unchanged. Thus, the first ordering has the smallest cost if and only if c1/s1 ≥ c2/s2. It follows that the relationship ci/si ≥ cj/sj must hold for any two jobs i and j such that i immediately precedes j in an optimal processing order of the jobs. This implies that the total holding cost is maximized by processing the n jobs in decreasing order of indices which are defined as μi = ci/si.

For Problem 2, an optimal policy is one which, when xi is the amount of gold remaining in mine i on a particular day, allocates the machine to a mine j such that

images/c01_I0001.gif

For Problem 3, if box j is searched next under an optimal policy, then

images/c01_I0002.gif

For Problems 2 and 3, a policy is specified by a sequence of numbers corresponding to the order of the mines to which the machine is allocated until it breaks down, or the order in which the boxes are to be searched until the object is found. The policies just quoted may therefore be shown to be optimal in the class C of all those policies specified by sequences in which each i (= 1, 2, …, n) occurs an infinite number of times, by arguments involving interchanges of adjacent numbers in such a sequence. Since any policy may be matched by a policy in C for any arbitrarily large initial portion of the specifying number sequence, it follows that the above policies are optimal in the class of all policies.

The optimal policies for Problems 1, 2 and 3 may be described as follows. At each stage where a job, mine or box is to be selected there is a real-valued function defined on the set of available alternatives, and the one selected is such as to maximize this function. These functions, then, are dynamic allocation indices for the three problems.

We have described these indices as functions on the sets of available alternatives. More precisely, the index for Problem 2, for example, is a function of the three variables, xi, qi and pi, which constitute the information available for mine i when the amount of gold which it contains is xi.

The point of this observation is that our indices do not simply tell us which alternative to choose at a given stage in a particular problem with n alternatives. Rather, they specify the optimal policy for a problem with any set of alternatives of a given type. This means that if we know that such an index exists for a given class of problems, without as yet having an explicit expression for it, then it is sufficient, in order to find an explicit expression, to consider problems within the class for which n = 2, and indeed even this restricted class of problems need not be solved in full generality, as our next example shows.

Problem 4 Industrial Research (see also Section 3.2)

The manager of a team of industrial scientists has n research projects which may be carried out in any order. Switches from project to project cause negligible loss of time, whether these occur when a project has been successfully completed or not. The value to the manager's employer of completing project i at time t is Vie−γt, (i = 1, 2, …, n;γ > 0). The time which the team would need to spend on project i in order to complete it is a random variable σi, which has distribution function Fi(t) and density function fi(t). What policy should the manager follow in order to maximize the expected total value generated by the n projects?

Following standard usage, a switch from one project to another project is called a preemption. We say that we have the preemptive case if preemption is allowed at any time, even if the project that is currently receiving effort has not yet been completed. We say that we have the non-preemptive case if a project, once started, must continue to receive processing effort until it has been completed. In the non-preemptive case, the optimal schedule can be found with an interchange argument, just as in Problem 1. Suppose that projects 1 and 2 are the first two projects processed. The expected value obtained from these projects when 1 is processed before 2 is at least as great as when their order of processing is reversed if

images/c01_I0003.gif

which is equivalent to

images/c01_I0004.gif

We may conclude that the expected total value obtained from the n projects is maximized by processing them in decreasing order of indices defined as

images/c01_I0005.gif

We can make a connection with Problem 1. Suppose that under a given order of processing, project i completes at time ti. Then the expected total value obtained from the projects can be written as

images/c01_I0006.gif

Thus, in the limit as images/c01_I0007.gif , the problem of maximizing images/c01_I0008.gif is identical to that of minimizing E∑iViti, a measure that is identical to that in Problem 1 if we identify Vi with ci. Also, as images/c01_I0007.gif we have γμi → Vi/Eσi, an index analogous to that of ci/si in Problem 1.

The characterization of an optimal policy is not as straightforward in the preemptive case, that is when arbitrary switching amongst projects is allowed. However, let us take advantage of the fact that the problem may be shown (Theorem 3.1) to have an index, and proceed to find an expression for the index by considering the case n = 2. Project 1 will be an arbitrary project of the type described. Project 2 we suppose to require a non-random time Δ to complete, and V2 = λΔ. Let CΔ denote the class of projects of this type for a given Δ and different positive values of λ.

For a set of projects all of which are in CΔ it is best to work on them in order of decreasing λ-values. This is almost obvious, as such a policy results in the larger undiscounted values λΔ being discounted least by the factors e−γs, and is easily checked by an interchange argument. It is also not difficult to convince oneself that in this case work on a project should not be interrupted before the project has been completed. Thus, for CΔ the λ-values define an index.

Consider then the case n = 2, where project 1 is arbitrary and project 2 is in CΔ. If we can find a λ for which it is optimal to work initially either on project 1 or on project 2 then, since an index is known to exist for the entire set of projects, it follows that one such index function may be defined by assigning the index value λ to project 1 as well as to project 2. In effect we may calibrate the entire set of projects by using CΔ as a measuring device. Any monotone function of λ would of course serve equally well as an index.

By the optimality property of the index rule, it follows that if in a given problem it is optimal to work on a member of CΔ then, if the problem is modified by adding further members of CΔ all with the same λ-value, it remains optimal to work on these in succession until they have all been completed. Thus for calibration purposes we may suppose that several replicates of project 2 are available. For good measure suppose there is an infinite number of replicates. (Discounting ensures that the maximum expected reward over all policies remains finite.) Note, too, that since the value of Δ is arbitrary we may as well work with the limiting case as Δ tends to zero, so that our infinity of replicates of project 2 reduce to a continuous reward stream at the undiscounted rate λ, an object which will appear later, and termed a standard bandit process with parameter λ, or simply S(λ).

We should like, then, a criterion for deciding whether it is best to work on project 1 first, with the option of switching later to S(λ), or to start on S(λ). Denote these two alternatives by 1λ and λ. Note that the index rule never requires a switch back to project 1 in the case of positive Δ, and such a switch is therefore not required for optimality either for positive Δ or in the limiting case of a standard bandit process.

Under 1λ, work proceeds on project 1 up to some time t (possibly ∞), or until project 1 terminates, whichever occurs first, and then switches to S(λ), from which point 1λ and λ yield identical reward streams. Thus 1λ is strictly preferable if

1.1 1.1

these two expressions being, respectively, the expected reward from 1λ up to the switch to S(λ), and the expected reward from λ up to that point. The uninformative suffix 1 has been dropped in (1.1). The right-hand side of (1.1) takes the given form since λe−γsδs + o(δs) = return from S(λ) in the interval (s, s + δs), and 1 − F(s) = P{switch to S(λ) under 1λ takes place after s}.

It is strictly better to start work on project 1 if and only if (1.1) holds for some t > 0, that is iff

1.2 1.2

It follows that the right-hand side of (1.2) is the required index for a project on which work has not yet started. For a project on which work has already continued for a time x without successful completion, the density f(s) and distribution function F(s) in (1.2) must be replaced by the conditional density f(s)/[1 − F(x)] and the conditional distribution function [F(s) − F(x)]/[1 − F(x)], giving the general expression for the index

1.3 1.3

The completion rate (or hazard rate) is defined as ρ(s) = f(s)/[1 − F(s)]. If this is decreasing then (1.3) reduces to Vρ(x). Since ρ(x) is the probability density for completion at time x conditional on no completion before x, using Vρ(x) as an index corresponds to maximizing the expected reward during the next infinitesimal time interval, a policy sometimes termed myopic because of its neglect of what might happen later, or disregard of long-range vision, to continue the optic metaphor. What we have shown is that for Problem 4 there is no

Enjoying the preview?

Page 1 of 1

Multi-armed Bandit Allocation Indices

About this ebook

John Gittins

Related authors

Related to Multi-armed Bandit Allocation Indices

Related ebooks

Mathematics For You

Related podcast episodes

Related articles

Related categories

Reviews for Multi-armed Bandit Allocation Indices

What did you think?

Book preview

Multi-armed Bandit Allocation Indices - John Gittins

Foreword

Foreword to the First Edition

Preface

Preface to the First Edition

Chapter 1

Introduction or Exploration

Problem 1 Single-Machine Scheduling (see also Section 2.12, Section 3.2 and Exercise 4.1)

Problem 2 Gold-Mining (see also Section 3.5.2)

Problem 3 Search (see also Section 3.5.4 and Exercise 3.6)

Problem 4 Industrial Research (see also Section 3.2)