Experimentation for Engineers: From A/B testing to Bayesian optimization

Ebook533 pages3 hours

Experimentation for Engineers: From A/B testing to Bayesian optimization

Name: Experimentation for Engineers: From A/B testing to Bayesian optimization
Author: David Sweet
ISBN: 9781638356905

By David Sweet

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Optimize the performance of your systems with practical experiments used by engineers in the world’s most competitive industries.

In Experimentation for Engineers: From A/B testing to Bayesian optimization you will learn how to:

Design, run, and analyze an A/B test
Break the "feedback loops" caused by periodic retraining of ML models
Increase experimentation rate with multi-armed bandits
Tune multiple parameters experimentally with Bayesian optimization
Clearly define business metrics used for decision-making
Identify and avoid the common pitfalls of experimentation

Experimentation for Engineers: From A/B testing to Bayesian optimization is a toolbox of techniques for evaluating new features and fine-tuning parameters. You’ll start with a deep dive into methods like A/B testing, and then graduate to advanced techniques used to measure performance in industries such as finance and social media. Learn how to evaluate the changes you make to your system and ensure that your testing doesn’t undermine revenue or other business metrics. By the time you’re done, you’ll be able to seamlessly deploy experiments in production while avoiding common pitfalls.

About the technology
Does my software really work? Did my changes make things better or worse? Should I trade features for performance? Experimentation is the only way to answer questions like these. This unique book reveals sophisticated experimentation practices developed and proven in the world’s most competitive industries that will help you enhance machine learning systems, software applications, and quantitative trading solutions.

About the book
Experimentation for Engineers: From A/B testing to Bayesian optimization delivers a toolbox of processes for optimizing software systems. You’ll start by learning the limits of A/B testing, and then graduate to advanced experimentation strategies that take advantage of machine learning and probabilistic methods. The skills you’ll master in this practical guide will help you minimize the costs of experimentation and quickly reveal which approaches and features deliver the best business results.

What's inside

Design, run, and analyze an A/B test
Break the “feedback loops” caused by periodic retraining of ML models
Increase experimentation rate with multi-armed bandits
Tune multiple parameters experimentally with Bayesian optimization

About the reader
For ML and software engineers looking to extract the most value from their systems. Examples in Python and NumPy.

About the author
David Sweet has worked as a quantitative trader at GETCO and a machine learning engineer at Instagram. He teaches in the AI and Data Science master's programs at Yeshiva University.

Table of Contents
1 Optimizing systems by experiment
2 A/B testing: Evaluating a modification to your system
3 Multi-armed bandits: Maximizing business metrics while experimenting
4 Response surface methodology: Optimizing continuous parameters
5 Contextual bandits: Making targeted decisions
6 Bayesian optimization: Automating experimental optimization
7 Managing business metrics
8 Practical considerations

Skip carousel

LanguageEnglish

PublisherManning

Release dateMar 21, 2023

ISBN9781638356905

Author

David Sweet

David Sweet has worked as a quantitative trader at GETCO and a machine learning engineer at Instagram, where he used experimental methods to tune trading systems and recommender systems. This book is an extension of his lectures on tuning quantitative trading systems given at NYU Stern over the past three years.

Related authors

Skip carousel

Related to Experimentation for Engineers

Related ebooks

Skip carousel

Ensemble Methods for Machine Learning
Ebook
Ensemble Methods for Machine Learning
byGautam Kunapuli
Rating: 0 out of 5 stars
0 ratings
Feature Engineering Bookcamp
Ebook
Feature Engineering Bookcamp
bySinan Ozdemir
Rating: 0 out of 5 stars
0 ratings
Bayesian Optimization and Data Science
Ebook
Bayesian Optimization and Data Science
byFrancesco Archetti
Rating: 0 out of 5 stars
0 ratings
SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics
Ebook
SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics
byDaniel J. Denis
Rating: 0 out of 5 stars
0 ratings
Julia as a Second Language
Ebook
Julia as a Second Language
byErik Engheim
Rating: 0 out of 5 stars
0 ratings
Julia for Data Analysis
Ebook
Julia for Data Analysis
byBogumil Bogumil
Rating: 0 out of 5 stars
0 ratings
Markov Processes: An Introduction for Physical Scientists
Ebook
Markov Processes: An Introduction for Physical Scientists
byDaniel T. Gillespie
Rating: 1 out of 5 stars
1/5
Hands-on Time Series Analysis with Python: From Basics to Bleeding Edge Techniques
Ebook
Hands-on Time Series Analysis with Python: From Basics to Bleeding Edge Techniques
byB V Vishwas
Rating: 5 out of 5 stars
5/5
Profit From Your Forecasting Software: A Best Practice Guide for Sales Forecasters
Ebook
Profit From Your Forecasting Software: A Best Practice Guide for Sales Forecasters
byPaul Goodwin
Rating: 0 out of 5 stars
0 ratings
Introduction to Stochastic Dynamic Programming
Ebook
Introduction to Stochastic Dynamic Programming
bySheldon M. Ross
Rating: 0 out of 5 stars
0 ratings
A Primer on Statistical Distributions
Ebook
A Primer on Statistical Distributions
byN. Balakrishnan
Rating: 0 out of 5 stars
0 ratings
Quantitative Management of Bond Portfolios
Ebook
Quantitative Management of Bond Portfolios
byLev Dynkin
Rating: 0 out of 5 stars
0 ratings
Learning and Expectations in Macroeconomics
Ebook
Learning and Expectations in Macroeconomics
byGeorge W. Evans
Rating: 4 out of 5 stars
4/5
Dynamic Programming and Its Applications: Proceedings of the International Conference on Dynamic Programming and Its Applications, University of British Columbia, Vancouver, British Columbia, Canada, April 14-16, 1977
Ebook
Dynamic Programming and Its Applications: Proceedings of the International Conference on Dynamic Programming and Its Applications, University of British Columbia, Vancouver, British Columbia, Canada, April 14-16, 1977
byMartin L. Puterman
Rating: 0 out of 5 stars
0 ratings
The New Know: Innovation Powered by Analytics
Ebook
The New Know: Innovation Powered by Analytics
byThornton May
Rating: 0 out of 5 stars
0 ratings
Bond Portfolio Investing and Risk Management
Ebook
Bond Portfolio Investing and Risk Management
byVineer Bhansali
Rating: 0 out of 5 stars
0 ratings
Building REST APIs with Flask: Create Python Web Services with MySQL
Ebook
Building REST APIs with Flask: Create Python Web Services with MySQL
byKunal Relan
Rating: 0 out of 5 stars
0 ratings
A Rational Expectations Approach to Macroeconometrics: Testing Policy Ineffectiveness and Efficient-Markets Models
Ebook
A Rational Expectations Approach to Macroeconometrics: Testing Policy Ineffectiveness and Efficient-Markets Models
byFrederic S. Mishkin
Rating: 0 out of 5 stars
0 ratings
Semantic Web Programming
Ebook
Semantic Web Programming
byMatthew Fisher
Rating: 4 out of 5 stars
4/5
Probability Algebras and Stochastic Spaces
Ebook
Probability Algebras and Stochastic Spaces
byDemetrios A. Kappos
Rating: 0 out of 5 stars
0 ratings
Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood
Ebook
Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood
bySupun Kamburugamuve
Rating: 0 out of 5 stars
0 ratings
Mathematical Methods of Statistics (PMS-9), Volume 9
Ebook
Mathematical Methods of Statistics (PMS-9), Volume 9
byHarald Cramér
Rating: 3 out of 5 stars
3/5
Credit-Risk Modelling: Theoretical Foundations, Diagnostic Tools, Practical Examples, and Numerical Recipes in Python
Ebook
Credit-Risk Modelling: Theoretical Foundations, Diagnostic Tools, Practical Examples, and Numerical Recipes in Python
byDavid Jamieson Bolder
Rating: 0 out of 5 stars
0 ratings
Applications of Regression Models in Epidemiology
Ebook
Applications of Regression Models in Epidemiology
byErick Suárez
Rating: 0 out of 5 stars
0 ratings
Machine Learning Proceedings 1991: Proceedings of the Eighth International Workshop (ML91)
Ebook
Machine Learning Proceedings 1991: Proceedings of the Eighth International Workshop (ML91)
byLawrence A. Birnbaum
Rating: 0 out of 5 stars
0 ratings
Computer Science and Operations Research: New Developments in their Interfaces
Ebook
Computer Science and Operations Research: New Developments in their Interfaces
byOsman Balci
Rating: 0 out of 5 stars
0 ratings
Ruin Probabilities: Smoothness, Bounds, Supermartingale Approach
Ebook
Ruin Probabilities: Smoothness, Bounds, Supermartingale Approach
byYuliya Mishura
Rating: 0 out of 5 stars
0 ratings
Multiobjective Programming and Planning
Ebook
Multiobjective Programming and Planning
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
Hybrid Computational Intelligence: Challenges and Applications
Ebook
Hybrid Computational Intelligence: Challenges and Applications
bySiddhartha Bhattacharyya
Rating: 0 out of 5 stars
0 ratings
Dynamic Programming and Its Application to Optimal Control
Ebook
Dynamic Programming and Its Application to Optimal Control
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings

Computers For You

Skip carousel

CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
The Invisible Rainbow: A History of Electricity and Life
Ebook
The Invisible Rainbow: A History of Electricity and Life
byArthur Firstenberg
Rating: 4 out of 5 stars
4/5
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
Ebook
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
byKathleen Hale
Rating: 4 out of 5 stars
4/5
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
Ebook
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
byRizwan Virk
Rating: 5 out of 5 stars
5/5
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
Ebook
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
byAndrew Hodges
Rating: 4 out of 5 stars
4/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Ebook
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
bySeth Stephens-Davidowitz
Rating: 4 out of 5 stars
4/5
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
Ebook
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
byBruce Sterling
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance
Ebook
Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance
byKatherine Johnson Martinko
Rating: 0 out of 5 stars
0 ratings
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 0 out of 5 stars
0 ratings
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Going Text: Mastering the Command Line
Ebook
Going Text: Mastering the Command Line
byBrian Schell
Rating: 4 out of 5 stars
4/5
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
Ebook
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
byDavid Kadavy
Rating: 5 out of 5 stars
5/5
AP Computer Science Principles Premium, 2024: 6 Practice Tests + Comprehensive Review + Online Practice
Ebook
AP Computer Science Principles Premium, 2024: 6 Practice Tests + Comprehensive Review + Online Practice
bySeth Reichelson
Rating: 0 out of 5 stars
0 ratings
Remote/WebCam Notarization : Basic Understanding
Ebook
Remote/WebCam Notarization : Basic Understanding
byJeannie Eunice Franks
Rating: 3 out of 5 stars
3/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
CompTIA Security+ Practice Questions
Ebook
CompTIA Security+ Practice Questions
byIP Specialist
Rating: 2 out of 5 stars
2/5
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
People Skills for Analytical Thinkers
Ebook
People Skills for Analytical Thinkers
byGilbert Eijkelenboom
Rating: 5 out of 5 stars
5/5
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

LLMs, Retrieval Augmented Generation, Knowledge Graph, Vector Databases with Mike Dillinger: RAG, Retrieval Augemented Generation, is the term you now constantly hear in conjunction with LLM that provides context. But how does it actually work? And what's the relationship with Vector Databases and Knowledge Graphs? This will be a geeky AI e...
Podcast episode
LLMs, Retrieval Augmented Generation, Knowledge Graph, Vector Databases with Mike Dillinger: RAG, Retrieval Augemented Generation, is the term you now constantly hear in conjunction with LLM that provides context. But how does it actually work? And what's the relationship with Vector Databases and Knowledge Graphs? This will be a geeky AI e...
byCatalog & Cocktails: The Honest, No-BS Data Podcast
0 ratings
0% found this document useful
Being Bayesian: This episode explores the root concept of what it is to be Bayesian: describing knowledge of a system probabilistically, having an appropriate prior probability, know how to weigh new evidence, and following Bayes's rule to compute the revised...
Podcast episode
Being Bayesian: This episode explores the root concept of what it is to be Bayesian: describing knowledge of a system probabilistically, having an appropriate prior probability, know how to weigh new evidence, and following Bayes's rule to compute the revised...
byData Skeptic
0 ratings
0% found this document useful
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
Podcast episode
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
#13 Fake News Detection with Data Science: Fake news: how can data science and deep learning be leveraged to detect it? Come on a journey with Mike Tamir, Head of Data Science at Uber ATG, who is building out a data science product that classifies text as news, editorial, satire, hate speech...
Podcast episode
#13 Fake News Detection with Data Science: Fake news: how can data science and deep learning be leveraged to detect it? Come on a journey with Mike Tamir, Head of Data Science at Uber ATG, who is building out a data science product that classifies text as news, editorial, satire, hate speech...
byDataFramed
100%
100% found this document useful
Why Building Supplier Relationships Is So Important: How do human relationships impact supplier management in the medical device field? Today you’ll hear from Taylor Brown, a Medical Device Guru and Manager of Onboarding and Implementation for Greenlight Guru, and Maxime Rochon, the Director of Qualit...
Podcast episode
Why Building Supplier Relationships Is So Important: How do human relationships impact supplier management in the medical device field? Today you’ll hear from Taylor Brown, a Medical Device Guru and Manager of Onboarding and Implementation for Greenlight Guru, and Maxime Rochon, the Director of Qualit...
byGlobal Medical Device Podcast powered by Greenlight Guru
0 ratings
0% found this document useful
TECC 195: The Four Books That I Am Reading Now and How They Can Help You as an Engineer: In this episode, I am going to discuss the four books that I am currently reading and how these books, or really how my reading process may help you as an engineer. I decided to create this episode because engineers are constantly asking me what I am r...
Podcast episode
TECC 195: The Four Books That I Am Reading Now and How They Can Help You as an Engineer: In this episode, I am going to discuss the four books that I am currently reading and how these books, or really how my reading process may help you as an engineer. I decided to create this episode because engineers are constantly asking me what I am r...
byThe Engineering Career Coach Podcast
100%
100% found this document useful
EP 198 - Rethink Database Design for the AI Era: Today, we have , the CEO and co-founder of . Conexus AI serves as a hybrid generative AI platform, facilitating reliable and rapid digital modernization, empowering enterprises to seamlessly migrate, integrate, and transform their IT systems. In this...
Podcast episode
EP 198 - Rethink Database Design for the AI Era: Today, we have , the CEO and co-founder of . Conexus AI serves as a hybrid generative AI platform, facilitating reliable and rapid digital modernization, empowering enterprises to seamlessly migrate, integrate, and transform their IT systems. In this...
byIndustrial IoT Spotlight
0 ratings
0% found this document useful
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
Podcast episode
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
byNew Books in Education
0 ratings
0% found this document useful
One Shot and Metric Learning - Quadruplet Loss (Machine Learning Dojo)
Podcast episode
One Shot and Metric Learning - Quadruplet Loss (Machine Learning Dojo)
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
Podcast episode
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
120: FastAPI & Typer - Sebastián Ramírez: Sebastián Ramírez is the developer behind FastAPI for Python REST APIs and Typer, for CLI applications. We discuss FastAPI, Typer, Swagger UI, interface design, autocompletion, and more.
Podcast episode
120: FastAPI & Typer - Sebastián Ramírez: Sebastián Ramírez is the developer behind FastAPI for Python REST APIs and Typer, for CLI applications. We discuss FastAPI, Typer, Swagger UI, interface design, autocompletion, and more.
byTest and Code
0 ratings
0% found this document useful
Yves Hilpisch on Quantitative Finance
Podcast episode
Yves Hilpisch on Quantitative Finance
byThe Python Podcast.__init__
0 ratings
0% found this document useful
#14 Hidden Markov Models & Statistical Ecology, with Vianey Leos-Barajas
Podcast episode
#14 Hidden Markov Models & Statistical Ecology, with Vianey Leos-Barajas
byLearning Bayesian Statistics
0 ratings
0% found this document useful
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
Deploying Edge and Embedded AI Systems with Heather Gorr - #655
Podcast episode
Deploying Edge and Embedded AI Systems with Heather Gorr - #655
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
What does the D2C Model means for Shopping Centres and Retailers?: While the D2C selling model is becoming popular among brands and product companies, there are several supply chain challenges and complexities involved for businesses in this space. Logistics and supply chain management can make or break a...
Podcast episode
What does the D2C Model means for Shopping Centres and Retailers?: While the D2C selling model is becoming popular among brands and product companies, there are several supply chain challenges and complexities involved for businesses in this space. Logistics and supply chain management can make or break a...
byVoice on Demand - Retail Podcast by MECS+R
0 ratings
0% found this document useful
Making Automated Machine Learning More Accessible With EvalML: An interview with Angela Lin and Jeremy Shih about the open source EvalML framework for building automated machine learning workflows.
Podcast episode
Making Automated Machine Learning More Accessible With EvalML: An interview with Angela Lin and Jeremy Shih about the open source EvalML framework for building automated machine learning workflows.
byThe Python Podcast.__init__
100%
100% found this document useful
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188: In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern...
Podcast episode
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188: In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Robert Chang: Building the Minerva Metrics Store @ Airbnb: Robert Chang is a product manager for the data platform at Airbnb, where he helped build and roll out Minerva, Airbnb's internal metrics store. They use Minerva to track over 12,000(!) metrics and 4,000(!) dimensions with consistency across the...
Podcast episode
Robert Chang: Building the Minerva Metrics Store @ Airbnb: Robert Chang is a product manager for the data platform at Airbnb, where he helped build and roll out Minerva, Airbnb's internal metrics store. They use Minerva to track over 12,000(!) metrics and 4,000(!) dimensions with consistency across the...
byThe Analytics Engineering Podcast
0 ratings
0% found this document useful
Distributing Geospatial Data: Distributing Geospatial Data - Every wondered why you might what to do this? Or maybe you understand the why but are unsure about the how? Perhaps you have heard people talk about partitioning data or sharding data, you might have heard some of thes...
Podcast episode
Distributing Geospatial Data: Distributing Geospatial Data - Every wondered why you might what to do this? Or maybe you understand the why but are unsure about the how? Perhaps you have heard people talk about partitioning data or sharding data, you might have heard some of thes...
byThe MapScaping Podcast - GIS, Geospatial, Remote Sensing, earth observation and digital geography
0 ratings
0% found this document useful
[REPLAY] Alex Moazed – Building Modern Monopolies - [Invest Like the Best, EP.25]: My guest this week is Alex Moazed, the co-author of Modern Monopolies: What It Takes to Dominate the 21st Century Economy, which explores the platform business model (Uber, Airbnb, Github). Alex is also the founder and CEO of Applico, a company that he s
Podcast episode
[REPLAY] Alex Moazed – Building Modern Monopolies - [Invest Like the Best, EP.25]: My guest this week is Alex Moazed, the co-author of Modern Monopolies: What It Takes to Dominate the 21st Century Economy, which explores the platform business model (Uber, Airbnb, Github). Alex is also the founder and CEO of Applico, a company that he s
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Batteries: Is battery technology the key to a sustainable and emission free world? Evan Davis chairs.
Podcast episode
Batteries: Is battery technology the key to a sustainable and emission free world? Evan Davis chairs.
byThe Bottom Line
100%
100% found this document useful
Paul Belleflamme and Martin Peitz, "The Economics of Platforms: Concepts and Strategy" (Cambridge UP, 2021): An interview with Martin Peitz
Podcast episode
Paul Belleflamme and Martin Peitz, "The Economics of Platforms: Concepts and Strategy" (Cambridge UP, 2021): An interview with Martin Peitz
byNew Books in Economics
0 ratings
0% found this document useful
Heavy Networking 707: Getting Real With Selector’s AIOps (Sponsored): AI and machine learning are finally being applied to networking in meaningful ways. On today's sponsored show we talk with Selector about its AIOps platform, which ingests networking logs, flows, configurations, SNMP,
Podcast episode
Heavy Networking 707: Getting Real With Selector’s AIOps (Sponsored): AI and machine learning are finally being applied to networking in meaningful ways. On today's sponsored show we talk with Selector about its AIOps platform, which ingests networking logs, flows, configurations, SNMP,
byHeavy Networking
0 ratings
0% found this document useful
Build custom ML tools with Streamlit: featuring Adrien Treuille, Co-Founder and CEO at Streamlit
Podcast episode
Build custom ML tools with Streamlit: featuring Adrien Treuille, Co-Founder and CEO at Streamlit
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
20 JavaScript Array and Object Methods to make you a better developer: Wes and Scott rattle through ~20 different Object and Arra Methods that will make you a better JavaScript developer. Freshbooks - Sponsor This is episode Wes mentions the free book . Get a 30 day free trial of Freshbooks at . Netlify —...
Podcast episode
20 JavaScript Array and Object Methods to make you a better developer: Wes and Scott rattle through ~20 different Object and Arra Methods that will make you a better JavaScript developer. Freshbooks - Sponsor This is episode Wes mentions the free book . Get a 30 day free trial of Freshbooks at . Netlify —...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
32: David Hussman - Agile vs Agility, Dude's Law, and more: A wonderful discussion with David Hussman. David and Brian look back at what all we've learned in XP, TDD, and other Agile methodologies, where things have gone awry, how to bring the value back, and where testing fits into all of this.
Podcast episode
32: David Hussman - Agile vs Agility, Dude's Law, and more: A wonderful discussion with David Hussman. David and Brian look back at what all we've learned in XP, TDD, and other Agile methodologies, where things have gone awry, how to bring the value back, and where testing fits into all of this.
byTest and Code
0 ratings
0% found this document useful
Humans in the Loop - Lina Weichbrodt
Podcast episode
Humans in the Loop - Lina Weichbrodt
byDataTalks.Club
0 ratings
0% found this document useful
The Art & Science of Finding You Top Performers: The Art & Science of Finding You Top Performers Advanced Insights into Data Analysis and Optimization with Dr. Ellis Welcome to this episode of Seller Sessions, where we dive deep into the nuanced world of data analysis and optimisation with the...
Podcast episode
The Art & Science of Finding You Top Performers: The Art & Science of Finding You Top Performers Advanced Insights into Data Analysis and Optimization with Dr. Ellis Welcome to this episode of Seller Sessions, where we dive deep into the nuanced world of data analysis and optimisation with the...
bySeller Sessions Amazon FBA and Private Label
0 ratings
0% found this document useful
The ultimate guide to A/B testing | Ronny Kohavi (Airbnb, Microsoft, Amazon)
Podcast episode
The ultimate guide to A/B testing | Ronny Kohavi (Airbnb, Microsoft, Amazon)
byLenny's Podcast: Product | Growth | Career
0 ratings
0% found this document useful

Skip carousel

How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
3 Women Blinded By Unproven Stem Cell Treatments
NPR
Article
3 Women Blinded By Unproven Stem Cell Treatments
Mar 15, 2017
5 min read
Deep Learning Tests Billions Of Graphene Combos In 2 Days
Futurity
Article
Deep Learning Tests Billions Of Graphene Combos In 2 Days
Apr 11, 2019
2 min read
Building Blocks Of A Portfolio
Money Magazine
Article
Building Blocks Of A Portfolio
May 3, 2023
5 min read
Intel Core i3-13100F
Linux Format
Article
Intel Core i3-13100F
Jun 27, 2023
2 min read
01 Giving Data Collectors—and Donors—a Real-Time Rush
Fast Company
Article
01 Giving Data Collectors—and Donors—a Real-Time Rush
Mar 20, 2017
7 min read
Recognizing 'Value Patterns'
Rotman Management
Article
Recognizing 'Value Patterns'
May 1, 2018
7 min read
Don’t Be Misled by GPT-4’s Gift of Gab
The Atlantic
Article
Don’t Be Misled by GPT-4’s Gift of Gab
Mar 15, 2023
4 min read
GPT-4 Might Just Be a Bloated, Pointless Mess
The Atlantic
Article
GPT-4 Might Just Be a Bloated, Pointless Mess
Mar 6, 2023
4 min read
After Years of Challenges, Foursquare Has Found its Purpose -- and Profits
Entrepreneur
Article
After Years of Challenges, Foursquare Has Found its Purpose -- and Profits
Apr 1, 2017
8 min read
The Brain Uses Calculus to Control Fast Movements
Nautilus
Article
The Brain Uses Calculus to Control Fast Movements
Jan 4, 2023
4 min read
Set Up Your First Database
Linux Format
Article
Set Up Your First Database
Aug 25, 2020
1 min read
Windows Sandbox: How To Use Microsoft’s Virtual Windows PC To Secure Your Digital Life
PCWorld
Article
Windows Sandbox: How To Use Microsoft’s Virtual Windows PC To Secure Your Digital Life
Jul 2, 2019
6 min read
Intelligent Buildings
Business Today
Article
Intelligent Buildings
Jun 24, 2019
2 min read
Why Many Modern Psychology Test Publishers Fail
The European Business Review
Article
Why Many Modern Psychology Test Publishers Fail
Jul 31, 2023
6 min read
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
The European Business Review
Article
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
May 25, 2021
8 min read
Four Paths to Opportunity Identification
Rotman Management
Article
Four Paths to Opportunity Identification
Jan 1, 2019
In our work teaching innovation and entrepreneurship to students at the University of Sydney Business School and the California College of the Arts, we focus on four cognitive acts that comprise ‘design cognition’ — the type of thinking that fuels op
4 min read
The Devil Is In The Detail
Amateur Photographer
Article
The Devil Is In The Detail
Nov 26, 2019
2 min read
PEOPLE ASSESSMENT in the Digital Age
The European Business Review
Article
PEOPLE ASSESSMENT in the Digital Age
May 25, 2021
8 min read
A Golden Opportunity to Move Forward
Rotman Management
Article
A Golden Opportunity to Move Forward
Jan 1, 2021
AT SOME POINT, the COVID-19 pandemic will be behind us, whether that’s due to a vaccine, a two-minute test, or herd immunity. But if you want to thrive in the post-COVID world, you’ve got to start working on operational improvements now. After all, i
6 min read
An Efficient Alternative To Concept Optimisation
NZ Marketing
Article
An Efficient Alternative To Concept Optimisation
Jun 18, 2017
The art is evident in the workshops, ideation sessions, and concept-building huddles that take place at the earliest stages of innovation. This is where creativity, imagination and ingenuity meet to form the keys to success. And then there’s the scie
2 min read
Technology Excellence Awards 2020
PC Pro Magazine
Article
Technology Excellence Awards 2020
Oct 8, 2020
12 min read
How Quantum Computing Can Fight Climate Change
APC
Article
How Quantum Computing Can Fight Climate Change
Nov 28, 2022
8 min read
Making BoP changes
Racecar Engineering
Article
Making BoP changes
Dec 31, 2020
12 min read
The Era of Human + Machine Innovation
Rotman Management
Article
The Era of Human + Machine Innovation
Jan 1, 2019
Interview by Karen Christensen In today's environment, organizations that don't keep up with customers' evolving needs are doomed. What is the best way to get a handle on these evolving needs? The first step in understanding your customers is to acce
5 min read
How Quantum Computing Can Fight Climate Change
PC Pro Magazine
Article
How Quantum Computing Can Fight Climate Change
Oct 8, 2022
8 min read
Making Experimentation Work For You
Rotman Management
Article
Making Experimentation Work For You
Sep 1, 2020
You have said that every organization should embrace the practice of continuous experimentation. Why is this so critical, particularly right now? The main reason is that experimentation is the engine of innovation. As a result, anyone who cares about
6 min read
A New Era of Coronavirus Testing Is About to Begin
The Atlantic
Article
A New Era of Coronavirus Testing Is About to Begin
Aug 27, 2020
7 min read
“The Process Of Designing, Testing, Prototyping And Perfecting Is Never Ending”
PC Pro Magazine
Article
“The Process Of Designing, Testing, Prototyping And Perfecting Is Never Ending”
Apr 6, 2023
There are many things to do when starting a company. Find desk space, register the company, get a bank account, set up the website and all the other tasks that require different hats to be worn. If the idiom were reality, hatters and milliners would
7 min read
How Spooky Science Helps Us Peer Inside The Planets
All About Space
Article
How Spooky Science Helps Us Peer Inside The Planets
Dec 3, 2020
An assistant professor of computational science at the EPFL research centre in Lausanne, Switzerland, involved in the current research on metallic hydrogen. Could you explain how the machine-learning techniques used in your research work? Why were th
1 min read

Related categories

Skip carousel

Reviews for Experimentation for Engineers

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Experimentation for Engineers - David Sweet

inside front cover

IFC-1

Three stages of an A/B test: Design, Measure, and Analyze

IFC-2

Four iterations of a Bayesian optimization. In frames (a)–(d), we run four iterations of the optimization. By frame (d), the parameter value (black dots) has stopped changing.

Experimentation for Engineers

From A/B testing to Bayesian optimization

David Sweet

To comment go to liveBook

Manning

Shelter Island

For more information on this and other Manning titles go to

www.manning.com

Copyright

For online information and ordering of these and other Manning books, please visit www.manning.com. The publisher offers discounts on these books when ordered in quantity.

For more information, please contact

Special Sales Department

Manning Publications Co.

20 Baldwin Road

PO Box 761

Shelter Island, NY 11964

Email: orders@manning.com

No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.

♾ Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.

ISBN: 9781617298158

dedication

To B and Iz.

front matter

preface

acknowledgments

about this book

about the author

about the cover illustration

1 Optimizing systems by experiment

1.1 Examples of engineering workflows

Machine learning engineer’s workflow

Quantitative trader’s workflow

Software engineer’s workflow

1.2 Measuring by experiment

Experimental methods

Practical problems and pitfalls

1.3 Why are experiments necessary?

Domain knowledge

Offline model quality

Simulation

2 A/B testing: Evaluating a modification to your system

2.1 Take an ad hoc measurement

Simulate the trading system

Compare execution costs

2.2 Take a precise measurement

Mitigate measurement variation with replication

2.3 Run an A/B test

Analyze your measurements

Design the A/B test

Measure and analyze

Recap of A/B test stages

3 Multi-armed bandits: Maximizing business metrics while experimenting

3.1 Epsilon-greedy: Account for the impact of evaluation on business metrics

A/B testing as a baseline

The epsilon-greedy algorithm

Deciding when to stop

3.2 Evaluating multiple system changes simultaneously

3.3 Thompson sampling: A more efficient MAB algorithm

Estimate the probability that an arm is the best

Randomized probability matching

The complete algorithm

4 Response surface methodology: Optimizing continuous parameters

4.1 Optimize a single continuous parameter

Design: Choose parameter values to measure

Take the measurements

Analyze I: Interpolate between measurements

Analyze II: Optimize the business metric

Validate the optimal parameter value

4.2 Optimizing two or more continuous parameters

Design the two-parameter experiment

Measure, analyze, and validate the 2D experiment

5 Contextual bandits: Making targeted decisions

5.1 Model a business metric offline to make decisions online

Model the business-metric outcome of a decision

Add the decision-making component 128

Run and evaluate the greedy recommender

5.2 Explore actions with epsilon-greedy

Missing counterfactuals degrade predictions

Explore with epsilon-greedy to collect counterfactuals

5.3 Explore parameters with Thompson sampling

Create an ensemble of prediction models

Randomized probability matching

5.4 Validate the contextual bandit

6 Bayesian optimization: Automating experimental optimization

6.1 Optimizing a single compiler parameter, a visual explanation

Simulate the compiler

Run the initial experiment

Analyze: Model the response surface

Design: Select the parameter value to measure next

Design: Balance exploration with exploitation

6.2 Model the response surface with Gaussian process regression

Estimate the expected CPU time

Estimate uncertainty with GPR

6.3 Optimize over an acquisition function

Minimize the acquisition function

6.4 Optimize all seven compiler parameters

Random search

A complete Bayesian optimization

7 Managing business metrics

7.1 Focus on the business

Don’t evaluate a model

Evaluate the product

7.2 Define business metrics

Be specific to your business

Update business metrics periodically

Business metric timescales

7.3 Trade off multiple business metrics

Reduce negative side effects

Evaluate with multiple metrics

8 Practical considerations

8.1 Violations of statistical assumptions

Violation of the iid assumption

Nonstationarity

8.2 Don’t stop early

8.3 Control family-wise error

Cherry-picking increases the false-positive rate

Control false positives with the Bonferroni correction

8.4 Be aware of common biases

Confounder bias

Small-sample bias

Optimism bias

Experimenter bias

8.5 Replicate to validate results

Validate complex experiments

Monitor changes with a reverse A/B test

Measure quarterly changes with holdouts

8.6 Wrapping up

appendix A Linear regression and the normal equations

appendix B One factor at a time

appendix C Gaussian process regression

index

front matter

preface

When I first entered the industry, I had the training of a theoretician but was presented with the tasks of an engineer. As a theoretician, I had worked with models using pen-and-paper or simulation. Where the model had a parameter, I—the theoretician—would try to understand how the model would behave with different values of it. But now I—the engineer—had to commit to a single value: the one to use in a production system. How could I know what value to choose?

The short answer I received from more experienced practitioners was, Just try something. In other words, experiment. This set me off on a course of study of experimentation and experimental methods, with a focus on optimizing engineered systems.

Over the years, the methods applied by the teams I have been on, and by engineers in trading and technology generally, have become ever more precise and efficient. They have been used to optimize the execution of stock trades, market making, web search, online advertising, social media, online news, low-latency infrastructure, and more. As a result, trade execution has become cheaper and more fairly priced. Users regularly claim that web search and social media recommendations are so good that they worry their phones might be eavesdropping on them (they’re not).

Statistics-based experimental methods have a relatively short history. Sir R. A. Fisher published the seminal work, The Design of Experiments, in 1935—less than a century ago. In it he discussed the class of experimental methods in which we’d place an A/B test (chapter 2). In 1941, H. Hotelling wrote the paper Experimental determination of the maximum of a function, in which he discussed the modeling of a response surface (chapter 4). Response surface methodology was further explored by G. Box and K. P. Wilson. In 1947, A. Wald published the book Sequential Analysis, which studies the idea of analyzing experimental data measurement by measurement (chapter 3), rather than waiting until all measurements are available (as you would in an A/B test).

While this research was being done, the methods were being applied in industry: first in agriculture (Fisher’s methods), then in chemical and process industries (response surface methods). Later (from the 1950s to the 1980s) experimentation merged with statistical process control to give us the quality movements in manufacturing, exemplified by Toyota’s Total Quality Management, and later, popularized by Six Sigma.

From the 1990s onward, internet companies have experienced an explosion of opportunity for experimentation as users have generated views, clicks, purchases, likes—countless interactions—that could be easily modified and measured with software on centralized web servers. In 2005, C.-C. Wang and S. R. Kulkarni wrote Bandit problems with side observations, which combined sequential analysis and supervised learning into a method now called a contextual bandit (chapter 5).

In 1975, J. Mockus wrote On the Bayes methods for seeking the extremal point, the foundation for Bayesian optimization (chapter 6), which takes an alternative approach to modeling a response surface and combines it with ideas from sequential analysis. This method was developed over the decades since by many researchers, including D. Jones et al., who wrote Efficient global optimization of expensive black-box functions, which, in 1998, applied some modern ideas to the method, making it look much more like the approach presented in this book.

In 2017, Vasant Dhar asked me to talk to his Trading Strategies and Systems class about high-frequency trading (HFT). He was gracious enough to allow me to focus specifically on the experimental optimization of HFT strategies. This was valuable to me because it gave me an opportunity to organize my thoughts and understanding of the topic—to pull together the various bits and pieces that I’d collected over the years. Slowly, those notes have grown into this book.

I hope this book saves you some time by putting all the bits and pieces I’ve collected in one place and stitching them together into a single, coherent unit.

acknowledgments

I am grateful to so many people for their hard work, for their support, and for their faith that this book could be brought into existence.

Thanks to Andrew Waldron, my acquisitions editor, for taking a chance on my proposal and on me. And thanks to Marjan Bace for giving it the thumbs-up.

Thanks to Katherine Olstein, my first development editor, for tirelessly reading and rereading my drafts and providing invaluable feedback and instruction.

Thank you to Karen Miller, my second development editor, and to Alain Couniot for technical editing. Thank you to Bert Bates for great high-level advice on writing a technical book, and to my technical proofreader, Ninoslav Čerkez. Thanks also to Matko Hrvatin, MEAP coordinator; Melissa Ice, development administrative support; Rebecca Rinehart, development manager; Mihaela Batinić, review editor; and Rejhana Markanović, development support.

Thanks to Professor Dhar for entrusting his students to me and my new material. Thanks to Andy Catlin for believing that I could teach a brand-new class based on an incomplete book. And thank you to my students for being gracious beta testers and providing valuable, as-you’re-learning feedback that I couldn’t have found anywhere else.

Several people sat with me for interviews. I appreciate the time and support of P.B., B.S., M.M., and Yan Wu (of Bond).

Thank you to the many Manning Early Access Program (MEAP) participants who bought the book before it was finished, asked great questions, located errors, and made helpful suggestions.

To all the reviewers: Achim Domma, Al Krinker, Amaresh Rajasekharan, Andrei Paleyes, Chris Heneghan, Dan Sheikh, Dimitrios Kouzis-Loukas, Eric Platon, Guillermo Alcantara Gonzalez, Ikechukwu Okonkwo, Ioannis Atsonios, Jeremy Chen, John Wood, Kim Falk, Luis Henrique Imagiire, Marc-Anthony Taylor, Matthew Macarty, Matthew Sarmiento, Maxim Volgin, Michael Kareev, Mike Jensen, Nick Vazquez, Oliver Korten, Patrick Goetz, Richard Tobias, Richard Vaughan, Roger Le, Satej Kumar Sahu, Sergio Govoni, Simone Sguazza, Steven Smith, William Jamir Silva, and Xiangbo Mao; your suggestions helped make this a better book.

about this book

Experimentation for Engineers teaches readers how to improve engineered systems using experimental methods. Experiments are run on live production systems, so they need to be done efficiently and with care. This book shows how.

Who should read this book

If you want to build things, you should also know how to evaluate them. This book is for machine learning engineers, quantitative traders, and software engineers looking to measure and improve the performance of whatever they’re building. Performance of the systems they build may be gauged by user behavior, revenue, speed, or similar metrics.

You might already be working with an experimentation system at a tech or finance company and want to understand it more deeply. You might be planning or aspiring to work with or build such a system. Students entering industry might find that this book is an ideal introduction to industry practices.

A reader should be comfortable with Python, NumPy, and undergraduate math (including basic linear algebra).

How this book is organized: A road map

Experimentation for Engineers is loosely organized into three pieces: an introduction (chapter 1), experimental methods (chapters 2-6), and information that applies to all methods (chapters 7 and 8).

Chapter 1 motivates experimentation, describes how it fits in with other engineering practices, and introduces business metrics.

Chapter 2 explains A/B testing and the fundamentals of experimentation.

Chapter 3 shows how to speed up A/B testing with multi-armed bandits.

Chapter 4 focuses on systems with numerical parameters and introduces the idea of a response surface.

Chapter 5 uses a multi-armed bandit to optimize many parameters in the special case where metrics can be measured very frequently.

Chapter 6 combines the concepts of a response surface and multi-armed bandits into a single method called Bayesian optimization.

Chapter 7 talks more deeply about business metrics.

Chapter 8 warns the reader about common pitfalls in experimentation and discusses mitigations.

About the code

This book contains many examples of source code both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text. Sometimes code is also in bold to highlight code that has changed from previous steps in the chapter, such as when a new feature adds to an existing line of code. In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In rare cases, even this was not enough, and listings include line-continuation markers (➥). Additionally, comments in the source code have often been removed from the listings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts.

You can get executable snippets of code from the liveBook (online) version of this book at https://livebook.manning.com/book/experimentation-for-engineers. The source code for all listings as well as generated figures is available on GitHub (https://github.com/dsweet99/e4e) inside Jupyter notebooks. You can always find your way there from the book’s web page at www.manning.com/books/experimentation-for-engineers. The code is written to Python 3.6.3, NumPy 1.21.2, and Jupyter 5.4.0.

liveBook discussion forum

Purchase of Experimentation for Engineers includes free access to liveBook, Manning’s online reading platform. Using liveBook’s exclusive discussion features, you can attach comments to the book globally or to specific sections or paragraphs. It’s a snap to make notes for yourself, ask and answer technical questions, and receive help from the author and other users. To access the forum, go to https://livebook.manning.com/book/experimentation-for-engineers/discussion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/discussion.

Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the author some challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.

about the author

David Sweet

worked as a quantitative trader at GETCO and a machine learning engineer at Instagram, where he used experimental methods to optimize trading systems and recommender systems. This book is an extension of his lectures on quantitative trading systems given at NYU Stern. It also forms the basis for the course Experimental Optimization, a course that he teaches in the AI and data science master’s programs at Yeshiva University. Before working in industry, he received a PhD in physics, publishing research in Physical Review Letters and Nature. The latter publication—an experiment demonstrating chaos in geometrical optics—has become a source of inspiration for computer graphics artists, a tool for undergraduate physics instruction, and an exhibit called TetraSphere at the Museum of Mathematics in New York City.

about the cover illustration

The figure on the cover of Experimentation for Engineers is Homme Sicilien, or Sicilian, taken from a collection by Jacques Grasset de Saint-Sauveur, published in 1788. Each illustration is finely drawn and colored by hand.

In those days, it was easy to identify where people lived and what their trade or station in life was just by their dress. Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional culture centuries ago, brought back to life by pictures from collections such as this one.

1 Optimizing systems by experiment

This chapter covers

Optimizing an engineered system

Exploring what experiments are

Learning why experiments are uniquely valuable

The past 20 years have seen a surge in interest in the development of experimental methods used to measure and improve engineered systems, such as web products, automated trading systems, and software infrastructure. Experimental methods have become more automated and more efficient. They have scaled up to large systems like search engines or social media sites. These methods generate continuous, automated performance improvement of live production systems.

Using these experimental methods, engineers measure the business impact of the changes they make to their systems and determine the optimal settings under which to run them. We call this process experimental optimization.

This book teaches several experimental optimization methods used by engineers working in trading and technology. We’ll discuss systems built by three specific types of engineers:

Machine learning engineers

Quantitative traders (quants)

Software engineers

Machine learning engineers often work on web products like search engines, recommender systems, and ad placement systems. Quants build automated trading systems. Software engineers build infrastructure and tooling such as web servers, compilers, and event processing systems.

These engineers follow a common process, or workflow, that is an endless loop of steady system improvement. Figure 1.1 shows this common workflow.

01-01

Figure 1.1 Common engineering workflow. (1) A new idea is first implemented as a code change to the system. (2) Typically, some offline evaluation is performed that rejects ideas that are expected to negatively impact business metrics. (3) The change is pushed into the production system, and business metrics are measured there, online. Accepted changes become permanent parts of the system. The whole workflow repeats, creating reliable, continuous improvement of the system.

The common workflow creates progressive improvement of an engineered system. An individual or a team generates ideas that they expect will improve the system, and they pass each idea through the workflow. Good ideas are accepted into the system, and bad ideas are rejected:

Implement change—First, an engineer implements an idea as a code change, an update to the system’s software. In this stage, the code is subjected to typical software engineering quality controls, like code review and unit testing. If it passes all tests, it moves on to the next stage.

Evaluate offline—The business impact of the code change is evaluated offline, away from the production system. This evaluation typically uses data previously logged by the production system to produce rough estimates of business metrics such as revenue or the expected number of clicks on an advertisement. If these estimates show that applying this code change to the production system would worsen business metrics, then the code change is rejected. Otherwise, it is passed to the final stage.

Measure online—The change is pushed into production, where its impact on business metrics is measured. The code change might require some configuration—the setting of numerical parameters or Boolean flags. If so, the engineer will measure business metrics for multiple configurations to determine which is best. If no improvements to business metrics can be made by applying (and configuring) this code change, then the code change is rejected. Otherwise, the change is made permanent and the system improves.

This book deals with the final stage, measure online. In this stage, you run an experiment on the live production system. Experimentation is valuable because it produces a measurement from the real system, which is information you couldn’t get any other way. But experimentation on a live system takes time. Some experiments take days or weeks to run. And it is not without risk. When you run an experiment, you may lose money, alienate users, or generate bad press or social media chatter as users notice and complain about the changes you’re making to your system. Therefore, you need to take measurements as quickly and precisely as possible to minimize the ill effects of ideas—call them costs for brevity—that don’t work and to take maximal advantage of ones that do.

To extract the most value from a new bit of code, you need to configure it optimally. You could liken the process of finding the best configuration to tuning an old AM or FM radio or tuning a guitar string. You typically turn a knob up and down and listen to see whether you’re getting good results. Set the knob too high or too low and your radio will be noisy, or your guitar will be sharp or flat. So it is with code configuration parameters (often referred to as knobs in code your author has read). You want them set to just the right values to give maximal business impact—whether that’s revenue or clicks or some other metric. Note that the need to run costly experiments is what specifies experimental optimization methods as a subset of optimization methods more generally.

In this chapter, we’ll discuss engineering workflows for each of the engineer types listed earlier—machine learning engineer (MLE), quant, and software engineer (SWE). We’ll see what kinds of systems they work on, the business metrics they measure, and how each stage of the generic workflow is implemented.

In your organization, you might hear of alternative ways of evaluating changes to a system. Common suggestions are domain knowledge, model-based estimates, and simulation. We’ll discuss the reason why these tools, while valuable, can’t substitute for an experimental measurement.

1.1 Examples of engineering workflows

While the engineers listed earlier may work in different domains, their overall workflows are similar. Their workflows can be seen as specific cases of the common engineering workflow we described in figure 1.1: implement change, evaluate offline, measure online. Let’s look in detail at an example workflow for an MLE, for a quant, and for an SWE.

1.1.1 Machine learning engineer’s workflow

Imaginean MLE who works on a web-based news site. Their workflow might look like figure 1.2.

01-02

Figure 1.2 Example workflow for a machine learning engineer building a news-based website. The site contains an ML component that predicts clicks on news articles. (1) The MLE fits a new predictor. (2) An estimate of ad revenue from the new predictor is made using logs of user clicks and ad rates. (3) The new predictor is deployed to production and actual ad revenue is measured. If it improves ad revenue, then it is accepted into the system.

The key machine learning (ML) component of the website is a predictor model that predicts which news articles a user will click on. The predictor might take as input many features, such as information about the user’s demographics, the user’s previous activity on the website, and information about the news article’s title or its content. The predictor’s output will be an estimate of the probability that a specific user will click on a given news article. The website could use those predictions to rank and sort news articles on a headlines-summary page hoping to put more appealing news higher up on the page.

Figure 1.2 depicts the workflow for this system. When the MLE comes up with an idea to improve the predictor—a new feature or a new model type—the idea is subjected to the workflow:

Implement change—The MLE fits the new predictor to logged data. If it produces better predictions on the logged data than the previous predictor, it passes to the next stage.

Evaluate offline—The business goal is to increase revenue from ads that run on the website, not simply to improve click predictions. Translating improved predictions into improved revenue is not straightforward, but methods exist that give useful estimates for some systems. If the estimates do not look very bad, the predictor will pass on to the next stage.

Measure online—The MLE deploys the predictor to production, and real users see their headlines ranked with it. The MLE measures the ad revenue and compares it to the ad revenue produced by

Enjoying the preview?

Page 1 of 1

Experimentation for Engineers: From A/B testing to Bayesian optimization

About this ebook

David Sweet

Related authors

Related to Experimentation for Engineers

Related ebooks

Computers For You

Related podcast episodes

Related articles

Related categories

Reviews for Experimentation for Engineers

What did you think?

Book preview

Experimentation for Engineers - David Sweet

inside front cover

Experimentation for Engineers

contents

preface

acknowledgments

about this book

Who should read this book

How this book is organized: A road map

About the code

liveBook discussion forum

about the author

about the cover illustration

1 Optimizing systems by experiment

This chapter covers

1.1 Examples of engineering workflows

1.1.1 Machine learning engineer’s workflow