The Lindahl Letter: 3 Years of AI/ML Research Notes
By Nels Lindahl
()
About this ebook
Read more from Nels Lindahl
The Lindahl Letter: 104 Machine Learning Posts Rating: 0 out of 5 stars0 ratingsThe Lindahl Letter: On Machine Learning Rating: 0 out of 5 stars0 ratingsUnited Earth Chronicles: A Novella Rating: 0 out of 5 stars0 ratingsShort Stories Assembled Rating: 0 out of 5 stars0 ratingsUpper Bound Chronicles: A Novella Rating: 0 out of 5 stars0 ratingsDream Chaser Archives: A Novella Rating: 0 out of 5 stars0 ratings
Related to The Lindahl Letter
Related ebooks
The AI Artificial Intelligence Course From Beginner to Expert Rating: 0 out of 5 stars0 ratingsData Analytics and Big Data Rating: 0 out of 5 stars0 ratingsDeploy Machine Learning Models to Production: With Flask, Streamlit, Docker, and Kubernetes on Google Cloud Platform Rating: 0 out of 5 stars0 ratingsHands-on TinyML: Harness the power of Machine Learning on the edge devices (English Edition) Rating: 5 out of 5 stars5/5Effective Data Science Infrastructure: How to make data scientists productive Rating: 0 out of 5 stars0 ratingsIntroduction to TinyML Rating: 5 out of 5 stars5/5Designing Machine Learning Systems with Python Rating: 0 out of 5 stars0 ratingsProgramming 16-Bit PIC Microcontrollers in C: Learning to Fly the PIC 24 Rating: 4 out of 5 stars4/5Monitoring Cloud-Native Applications: Lead Agile Operations Confidently Using Open Source Software Rating: 0 out of 5 stars0 ratingsAn Elementary Introduction to Statistical Learning Theory Rating: 0 out of 5 stars0 ratingsMachine Learning with TensorFlow, Second Edition Rating: 0 out of 5 stars0 ratingsIntroduction to Quantum Computing & Machine Learning Technologies: 1, #1 Rating: 0 out of 5 stars0 ratingsSwift 3 Object-Oriented Programming - Second Edition Rating: 0 out of 5 stars0 ratingsDeep Learning with Python Rating: 5 out of 5 stars5/5Exploring the World of Data Science and Machine Learning Rating: 0 out of 5 stars0 ratingsReal-World Machine Learning Rating: 0 out of 5 stars0 ratingsBeginning Machine Learning in the Browser: Quick-start Guide to Gait Analysis with JavaScript and TensorFlow.js Rating: 0 out of 5 stars0 ratingsPractical Machine Learning in JavaScript: TensorFlow.js for Web Developers Rating: 0 out of 5 stars0 ratingsEmerging Social Computing Techniques: Volume 3 Rating: 0 out of 5 stars0 ratingsMachine Learning in Production: Master the art of delivering robust Machine Learning solutions with MLOps (English Edition) Rating: 0 out of 5 stars0 ratingsMetaversed: See Beyond The Hype Rating: 0 out of 5 stars0 ratingsThe SQL Server DBA’s Guide to Docker Containers: Agile Deployment without Infrastructure Lock-in Rating: 0 out of 5 stars0 ratingsDeep Belief Nets in C++ and CUDA C: Volume 1: Restricted Boltzmann Machines and Supervised Feedforward Networks Rating: 0 out of 5 stars0 ratingsExploring Artificial Intelligence: From Fundamentals to Advanced Applications Rating: 0 out of 5 stars0 ratingsPragmatic Machine Learning with Python: Learn How to Deploy Machine Learning Models in Production Rating: 0 out of 5 stars0 ratingsPredicting Malicious Behavior: Tools and Techniques for Ensuring Global Security Rating: 0 out of 5 stars0 ratingsGANs in Action: Deep learning with Generative Adversarial Networks Rating: 0 out of 5 stars0 ratings
Technology & Engineering For You
The Art of War Rating: 4 out of 5 stars4/5Vanderbilt: The Rise and Fall of an American Dynasty Rating: 4 out of 5 stars4/5The Big Book of Hacks: 264 Amazing DIY Tech Projects Rating: 4 out of 5 stars4/5The Art of War Rating: 4 out of 5 stars4/5A Night to Remember: The Sinking of the Titanic Rating: 4 out of 5 stars4/5Longitude: The True Story of a Lone Genius Who Solved the Greatest Scientific Problem of His Time Rating: 4 out of 5 stars4/5Death in Mud Lick: A Coal Country Fight against the Drug Companies That Delivered the Opioid Epidemic Rating: 4 out of 5 stars4/5Broken Money: Why Our Financial System is Failing Us and How We Can Make it Better Rating: 5 out of 5 stars5/5The Right Stuff Rating: 4 out of 5 stars4/5U.S. Marine Close Combat Fighting Handbook Rating: 4 out of 5 stars4/5The 48 Laws of Power in Practice: The 3 Most Powerful Laws & The 4 Indispensable Power Principles Rating: 5 out of 5 stars5/5The Systems Thinker: Essential Thinking Skills For Solving Problems, Managing Chaos, Rating: 4 out of 5 stars4/5Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career Rating: 4 out of 5 stars4/5The Big Book of Maker Skills: Tools & Techniques for Building Great Tech Projects Rating: 4 out of 5 stars4/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 0 out of 5 stars0 ratingsThe CIA Lockpicking Manual Rating: 5 out of 5 stars5/5The Invisible Rainbow: A History of Electricity and Life Rating: 4 out of 5 stars4/5The Fast Track to Your Technician Class Ham Radio License: For Exams July 1, 2022 - June 30, 2026 Rating: 5 out of 5 stars5/5A History of the American People Rating: 4 out of 5 stars4/580/20 Principle: The Secret to Working Less and Making More Rating: 5 out of 5 stars5/5Selfie: How We Became So Self-Obsessed and What It's Doing to Us Rating: 4 out of 5 stars4/5Artificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/5How to Disappear and Live Off the Grid: A CIA Insider's Guide Rating: 0 out of 5 stars0 ratingsThe Wuhan Cover-Up: And the Terrifying Bioweapons Arms Race Rating: 0 out of 5 stars0 ratingsNo Nonsense Technician Class License Study Guide: for Tests Given Between July 2018 and June 2022 Rating: 5 out of 5 stars5/5Logic Pro X For Dummies Rating: 0 out of 5 stars0 ratingsUnderstanding Media: The Extensions of Man Rating: 4 out of 5 stars4/5
Reviews for The Lindahl Letter
0 ratings0 reviews
Book preview
The Lindahl Letter - Nels Lindahl
The Lindahl Letter: 3 Years of AI/ML Research Notes
ALSO BY NELS LINDAHL
Nonfiction Books
Graduation With Civic Honors
Responsive E-Government
The Lindahl Letter: On Machine Learning
The Lindahl Letter: 104 Machine Learning Posts
Contact Strategy for Campaigns
Fiction Books
United Earth Chronicles: A Novella
Upper Bound Chronicles: A Novella
Dream Chaser Archives: A Novella
Forbidden Stones: A novella from the Taceo Loquaxi Chronicles
Jupiter Darkly: A novella
The Lindahl Letter: 3 Years of AI/ML Research Notes
By Nels Lindahl
It takes someone willing to work just beyond the edge of what is possible to accomplish something meaningful.
~ Nels Lindahl ~
The Lindahl Letter: 3 Years of AI/ML Research Notes
All Rights Reserved © 2024 by Nels Lindahl
No part of this book may be reproduced or transmitted in any form or by any means, graphic, electronic, or mechanical, including photocopying, recording, taping, or by any information storage or retrieval system, without the permission in writing from the publisher.
Published by Nels Lindahl
Printed in the United States of America
978-1-304-69279-5
Imprint: Lulu.com
Dedication
2021: To those who look at what could be and thread the needle to get there…
2022: To those who work toward meeting opportunity with a singular devotion to it. That focus should yield some interesting results.
2023: We made it to the end of 3 years. Fun times. Thank you to everybody that was along for the ride.
Acknowledgments
Thanks to all my Substack post readers over the last two years who helped inspire this effort to create 156 posts. It was great bonus to get feedback along the way while writing a book one week at a time. This three year journey involved a lot of learning and the journey was a huge part of the adventure.
Table of Contents
Dedication
Acknowledgments
Preface
Acronyms
Substack Week 1: Machine learning return on investment
Substack Week 2: Machine learning frameworks & pipelines
Substack Week 3: Machine learning teams
Substack Week 4: Have a machine learning strategy…revisited
Substack Week 5: Let your ROI drive a fact-based decision-making process
Substack Week 6: Understand the ongoing cost and success criteria as part of your machine learning strategy
Substack Week 7: Plan to grow based on successful ROI
Substack Week 8: Is the machine learning we need everywhere now?
Substack Week 9: Valuing machine learning use cases based on scale
Substack Week 10: Model extensibility for few-shot GPT-2
Substack Week 11: What is machine learning scale? The where and the when of machine learning usage
Substack Week 12: Confounding within multiple machine learning model deployments
Substack Week 13: Building out your MLOps
Substack Week 14: My Ai4 Healthcare NYC 2019 talk revisited
Substack Week 15: What are people really doing with machine learning?
Substack Week 16: Ongoing machine learning cloud costs
Substack Week 17: Figuring out machine learning readiness
Substack Week 18: Could machine learning predict the lottery?
Substack Week 19: Fear of missing out on machine learning
Substack Week 20: Week 20 Lindahl Letter recap edition
Substack Week 21: Doing machine learning work
Substack Week 22: Machine learning graphics
Substack Week 23: Fairness and machine learning
Substack Week 24: Evaluating machine learning
Substack Week 25: Teaching kids
Substack Week 26: Machine learning as a service
Substack Week 27: The future of machine learning
Substack Week 28: Machine learning certifications?
Substack Week 29: Machine learning feature selection
Substack Week 30: Integrations and your machine learning layer
Substack Week 31: Edge machine learning integrations
Substack Week 32: Federating your machine learning models
Substack Week 33: Where are AI investments coming from?
Substack Week 34: Where are the main AI labs?
Substack Week 35: Explainability in modern machine learning
Substack Week 36: AIOps/MLOps: Consumption of AI services versus operations
Substack Week 37: Reverse engineering GPT-2 or GPT-3
Substack Week 38: Do most machine learning projects fail?
Substack Week 39: Machine learning security
Substack Week 40: Applied machine learning skills
Substack Week 41: Machine learning and the metaverse
Substack Week 42: Time crystals and machine learning
Substack Week 43: Practical machine learning
Substack Week 44: Machine learning salaries
Substack Week 45: Prompt engineering and machine learning
Substack Week 46: Machine learning and deep learning
Substack Week 47: Anomaly detection and machine learning
Substack Week 48: Machine learning applications revisited
Substack Week 49: Machine learning assets
Substack Week 50: Is machine learning the new oil?
Substack Week 51: What is scientific machine learning?
Substack Week 52: That one with a machine learning post
Substack Week 53: Machine learning interview questions
Substack Week 54: What is a chief AI officer?
Substack Week 55: Who is acquiring machine learning patents?
Substack Week 56: Comparative analysis of national AI strategies
Substack Week 57: How would I compose a machine learning syllabus?
Substack Week 58: Teaching or training machine learning skills
Substack Week 59: Multimodal machine learning revisited
Substack Week 60: General AI
Substack Week 61: AI network platforms
Substack Week 62: Touching the singularity
Substack Week 63: Sentiment and consensus analysis
Substack Week 64: Language models revisited
Substack Week 65: Ethics in machine learning
Substack Week 66: Does a digital divide in machine learning exist?
Substack Week 67: My thoughts on NFTs
Substack Week 68: Publishing a model or selling the API?
Substack Week 69: A machine learning cookbook?
Substack Week 70: Web3, the decentralized internet
Substack Week 71: What are the best machine learning newsletters?
Substack Week 72: Open source machine learning security plus the machine learning and surveillance bonus issue
Substack Week 73: Symbolic machine learning
Substack Week 74: Machine learning content automation
Substack Week 75: Is machine learning destroying engineering colleges?
Substack Week 76: What is post-theory science?
Substack Week 77: Is quantum machine learning gaining momentum?
Substack Week 78: Trust and the future of digital photography
Substack Week 79: Why is diffusion so popular?
Substack Week 80: Bayesian optimization (Introduction to Machine Learning syllabus edition 1 of 8)
Substack Week 81: A machine learning literature review (Introduction to Machine Learning syllabus edition 2 of 8)
Substack Week 82: Machine learning algorithms (Introduction to Machine Learning syllabus edition 3 of 8)
Substack Week 83: Machine learning approaches (Introduction to Machine Learning syllabus edition 4 of 8)
Substack Week 84: Neural networks (Introduction to Machine Learning syllabus edition 5 of 8)
Substack Week 85: Neuroscience (Introduction to Machine Learning syllabus edition 6 of 8)
Substack Week 86: Ethics, fairness, bias, and privacy (Introduction to Machine Learning syllabus edition 7 of 8)
Substack Week 87: MLOps (Introduction to Machine Learning syllabus edition 8 of 8)
Substack Week 88: The future of academic publishing
Substack Week 89: Your machine learning model is not an AGI
Substack Week 90: What is probabilistic machine learning?
Substack Week 91: What are ensemble machine learning models?
Substack Week 92: We have a National AI Advisory Committee
Substack Week 93: Papers critical of machine learning
Substack Week 94: AI hardware (reduced instruction set computer [RISC]-V AI Chips)
Substack Week 95: Getting to quantum machine learning
Substack Week 96: Generative AI: Where are large language models going?
Substack Week 97: MIT’s Twist Quantum programming language
Substack Week 98: My thoughts on ChatGPT
Substack Week 99: Deep generative models
Substack Week 100: Overcrowding and machine learning
Substack Week 101: Back to the ROI for machine learning
Substack Week 102: Machine learning pracademics
Substack Week 103: Rethinking the future of machine learning
Substack Week 104: That second year of posting recap
Substack Week 105: Building out a better backlog
Substack Week 106: Code generating systems
Substack Week 107: Highly cited AI papers
Substack Week 108: Twitter as a company probably would not happen today
Substack Week 109: Robots in the house
Substack Week 110: Chatbots and understanding knowledge graphs
Substack Week 111: Natural language processing
Substack Week 112: Autonomous vehicles
Substack Week 113: Structuring an introduction to AI ethics
Substack Week 114: How does confidential computing work?
Substack Week 115: A literature review of modern polling methodology
Substack Week 116: A literature study of mail polling methodology
Substack Week 117: A literature study of non-mail polling methodology
Substack Week 118: A paper on political debt as a concept vs. technical debt
Substack Week 119: All that bad data abounds
Substack Week 120: That one with an obligatory AI trend’s post
Substack Week 121: Considering an independent study applied AI syllabus
Substack Week 122: AIaaS: Will AI be a platform or a service? Auto-GPT will disrupt both
Substack Week 123: We are wholesale oversubscribed on AI related content
Substack Week 124: Pivoting to AI + Security
Substack Week 125: Profiling OpenAI Security
Substack Week 126: Profiling Hugging Face Security
Substack Week 127: Profiling Google DeepMind Security
Substack Week 128: Democratizing AI system security
Substack Week 129: How do you use Colab in a generative way?
Substack Week 130: Build captain fractal using Colab
Substack Week 131: Bulk imagine improvement
Substack Week 132: Synthetic data notebooks
Substack Week 133: Automated survey methods
Substack Week 134: The chalk model for predicting elections
Substack Week 135: Polling aggregation models
Substack Week 136: Econometric election models
Substack Week 137: Tracking political registrations
Substack Week 138: Election prediction markets
Substack Week 139: Machine learning election models
Substack Week 140: Proxy models for elections
Substack Week 141: Building generative AI chatbots
Substack Week 142: Learning LangChain
Substack Week 143: Synthetic social media analysis
Substack Week 144: Knowledge graphs vs. vector databases
Substack Week 145: Delphi method & Door-to-door canvassing
Substack Week 146: Election simulations & Expert opinions
Substack Week 147: Bayesian Models and Elections
Substack Week 148: Happy Thanksgiving 2023
Substack Week 149: All that 48th week of the year wildness
Substack Week 150: Unraveling Emotions in the Digital Age
Substack Week 151: Decoding the Electorate
Substack Week 152: Beyond the Ballot
Substack Week 153: That one with a Hapy Holidays Post
Substack Week 154: Revolutionizing Predictions
Substack Week 155: 3 Years on Substack
Substack Week 156: My 2024 Predictions
Postlogue
About the author
Preface
Greetings, still inspired reader of now additional machine learning glory. Here we are again this year with a fresh collection of machine learning related posts from my Substack series, The Lindahl Letter. Last year when the first version of this manuscript was being drafted, I had thought that everybody seemed to be writing a Substack series. This year some drama on Twitter happened, and a flood of new writers tumbled over into the world of Substack. Within the machine learning space, now more Substack series are being written than I can possibly read each week. Due to the generative models like DALL-E 2 from OpenAI and the interesting ChatGPT interface being released, machine learning and artificial intelligence (AI) have camped out in the public mind recently.
This series of weekly posts was compiled into a manuscript. Each post has been edited from the original form into a more publication-friendly format. Substack as a platform provides a lot of freedom to include links and embedded content, which does not translate into the written page.
One of the things I did notice is that the first few posts were much longer than the ones at the end of the series. It appears that when I started talking about machine learning in general, the amount of content related to things I wanted to say was much larger. Given that each week is written to be independently consumed, all of my references are handled in a straightforward footnote method. Any aside, link, or acknowledgment to another author happens in the footnotes ending the chapter.
Things in this highly technology-driven space are changing rapidly. I included the date of publication as a frame of reference.
Dr. Nels Lindahl
Broomfield, Colorado
December 12, 2022 @ 6:02 AM
Acronyms
AGI artificial general intelligence
AI artificial intelligence
AIOps artificial intelligence operations
ANN artificial neural network
API application programming interface
AWS Amazon Web Services
CAIO chief artificial intelligence officer
CNN convolutional neural network
CSAIL Computer Science and Artificial Intelligence Laboratory
DAIR Distributed Artificial Intelligence Research Institute
DBN deep belief network
DQNN deeply quantized neural network
GAN generative adversarial network
GCP Google Cloud Platform
GPT generative pre-trained transformer
KPI key performance indicator
MLOps machine learning operations
MNN modular neural network
NFT non-fungible tokens
RISC reduced instruction set computer
RNN recurrent neural network
ROI return on investment
SciML scientific machine learning
SNN simulated neural network
SONN self-organizing neural network
TFX TensorFlow Extended
Substack Week 1: Machine learning return on investment
Published on January 29, 2021
Be strategic with your machine learning efforts.
Be
strategic
with
your
machine
learning
efforts.
Seriously, those seven words should guide your next steps along the machine learning journey. Take a moment and let that direction (strong guidance) sink in and reflect on what it really means for your organization. You have to take a moment and work backward from building strategic value for your organization to the actual machine learning effort you are undertaking. Inside that effort you will quickly discover that operationalizing machine learning efforts to generate strategic value will rely on a solid plan for return on investment (ROI). Make sure you are beginning with that end in mind to increase your chances of success. Taking actions within an organization of any kind at the scale machine learning is capable of delivering, without understanding the potential ROI or potential loss, is highly questionable. That is why you have to be strategic with your machine learning efforts from start to finish.
You have to set up and run a machine learning strategy from the top down. Executive leadership has to understand and be invested in guiding things toward the right path (a truly strategic path) from the start. Start by making an effort to begin with a solid strategy in the machine learning space. It might sound harder than it is in practice. You don’t need a complicated center of excellence or massive investment to develop a strategy. Your strategy just needs to be linked to the budget and ideally to a budget key performance indicator (KPI). Every budget results in the process of spending precious funds, and keeping a solid KPI around machine learning ROI levels will help ensure your strategy ends on a strong financial footing for years to come. All spending of an organization’s precious resources should translate to a KPI of some type. That is how your results will let you confirm that the funding is being spent well and that solid decision-making is occurring. You have to really focus and ensure that all spending is tied to that framework when you operationalize the organization’s strategic vision to be aligned financially to the budget.
That means that the machine learning strategy you are investing in has to be driven to achieve a certain ROI tied directly to solid budget-level KPIs. You might feel like that line has been repeated. If you noticed that repetition, then you are paying attention and well on your way to future success. Reading comprehension goes a long way to translating written argument to action. That KPI-related tieback you are creating is only going to happen with a solid machine learning strategy in place. It has to be based on prioritizing and planning for ROI. Your machine learning pipelines and frameworks have to be aligned toward that goal. That is ultimately the cornerstone of a solid strategic plan when it comes to implementing machine learning as part of a long-term strategy.
We are about 500 words into this book, and it might be time to simply recap the message being delivered so far. Be ready to do things in a definable and repeatable way. Part of executing a strategy with quality is doing things in a definable and repeatable way. That is the essence of where quality comes from. You have to know what plan is being executed and focus on and support the plan in ways that make it successful at your desired run rate. In terms of deploying machine learning efforts within an enterprise, you have to figure out how the technology is going to be set up and invested in and how that investment is going to translate to use cases with the right ROI.
Know the business value for the use case instead of letting solutions chase problems. Just because you can do a thing does not always mean that you should. Having the ability to deploy a technology does create the potential of letting a technology-based solution chase a problem. Building up technology for machine learning in a very theoretical and lab-based way and then chasing use cases is a terrible way to accidentally stumble on an ROI model that works. The better way forward is to know the use cases and have a solid strategy to apply your technology. That means finding the right machine learning frameworks and pipelines to support your use cases in powerful ways across the entire organization.
This is a time to be planful. Right here, right now, in this moment of consideration you can elect to be planful going forward. Technology for machine learning is becoming increasingly available and plentiful. No code, low code, and just solidly integrated solutions are becoming omnipresent in the technology landscape. Teams from all over the organization are probably wanting to try proof of concepts, and vendors are bringing in a variety of options. People are always ready to pitch the value of machine learning to the organization. Both internal and external options are plentiful. It is an amazing time for applied machine learning. You can get into the game in a variety of ways rapidly and without a ton of effort. Getting your implementation right and having the data, pipeline, and frameworks aligned to your maximum possible results involves planning and solid execution.
Your machine learning strategy cannot be a back-of-the-desk project. You have to be strategic. It has to be part of a broader strategy. You cannot let all of the proofs of concept and vendor plays drive the adoption of machine learning technology in your organization. That will mean that the overall strategic vision is not defined. It happened generally because it might have a solid ROI, and the right use case might have been selected by chance from the bottom up in the organization. That is not a planful strategy.
Know the workflow you want to augment with machine learning and drive beyond the buzzwords to see technology in action. You really have to know where in the workflow and what pipelines are going to enable your use cases to provide that solid ROI.
At some point along the machine learning journey you are going to need to make some decisions…
Q: Where are you going to serve the machine learning model from?
Q: Is this your first model build and deployment?
Q: What actual deployments of model serving are being managed?
Q: Are you working on-premise for training or calling an application programming interface (API) and model serving in your workflow?
Q: Have you elected to use a pre-trained model via an external API call?
Q: Did you buy a model from a marketplace, or are you buying access to a commercial API?
Q: How long before the model efficiency drops off and adjustment is required?
Q: Have you calculated where the point of no return is for model efficiency where ROI falls below break-even?
Substack Week 2: Machine learning frameworks & pipelines
Published on February 5, 2021
Ecosystems are beginning to develop related to machine learning pipelines. Different platforms (companies) are building out different methods to manage the machine learning frameworks and pipelines they support. Now is the time for your organization to get that effort going. You can go build out an easy-to-manage end-to-end method for feeding model updates to production. If you stopped reading this manuscript for a moment and started doing research or spinning things up, then you probably ended up using a TensorFlow Serving instance you installed, Amazon SageMaker pipeline, or an Azure machine learning pipeline [1]. Any of those methods will get you up and running. They have communities of practice to provide support [2]. That is to say the road you are traveling has been used before and used at scale. The path toward using machine learning frameworks and pipelines is pretty clearly established. People are doing that right now. They are building things for fun. They have things in production. While all that is occurring in the wild, a ton of orchestration and pipeline management companies are jumping out into the forefront of things right now in the business world [3].
Get going on your machine learning journey. One way to get going very quickly and start to really think about how to make this happen is to go and download TensorFlow Extended (TFX) from GitHub as your pipeline platform on your own hardware or some type of cloud instance [4]. You can just as easily go cloud native and build out your technology without boxes in your datacenter or at your desk. You could spin up on Google Cloud Platform (GCP), Azure, or Amazon Web Services (AWS) without any real friction against realizing your dream. Some of your folks might just set up local versions of these things to mess around and do some development along the way.
Build models. You could of course buy a model [5]. Steps exist to help you build a model. All of the machine learning pipeline setup steps are rather academic, without models that utilize the entire apparatus. One way to introduce machine learning to the relevant workflow based on your use case is to just integrate with an API to avoid having to set up frameworks and pipelines. That is one way to go about it, and for some things it makes a lot of sense. For other machine learning efforts, complexity will preclude using an out-of-the-box solution that has a callable API. You would be surprised at how many complex APIs are being offered these days, but they do not provide comprehensive coverage for all use cases [6].
What are you going to do with all those models? You are going to need to save them for serving. Getting set up with a solid framework and machine learning pipeline is all about serving up those models within workflows that fulfill use cases with defined and predictable ROI models.
From the point you implement, it is going to be a race against time to figure out when those models from the marketplace suffer an efficiency drop and some type of adjustment is required. You have to understand the potential model degradation and calculate at what point you have to shut down the effort due to ROI conditions being violated [7]. That might sound a little bit hard, but if your model efficiency degrades to the point that financial outcomes are being negatively impacted, you will want to know how to flip the off switch, and you might be wondering why that switch was not automated.
Along the way some type of adjustment to a model or parameters is going to be required. To recap, the way I look at ROI is pretty straightforward. You have to consider the value of the machine learning model in terms of what was invested in it and what you can potentially get out of it. It’s just going to give you a positive or negative look at whether that ROI is going to be there for you. At that point you are just following your strategy and thinking about the ROI model.
So again, strict ROI modeling may not be the method that you want to use. I would caution against working for long periods without understanding the financial consequences. At scale, you can very quickly create breakdowns and other problems within a machine learning use case. It could even go so far that you may not find it worthwhile for your business case. Inserting machine learning into a workflow might not be the right thing to do, and that is why calculating results and making fact-based decisions is so important.
Really, any way you do it in a planful way that’s definable and repeatable is going to work out great. That is fairly easy to say given that fact-based decision-making and being willing to hit the off switch if necessary help prevent runway problems from becoming existential threats to the business. So having a machine learning strategy, doing things in a definable and repeatable way, and being ruthlessly fact based is what I’m suggesting.
Obviously, you have to take everything that I say with a grain of salt; you should know up front that I’m a big TensorFlow enthusiast. That’s one of the reasons why I use it as my primary example, but it doesn’t mean that that’s the absolute right answer for you. It’s just the answer that I look at most frequently and always look to first before branching out to other solutions. That is always based on the use case, and I avoid letting technology search for problems at all costs. You need to let the use case and the problem at hand fit the solution instead of applying solutions until one works or you give up.
At this point in the story, you are thinking about or beginning to build this out, and you’re starting to get ramped up. The excitement is probably building to a crescendo of some sort. Now you need somewhere to manage your models. You may need to imagine for a moment that you do have models. Maybe you bought them from a marketplace and you skipped training all together. It’s an exciting time, and you are ready to get going. So in this example, you’re going from just building (or having recently acquired) a machine learning model to doing something. At that moment, you are probably realizing that you need to serve that model out over and over again to create an actual machine-learning-driven workload. That probably means that you are not only getting to manage those models, but also going to need to serve out different models over time.
As you make adjustments and corrections that introduce different modeling techniques, you get more advanced with what you are trying to implement. One of the things you’ll find is that even the perfect model that you had, which was right where you wanted it to be when you launched, is slowly waiting to betray you and your confidence in it by degrading. You have to be ready to model and evaluate performance based on your use case. That is what lets you make quality decisions about model quality and how outcomes are being impacted.
I have a few takeaways to conclude this installment of The Lindahl Letter. You have to remember that at this point machine learning models and pipelines are pretty much democratized. You can get them. They are out in the wild. People are using them in all kinds of different ways. You can just go ahead and introduce this technology to your organization with relatively little friction.
Footnotes:
[1] Links to the referenced machine learning pipelines: https://www.tensorflow.org/tfx, https://aws.amazon.com/sagemaker/pipelines/ or https://docs.microsoft.com/en-us/azure/machine-learning/concept-ml-pipelines
[2] One of the best places to start to learn about machine learning communities would be https://www.kaggle.com/
[3] Read this if you have a few minutes, it is worth the read: Agrawal, A., Gans, J., & Goldfarb, A. (2020, September). How to win with machine learning. Harvard Business Review. https://hbr.org/2020/09/how-to-win-with-machine-learning
[4] https://github.com/tensorflow/tfx
[5] This is one of the bigger ones: https://aws.amazon.com/marketplace/solutions/machine-learning
[6] This is one example of services that are open for business right now: https://cloud.google.com/products/ai
[7] This is a wonderful site and the article is spot on: Shendre, S. (2020, May 13). Model drift in machine learning. Towards Data Science. https://towardsdatascience.com/model-drift-in-machine-learning-models-8f7e7413b563
Substack Week 3: Machine learning teams
Published on February 12, 2021
Learning. Keep learning. Being a lifelong learner is a solid plan. I do honestly believe we live in a golden age of machine learning training and talent. People are selflessly sharing world-class training materials and code on GitHub. The number of academic papers being shared is skyrocketing [1]. For those folks able to start sprinkling in a little bit of expertise as you go, you can start building teams from the ground up. That starts by helping people invest in understanding this type of machine learning knowledge. That sets the foundation for them so they can build the machine learning pipelines you need with something like TFX examples [2]. That is one way to help them understand how those pipelines work for building and deploying. I think all the training to be able to do that is just amazingly available online; people have been super gracious with sharing that kind of knowledge.
The hard part is pairing that knowledge with deep machine learning skills. That means you are going to need to start finding the subject-matter expert knowledge within your organization. When you start really pairing people that know what you’re doing at a super granular, deep level to be able to work with models and do deployments, things will take off. It will probably end up being a team effort. You have deep knowledge from subject-matter experts and machine learning experts. The combination of two groups in a team is how things get done in practice. You may want to bring in someone who is a real expert on building different layers and networks; being able to really build refined models, they may augment your team and speed things up. Sometimes you have to bring in the right folks to jumpstart things along the way.
1. Where does the talent come from?
Personally, I believe you can build the talent from within your organization. If you doubted that assertion, then make a mental note to really challenge that bias against internal growth and development. Throughout my career, I have a proven track record of helping grow internal talent. It works, but it requires investing time and having the right programs aligned to building toolkits. One of my proudest professional accomplishments is seeing somebody get promoted. It is an amazing thing to see people start bringing advanced methods and techniques to different parts of the organization. You can capture that feeling as well. You just need to go out and start building out the toolkits of the people that you have. Really take the time to invest in them growing and developing as teammates and as individual contributors.
We are in the golden age of learning about machine learning [3]. More training than you can possibly consume now exists online. It exists in a variety of different forms. Among my favorites are the online labs now available online. The one I have used the most is called Coursera. People have built out well-tooled examples of how to do machine learning. Not only can you read about it, but you can get into examples and kick the tires. That is the thing that has drawn me to TensorFlow since the product launched. So many people have been so generous with their knowledge, skills, and abilities [4]. They are sharing the keys to the machine learning kingdom online in accessible classes, lectures, and even a few certificates. I have taken over 50 courses. You can see them on my LinkedIn profile if you really want to dig into the road I traveled. That will show you which ones I invested my own time in completing. You can also just check the links in the footnotes to find a place to start.
Sometimes building internal teams is just not fast enough. It takes time to help internal talent develop world-class skills in machine learning—or anything, for that matter. I recognize that is a long-term goal and something you have to build toward along the way. You have a few options to start looking for ways to supplement talent. One of those ways is to hire contractors and have them help you kickstart your endeavor. Another way is to find the right product or company to help you get going fast. Several companies are doing that right now, and some can be impactful for your organization.
Typically, the data sources in an organization are not well indexed with clearly mapped features and associations. Even getting off-the-shelf data sources is a real challenge. For the most part, the ones that people use were created to be used that way. Those data sets did not occur naturally in the wild. Even making custom-tailored synthetic datasets can be a challenge for an organization trying to operationalize machine learning at scale. That is where using external products to manage the data and even accessing APIs requires planning and sustained dedication. Data going to the APIs must be consistent. Constantly changing data streams are a nightmare to manage internally or externally. A lot of companies like Databricks are showing up to the party and helping make sense of complex data stores [5].
That might have been a lot to consider all in one stretch of thought, but it will all come into context the first time building a team to solve a machine learning problem becomes a necessity. My answer to where the talent comes from involves blending great professionals together over time to create high-functioning teams. That may involve hiring in key skill sets to help supplement a team or investing in training the team if enough ramp-up time exists. The shorter the amount of ramp-up time, the greater the need to quickly bring in external talent.
2. How do you get the talent to work together?
Now that we talked about where the talent comes from and how to think about investing in your teams, let’s switch gears and talk about how to get the teams to work together. This is one of those things that is much easier to talk about than to manage in practice. You can think about the mantra: let your leaders lead, let your managers manage, and let your employees succeed. That works well enough when you have agile teams who self-organize and rapidly get work done. If that is where you are sitting right now, then congratulations and appreciate what you have.
Teams are about how the different players work together. I try to think about machine learning engagements as having two key pillars. First, you need to figure out who has the deep knowledge on the product, data, and how the data relate to the customer journey. This is going to be either obvious or hard. Sometimes these folks with the greatest institutional knowledge of the data are key subject-matter experts who play an impactful role, they could be buried deeper in the organization at an analyst role, or maybe they moved to another role.
Second, find that person with deep knowledge and help them work with the machine learning expert you found. Pairing these two together is going to be the most critical lynchpin to what you are doing. Most organizations do have data structures that were architected to work from the start in machine learning. Figuring out the right places to start, what data to label, and what relates to what is really the beginning of the journey. This is one of the reasons why people with full stack machine learning skills are so important. What does that even mean? Full stack machine learning skills. I can walk into your organization and set up TensorFlow and even get the team sharing some Jupyter notebooks today. Having the right feeds, having the right machine learning hardware, and having access to right production side infrastructure to swiftly move data without crushing or breaking things are where full stack skills are essential.
Maybe truly agile teams are supposed to be self-organizing, but that is probably not going to happen the first time out the gate. Finding a common or shared purpose sounds a lot easier than it really is in practice. Getting people to self-organize around that common or shared purpose probably requires some type of ground rules or spark.
Sometimes high-functioning teams just embrace the challenge and work to knock down any barriers or obstacles they might face. Most teams do not have that level of dedication, persistence, or fortitude. Typically, the project needs or general business problems bring a group together to take some type of action. Managing during those types of situations is always interesting and generally includes trying to bring people with diverse skill sets together.
You will encounter two types of teams: high-performing teams that are already assembled and teams that come together based on a specific business problem. Outside of those two common scenarios, the other type of talent situation you will face might very well be a solution-chasing problem. It happens now more than ever when the market is saturated with open-source projects that let people jump in and start working with complex tools. The next step in that pattern is wanting to do something with that new and exciting tooling. To that end, you may find a solution just waiting for a problem to tackle. However, it might not be the right solution or even remotely close to the course of action that should be taken.
Getting talent to work together for me revolves around the business problem and what the team is trying to achieve. It is hard to rally around an end goal that is nebulous or otherwise pragmatic co-opted into something other than a resolution to the business problem in question.
We should probably jump in and spend a little bit of time on understanding the tooling necessary to allow the machine learning expert to work with the team in a productive way. You can probably tell by now that my preference is for using something robust like TensorFlow to dig in and start doing machine learning at scale. You could just start out with log files and dig in with an off-the-shelf product like the machine learning toolkit from Splunk. That is an example of a way to open the door for the team to start using a common platform to get things done.
This is a topic that I’m super passionate about. I am always happy to talk to people about how to take the first steps to build up internal talent for machine learning.
Footnotes:
[1] Here is a good look at machine learning papers. The volume of publication in the space is expanding exponentially. https://www.technologyreview.com/2019/01/25/1436/we-analyzed-16625-papers-to-figure-out-where-ai-is-headed-next/
[2] https://github.com/tensorflow/tfx/tree/master/tfx/examples/chicago_taxi_pipeline
[3] Check out https://www.coursera.org/browse/data-science/machine-learning and https://www.qwiklabs.com/focuses/3391?parent=catalog
[4] Two of my favorite people to follow, @DynamicWebPaige and @lak_gcp: https://twitter.com/DynamicWebPaige and https://twitter.com/lak_gcp
[5] I’m working daily to learn more about this company: https://databricks.com/
Substack Week 4: Have a machine learning strategy…revisited
Published on February 19, 2021
Welcome to the fourth post in this ongoing Substack series. This is the post where I’m going to go back and revisit two very important machine learning questions. First, I’ll take a look back at my answers to the question, What exactly is a machine learning strategy?
Second, that will set the foundation to really dig in and answer, Do you even need a machine learning strategy?
Obviously,