Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

The Lindahl Letter: 3 Years of AI/ML Research Notes
The Lindahl Letter: 3 Years of AI/ML Research Notes
The Lindahl Letter: 3 Years of AI/ML Research Notes
Ebook652 pages7 hours

The Lindahl Letter: 3 Years of AI/ML Research Notes

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Greetings, inspired reader of technology, artificial intelligence, and machine learning glory. If you made it here, then thank you for the time and consideration. This now complete and expanded weighty tome of my insights covers my weekly thoughts on machine learning for 156 total weeks. This manuscript you are reading the description for right now contains all my 2021 and 2023 missives in one curated collection for hours of reading joy. Assuming you have the digital version of this the links should be easy to click and are all shared as footnotes within the chapters. If you have a physical copy of this book which by the way I appreciate that you have, then you will be required to type the links out into an internet browser manually. My series of weekly posts or research notes was compiled into this manuscript. Each post has been edited from the original form into a more publication friendly format.
LanguageEnglish
PublisherLulu.com
Release dateFeb 8, 2024
ISBN9781304692795
The Lindahl Letter: 3 Years of AI/ML Research Notes

Read more from Nels Lindahl

Related to The Lindahl Letter

Related ebooks

Technology & Engineering For You

View More

Related articles

Reviews for The Lindahl Letter

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    The Lindahl Letter - Nels Lindahl

    The Lindahl Letter: 3 Years of AI/ML Research Notes

    ALSO BY NELS LINDAHL

    Nonfiction Books

    Graduation With Civic Honors

    Responsive E-Government

    The Lindahl Letter: On Machine Learning

    The Lindahl Letter: 104 Machine Learning Posts

    Contact Strategy for Campaigns

    Fiction Books

    United Earth Chronicles: A Novella

    Upper Bound Chronicles: A Novella

    Dream Chaser Archives: A Novella

    Forbidden Stones: A novella from the Taceo Loquaxi Chronicles

    Jupiter Darkly: A novella

    The Lindahl Letter: 3 Years of AI/ML Research Notes

    By Nels Lindahl

    It takes someone willing to work just beyond the edge of what is possible to accomplish something meaningful.

    ~ Nels Lindahl ~

    The Lindahl Letter: 3 Years of AI/ML Research Notes

    All Rights Reserved © 2024 by Nels Lindahl

    No part of this book may be reproduced or transmitted in any form or by any means, graphic, electronic, or mechanical, including photocopying, recording, taping, or by any information storage or retrieval system, without the permission in writing from the publisher.

    Published by Nels Lindahl

    Printed in the United States of America

    978-1-304-69279-5

    Imprint: Lulu.com

    Dedication

    2021: To those who look at what could be and thread the needle to get there…

    2022: To those who work toward meeting opportunity with a singular devotion to it. That focus should yield some interesting results.

    2023: We made it to the end of 3 years. Fun times. Thank you to everybody that was along for the ride.

    Acknowledgments

    Thanks to all my Substack post readers over the last two years who helped inspire this effort to create 156 posts. It was great bonus to get feedback along the way while writing a book one week at a time. This three year journey involved a lot of learning and the journey was a huge part of the adventure.

    Table of Contents

    Dedication

    Acknowledgments

    Preface

    Acronyms

    Substack Week 1: Machine learning return on investment

    Substack Week 2: Machine learning frameworks & pipelines

    Substack Week 3: Machine learning teams

    Substack Week 4: Have a machine learning strategy…revisited

    Substack Week 5: Let your ROI drive a fact-based decision-making process

    Substack Week 6: Understand the ongoing cost and success criteria as part of your machine learning strategy

    Substack Week 7: Plan to grow based on successful ROI

    Substack Week 8: Is the machine learning we need everywhere now?

    Substack Week 9: Valuing machine learning use cases based on scale

    Substack Week 10: Model extensibility for few-shot GPT-2

    Substack Week 11: What is machine learning scale? The where and the when of machine learning usage

    Substack Week 12: Confounding within multiple machine learning model deployments

    Substack Week 13: Building out your MLOps

    Substack Week 14: My Ai4 Healthcare NYC 2019 talk revisited

    Substack Week 15: What are people really doing with machine learning?

    Substack Week 16: Ongoing machine learning cloud costs

    Substack Week 17: Figuring out machine learning readiness

    Substack Week 18: Could machine learning predict the lottery?

    Substack Week 19: Fear of missing out on machine learning

    Substack Week 20: Week 20 Lindahl Letter recap edition

    Substack Week 21: Doing machine learning work

    Substack Week 22: Machine learning graphics

    Substack Week 23: Fairness and machine learning

    Substack Week 24: Evaluating machine learning

    Substack Week 25: Teaching kids

    Substack Week 26: Machine learning as a service

    Substack Week 27: The future of machine learning

    Substack Week 28: Machine learning certifications?

    Substack Week 29: Machine learning feature selection

    Substack Week 30: Integrations and your machine learning layer

    Substack Week 31: Edge machine learning integrations

    Substack Week 32: Federating your machine learning models

    Substack Week 33: Where are AI investments coming from?

    Substack Week 34: Where are the main AI labs?

    Substack Week 35: Explainability in modern machine learning

    Substack Week 36: AIOps/MLOps: Consumption of AI services versus operations

    Substack Week 37: Reverse engineering GPT-2 or GPT-3

    Substack Week 38: Do most machine learning projects fail?

    Substack Week 39: Machine learning security

    Substack Week 40: Applied machine learning skills

    Substack Week 41: Machine learning and the metaverse

    Substack Week 42: Time crystals and machine learning

    Substack Week 43: Practical machine learning

    Substack Week 44: Machine learning salaries

    Substack Week 45: Prompt engineering and machine learning

    Substack Week 46: Machine learning and deep learning

    Substack Week 47: Anomaly detection and machine learning

    Substack Week 48: Machine learning applications revisited

    Substack Week 49: Machine learning assets

    Substack Week 50: Is machine learning the new oil?

    Substack Week 51: What is scientific machine learning?

    Substack Week 52: That one with a machine learning post

    Substack Week 53: Machine learning interview questions

    Substack Week 54: What is a chief AI officer?

    Substack Week 55: Who is acquiring machine learning patents?

    Substack Week 56: Comparative analysis of national AI strategies

    Substack Week 57: How would I compose a machine learning syllabus?

    Substack Week 58: Teaching or training machine learning skills

    Substack Week 59: Multimodal machine learning revisited

    Substack Week 60: General AI

    Substack Week 61: AI network platforms

    Substack Week 62: Touching the singularity

    Substack Week 63: Sentiment and consensus analysis

    Substack Week 64: Language models revisited

    Substack Week 65: Ethics in machine learning

    Substack Week 66: Does a digital divide in machine learning exist?

    Substack Week 67: My thoughts on NFTs

    Substack Week 68: Publishing a model or selling the API?

    Substack Week 69: A machine learning cookbook?

    Substack Week 70: Web3, the decentralized internet

    Substack Week 71: What are the best machine learning newsletters?

    Substack Week 72: Open source machine learning security plus the machine learning and surveillance bonus issue

    Substack Week 73: Symbolic machine learning

    Substack Week 74: Machine learning content automation

    Substack Week 75: Is machine learning destroying engineering colleges?

    Substack Week 76: What is post-theory science?

    Substack Week 77: Is quantum machine learning gaining momentum?

    Substack Week 78: Trust and the future of digital photography

    Substack Week 79: Why is diffusion so popular?

    Substack Week 80: Bayesian optimization (Introduction to Machine Learning syllabus edition 1 of 8)

    Substack Week 81: A machine learning literature review (Introduction to Machine Learning syllabus edition 2 of 8)

    Substack Week 82: Machine learning algorithms (Introduction to Machine Learning syllabus edition 3 of 8)

    Substack Week 83: Machine learning approaches (Introduction to Machine Learning syllabus edition 4 of 8)

    Substack Week 84: Neural networks (Introduction to Machine Learning syllabus edition 5 of 8)

    Substack Week 85: Neuroscience (Introduction to Machine Learning syllabus edition 6 of 8)

    Substack Week 86: Ethics, fairness, bias, and privacy (Introduction to Machine Learning syllabus edition 7 of 8)

    Substack Week 87: MLOps (Introduction to Machine Learning syllabus edition 8 of 8)

    Substack Week 88: The future of academic publishing

    Substack Week 89: Your machine learning model is not an AGI

    Substack Week 90: What is probabilistic machine learning?

    Substack Week 91: What are ensemble machine learning models?

    Substack Week 92: We have a National AI Advisory Committee

    Substack Week 93: Papers critical of machine learning

    Substack Week 94: AI hardware (reduced instruction set computer [RISC]-V AI Chips)

    Substack Week 95: Getting to quantum machine learning

    Substack Week 96: Generative AI: Where are large language models going?

    Substack Week 97: MIT’s Twist Quantum programming language

    Substack Week 98: My thoughts on ChatGPT

    Substack Week 99: Deep generative models

    Substack Week 100: Overcrowding and machine learning

    Substack Week 101: Back to the ROI for machine learning

    Substack Week 102: Machine learning pracademics

    Substack Week 103: Rethinking the future of machine learning

    Substack Week 104: That second year of posting recap

    Substack Week 105: Building out a better backlog

    Substack Week 106: Code generating systems

    Substack Week 107: Highly cited AI papers

    Substack Week 108: Twitter as a company probably would not happen today

    Substack Week 109: Robots in the house

    Substack Week 110: Chatbots and understanding knowledge graphs

    Substack Week 111: Natural language processing

    Substack Week 112: Autonomous vehicles

    Substack Week 113: Structuring an introduction to AI ethics

    Substack Week 114: How does confidential computing work?

    Substack Week 115: A literature review of modern polling methodology

    Substack Week 116: A literature study of mail polling methodology

    Substack Week 117: A literature study of non-mail polling methodology

    Substack Week 118: A paper on political debt as a concept vs. technical debt

    Substack Week 119: All that bad data abounds

    Substack Week 120: That one with an obligatory AI trend’s post

    Substack Week 121: Considering an independent study applied AI syllabus

    Substack Week 122: AIaaS: Will AI be a platform or a service? Auto-GPT will disrupt both

    Substack Week 123: We are wholesale oversubscribed on AI related content

    Substack Week 124: Pivoting to AI + Security

    Substack Week 125: Profiling OpenAI Security

    Substack Week 126: Profiling Hugging Face Security

    Substack Week 127: Profiling Google DeepMind Security

    Substack Week 128: Democratizing AI system security

    Substack Week 129: How do you use Colab in a generative way?

    Substack Week 130: Build captain fractal using Colab

    Substack Week 131: Bulk imagine improvement

    Substack Week 132: Synthetic data notebooks

    Substack Week 133: Automated survey methods

    Substack Week 134: The chalk model for predicting elections

    Substack Week 135: Polling aggregation models

    Substack Week 136: Econometric election models

    Substack Week 137: Tracking political registrations

    Substack Week 138: Election prediction markets

    Substack Week 139: Machine learning election models

    Substack Week 140: Proxy models for elections

    Substack Week 141: Building generative AI chatbots

    Substack Week 142: Learning LangChain

    Substack Week 143: Synthetic social media analysis

    Substack Week 144: Knowledge graphs vs. vector databases

    Substack Week 145: Delphi method & Door-to-door canvassing

    Substack Week 146: Election simulations & Expert opinions

    Substack Week 147: Bayesian Models and Elections

    Substack Week 148: Happy Thanksgiving 2023

    Substack Week 149: All that 48th week of the year wildness

    Substack Week 150: Unraveling Emotions in the Digital Age

    Substack Week 151: Decoding the Electorate

    Substack Week 152: Beyond the Ballot

    Substack Week 153: That one with a Hapy Holidays Post

    Substack Week 154: Revolutionizing Predictions

    Substack Week 155: 3 Years on Substack

    Substack Week 156: My 2024 Predictions

    Postlogue

    About the author

    Preface

    Greetings, still inspired reader of now additional machine learning glory. Here we are again this year with a fresh collection of machine learning related posts from my Substack series, The Lindahl Letter. Last year when the first version of this manuscript was being drafted, I had thought that everybody seemed to be writing a Substack series. This year some drama on Twitter happened, and a flood of new writers tumbled over into the world of Substack. Within the machine learning space, now more Substack series are being written than I can possibly read each week. Due to the generative models like DALL-E 2 from OpenAI and the interesting ChatGPT interface being released, machine learning and artificial intelligence (AI) have camped out in the public mind recently.

    This series of weekly posts was compiled into a manuscript. Each post has been edited from the original form into a more publication-friendly format. Substack as a platform provides a lot of freedom to include links and embedded content, which does not translate into the written page. 

    One of the things I did notice is that the first few posts were much longer than the ones at the end of the series. It appears that when I started talking about machine learning in general, the amount of content related to things I wanted to say was much larger. Given that each week is written to be independently consumed, all of my references are handled in a straightforward footnote method. Any aside, link, or acknowledgment to another author happens in the footnotes ending the chapter.

    Things in this highly technology-driven space are changing rapidly. I included the date of publication as a frame of reference.

    Dr. Nels Lindahl

    Broomfield, Colorado

    December 12, 2022 @ 6:02 AM

    Acronyms

    AGI      artificial general intelligence

    AI      artificial intelligence

    AIOps      artificial intelligence operations

    ANN      artificial neural network

    API      application programming interface

    AWS      Amazon Web Services

    CAIO      chief artificial intelligence officer

    CNN      convolutional neural network

    CSAIL      Computer Science and Artificial Intelligence Laboratory

    DAIR      Distributed Artificial Intelligence Research Institute

    DBN      deep belief network

    DQNN      deeply quantized neural network

    GAN      generative adversarial network

    GCP      Google Cloud Platform

    GPT      generative pre-trained transformer

    KPI      key performance indicator

    MLOps      machine learning operations

    MNN      modular neural network

    NFT      non-fungible tokens

    RISC      reduced instruction set computer

    RNN      recurrent neural network

    ROI      return on investment

    SciML      scientific machine learning

    SNN      simulated neural network

    SONN      self-organizing neural network

    TFX      TensorFlow Extended

    Substack Week 1: Machine learning return on investment

    Published on January 29, 2021

    Be strategic with your machine learning efforts.

    Be

    strategic

    with

    your

    machine

    learning

    efforts.

    Seriously, those seven words should guide your next steps along the machine learning journey. Take a moment and let that direction (strong guidance) sink in and reflect on what it really means for your organization. You have to take a moment and work backward from building strategic value for your organization to the actual machine learning effort you are undertaking. Inside that effort you will quickly discover that operationalizing machine learning efforts to generate strategic value will rely on a solid plan for return on investment (ROI). Make sure you are beginning with that end in mind to increase your chances of success. Taking actions within an organization of any kind at the scale machine learning is capable of delivering, without understanding the potential ROI or potential loss, is highly questionable. That is why you have to be strategic with your machine learning efforts from start to finish.   

    You have to set up and run a machine learning strategy from the top down. Executive leadership has to understand and be invested in guiding things toward the right path (a truly strategic path) from the start. Start by making an effort to begin with a solid strategy in the machine learning space. It might sound harder than it is in practice. You don’t need a complicated center of excellence or massive investment to develop a strategy. Your strategy just needs to be linked to the budget and ideally to a budget key performance indicator (KPI). Every budget results in the process of spending precious funds, and keeping a solid KPI around machine learning ROI levels will help ensure your strategy ends on a strong financial footing for years to come. All spending of an organization’s precious resources should translate to a KPI of some type. That is how your results will let you confirm that the funding is being spent well and that solid decision-making is occurring. You have to really focus and ensure that all spending is tied to that framework when you operationalize the organization’s strategic vision to be aligned financially to the budget. 

    That means that the machine learning strategy you are investing in has to be driven to achieve a certain ROI tied directly to solid budget-level KPIs. You might feel like that line has been repeated. If you noticed that repetition, then you are paying attention and well on your way to future success. Reading comprehension goes a long way to translating written argument to action. That KPI-related tieback you are creating is only going to happen with a solid machine learning strategy in place. It has to be based on prioritizing and planning for ROI. Your machine learning pipelines and frameworks have to be aligned toward that goal. That is ultimately the cornerstone of a solid strategic plan when it comes to implementing machine learning as part of a long-term strategy.

    We are about 500 words into this book, and it might be time to simply recap the message being delivered so far. Be ready to do things in a definable and repeatable way. Part of executing a strategy with quality is doing things in a definable and repeatable way. That is the essence of where quality comes from. You have to know what plan is being executed and focus on and support the plan in ways that make it successful at your desired run rate. In terms of deploying machine learning efforts within an enterprise, you have to figure out how the technology is going to be set up and invested in and how that investment is going to translate to use cases with the right ROI.  

    Know the business value for the use case instead of letting solutions chase problems. Just because you can do a thing does not always mean that you should. Having the ability to deploy a technology does create the potential of letting a technology-based solution chase a problem. Building up technology for machine learning in a very theoretical and lab-based way and then chasing use cases is a terrible way to accidentally stumble on an ROI model that works. The better way forward is to know the use cases and have a solid strategy to apply your technology. That means finding the right machine learning frameworks and pipelines to support your use cases in powerful ways across the entire organization. 

    This is a time to be planful. Right here, right now, in this moment of consideration you can elect to be planful going forward. Technology for machine learning is becoming increasingly available and plentiful. No code, low code, and just solidly integrated solutions are becoming omnipresent in the technology landscape. Teams from all over the organization are probably wanting to try proof of concepts, and vendors are bringing in a variety of options. People are always ready to pitch the value of machine learning to the organization. Both internal and external options are plentiful. It is an amazing time for applied machine learning. You can get into the game in a variety of ways rapidly and without a ton of effort. Getting your implementation right and having the data, pipeline, and frameworks aligned to your maximum possible results involves planning and solid execution. 

    Your machine learning strategy cannot be a back-of-the-desk project. You have to be strategic. It has to be part of a broader strategy. You cannot let all of the proofs of concept and vendor plays drive the adoption of machine learning technology in your organization. That will mean that the overall strategic vision is not defined. It happened generally because it might have a solid ROI, and the right use case might have been selected by chance from the bottom up in the organization. That is not a planful strategy. 

    Know the workflow you want to augment with machine learning and drive beyond the buzzwords to see technology in action. You really have to know where in the workflow and what pipelines are going to enable your use cases to provide that solid ROI. 

    At some point along the machine learning journey you are going to need to make some decisions…

    Q: Where are you going to serve the machine learning model from? 

    Q: Is this your first model build and deployment? 

    Q: What actual deployments of model serving are being managed?

    Q: Are you working on-premise for training or calling an application programming interface (API) and model serving in your workflow?

    Q: Have you elected to use a pre-trained model via an external API call? 

    Q: Did you buy a model from a marketplace, or are you buying access to a commercial API? 

    Q: How long before the model efficiency drops off and adjustment is required?

    Q: Have you calculated where the point of no return is for model efficiency where ROI falls below break-even?

    Substack Week 2: Machine learning frameworks & pipelines

    Published on February 5, 2021

    Ecosystems are beginning to develop related to machine learning pipelines. Different platforms (companies) are building out different methods to manage the machine learning frameworks and pipelines they support. Now is the time for your organization to get that effort going. You can go build out an easy-to-manage end-to-end method for feeding model updates to production. If you stopped reading this manuscript for a moment and started doing research or spinning things up, then you probably ended up using a TensorFlow Serving instance you installed, Amazon SageMaker pipeline, or an Azure machine learning pipeline [1]. Any of those methods will get you up and running. They have communities of practice to provide support [2]. That is to say the road you are traveling has been used before and used at scale. The path toward using machine learning frameworks and pipelines is pretty clearly established. People are doing that right now. They are building things for fun. They have things in production. While all that is occurring in the wild, a ton of orchestration and pipeline management companies are jumping out into the forefront of things right now in the business world [3]. 

    Get going on your machine learning journey. One way to get going very quickly and start to really think about how to make this happen is to go and download TensorFlow Extended (TFX) from GitHub as your pipeline platform on your own hardware or some type of cloud instance [4]. You can just as easily go cloud native and build out your technology without boxes in your datacenter or at your desk. You could spin up on Google Cloud Platform (GCP), Azure, or Amazon Web Services (AWS) without any real friction against realizing your dream. Some of your folks might just set up local versions of these things to mess around and do some development along the way. 

    Build models. You could of course buy a model [5]. Steps exist to help you build a model. All of the machine learning pipeline setup steps are rather academic, without models that utilize the entire apparatus. One way to introduce machine learning to the relevant workflow based on your use case is to just integrate with an API to avoid having to set up frameworks and pipelines. That is one way to go about it, and for some things it makes a lot of sense. For other machine learning efforts, complexity will preclude using an out-of-the-box solution that has a callable API. You would be surprised at how many complex APIs are being offered these days, but they do not provide comprehensive coverage for all use cases [6]. 

    What are you going to do with all those models? You are going to need to save them for serving. Getting set up with a solid framework and machine learning pipeline is all about serving up those models within workflows that fulfill use cases with defined and predictable ROI models. 

    From the point you implement, it is going to be a race against time to figure out when those models from the marketplace suffer an efficiency drop and some type of adjustment is required. You have to understand the potential model degradation and calculate at what point you have to shut down the effort due to ROI conditions being violated [7]. That might sound a little bit hard, but if your model efficiency degrades to the point that financial outcomes are being negatively impacted, you will want to know how to flip the off switch, and you might be wondering why that switch was not automated. 

    Along the way some type of adjustment to a model or parameters is going to be required. To recap, the way I look at ROI is pretty straightforward. You have to consider the value of the machine learning model in terms of what was invested in it and what you can potentially get out of it. It’s just going to give you a positive or negative look at whether that ROI is going to be there for you. At that point you are just following your strategy and thinking about the ROI model.

    So again, strict ROI modeling may not be the method that you want to use. I would caution against working for long periods without understanding the financial consequences. At scale, you can very quickly create breakdowns and other problems within a machine learning use case. It could even go so far that you may not find it worthwhile for your business case. Inserting machine learning into a workflow might not be the right thing to do, and that is why calculating results and making fact-based decisions is so important. 

    Really, any way you do it in a planful way that’s definable and repeatable is going to work out great. That is fairly easy to say given that fact-based decision-making and being willing to hit the off switch if necessary help prevent runway problems from becoming existential threats to the business. So having a machine learning strategy, doing things in a definable and repeatable way, and being ruthlessly fact based is what I’m suggesting. 

    Obviously, you have to take everything that I say with a grain of salt; you should know up front that I’m a big TensorFlow enthusiast. That’s one of the reasons why I use it as my primary example, but it doesn’t mean that that’s the absolute right answer for you. It’s just the answer that I look at most frequently and always look to first before branching out to other solutions. That is always based on the use case, and I avoid letting technology search for problems at all costs. You need to let the use case and the problem at hand fit the solution instead of applying solutions until one works or you give up.

    At this point in the story, you are thinking about or beginning to build this out, and you’re starting to get ramped up. The excitement is probably building to a crescendo of some sort. Now you need somewhere to manage your models. You may need to imagine for a moment that you do have models. Maybe you bought them from a marketplace and you skipped training all together. It’s an exciting time, and you are ready to get going. So in this example, you’re going from just building (or having recently acquired) a machine learning model to doing something. At that moment, you are probably realizing that you need to serve that model out over and over again to create an actual machine-learning-driven workload. That probably means that you are not only getting to manage those models, but also going to need to serve out different models over time. 

    As you make adjustments and corrections that introduce different modeling techniques, you get more advanced with what you are trying to implement. One of the things you’ll find is that even the perfect model that you had, which was right where you wanted it to be when you launched, is slowly waiting to betray you and your confidence in it by degrading. You have to be ready to model and evaluate performance based on your use case. That is what lets you make quality decisions about model quality and how outcomes are being impacted. 

    I have a few takeaways to conclude this installment of The Lindahl Letter. You have to remember that at this point machine learning models and pipelines are pretty much democratized. You can get them. They are out in the wild. People are using them in all kinds of different ways. You can just go ahead and introduce this technology to your organization with relatively little friction.

    Footnotes:

    [1] Links to the referenced machine learning pipelines: https://www.tensorflow.org/tfx, https://aws.amazon.com/sagemaker/pipelines/ or https://docs.microsoft.com/en-us/azure/machine-learning/concept-ml-pipelines

    [2] One of the best places to start to learn about machine learning communities would be https://www.kaggle.com/

    [3] Read this if you have a few minutes, it is worth the read: Agrawal, A., Gans, J., & Goldfarb, A. (2020, September). How to win with machine learning. Harvard Business Review. https://hbr.org/2020/09/how-to-win-with-machine-learning

    [4] https://github.com/tensorflow/tfx

    [5] This is one of the bigger ones: https://aws.amazon.com/marketplace/solutions/machine-learning

    [6] This is one example of services that are open for business right now: https://cloud.google.com/products/ai

    [7] This is a wonderful site and the article is spot on: Shendre, S. (2020, May 13). Model drift in machine learning. Towards Data Science. https://towardsdatascience.com/model-drift-in-machine-learning-models-8f7e7413b563

    Substack Week 3: Machine learning teams

    Published on February 12, 2021

    Learning. Keep learning. Being a lifelong learner is a solid plan. I do honestly believe we live in a golden age of machine learning training and talent. People are selflessly sharing world-class training materials and code on GitHub. The number of academic papers being shared is skyrocketing [1]. For those folks able to start sprinkling in a little bit of expertise as you go, you can start building teams from the ground up. That starts by helping people invest in understanding this type of machine learning knowledge. That sets the foundation for them so they can build the machine learning pipelines you need with something like TFX examples [2]. That is one way to help them understand how those pipelines work for building and deploying. I think all the training to be able to do that is just amazingly available online; people have been super gracious with sharing that kind of knowledge. 

    The hard part is pairing that knowledge with deep machine learning skills. That means you are going to need to start finding the subject-matter expert knowledge within your organization. When you start really pairing people that know what you’re doing at a super granular, deep level to be able to work with models and do deployments, things will take off. It will probably end up being a team effort. You have deep knowledge from subject-matter experts and machine learning experts. The combination of two groups in a team is how things get done in practice. You may want to bring in someone who is a real expert on building different layers and networks; being able to really build refined models, they may augment your team and speed things up. Sometimes you have to bring in the right folks to jumpstart things along the way.

    1. Where does the talent come from?

    Personally, I believe you can build the talent from within your organization. If you doubted that assertion, then make a mental note to really challenge that bias against internal growth and development. Throughout my career, I have a proven track record of helping grow internal talent. It works, but it requires investing time and having the right programs aligned to building toolkits. One of my proudest professional accomplishments is seeing somebody get promoted. It is an amazing thing to see people start bringing advanced methods and techniques to different parts of the organization. You can capture that feeling as well. You just need to go out and start building out the toolkits of the people that you have. Really take the time to invest in them growing and developing as teammates and as individual contributors.

    We are in the golden age of learning about machine learning [3]. More training than you can possibly consume now exists online. It exists in a variety of different forms. Among my favorites are the online labs now available online. The one I have used the most is called Coursera. People have built out well-tooled examples of how to do machine learning. Not only can you read about it, but you can get into examples and kick the tires. That is the thing that has drawn me to TensorFlow since the product launched. So many people have been so generous with their knowledge, skills, and abilities [4]. They are sharing the keys to the machine learning kingdom online in accessible classes, lectures, and even a few certificates. I have taken over 50 courses. You can see them on my LinkedIn profile if you really want to dig into the road I traveled. That will show you which ones I invested my own time in completing. You can also just check the links in the footnotes to find a place to start. 

    Sometimes building internal teams is just not fast enough. It takes time to help internal talent develop world-class skills in machine learning—or anything, for that matter. I recognize that is a long-term goal and something you have to build toward along the way. You have a few options to start looking for ways to supplement talent. One of those ways is to hire contractors and have them help you kickstart your endeavor. Another way is to find the right product or company to help you get going fast. Several companies are doing that right now, and some can be impactful for your organization.

    Typically, the data sources in an organization are not well indexed with clearly mapped features and associations. Even getting off-the-shelf data sources is a real challenge. For the most part, the ones that people use were created to be used that way. Those data sets did not occur naturally in the wild. Even making custom-tailored synthetic datasets can be a challenge for an organization trying to operationalize machine learning at scale. That is where using external products to manage the data and even accessing APIs requires planning and sustained dedication. Data going to the APIs must be consistent. Constantly changing data streams are a nightmare to manage internally or externally. A lot of companies like Databricks are showing up to the party and helping make sense of complex data stores [5].

    That might have been a lot to consider all in one stretch of thought, but it will all come into context the first time building a team to solve a machine learning problem becomes a necessity. My answer to where the talent comes from involves blending great professionals together over time to create high-functioning teams. That may involve hiring in key skill sets to help supplement a team or investing in training the team if enough ramp-up time exists. The shorter the amount of ramp-up time, the greater the need to quickly bring in external talent.

    2. How do you get the talent to work together?

    Now that we talked about where the talent comes from and how to think about investing in your teams, let’s switch gears and talk about how to get the teams to work together. This is one of those things that is much easier to talk about than to manage in practice. You can think about the mantra: let your leaders lead, let your managers manage, and let your employees succeed. That works well enough when you have agile teams who self-organize and rapidly get work done. If that is where you are sitting right now, then congratulations and appreciate what you have.

    Teams are about how the different players work together. I try to think about machine learning engagements as having two key pillars. First, you need to figure out who has the deep knowledge on the product, data, and how the data relate to the customer journey. This is going to be either obvious or hard. Sometimes these folks with the greatest institutional knowledge of the data are key subject-matter experts who play an impactful role, they could be buried deeper in the organization at an analyst role, or maybe they moved to another role.

    Second, find that person with deep knowledge and help them work with the machine learning expert you found. Pairing these two together is going to be the most critical lynchpin to what you are doing. Most organizations do have data structures that were architected to work from the start in machine learning. Figuring out the right places to start, what data to label, and what relates to what is really the beginning of the journey. This is one of the reasons why people with full stack machine learning skills are so important. What does that even mean? Full stack machine learning skills. I can walk into your organization and set up TensorFlow and even get the team sharing some Jupyter notebooks today. Having the right feeds, having the right machine learning hardware, and having access to right production side infrastructure to swiftly move data without crushing or breaking things are where full stack skills are essential.

    Maybe truly agile teams are supposed to be self-organizing, but that is probably not going to happen the first time out the gate. Finding a common or shared purpose sounds a lot easier than it really is in practice. Getting people to self-organize around that common or shared purpose probably requires some type of ground rules or spark.

    Sometimes high-functioning teams just embrace the challenge and work to knock down any barriers or obstacles they might face. Most teams do not have that level of dedication, persistence, or fortitude. Typically, the project needs or general business problems bring a group together to take some type of action. Managing during those types of situations is always interesting and generally includes trying to bring people with diverse skill sets together.

    You will encounter two types of teams: high-performing teams that are already assembled and teams that come together based on a specific business problem. Outside of those two common scenarios, the other type of talent situation you will face might very well be a solution-chasing problem. It happens now more than ever when the market is saturated with open-source projects that let people jump in and start working with complex tools. The next step in that pattern is wanting to do something with that new and exciting tooling. To that end, you may find a solution just waiting for a problem to tackle. However, it might not be the right solution or even remotely close to the course of action that should be taken.

    Getting talent to work together for me revolves around the business problem and what the team is trying to achieve. It is hard to rally around an end goal that is nebulous or otherwise pragmatic co-opted into something other than a resolution to the business problem in question.

    We should probably jump in and spend a little bit of time on understanding the tooling necessary to allow the machine learning expert to work with the team in a productive way. You can probably tell by now that my preference is for using something robust like TensorFlow to dig in and start doing machine learning at scale. You could just start out with log files and dig in with an off-the-shelf product like the machine learning toolkit from Splunk. That is an example of a way to open the door for the team to start using a common platform to get things done. 

    This is a topic that I’m super passionate about. I am always happy to talk to people about how to take the first steps to build up internal talent for machine learning.  

    Footnotes:

    [1] Here is a good look at machine learning papers. The volume of publication in the space is expanding exponentially. https://www.technologyreview.com/2019/01/25/1436/we-analyzed-16625-papers-to-figure-out-where-ai-is-headed-next/

    [2] https://github.com/tensorflow/tfx/tree/master/tfx/examples/chicago_taxi_pipeline

    [3] Check out https://www.coursera.org/browse/data-science/machine-learning and https://www.qwiklabs.com/focuses/3391?parent=catalog

    [4] Two of my favorite people to follow, @DynamicWebPaige and @lak_gcp: https://twitter.com/DynamicWebPaige and https://twitter.com/lak_gcp

    [5] I’m working daily to learn more about this company: https://databricks.com/

    Substack Week 4: Have a machine learning strategy…revisited

    Published on February 19, 2021

    Welcome to the fourth post in this ongoing Substack series. This is the post where I’m going to go back and revisit two very important machine learning questions. First, I’ll take a look back at my answers to the question, What exactly is a machine learning strategy? Second, that will set the foundation to really dig in and answer, Do you even need a machine learning strategy? Obviously,

    Enjoying the preview?
    Page 1 of 1