Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Practical Java Machine Learning: Projects with Google Cloud Platform and Amazon Web Services
Practical Java Machine Learning: Projects with Google Cloud Platform and Amazon Web Services
Practical Java Machine Learning: Projects with Google Cloud Platform and Amazon Web Services
Ebook625 pages4 hours

Practical Java Machine Learning: Projects with Google Cloud Platform and Amazon Web Services

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Build machine learning (ML) solutions for Java development. This book shows you that when designing ML apps, data is the key driver and must be considered throughout all phases of the project life cycle. Practical Java Machine Learning helps you understand the importance of data and how to organize it for use within your ML project. You will be introduced to tools which can help you identify and manage your data including JSON, visualization, NoSQL databases, and cloud platforms including Google Cloud Platform and Amazon Web Services.
Practical Java Machine Learning includes multiple projects, with particular focus on the Android mobile platform and features such as sensors, camera, and connectivity, each of which produce data that can power unique machine learning solutions. You will learn to build a variety of applications that demonstrate the capabilities of the Google Cloud Platform machine learning API, including data visualizationfor Java; document classification using the Weka ML environment; audio file classification for Android using ML with spectrogram voice data; and machine learning using device sensor data.
After reading this book, you will come away with case study examples and projects that you can take away as templates for re-use and exploration for your own machine learning programming projects with Java.
What You Will Learn
  • Identify, organize, and architect the data required for ML projects
  • Deploy ML solutions in conjunction with cloud providers such as Google and Amazon
  • Determine which algorithm is the most appropriate for a specific ML problem
  • Implement Java ML solutions on Android mobile devices
  • Create Java ML solutions to work with sensor data
  • Build Java streaming based solutions
Who This Book Is For
Experienced Java developers who have not implemented machine learning techniques before.
LanguageEnglish
PublisherApress
Release dateOct 23, 2018
ISBN9781484239513
Practical Java Machine Learning: Projects with Google Cloud Platform and Amazon Web Services

Related to Practical Java Machine Learning

Related ebooks

Programming For You

View More

Related articles

Reviews for Practical Java Machine Learning

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Practical Java Machine Learning - Mark Wickham

    © Mark Wickham 2018

    Mark WickhamPractical Java Machine Learninghttps://doi.org/10.1007/978-1-4842-3951-3_1

    1. Introduction

    Mark Wickham¹ 

    (1)

    Irving, TX, USA

    Chapter 1 establishes the foundation for the book.

    It describes what the book will achieve, who the book is intended for, why machine learning (ML) is important, why Java makes sense, and how you can deploy Java ML solutions.

    The chapter includes the following:

    A review all of the terminology of AI and its sub-fields including machine learning

    Why ML is important and why Java is a good choice for implementation

    Setup instructions for the most popular development environments

    An introduction to ML-Gates, a development methodology for ML

    The business case for ML and monetization strategies

    Why this book does not cover deep learning, and why that is a good thing

    When and why you may need deep learning

    How to think creatively when exploring ML solutions

    An overview of key ML findings

    1.1 Terminology

    As artificial intelligence and machine learning have seen a surge in popularity, there has arisen a lot of confusion with the associated terminology. It seems that everyone uses the terms differently and inconsistently.

    Some quick definitions for some of the abbreviations used in the book:

    Artificial intelligence (AI): Anything that pretends to be smart.

    Machine learning (ML): A generic term that includes the subfields of deep learning (DL) and classic machine learning (CML).

    Deep learning (DL): A class of machine learning algorithms that utilize neural networks.

    Reinforcement learning (RL): A supervised learning style that receives feedback, but not necessarily for each input.

    Neural networks (NN): A computer system modeled on the human brain and nervous system.

    Classic machine learning (CML): A term that more narrowly defines the set of ML algorithms that excludes the deep learning algorithms.

    Data mining (DM): Finding hidden patterns in data, a task typically performed by people.

    Machine learning gate (MLG): The book will present a development methodology called ML-Gates. The gate numbers start at ML-Gate 5 and conclude at ML-Gate 0. MLG3, for example, is the abbreviation for ML-Gate 3 of the methodology.

    Random Forest (RF) algorithm: A learning method for classification, regression and other tasks, that operates by constructing decision trees at training time.

    Naive Bayes (NB) algorithm: A family of probabilistic classifiers based on applying Bayes’ theorem with strong (naive) independence assumptions between the features.

    K-nearest neighbor (KNN) algorithm: A non-parametric method used for classification and regression where the input consists of the k closest training examples in the feature space.

    Support vector machine (SVM) algorithm: A supervised learning model with associated learning algorithm that analyzes data used for classification and regression.

    Much of the confusion stems from the various factions or domains that use these terms. In many cases, they created the terms and have been using them for decades within their domain.

    Table 1-1 shows the domains that have historically claimed ownership to each of the terms. The terms are not new. Artificial intelligence is a general term. AI first appeared back in the 1970s.

    Table 1-1

    AI Definitions and Domains

    The definitions in Table 1-1 represent my consolidated understanding after reading a vast amount of research and speaking with industry experts. You can find huge philosophical debates online supporting or refuting these definitions.

    Do not get hung up on the terminology. Usage of the terms often comes down to domain perspective of the entity involved. A mathematics major who is doing research on DL algorithms will describe things differently than a developer who is trying to solve a problem by writing application software. The following is a key distinction from the definitions:

    Data mining is all about humans discovering the hidden patterns in data, while machine learning automates the process and allows the computer to perform the work through the use of algorithms.

    It is helpful to think about each of these terms in context of infrastructure and algorithms. Figure 1-1 shows a graphical representation of these relationships. Notice that statistics are the underlying foundation, while artificial intelligence on the right-hand side includes everything within each of the additional subfields of DM, ML, and DL.

    Machine learning is all about the practice of selecting and applying algorithms to our data.

    I will discuss algorithms in detail in Chapter 3. The algorithms are the secret sauce that enables the machine to find the hidden patterns in our data.

    ../images/468661_1_En_1_Chapter/468661_1_En_1_Fig1_HTML.jpg

    Figure 1-1

    Artificial intelligence subfield relationships

    1.2 Historical

    The term artificial intelligence is hardly new. It has actually been in use since the 1970s. A quick scan of reference books will provide a variety of definitions that have in fact changed over the decades. Figure 1-2 shows a representation of 1970s AI, a robot named Shakey, alongside a representation of what it might look like today.

    ../images/468661_1_En_1_Chapter/468661_1_En_1_Fig2_HTML.png

    Figure 1-2

    AI, past and present

    Most historians agree that there have been a couple of AI winters. They represent periods of time when AI fell out of favor for various reasons, something akin to a technological ice age. They are characterized by a trend that begins with pessimism in the research community, followed by pessimisms in the media, and finally followed by severe cutbacks in funding. These periods, along with some historical context, are summarized in Table 1-2.

    Table 1-2

    History of AI and Winter Periods

    It is important to understand why these AI winters happened. If we are going to make an investment to learn and deploy ML solutions, we want to be certain another AI winter is not imminent.

    Is another AI winter on the horizon? Some people believe so, and they raise three possibilities:

    Blame it on statistics: AI is headed in the wrong direction because of its heavy reliance on statistical techniques. Recall from Figure 1-1 that statistics are the foundation of AI and ML.

    Machines run amuck: Top researchers suggest another AI winter could happen because misuse of the technology will lead to its demise. In 2015, an open letter to ban development and use of autonomous weapons was signed by Elon Musk, Steven Hawking, Steve Wozniak, and 3,000 AI and robotics researchers.

    Fake data: Data is the fuel for machine learning (more about this in Chapter 2). Proponents of this argument suggest that ever increasing entropy will continue to degrade global data integrity to a point where ML algorithms will become invalid and worthless. This is a relevant argument in 2018. I will discuss the many types of data in Chapter 2.

    It seems that another AI winter is not likely in the near future because ML is so promising and because of the availability of high-quality data with which we can fuel it.

    Much of our existing data today is not high quality, but we can mitigate this risk by retaining control of the source data our models will rely upon.

    Cutbacks in government funding caused the previous AI winters. Today, private sector funding is enormous. Just look at some of the VC funding being raised by AI startups. Similar future cutbacks in government support would no longer have a significant impact. For ML, it seems the horse is out of the barn for good this time around.

    1.3 Machine Learning Business Case

    Whether you are a freelance developer or you work for a large organization with vast resources available, you must consider the business case before you start to apply valuable resources to ML deployments.

    Machine Learning Hype

    ML is certainly not immune from hype. The book preface listed some of the recent hype in the media. The goal of this book is to help you overcome the hype and implement real solutions for problems.

    ML and DL are not the only recent technology developments that suffer from excessive hype. Each of the following technologies has seen some recent degree of hype:

    Virtual reality (VR)

    Augmented reality (AR)

    Bitcoin

    Block chain

    Connected home

    Virtual assistants

    Internet of Things (IoT)

    3D movies

    4K television

    Machine learning (ML)

    Deep learning (DL)

    Some technologies become widespread and commonly used, while other simply fade away. Recall that just a few short years ago 3D movies were expected to totally overtake traditional films for cinematic release. It did not happen.

    It is important for us to continue to monitor the ML and DL technologies closely. It remains to be seen how things will play out, but ultimately, we can convince ourselves about the viability of these technologies by experimenting with them, building, and deploying our own applications.

    Challenges and Concerns

    Table 1-3 lists some of the top challenges and concerns highlighted by IT executives when asked what worries them the most when considering ML and DL initiatives. As with any IT initiative, there is an opportunity cost associated with implementing it, and the benefit derived from the initiative must outweigh the opportunity cost, that is, the cost of forgoing another potential opportunity by proceeding with AI/ML.

    Fortunately, there are mitigation strategies available for each of the concerns. These strategies, summarized below, are even available to small organization and individual freelance developers.

    Table 1-3

    Machine Learning Concerns and Mitigation Strategies

    Using the above mitigation strategies, developers can produce some potentially groundbreaking ML software solutions with a minimal learning curve investment. It is a great time to be a software developer.

    Next, I will take a closer look at ML data science platforms. Such platforms can help us with the goal of monetizing our machine learning investments. The monetization strategies can further alleviate some of these challenges and concerns.

    Data Science Platforms

    If you ask business leaders about their top ML objectives, you will hear variations of the following:

    Improve organizational efficiency

    Make predictive insights into future scenarios or outcomes

    Gain a competitive advantage by using AI/ML

    Monetize AI/ML

    Regardless of whether you are an individual or freelance developer, monetization is one of the most important objectives.

    Regardless of organizational size, monetizing ML solutions requires two building blocks: deploying a data science platform , and following a ML development methodology .

    When it comes to the data science platforms, there are myriad options. It is helpful to think about them by considering a build vs. buy decision process. Table 1-4 shows some of the typical questions you should ask when making the decision. The decisions shown are merely guidelines.

    Table 1-4

    Data Science Platform: Build vs. Buy Decision

    So what does it actually mean to buy a data science platform? Let’s consider an example.

    You wish to create a recommendation engine for visitors to your website. You would like to use machine learning to build and train a model using historical product description data and customer purchase activity on your website. You would then like to use the model to make real-time recommendations for your site visitors. This is a common ML use case. You can find offerings from all of the major vendors to help you implement this solution. Even though you will be building your own model using the chosen vendor’s product, you are actually buying the solution from the provider. Table 1-5 shows how the pricing might break down for this project for several of the cloud ML providers.

    Table 1-5

    Example ML Cloud Provider Pricing https://cloud.google.com/ml-engine/docs/pricing , https://aws.amazon.com/aml/pricing/ , https://azure.microsoft.com/en-us/pricing/details/machine-learning-studio/

    In this example, you accrue costs because of the compute time required to build your model. With very large data sets and construction of deep learning models, these costs become significant.

    Another common example of buying an ML solution is accessing a prebuilt model using a published API. You can use this method for image detection or natural language processing where huge models exist which you can leverage simply by calling the API with your input details, typically using JSON. You will see how to implement this trivial case later in the book. In this case, most of the service providers charge by the number of API calls over a given time period.

    So what does it mean to build a data science platform? Building in this case refers to acquiring a software package that will provide the building blocks needed to implement your own AI or ML solution.

    The following list shows some of the popular data science platforms:

    MathWorks: Creators of the legendary MATLAB package, MathWorks is a long-time player in the industry.

    SAP: The large database player has a complete big data services and consulting business.

    IBM: IBM offers Watson Studio and the IBM Data Science Platform products.

    Microsoft: Microsoft Azure provides a full spectrum of data and analytics services and resources.

    KNIME: KNIME analytics is a Java-based, open, intuitive, integrative data science platform.

    RapidMiner: A commercial Java-based solution.

    H2O.ai: A popular open source data science and ML platform.

    Dataku: A collaborative data science platform that allows users to prototype, deploy, and run at scale.

    Weka: The Java-based solution you will explore extensively in this book.

    The list includes many of the popular data science platforms, and most of them are commercial data science platforms. The keyword is commercial. You will take a closer look at Rapidminer later in the book because it is Java based. The other commercial solutions are full-featured and have a range of pricing options from license-based to subscription-based pricing.

    The good news is you do not have to make a capital expenditure in order to build a data science platform because there are some open source alternatives available. You will take a close look at the Weka package in Chapter 3. Whether you decide to build or buy, open source alternatives like Weka are a very useful way to get started because they allow you to build your solution while you are learning, without locking you into an expensive technology solution.

    ML Monetization

    One of the best reasons to add ML into your projects is increased potential to monetize. You can monetize ML in two ways: directly and indirectly.

    Indirect monetization: Making ML a part of your product or service.

    Direct monetization: Selling ML capabilities to customers who in turn apply them to solve particular problems or create their own products or services.

    Table 1-6 highlights some of the ways you can monetize ML.

    Table 1-6

    ML Monetization Approaches

    Many of the direct strategies employ DL approaches. In this book, the focus is mainly on the indirect ML strategies. You will implement several integrated ML apps later in the book. This strategy is indirect because the ML functionality is not visible to your end user.

    Customers are not going to pay more just because you include ML in your application. However, if you can solve a new problem or provide them capability that was not previously available, you greatly improve your chances to monetize.

    There is not much debate about the rapid growth of AI and ML. Table 1-7 shows estimates from Bank of America Merrill Lynch and Transparency Market Research. Both firms show a double-digit cumulative annual growth rate, or CAGR. This impressive CAGR is consistent with all the hype previously discussed.

    Table 1-7

    AI and ML Explosive Growth

    These CAGRs represent impressive growth. Some of the growth is attributed to DL; however, you should not discount the possible opportunities available to you with CML, especially for mobile devices.

    The Case for Classic Machine Learning on Mobile

    Classic machine learning is not a very commonly used term. I will use the term to indicate that we are excluding deep learning. Figure 1-3 shows the relationship. These two approaches employ different algorithms, and I will discuss them in Chapter 4.

    This book is about implementing CML for widely available computing devices using Java. In a sense, we are going after the low-hanging fruit. CML is much easier to implement than DL, but many of the functions we can achieve are no less astounding.

    ../images/468661_1_En_1_Chapter/468661_1_En_1_Fig3_HTML.jpg

    Figure 1-3

    Classic machine learning relationship diagram

    There is a case for mastering the tools of CML before attempting to create DL solutions. Table 1-8 highlights some of the key differences between development and deployment of CML and DL solutions.

    Table 1-8

    Comparison of Classic Machine Learning and Deep Learning

    For mobile devices and embedded devices, CML makes a lot of sense. CML outperforms DL for smaller data sets, as shown on the left side of the chart in Figure 1-7.

    It is possible to create CML models with a single modern CPU

    Enjoying the preview?
    Page 1 of 1