Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Microsoft Azure Machine Learning
Microsoft Azure Machine Learning
Microsoft Azure Machine Learning
Ebook363 pages2 hours

Microsoft Azure Machine Learning

Rating: 4.5 out of 5 stars

4.5/5

()

Read preview

About this ebook

About This Book
  • Learn how to build predictive models using a browser such as IE
  • Explore different machine learning algorithms available
  • Without any prior knowledge and experience get started with predictive analytics with confidence
Who This Book Is For

The book is intended for those who want to learn how to use Azure Machine Learning. Perhaps you already know a bit about Machine Learning, but have never used ML Studio in Azure; or perhaps you are an absolute newbie. In either case, this book will get you up-and-running quickly.

LanguageEnglish
Release dateJun 16, 2015
ISBN9781784398514
Microsoft Azure Machine Learning

Related to Microsoft Azure Machine Learning

Related ebooks

Computers For You

View More

Related articles

Reviews for Microsoft Azure Machine Learning

Rating: 4.333333333333333 out of 5 stars
4.5/5

3 ratings1 review

What did you think?

Tap to rate

Review must be at least 10 words

  • Rating: 5 out of 5 stars
    5/5
    This is a fine book for its purpose: a very basic introduction to Azure ML / ML Studio. This is not an introduction to machine learning or data mining. Even though the book provides some very basic explanation of the different ML techniques presented in it, you'll get it better of you already understand what the different techniques and algorithms do. And these techniques constitute the basic organization of the book (regression, classification, clustering, recommender systems, and publishing. These chapters are preceded by chapters on introducing ML Studio's browser interface, getting data in and out of there, and the always-essential data preparation stage.
    Beyond that, the book requires no preexisting knowledge of coding since Azure ML has a drag-and-drop interface, where you drop modules on a canvas, customize them, connect them together in experiments. Thankfully, the book provides a lot of screenshots of the different procedures it describes and also provides a few "do it yourself" exercises at the end of each chapter, as well as several exercises chapters at the end.
    So, this is all very handy, with very few errors (with so many references to modules and models, there's bound to be a few slip-ups here or there). This highlights the convenience and ease of use of Azure ML for a non-technical audience.
    Extremely useful for not-entirely beginners. Remember, you have to know some machine learning / data mining if you want to make the best use of this book, which is very clearly written.
    Hi recommended especially if coding is not your forte.

Book preview

Microsoft Azure Machine Learning - Sumit Mund

Table of Contents

Microsoft Azure Machine Learning

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Instant updates on new Packt books

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the color images of this book

Errata

Piracy

Questions

1. Introduction

Introduction to predictive analytics

Problem definition and scoping

Data collection

Data exploration and preparation

Model development

Model deployment

Machine learning

Types of machine learning problems

Classification

Regression

Clustering

Common machine learning techniques/algorithms

Linear regression

Logistic regression

Decision tree-based ensemble models

Neural networks and deep learning

Introduction to Azure Machine Learning

ML Studio

Summary

2. ML Studio Inside Out

Introduction to ML Studio

Getting started with Microsoft Azure

Microsoft account and subscription

Creating and managing ML workspaces

Inside ML Studio

Experiments

Creating and editing an experiment

Running an experiment

Creating and running an experiment – do it yourself

Workspace as a collaborative environment

Summary

3. Data Exploration and Visualization

The basic concepts

The mean

The median

Standard deviation and variance

Understanding a histogram

The box and whiskers plot

The outliers

A scatter plot

Data exploration in ML Studio

Visualizing an automobile price dataset

A histogram

The box and whiskers plot

Comparing features

A snapshot

Do it yourself

Summary

4. Getting Data in and out of ML Studio

Getting data in ML Studio

Uploading data from a PC

The Enter Data module

The Data Reader module

Getting data from the Web

Fetching a public dataset – do it yourself

Getting data from Azure

Data format conversion

Getting data from ML Studio

Saving a dataset on a PC

Saving results in ML Studio

The Writer module

Summary

5. Data Preparation

Data manipulation

Clean Missing Data

Removing duplicate rows

Project columns

The Metadata Editor module

The Add Columns module

The Add Rows module

The Join module

Splitting data

Do it yourself

The Apply SQL Transformation module

Advanced data preprocessing

Removing outliers

Data normalization

The Apply Math Operation module

Feature selection

The Filter Based Feature Selection module

The Fisher Linear Discriminant Analysis module

Data preparation beyond ready-made modules

Summary

6. Regression Models

Understanding regression algorithms

Train, score, and evaluate

The test and train dataset

Evaluating

The mean absolute error

The root mean squared error

The relative absolute error

The relative squared error

The coefficient of determination

Linear regression

Optimizing parameters for a learner – the sweep parameters module

The decision forest regression

The train neural network regression – do it yourself

Comparing models with the evaluate model

Comparing models – the neural network and boosted decision tree

Other regression algorithms

No free lunch

Summary

7. Classification Models

Understanding classification

Evaluation metrics

True positive

False positive

True negative

False negative

Accuracy

Precision

Recall

The F1 score

Threshold

Understanding ROC and AUC

Motivation for the matrix to consider

Training, scoring, and evaluating modules

Classifying diabetes or not

Two-class bayes point machine

Two-class neural network with parameter sweeping

Predicting adult income with decision-tree-based models

Do it yourself – comparing models to choose the best

Multiclass classification

Evaluation metrics – multiclass classification

Multiclass classification with the Iris dataset

Multiclass decision forest

Comparing models – multiclass decision forest and logistic regression

Multiclass classification with the Wine dataset

Multiclass neural network with parameter sweep

Do it yourself – multiclass decision jungle

Summary

8. Clustering

Understanding the K-means clustering algorithm

Creating a K-means clustering model using ML Studio

Do it yourself

Clustering versus classification

Summary

9. A Recommender System

The Matchbox recommender

Types of recommendations

Understanding the recommender modules

The Train Matchbox recommender

The number of traits

The number of recommendation algorithm iterations

The Score Matchbox recommender

The evaluate recommender

Building a recommendation system

Summary

10. Extensibility with R and Python

Introduction to R

Introduction to Python

Why should you extend through R/Python code?

Extending experiments using the Python language

Understanding the Execute Python Script module

Creating visualizations using Python

A simple time series analysis with the Python script

Importing the existing Python code

Do it yourself – Python

Extending experiments using the R language

Understanding the Execute R Script module

A simple time series analysis with the R script

Importing an existing R code

Including an R package

Understanding the Create R Model module

Do it yourself – R

Summary

11. Publishing a Model as a Web Service

Preparing an experiment to be published

Saving a trained model

Creating a scoring experiment

Specifying the input and output of the web service

Publishing a model as a web service

Visually testing a web service

Consuming a published web service

Web service configuration

Updating the web service

Summary

12. Case Study Exercise I

Problem definition and scope

The dataset

Data exploration and preparation

Feature selection

Model development

Model deployment

Summary

13. Case Study Exercise II

Problem definition and scope

The dataset

Data exploration and preparation

Model development

Model deployment

Summary

Index

Microsoft Azure Machine Learning


Microsoft Azure Machine Learning

Copyright © 2015 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: June 2015

Production reference: 1100615

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78439-079-2

www.packtpub.com

Cover image by Kamal Kanta Majhi

Credits

Author

Sumit Mund

Reviewers

Grigor Aslanyan

Alisson Sol

Abhishek Sur

Radu Tudoran

Commissioning Editor

Ashwin Nair

Acquisition Editor

Meeta Rajani

Content Development Editor

Adrian Raposo

Technical Editor

Abhishek R. Kotian

Copy Editors

Sonia Michelle Cheema

Neha Vyas

Project Coordinator

Sanchita Mandal

Proofreaders

Stephen Copestake

Safis Editing

Indexer

Monica Ajmera Mehta

Production Coordinator

Conidon Miranda

Cover Work

Conidon Miranda

About the Author

Sumit Mund is a BI/analytics consultant with about a decade of industry experience. He works in his own company, Mund Consulting Ltd., where he is a director and lead consultant. He is an expert in machine learning, predictive analytics, C#, R, and Python programming; he also has an active interest in Artificial Intelligence. He has extensive experience working with most of Microsoft Data Analytics tools and also on Big Data platforms, such as Hadoop and Spark. He is a Microsoft Certified Solution Expert (MCSE in Business Intelligence).

Sumit regularly engages on social media platforms through his tweets, blogs, and LinkedIn profile, and often gives talks at industry conferences and local user group meetings.

Acknowledgments

I may have written this book, but this project would never have been a success without the active help and support of many people who have contributed to my journey; I would like to thank them all sincerely and from the bottom of my heart.

Firstly, I'd like to thank the acquisition editor, Meeta Rajani, for approaching and convincing me to write this title. The book improved in manifold ways through valuable comments from all the reviewers, time and again. Adrian Raposo did a commendable job helping develop the content as well as coordinating the overall project management. This book would not have been in its current shape had it not received the perfect touch of the technical editor, Abhishek Kotian, and also all the proofreaders.

Special thanks to my colleagues, Kamal and Mahananda. Kamal took time to get the cover image for the book, while Mahananda took the pain of scanning through the drafts, making sure that all the examples were running well. He also gave suggestions wherever screenshots or steps were changed. When you start writing a book on a product that has been around since its beta days and is still going through changes till its final release, the job of making sure that all the screenshots and steps are correct and up to date is a challenge. Mahananda really made it easy for me.

Last but not least, I'd like to point out that, if someone has suffered because of this project, it's my dear wife, Pallabi. Whether it involved making late night coffee, sacrificing weekends and bank holidays, whenever I implored her to bear with me by saying, It's the book, she has always responded with a smile, without asking any question. Thank you for all your love, understanding, patience, and support.

I would also like to sincerely thank all those, though not mentioned here, who have helped me in this project directly or indirectly.

About the Reviewers

Grigor Aslanyan is a theoretical cosmologist who mainly focuses on computational methods for data analysis. He has a PhD in physics from the University of California, San Diego, and is currently a postdoctoral research fellow at the University of Auckland in New Zealand.

Grigor was born and raised in Armenia. He obtained his bachelor's and master's degrees in physics and computer science at Yerevan State University, Armenia, before moving to California for his PhD studies. He has also worked as a software engineer for 3 years at Ponté Solutions (which was later acquired by Mentor Graphics).

Grigor's research focuses on studying the theory of the early universe by using experimental data from Cosmic Microwave Background radiation and galaxy surveys. His research requires the development and implementation of complex numerical tools used to analyze the data on large computational clusters, with the ultimate goal of learning about the theory of the early universe. Grigor's current research is focused on applying advanced data science and machine learning techniques to improve the data analysis methods in cosmology, making it possible to analyze large amounts of data expected from current and future generation experiments.

He has implemented the publicly available numerical library, Cosmo++, which includes general mathematical and statistical tools for data analysis as well as cosmology-specific packages. The library is written in C++, and it is publicly available at http://cosmopp.com.

I thank the University of Auckland and my supervisor, Richard Easther, for supporting my work on this book.

Alisson Sol is currently a Group Engineering Manager for Microsoft in Bellevue, Washington. He has many years of experience in software development, having hired and managed several software teams that shipped many applications and frameworks, with focus on image processing, computer vision, ERP, business intelligence, big data, machine learning, and distributed systems. Alisson has been working for Microsoft and Microsoft Research in the USA and UK since 2000, and was previously a cofounder of 3 software companies. He has published several technical papers and has several patent applications and granted patents. He has a B.Sc. in physics and an M.Sc. in Computer Science from the Federal University of Minas Gerais, Brazil, and General Management training from the University of Cambridge, UK. When not coding, he likes to play soccer or disassemble hardware, put it back to work, and reuse the spare parts elsewhere!

Abhishek Sur has been a Microsoft MVP since 2011. He is currently working as a product head with Insync Tech-Fin Solutions Pvt Ltd. He has profound theoretical insight and years of hands-on experience in different .NET products and languages. Over the years, he has helped developers all over the world through his experience and knowledge. He owns a Microsoft User Group in Kolkata called Kolkata Geeks, and regularly organizes events and seminars in various places to spread .NET awareness. He is a renowned public speaker, voracious reader, and a technology buff. Abhishek's main interest lies in exploring the new realms of .NET technology and coming up with priceless write-ups on the unexplored domains of .NET. He is associated with the Microsoft Insider list on WPF and C# and stays in touch

Enjoying the preview?
Page 1 of 1