Fun Q: A Functional Introduction to Machine Learning in Q

Ebook828 pages14 hours

Fun Q: A Functional Introduction to Machine Learning in Q

Name: Fun Q: A Functional Introduction to Machine Learning in Q
Author: Nick Psaris
ISBN: 9781734467512

By Nick Psaris

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Bring the power of machine learning to the fastest time-series database. Fun Q uses the powerful q programming language to implement many of the most famous machine-learning algorithms. Using a meticulously factored machine-learning library, each algorithm is broken into its basic building blocks and then rebuilt from scratch. &nb

Skip carousel

Intelligence (AI) & Semantics

LanguageEnglish

PublisherVector Sigma

Release dateDec 21, 2020

ISBN9781734467512

Author

Nick Psaris

Related authors

Skip carousel

Related to Fun Q

Related ebooks

Skip carousel

Deep Learning with R
Ebook
Deep Learning with R
byJ. J. Allaire
Rating: 0 out of 5 stars
0 ratings
Time Series Forecasting in Python
Ebook
Time Series Forecasting in Python
byMarco Peixeiro
Rating: 0 out of 5 stars
0 ratings
D3.js in Action: Data visualization with JavaScript
Ebook
D3.js in Action: Data visualization with JavaScript
byElijah Meeks
Rating: 0 out of 5 stars
0 ratings
Machine Learning in Action
Ebook
Machine Learning in Action
byPeter Harrington
Rating: 0 out of 5 stars
0 ratings
Algorithms and Data Structures for Massive Datasets
Ebook
Algorithms and Data Structures for Massive Datasets
byDzejla Medjedovic
Rating: 0 out of 5 stars
0 ratings
Inside Deep Learning: Math, Algorithms, Models
Ebook
Inside Deep Learning: Math, Algorithms, Models
byEdward Raff
Rating: 0 out of 5 stars
0 ratings
Mastering Large Datasets with Python: Parallelize and Distribute Your Python Code
Ebook
Mastering Large Datasets with Python: Parallelize and Distribute Your Python Code
byJohn Wolohan
Rating: 0 out of 5 stars
0 ratings
Deep Learning with Python
Ebook
Deep Learning with Python
byFrancois Chollet
Rating: 5 out of 5 stars
5/5
Machine Learning: A Bayesian and Optimization Perspective
Ebook
Machine Learning: A Bayesian and Optimization Perspective
bySergios Theodoridis
Rating: 3 out of 5 stars
3/5
Bayesian Inference: With Ecological Applications
Ebook
Bayesian Inference: With Ecological Applications
byWilliam A Link
Rating: 1 out of 5 stars
1/5
Q Tips: Fast, Scalable, and Maintainable Kdb+
Ebook
Q Tips: Fast, Scalable, and Maintainable Kdb+
byNick Psaris
Rating: 0 out of 5 stars
0 ratings
TensorFlow in Action
Ebook
TensorFlow in Action
byThushan Ganegedara
Rating: 0 out of 5 stars
0 ratings
Website Scraping with Python: Using BeautifulSoup and Scrapy
Ebook
Website Scraping with Python: Using BeautifulSoup and Scrapy
byGábor László Hajba
Rating: 0 out of 5 stars
0 ratings
Probabilistic Deep Learning: With Python, Keras and TensorFlow Probability
Ebook
Probabilistic Deep Learning: With Python, Keras and TensorFlow Probability
byBeate Sick
Rating: 0 out of 5 stars
0 ratings
Practical C++20 Financial Programming: Problem Solving for Quantitative Finance, Financial Engineering, Business, and Economics
Ebook
Practical C++20 Financial Programming: Problem Solving for Quantitative Finance, Financial Engineering, Business, and Economics
byCarlos Oliveira
Rating: 0 out of 5 stars
0 ratings
Theory of Markov Processes
Ebook
Theory of Markov Processes
byE. B. Dynkin
Rating: 0 out of 5 stars
0 ratings
Computational Finance Using C and C#
Ebook
Computational Finance Using C and C#
byGeorge Levy
Rating: 0 out of 5 stars
0 ratings
GANs in Action: Deep learning with Generative Adversarial Networks
Ebook
GANs in Action: Deep learning with Generative Adversarial Networks
byVladimir Bok
Rating: 0 out of 5 stars
0 ratings
Generative Adversarial Networks with Industrial Use Cases: Learning How to Build GAN Applications for Retail, Healthcare, Telecom, Media, Education, and HRTech
Ebook
Generative Adversarial Networks with Industrial Use Cases: Learning How to Build GAN Applications for Retail, Healthcare, Telecom, Media, Education, and HRTech
byNavin K Manaswi
Rating: 0 out of 5 stars
0 ratings
Tensorflow Machine Learning Complete Self-Assessment Guide
Ebook
Tensorflow Machine Learning Complete Self-Assessment Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
AngularJS in Action
Ebook
AngularJS in Action
byLukas Ruebbelke
Rating: 0 out of 5 stars
0 ratings
Machine Learning Bookcamp: Build a portfolio of real-life projects
Ebook
Machine Learning Bookcamp: Build a portfolio of real-life projects
byAlexey Grigorev
Rating: 4 out of 5 stars
4/5
Practical Recommender Systems
Ebook
Practical Recommender Systems
byKim Falk
Rating: 5 out of 5 stars
5/5
Handbook of Advanced Mathematics
Ebook
Handbook of Advanced Mathematics
bySimone Malacrida
Rating: 0 out of 5 stars
0 ratings
Scala in Action
Ebook
Scala in Action
byNilanjan Raychaudhuri
Rating: 0 out of 5 stars
0 ratings
The Well-Grounded Java Developer: Vital techniques of Java 7 and polyglot programming
Ebook
The Well-Grounded Java Developer: Vital techniques of Java 7 and polyglot programming
byBenjamin Evans
Rating: 4 out of 5 stars
4/5
Apache Solr Search Patterns
Ebook
Apache Solr Search Patterns
byJayant Kumar
Rating: 0 out of 5 stars
0 ratings
Spark GraphX in Action
Ebook
Spark GraphX in Action
byMichael Malak
Rating: 0 out of 5 stars
0 ratings
Deep Learning with Structured Data
Ebook
Deep Learning with Structured Data
byMark Ryan
Rating: 0 out of 5 stars
0 ratings
Python: Deeper Insights into Machine Learning
Ebook
Python: Deeper Insights into Machine Learning
byJohn Hearty
Rating: 0 out of 5 stars
0 ratings

Intelligence (AI) & Semantics For You

Skip carousel

ChatGPT For Fiction Writing: AI for Authors
Ebook
ChatGPT For Fiction Writing: AI for Authors
byNova Leigh
Rating: 5 out of 5 stars
5/5
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
2084: Artificial Intelligence and the Future of Humanity
Ebook
2084: Artificial Intelligence and the Future of Humanity
byJohn C. Lennox
Rating: 4 out of 5 stars
4/5
101 Midjourney Prompt Secrets
Ebook
101 Midjourney Prompt Secrets
byMarcus Byrne
Rating: 3 out of 5 stars
3/5
Summary of Super-Intelligence From Nick Bostrom
Ebook
Summary of Super-Intelligence From Nick Bostrom
bySummary Station
Rating: 5 out of 5 stars
5/5
Our Final Invention: Artificial Intelligence and the End of the Human Era
Ebook
Our Final Invention: Artificial Intelligence and the End of the Human Era
byJames Barrat
Rating: 4 out of 5 stars
4/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
Ebook
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
byThe Passive Income Strategist
Rating: 4 out of 5 stars
4/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
Ebook
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
byMatthew Hayes
Rating: 0 out of 5 stars
0 ratings
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
Ebook
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
byAlexander Cooper
Rating: 1 out of 5 stars
1/5
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
Discovery Writing with ChatGPT: AI-Powered Storytelling: Three Story Method, #6
Ebook
Discovery Writing with ChatGPT: AI-Powered Storytelling: Three Story Method, #6
byJ. Thorn
Rating: 0 out of 5 stars
0 ratings
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
Impromptu: Amplifying Our Humanity Through AI
Ebook
Impromptu: Amplifying Our Humanity Through AI
byReid Hoffman
Rating: 5 out of 5 stars
5/5
ChatGPT For Dummies
Ebook
ChatGPT For Dummies
byPam Baker
Rating: 0 out of 5 stars
0 ratings
Midjourney Mastery - The Ultimate Handbook of Prompts
Ebook
Midjourney Mastery - The Ultimate Handbook of Prompts
byAndreea Todinca
Rating: 5 out of 5 stars
5/5
Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence
Ebook
Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence
byJames Bridle
Rating: 4 out of 5 stars
4/5
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
Ebook
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
byJasmine Wang
Rating: 5 out of 5 stars
5/5
The Algorithm of the Universe (A New Perspective to Cognitive AI)
Ebook
The Algorithm of the Universe (A New Perspective to Cognitive AI)
byAncient Philosophy
Rating: 5 out of 5 stars
5/5
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
Ebook
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
byLogan Rivers
Rating: 5 out of 5 stars
5/5
AI for Educators: AI for Educators
Ebook
AI for Educators: AI for Educators
byMatt Miller
Rating: 5 out of 5 stars
5/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
Ebook
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
byUtpal Chakraborty
Rating: 0 out of 5 stars
0 ratings
The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications
Ebook
The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications
byKavita Ganesan
Rating: 0 out of 5 stars
0 ratings
Humans Need Not Apply: A Guide to Wealth & Work in the Age of Artificial Intelligence
Ebook
Humans Need Not Apply: A Guide to Wealth & Work in the Age of Artificial Intelligence
byJerry Kaplan
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
Podcast episode
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Crafting Interpreters With Bob Nystrom: Bob Nystrom is the author of Crafting Interpreters. I speak with Nystrom about building a programming language and an interpreter implementation for it. We talk about parsing, the difference between compiler and interpreters and a lot more. If you are...
Podcast episode
Crafting Interpreters With Bob Nystrom: Bob Nystrom is the author of Crafting Interpreters. I speak with Nystrom about building a programming language and an interpreter implementation for it. We talk about parsing, the difference between compiler and interpreters and a lot more. If you are...
byCoRecursive: Coding Stories
0 ratings
0% found this document useful
#37 Prophet, Time Series & Causal Inference, with Sean Taylor
Podcast episode
#37 Prophet, Time Series & Causal Inference, with Sean Taylor
byLearning Bayesian Statistics
0 ratings
0% found this document useful
Qwik with Misko Hevery - JSJ 549
Podcast episode
Qwik with Misko Hevery - JSJ 549
byJavaScript Jabber
0 ratings
0% found this document useful
#111 The Rise of the Julia Programming Language
Podcast episode
#111 The Rise of the Julia Programming Language
byDataFramed
0 ratings
0% found this document useful
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
Podcast episode
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
byDataFramed
0 ratings
0% found this document useful
41. Bob Nystrom
Podcast episode
41. Bob Nystrom
byIt's All Widgets! Flutter Podcast
0 ratings
0% found this document useful
Matthew Rothman Talks 'Quant Land' and Springsteen
Podcast episode
Matthew Rothman Talks 'Quant Land' and Springsteen
byMasters in Business
0 ratings
0% found this document useful
316: git commit FreeBSD: NetBSD LLVM sanitizers and GDB regression test suite, Ada—The Language of Cost Savings, Homura - a Windows Games Launcher for FreeBSD, FreeBSD core team appoints a WG to explore transition to Git, OpenBSD 6.6 Beta tagged, Project Trident 12-U5 update now available, and more.
Podcast episode
316: git commit FreeBSD: NetBSD LLVM sanitizers and GDB regression test suite, Ada—The Language of Cost Savings, Homura - a Windows Games Launcher for FreeBSD, FreeBSD core team appoints a WG to explore transition to Git, OpenBSD 6.6 Beta tagged, Project Trident 12-U5 update now available, and more.
byBSD Now
0 ratings
0% found this document useful
Episode 409: RR 402: Ruby 2.6.0 Bugs, Kafka, and Karafka with Maciej Mensfeld
Podcast episode
Episode 409: RR 402: Ruby 2.6.0 Bugs, Kafka, and Karafka with Maciej Mensfeld
byRuby Rogues
0 ratings
0% found this document useful
Episode 454: RR 446: Development Environments
Podcast episode
Episode 454: RR 446: Development Environments
byRuby Rogues
0 ratings
0% found this document useful
Top 18 New Things in JS - Part 1: In this episode Wes and Scott discuss their favorite top 18 new things in Javascript. Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at freshbooks.com/syntax and put SYNTAX in the “How did you hear about us?”...
Podcast episode
Top 18 New Things in JS - Part 1: In this episode Wes and Scott discuss their favorite top 18 new things in Javascript. Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at freshbooks.com/syntax and put SYNTAX in the “How did you hear about us?”...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Episode 295: JSJ 292: CosmosDB with Kirill Gavrylyuk
Podcast episode
Episode 295: JSJ 292: CosmosDB with Kirill Gavrylyuk
byJavaScript Jabber
0 ratings
0% found this document useful
Episode 421: RR 413: When Your Tools Interrupt Your Coding Process
Podcast episode
Episode 421: RR 413: When Your Tools Interrupt Your Coding Process
byRuby Rogues
0 ratings
0% found this document useful
Complex Geometries
Podcast episode
Complex Geometries
byModellansatz
0 ratings
0% found this document useful
Complex Geometries: Modellansatz 086
Podcast episode
Complex Geometries: Modellansatz 086
byModellansatz - English episodes only
0 ratings
0% found this document useful
Episode 249: Router On A Stick | BSD Now 249: OpenZFS and DTrace updates in NetBSD, NetBSD network security stack audit, Performance of MySQL on ZFS, OpenSMTP results from p2k18, legacy Windows backup to FreeNAS, ZFS block size importance, and NetBSD as router on a stick.
Podcast episode
Episode 249: Router On A Stick | BSD Now 249: OpenZFS and DTrace updates in NetBSD, NetBSD network security stack audit, Performance of MySQL on ZFS, OpenSMTP results from p2k18, legacy Windows backup to FreeNAS, ZFS block size importance, and NetBSD as router on a stick.
byBSD Now
0 ratings
0% found this document useful
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization: The recent amalgamation of transformer and convolutional designs has led to steady improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a hybrid vision transformer architecture that obtains the state-of-the-art l...
Podcast episode
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization: The recent amalgamation of transformer and convolutional designs has led to steady improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a hybrid vision transformer architecture that obtains the state-of-the-art l...
byPapers Read on AI
0 ratings
0% found this document useful
Radio Frequency Data Collection: Aerial collection of data from smart meters. In this episode, we discuss why this is better than mesh networks, and other ground collection options and also look at the limitations of drones in terms of data collection platforms. Join the email list f...
Podcast episode
Radio Frequency Data Collection: Aerial collection of data from smart meters. In this episode, we discuss why this is better than mesh networks, and other ground collection options and also look at the limitations of drones in terms of data collection platforms. Join the email list f...
byThe MapScaping Podcast - GIS, Geospatial, Remote Sensing, earth observation and digital geography
0 ratings
0% found this document useful
Cassandra, with Sam Ramji: Apache Cassandra, a scale-out datastore, is becoming more Kubernetes-native. Sam Ramji is Chief Strategy Officer at DataStax, a company that builds Cassandra-based products. He explains how DataStax has pivoted back towards supporting upstream Cassandra, and how they're making it easier to manage on Kubernetes. As always, we also cover the news of the week, and we look at what is and is not a dinosaur.
Podcast episode
Cassandra, with Sam Ramji: Apache Cassandra, a scale-out datastore, is becoming more Kubernetes-native. Sam Ramji is Chief Strategy Officer at DataStax, a company that builds Cassandra-based products. He explains how DataStax has pivoted back towards supporting upstream Cassandra, and how they're making it easier to manage on Kubernetes. As always, we also cover the news of the week, and we look at what is and is not a dinosaur.
byKubernetes Podcast from Google
0 ratings
0% found this document useful
SE Radio 590: Andy Suderman on Standing Up Kubernetes: Andy Suderman, CTO of Fairwinds, joins host to talk about standing up a kubernetes cluster. Their discussion covers build-your-own versus managed clusters provided by cloud services, and how to determine the number of kubernetes clusters an...
Podcast episode
SE Radio 590: Andy Suderman on Standing Up Kubernetes: Andy Suderman, CTO of Fairwinds, joins host to talk about standing up a kubernetes cluster. Their discussion covers build-your-own versus managed clusters provided by cloud services, and how to determine the number of kubernetes clusters an...
bySoftware Engineering Radio - the podcast for professional software developers
0 ratings
0% found this document useful
296: It’s Alive: OpenBSD 6.5: OpenBSD 6.5 has been released, mount ZFS datasets anywhere, help test upcoming NetBSD 9 branch, LibreSSL 2.9.1 is available, Bail Bond Denied Edition of FreeBSD Mastery: Jails, and one reason ed(1) was a good editor back in the days in this week’s episode.
Podcast episode
296: It’s Alive: OpenBSD 6.5: OpenBSD 6.5 has been released, mount ZFS datasets anywhere, help test upcoming NetBSD 9 branch, LibreSSL 2.9.1 is available, Bail Bond Denied Edition of FreeBSD Mastery: Jails, and one reason ed(1) was a good editor back in the days in this week’s episode.
byBSD Now
0 ratings
0% found this document useful
Using CRISPR/Cas9 to detect DNA sequences with AFM
Podcast episode
Using CRISPR/Cas9 to detect DNA sequences with AFM
byListen In - Bitesize Bio Webinar Audios
0 ratings
0% found this document useful
Introduction To LIDAR & Point Clouds: The main topics discussed during this episode include: Basics of LIDAR data and its applications. Differences between LIDAR and photogrammetry. Processing chain of LIDAR data. Challenges in classifying point clouds. Applications of LIDAR technology i...
Podcast episode
Introduction To LIDAR & Point Clouds: The main topics discussed during this episode include: Basics of LIDAR data and its applications. Differences between LIDAR and photogrammetry. Processing chain of LIDAR data. Challenges in classifying point clouds. Applications of LIDAR technology i...
byThe MapScaping Podcast - GIS, Geospatial, Remote Sensing, earth observation and digital geography
0 ratings
0% found this document useful
2865: Navigating the Multi-Cloud Maze with CockroachDB: Are you curious how businesses can thrive in the rapidly evolving landscape of cloud computing and multi-cloud environments? In the upcoming episode of Tech Talks Daily, we're diving into the world of resilient databases with Spencer Kimball, CEO of...
Podcast episode
2865: Navigating the Multi-Cloud Maze with CockroachDB: Are you curious how businesses can thrive in the rapidly evolving landscape of cloud computing and multi-cloud environments? In the upcoming episode of Tech Talks Daily, we're diving into the world of resilient databases with Spencer Kimball, CEO of...
byThe Tech Talks Daily Podcast
0 ratings
0% found this document useful
374: OpenBSD’s 25th anniversary: OpenBSD 6.8 has been released, NetBSD 9.1 is out, OpenZFS devsummit report, BastilleBSD’s native container management for FreeBSD, cleaning up old tarsnap backups, Michael W. Lucas’ book sale, and more.
Podcast episode
374: OpenBSD’s 25th anniversary: OpenBSD 6.8 has been released, NetBSD 9.1 is out, OpenZFS devsummit report, BastilleBSD’s native container management for FreeBSD, cleaning up old tarsnap backups, Michael W. Lucas’ book sale, and more.
byBSD Now
0 ratings
0% found this document useful
Understanding Deep Neural Nets with Dr. James McCaffrey - TWiML Talk #13: My guest this week is Dr. James McCaffrey, resear…
Podcast episode
Understanding Deep Neural Nets with Dr. James McCaffrey - TWiML Talk #13: My guest this week is Dr. James McCaffrey, resear…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Cory O'Daniel and the Future of DevOps in Elixir Programming: In this episode of Elixir Wizards, Cory O'Daniel, CEO of Massdriver, talks with Sundi and Owen about the role of DevOps in the future of Elixir programming. They discuss the advantages of using Elixir for cloud infrastructure and the challenges of securing cloud systems. They elaborate on their hopes for the future, including processes and automation to streamline operations so programmers can spend more time doing what they love … writing software!
Podcast episode
Cory O'Daniel and the Future of DevOps in Elixir Programming: In this episode of Elixir Wizards, Cory O'Daniel, CEO of Massdriver, talks with Sundi and Owen about the role of DevOps in the future of Elixir programming. They discuss the advantages of using Elixir for cloud infrastructure and the challenges of securing cloud systems. They elaborate on their hopes for the future, including processes and automation to streamline operations so programmers can spend more time doing what they love … writing software!
byElixir Wizards
0 ratings
0% found this document useful
Episode 446: RR 438: Deviating from the Rails Core
Podcast episode
Episode 446: RR 438: Deviating from the Rails Core
byRuby Rogues
0 ratings
0% found this document useful
Overcoming the next hurdle to get to 800G pluggable optics, with Mark Nowell, 2 of 4: What are the industry’s technical experts in plug…
Podcast episode
Overcoming the next hurdle to get to 800G pluggable optics, with Mark Nowell, 2 of 4: What are the industry’s technical experts in plug…
byCisco Podcast Network
0 ratings
0% found this document useful

Skip carousel

GO Inside Parsing – How Go Handles The Code
Linux Format
Article
GO Inside Parsing – How Go Handles The Code
Jul 30, 2019
This tutorial has two aspects: a theoretical one and a practical one. In the theoretical part, you will learn about parsing, grammar and regular expressions; this is how languages are built and therefore understood in terms of construction and usage.
8 min read
The Midnight Design Solutions “Phaser” Transceiver Kit
CQ Amateur Radio
Article
The Midnight Design Solutions “Phaser” Transceiver Kit
Aug 1, 2020
19 min read
All Your Database Are Belong To Us
Linux Format
Article
All Your Database Are Belong To Us
Apr 6, 2021
7 min read
Why The Future Needs Optical Data Centres
PC Pro Magazine
Article
Why The Future Needs Optical Data Centres
Sep 10, 2020
9 min read
This Lens-free Microscope Fits On A Fingertip
Futurity
Article
This Lens-free Microscope Fits On A Fingertip
Mar 5, 2018
3 min read
A New Approach to Multiplication Opens the Door to Better Quantum Computers
Quanta
Article
A New Approach to Multiplication Opens the Door to Better Quantum Computers
Apr 24, 2019
3 min read
Quantum Computing Is Here…with One Small Caveat
PC Pro Magazine
Article
Quantum Computing Is Here…with One Small Caveat
Jan 4, 2024
7 min read
Quantum Computing Is Here… With One Small Caveat
APC
Article
Quantum Computing Is Here… With One Small Caveat
Feb 5, 2024
8 min read
APY Masterclass Framing A Dark Molecular Cloud
BBC Sky at Night
Article
APY Masterclass Framing A Dark Molecular Cloud
May 19, 2022
3 min read
FROM BITS 01 TO QuBITS
Techfastly
Article
FROM BITS 01 TO QuBITS
Oct 1, 2021
4 min read
Machine Learning Pushes Quantum Computing Forward
Futurity
Article
Machine Learning Pushes Quantum Computing Forward
Mar 18, 2020
4 min read
Throwing Shade
PC Gamer (US Edition)
Article
Throwing Shade
May 18, 2021
4 min read
2D Platform Could Enable Mass Production Of Quantum Bits
Futurity
Article
2D Platform Could Enable Mass Production Of Quantum Bits
Apr 28, 2019
2 min read
Mesh Wi-Fi Vs. Traditional Router: Which One’s Best For Your Home?
PCWorld
Article
Mesh Wi-Fi Vs. Traditional Router: Which One’s Best For Your Home?
Nov 2, 2021
5 min read
Quantum Cyberattacks Are Coming. This Maths Can Stop Them
Popular Mechanics South Africa
Article
Quantum Cyberattacks Are Coming. This Maths Can Stop Them
Dec 9, 2022
3 min read
Whatcan Quantum Computing Really Do?
How It Works
Article
Whatcan Quantum Computing Really Do?
Dec 27, 2019
2 min read
Is Quantum Computing Ready For Prime Time?
APC
Article
Is Quantum Computing Ready For Prime Time?
Oct 9, 2023
4 min read
Throwing Shade
PC Gamer
Article
Throwing Shade
Apr 29, 2021
3 min read
Is Quantum Computing Ready For Prime Time?
PC Pro Magazine
Article
Is Quantum Computing Ready For Prime Time?
Sep 7, 2023
7 min read
Digital Asset Management How To Save Your Sanity When A Drive Fails
Capture
Article
Digital Asset Management How To Save Your Sanity When A Drive Fails
Jan 23, 2020
8 min read
Quantum Supremacy Is Coming: Here’s What You Should Know
Quanta
Article
Quantum Supremacy Is Coming: Here’s What You Should Know
Jul 18, 2019
7 min read
Qsan XCubeNAS XN8112R
PC Pro Magazine
Article
Qsan XCubeNAS XN8112R
Apr 6, 2023
2 min read
It’s Great When You’re K8s
Linux Format
Article
It’s Great When You’re K8s
Oct 18, 2022
8 min read
We Were Wrong About Online Algorithms
Nautilus
Article
We Were Wrong About Online Algorithms
Dec 7, 2023
5 min read
How Quantum Computing Can Fight Climate Change
APC
Article
How Quantum Computing Can Fight Climate Change
Nov 28, 2022
8 min read
How Quantum Computing Can Fight Climate Change
PC Pro Magazine
Article
How Quantum Computing Can Fight Climate Change
Oct 8, 2022
8 min read
Netgear RBKE960 Mesh Router System
Maximum PC
Article
Netgear RBKE960 Mesh Router System
Dec 7, 2021
2 min read
Grid Modeling Overview: Four Types of Models Guiding the Transition to Clean Electricity
Union of Concerned Scientists
Article
Grid Modeling Overview: Four Types of Models Guiding the Transition to Clean Electricity
Apr 25, 2022
6 min read
Comparing Time Series Data Like A Pro
Linux Format
Article
Comparing Time Series Data Like A Pro
Jun 1, 2021
8 min read
Quantum Entanglement
Techfastly
Article
Quantum Entanglement
Oct 1, 2021
4 min read

Related categories

Skip carousel

Reviews for Fun Q

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Fun Q - Nick Psaris

Fun Q

A Functional Introduction to Machine Learning in Q

Nick Psaris

All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of the publisher.

Kx and kdb+ are registered trademarks of Kx Systems, Inc.

Books may be purchased in quantity and/or special sales by contacting the publisher, Vector Sigma at sales@vector-sigma.com.

Preface1. Introduction1.1. Who is This Book For?1.2. How to Read This Book1.3. Getting Started1.4. What is in This Book?1.5. Naming Conventions2. K-Nearest Neighbors2.1. K-Nearest Neighbors Example2.2. Iris Data Set2.3. Downloading Data Sets2.4. Loading Data Sets2.5. Partitioning Data2.6. Distance Metrics2.7. Norm Metrics2.8. Minimize Flipping2.9. Finding K Neighbors2.10. Weighted Average or Mode2.11. Weighting Schemes2.12. Euclidean Distance Squared2.13. Pairwise Euclidean Distance Squared2.14. K-Fold Cross Validation3. K-Means3.1. K-Means Example3.2. Centroid Initialization3.3. K-Means++ Initialization3.4. Lloyd’s Algorithm3.5. Ungrouping Clusters3.6. The Elbow Method3.7. Partitional Clustering4. Hierarchical Agglomerative Clustering4.1. Hierarchical Agglomerative Clustering Example4.2. Cluster Benchmark Data Sets4.3. Dissimilarity Matrix4.4. Lance-Williams Algorithm4.5. Linkage Functions4.6. Building Clusters4.7. The Silhouette Method5. Expectation-Maximization5.1. Bernoulli Distribution5.2. Expectation-Maximization Algorithm5.3. Binomial Distribution5.4. Multinomial Distribution5.5. Bernoulli Mixture Models5.6. Gaussian Mixture Model6. Naive Bayes6.1. Fit Naive Bayes6.2. Gaussian Naive Bayes6.3. Multinomial Naive Bayes6.4. Binary Classification6.5. Text Classification7. Decision Trees7.1. Impurity Functions7.2. Iterative Dichotomizer 37.3. Visualizing Decision Trees7.4. Decision Tree Classification7.5. Building the Decision Tree7.6. Gain Functions7.7. Cost Complexity Pruning7.8. K-Fold Cross Validation7.9. Regression Trees7.10. One Hot8. Decision Tree Ensembles8.1. Ensemble Example8.2. Wisconsin Diagnosis Breast Cancer Data Set8.3. Bootstrap Aggregation8.4. Random Forest8.5. Adaptive Boosting8.6. Discrete Adaboost8.7. AdaBoost Cross Validation9. Linear Regression9.1. Matrices9.2. Matrix Operators9.3. Least Squares Regression9.4. Multi-Dimensional Least Squares9.5. Custom Matrix Operators9.6. Normal Equations9.7. Gradient Descent9.8. Linear Regression Cost9.9. Linear Regression Gradient9.10. Feature Scaling9.11. FMINCG9.12. Stopping Conditions9.13. Stochastic Gradient Descent9.14. Regularization9.15. Optimal Regularization10. Logistic Regression10.1. Logistic Regression Example10.2. Wisconsin Diagnosis Breast Cancer Data Set10.3. Sigmoid10.4. Log-Loss Cost10.5. Logistic Regression Cost10.6. Logistic Regression Gradient10.7. Optimal Regularization10.8. Receiver Operating Characteristic10.9. Area Under the Curve10.10. Multi-Class Classification10.11. One-vs.-All Classifier10.12. Fitting One vs. All10.13. Predicting One vs. All11. Neural Networks11.1. Perceptron11.2. Neural Network11.3. Neural Network Example11.4. Neural Network Matrices Initialization11.5. Neural Network Topology11.6. Neural Network Weight Initialization11.7. Neural Network Cost11.8. Neural Network Gradient11.9. Neural Network Activation Functions11.10. Neural Network Cost and Gradient11.11. Neural Network Multinomial Classification11.12. Neural Network Regression12. Recommender Systems12.1. MovieLens Data Set12.2. Content-Based Filtering12.3. Rating Data Summary12.4. Null-Aware Aggregations12.5. User-User Collaborative Filtering12.6. Item-Item Collaborative Filtering12.7. Collaborative Filtering12.8. Collaborative Filtering Cost12.9. Collaborative Filtering Gradient12.10. Stochastic Gradient Descent12.11. Alternating Least Squares13. PageRank13.1. Adjacency Matrix13.2. Markov Matrix13.3. Google Matrix13.4. Power Method13.5. Algebraic PageRank13.6. Iterative PageRank13.7. Sparse Matrices13.8. Sparse Matrix Iterative PageRank14. Graphics14.1. ASCII plots14.2. Mandelbrot Set14.3. Portable Image Formats14.4. Sparklines14.5. Charting Stock PricesA. Source CodeA.1. UtilitiesA.2. Machine-Learning AlgorithmsA.3. External Libraries Ported to QA.4. Data LibrariesIndex

List of Figures

7.1. Weather Decision Tree11.1. Perceptron with Heaviside Activation11.2. Neural Network with Two Hidden Layers13.1. Example Web Page Links14.1. Black and White Mandelbrot Set14.2. Grayscale Mandelbrot Set

List of Examples

2.1. Distance Metrics3.1. K-Means3.2. K-Means Initial Clusters3.3. K-Means Centroids After One Round3.4. K-Means Clusters After One Round3.5. K-Means Centroids After Two Rounds3.6. K-Means Centroids After Three Rounds3.7. K-Means Clusters After Convergence3.8. Contrived K-Means3.9. Sum of Squared Errors Within3.10. Percentage of Total Variance4.1. Large HAC Clusters4.2. Silhouette I4.3. Small HAS Clusters4.4. Silhouette II5.1. Binomial EM Confusion Matrix5.2. Multinomial EM Confusion Matrix5.3. MNIST Digits5.4. Prototypical Digits with Random Noise5.5. Prototypical Digits5.6. Bernoulli Mixture Model EM Confusion Matrix5.7. Gaussian Mixture Model EM Confusion Matrix6.1. Naive Bayes Mistakes6.2. Binary Classification Confusion Matrix7.1. Mean Squared Errors7.2. Misclassification7.3. Entropy7.4. Gini8.1. Bootstrap Aggregation Accuracy8.2. Random Forest Regression Training Error8.3. Random Forest Regression Test Error8.4. Random Forest Classification Training Error8.5. Random Forest Classification Test Error8.6. AdaBoost Training Accuracy8.7. AdaBoost Test Accuracy9.1. Sine9.2. Linear Regression Cost Convergence9.3. Optimal Linear Regression Regularization10.1. Sigmoid10.2. Log-Loss10.3. Optimal Logistic Regression Regularization10.4. Logistic Regression Binary Classification Confusion Matrix10.5. Low Threshold Binary Classification Confusion Matrix10.6. High Threshold Binary Classification Confusion Matrix10.7. Discriminating Receiver Operating Characteristic10.8. Non-discriminating Receiver Operating Characteristic10.9. Negative Discriminating Receiver Operating Characteristic10.10. MNIST Digits10.11. Logistic Regression One-vs.-All Confusion Matrix10.12. Incorrectly Classified MNIST Digits11.1. Neural Network Multinomial Classification Confusion Matrix11.2. Row-Normalized Confusion Matrix11.3. Column-Normalized Confusion Matrix14.1. Heat Map14.2. Axis Labels14.3. Black and White Mandelbrot Set14.4. Gray Scale Mandelbrot Set

List of Equations

2.1. Manhattan Distance2.2. Euclidean Distance2.3. Minkowski Distance2.4. Manhattan Norm2.5. Euclidean Norm2.6. Parameterized Norm2.7. Cosine Similarity2.8. Pairwise Euclidean Distance Squared4.1. Lance-Williams Recursive Update5.1. Binomial Likelihood5.2. Binomial Maximum Likelihood Estimator5.3. Multinomial Likelihood5.4. Multinomial Mixture Model Maximum Likelihood Estimator5.5. Bernoulli Mixture Model Likelihood5.6. Binomial Mixture Model Maximum Likelihood Estimator5.7. Multivariate Gaussian Likelihood5.8. Multivariate Gaussian Maximum Likelihood Estimator7.1. Decision Tree Cost Complexity8.1. AdaBoost Strong Classifier8.2. Exponential Loss9.1. Normal Equations9.2. Regularized Linear Regression Cost9.3. Regularized Linear Regression Gradient9.4. Ridge Regression10.1. Sigmoid10.2. Log-Loss10.3. Regularized Logistic Regression Cost10.4. Regularized Logistic Regression Gradient11.1. Heaviside Step11.2. Neural Network Cost11.3. Sigmoid11.4. Sigmoid Gradient11.5. Hyperbolic Tangent11.6. Hyperbolic Tangent Gradient11.7. ReLU11.8. ReLU Gradient11.9. LReLU11.10. LReLU Gradient11.11. Softmax11.12. Cross Entropy12.1. Collaborative Filtering Cost12.2. Collaborative Filtering Gradient12.3. Matrix-Factorization Optimization12.4. Matrix-Factorization Updates14.1. Complex Multiplication

List of Figures

List of Examples

List of Equations

Preface

I love data. Let me clarify that. I love structured data. My first job was to administer a Sybase database and the Perl scripts that loaded client positions. I then transitioned to a trading desk where we used Visual Basic for Applications (VBA) to enter trades into a Microsoft Access database. From there, I moved to an automated trading team that stored all data in proprietary self-describing text messages.

Lucky for me, I switched to a team that used kdb+ as the data warehouse and therefore q as the programming language. Over the ensuing years, I continued to use q as my development language of choice, and spent many days—if not weeks—writing functions and C-bindings to import data from (and export data to) different formats including fixed-width binary files and proprietary middleware APIs. The goal in each case was to bring more data to the q ecosystem.

Convinced that q was the language of choice for big data analysis, I turned my attention to bringing analytics to the language as well. Data scientists often treat kdb+ as nothing more than a source of data. To accommodate the limited capacity for large data sets in languages such as R and Python, they are required to aggregate the data and, in the process, lose information. Instead of bringing summary statistics into languages that were not designed for scale, wouldn’t it be better to perform the analysis directly on the data stored in kdb+?

I was first introduced to machine-learning algorithms in 2015 when a friend suggested I take the Coursera Machine Learning course[1] taught by Coursera’s co-founder, Andrew Ng. After completing all the homework assignments in Octave (an open source alternative to Matlab), I decided to re-implement the whole course in q.

My goal was to push q to its limits, find its Achilles heel and keep pushing until it begged for mercy. In the process, I found the functional vector implementation of many algorithms quite elegant. Others, however, were down-right embarrassing. I found that q’s internal data structures were not designed for fast matrix multiplication. With carefully factored code, however, and the use of an open-source math library, the performance could be improved dramatically. I found the lack of complex data types frustrating and wished q had native sparse matrix operators. I found that transposing large matrices severely impacts performance and algorithms must be designed to prevent this. And finally, I found that q’s limits on function, argument and branch length make translating code from other languages very difficult without refactoring.[2]

I became addicted to implementing machine-learning algorithms and continued to read the original papers for algorithms such as decision trees, adaptive boosting and recommender systems. As I added each new algorithm, I refactored the library to maximize code performance and reuse. I wrote each algorithm on the free 32-bit version of kdb+ and limited all interaction (including visualization) to the q terminal. To make the code suitable for inclusion in this book, all functions and comments were required to be less than 77 characters wide. I believe this constraint improved the code clarity by forcing algorithms to be refactored into functions with descriptive names.

I learned that there exists a problem domain where q implementations of machine-learning algorithms, such as k-means, perform well. Others, such as support vector machines, are best delegated to the canonical implementation written in C/C++. This book is dedicated to machine-learning algorithms implemented in pure q.

I enjoy coding in q. In fact, I truly believe coding in q is fun. My goal is to share this passion with you while simultaneously teaching you machine-learning algorithms and concepts as well as many q coding paradigms. Along the way, I hope you have fun too.

[1] Machine Learning, Coursera. [Online]. Available: https://www.coursera.org/learn/machine-learning. [Accessed: 12-Mar-2020].

[2] Some of these limits have been relaxed with the release of kdb+ 3.6. Instructions embedded within if, do, and while branching control structures were previously limited to 255 bytecodes. This has now been extended to 65,025 bytecodes. The number of permitted constants, as well as the number local and global variables, has also been increased. The limit of 8 function arguments, however, still remains.

Chapter 1. Introduction

Data comes in many shapes: bytes, text, integral and floating. Data comes in many sizes: boolean flags, text config files, binary images and partitioned databases. Stored and ignored, data is expensive to maintain and backup. The quality of the data is only revealed once it is analyzed. And the value of the data is only determined once it is used to make and test predictions.

Analysis can take the form of visual inspection or statistical summary. Predictions are routinely performed and tested in academia and business. The goal of accurate predicting varies slightly between these two fields, however. While academia strives for truth, business searches for profit. And where there is incentive in academia to share data and analysis, business has historically held it captive.

Recent advances in computational analysis and computing power, and the reduction in storage costs, have ignited a data revolution. Whether we call it data science, machine learning, deep learning or artificial intelligence, the combined power of data and computational analysis reconfirms that we live in the information age.

Data is worthless if you can’t extract information from it. As Clifford Stoll and Gary Schubert are quoted with saying:

Each step along the path from data to wisdom requires the expenditure of effort and results in an increase in organization and structure. Recent developments in machine-learning techniques have enabled computers to automate the first step from data to information. The success of these algorithms can be attributed to 10 to 100-fold improvement in algorithm design, an increase in computer speeds by at least 100-fold and an explosion in the amount of data (up to 1,000-fold).[3]

Given the universal availability of algorithms and hardware, a remaining problem is accessing the volumes of data. Kdb+ is known not only for its database performance, but also for its flexible programming language q. Would it not make sense, therefore, to bring the machine-learning algorithms to the data stored in a kdb+ database?

1.1. Who is This Book For?

This book assumes an introductory knowledge of kdb+. It also assumes you have q installed (or know how to install it). Most q developers use the language to build tickerplants, databases or CEP engine. Few use q as an analytics platform, however. This is beginning to change. This book is for the community of developers who, like me, have fallen in love with programming in q and want to extend their computing knowledge into the world of big data and machine-learning algorithms while simultaneously learning how to write better q. It is also for the data scientists who find that the data they need—or want—to analyze is increasingly being stored in a kdb+ database. Using their existing machine-learning knowledge, they will be able to learn q programming idioms that will improve their ability to write efficient code.

1.2. How to Read This Book

This book was written to be read linearly from beginning to end. All code required to run the examples in these chapters is explained when first introduced. You may, of course, jump to future chapters if you are comfortable using functions without understanding their implementation. The last chapter, Chapter 14, is an exception and can be read at any time you are looking for some fun.

I am excited that you purchased this book and are interested in both machine learning and q. Having the book on your desk is a great first step, but the knowledge embedded within won’t magically jump to your brain. I suggest that you run each of the examples yourself. Play with the inputs, examine the outputs and perhaps step through the algorithms by declaring variables with the same name as the function parameters. This is made easier by the consistent naming conventions used in the funq project and listed in Section 1.5. I wrote this book for you and hope you make it through each chapter.

1.3. Getting Started

The source code for this book is included in Appendix A for reference, but can—and should—be downloaded from https://www.fun-q.net. To ensure compatibility with the examples in this book, download release v1.0.0. Once downloaded, you should start your q process from the funq project folder. In this example, I’ve downloaded the 32-bit version of kdb+ 4.0. I always use the -s command line option to start q with multiple secondary threads to enable parallel code execution.

bash-3.2$ cd funq

bash-3.2$ q -s 4

m32/ 4(4)core 8192MB nick nicks-macbookpro.local 192.168.1.3 NONEXPIRE

There are four main libraries that include the utilities and machine-learning algorithms: ut.q, ml.q, fmincg.q and porter.q. The funq.q library loads each of these files.[4]

q)\l funq.q

In addition to the directories distributed with q, the ut, ml, fmincg and porter directories should now be visible.

q)key `

`q`Q`h`j`o`ut`ml`fmincg`porter

The funq project also includes many other libraries dedicated to downloading data from the internet and loading them into memory. We can, for example, load the full text of Pride and Prejudice by Jane Austen.

q)\l pandp.q

[down]loading pride and prejudice text

http://www.gutenberg.org/files/1342/1342_0.txt

The text, with each chapter represented by a vector of characters, can then be found in pandp.s.

q)pandp.s

"It is a truth universally acknowledged, that a single man in\npossession..

"Mr. Bennet was among the earliest of those who waited on Mr.\nBingley. H..

"Not all that Mrs. Bennet, however, with the assistance of her\nfive daug..

"When Jane and Elizabeth were alone, the former, who had been\ncautious i..

"Within a short walk of Longbourn lived a family with whom the\nBennets w..

"The ladies of Longbourn soon waited on those of Netherfield. The\nvisit ..

"Mr. Bennet's property consisted almost entirely in an estate of\ntwo tho..

Other data sets automatically load their contents as tables, matrices and vectors—depending on the appropriate abstraction.

Writing a book that requires data is a brittle process. The examples will break if the data sets move location or change format. All data sets are, therefore, loaded from a dedicated q source file. If future modifications are required, the funq project will be updated to ensure the examples remain valid. Also, if you find that a data set required in the examples below has changed location, please raise an issue in the funq repository.

Installing 7-Zip

The funq project was written with the expectation that it can be run on any platform that kdb+ is supported: Windows, Linux, Solaris and Mac operating systems. Although Linux, Solaris, and Mac come with utilities to uncompress data files in various formats such as .zip, .gz and .tar.gz, Windows does not. To uncompress files on a machine running the Windows operating system, the ut.q library uses the 7-Zip utility (7z.exe). This utility can be downloaded from https://www.7-zip.org.

1.4. What is in This Book?

We begin our journey through machine-learning algorithms in Chapter 2 with the most intuitive supervised learning algorithm: k-nearest neighbors (k-NN). The chapter starts by introducing the funq project, which includes all the code required to run the examples in this book. It then demonstrates the project’s ability to download external data sets by loading, and performing k-NN analysis on, the ever-popular Iris data set. It then takes a step back to explain different distance metrics that are used in machine learning and their relationship with norm metrics. The k-NN implementation is then discussed and examples of using it with different distance metrics are presented. A few common optimizations are then reviewed and the chapter concludes with a presentation of the confusion matrix and a discussion on using cross validation to pick the best value for k (the number of neighbors).

We next turn to a simple, yet elegant, unsupervised clustering algorithm in Chapter 3 called k-means. To explain the k-means algorithm, we discuss the implementation of the Lloyd method of iterative assignment and update steps, which results in the convergence of k distinct clusters. Three different initialization techniques are covered including random cluster selection, random cluster assignment and an algorithm specifically designed to produce distant clusters: k-means++. The elbow method is then used to pick the optimal number of neighbors. To demonstrate the flexibility of the algorithm, different distance metrics are used to implement k-medians and k-harmonic means. The chapter ends with a discussion of k-medoids, which requires the centroids—the center of each cluster—be actual points within the data set.

We continue our discussion on clustering techniques in Chapter 4 by presenting hierarchical agglomerative clustering. The chapter begins by downloading and describing the University of Eastern Finland’s clustering benchmark data set. After first building a dissimilarity matrix, the iterative Lance-Williams algorithm is dissected and its efficiency and flexibility are demonstrated. Different linkage functions are defined and Ward’s method is explained. Methods for generating and ungrouping clusters are reviewed before the chapter concludes by demonstrating how the silhouette cluster quality method can be used to determine the optimal number of clusters.

The final clustering algorithm, expectation-maximization (EM), is described in Chapter 5. The expectation-maximization algorithm, which is used to estimate distribution parameters, can be used as a generalization of the k-means clustering algorithm. Where the k-means algorithm assigns each observation to a single cluster, the expectation-maximization algorithm assigns each observation a probability of belonging to each of the specified clusters. Probabilities require distributions and the chapter introduces the binomial, multinomial, Bernoulli mixture model and Gaussian mixture model distributions using coin flips, dice rolls, the MNIST handwritten-digit data set and the Iris data set respectively. The concepts of likelihood and log likelihood estimators are also introduced.

Chapter 6 returns to supervised learning algorithms and the use of probability distributions with the naive Bayes algorithm. The Iris data set is used again to demonstrate the implementation of Gaussian naive Bayes while the SMS spam data set is analyzed with multinomial naive Bayes. The SMS spam data set is our first introduction to text processing—also referred to as natural language processing (NLP). We then introduce a q implementation of the Porter stemmer--which allows us to combine English words with the same meaning: transformer and transformers for example. We then review binary classification statistics that allow us to quantify how good our binary predictions are. The chapter concludes by introducing the term-frequency inverse document frequency (TF-IDF) statistic and uses it to predict which Jane Austen book a randomly selected chapter comes from.

Chapter 7 describes a powerful (and perhaps my favorite) machine-learning algorithm—decision trees. By continuously splitting a data set based on the values of each feature, we build a tree that, most importantly, creates an intuitive model of the underlying data. Decision trees have a long history and many competing designs. The funq project provides a general-purpose tree-building algorithm that is parameterized to allow the generation of multiple implementations. After the chapter introduces the impurity functions, which are used to determine how much information is in each of the data set features, the ID3 classification-tree algorithm is examined and applied to the famous weather data set. Before proceeding to review more complex classification and regression trees (CART), the chapter demonstrates two visualization tools: the first produces a terminal-appropriate text tree while the second generates output that can be used to produce an image with the Graphviz tool. Digging into the algorithm, the chapter continues to discuss different information-gain calculations, which are used to identify which feature should be split at each branch in the tree. The Iris data set is once again used to demonstrate the power of decision trees. To prevent overly bushy trees and their tendency to over fit, a pruning technique and cross validation are used to pick the appropriate tree complexity. Regression trees that can be used to predict numeric values are then introduced and the chapter concludes by explaining how one-hot encoding can be used to transform categorical features into multiple features with binary values.

If one decision tree is good, multiple decision trees must be great. Chapter 8 introduces the concept of ensemble learning to distinguish between benign and malignant tumors in the Wisconsin Diagnosis Breast Cancer data set. By combining multiple fully grown trees, we can reduce the tendency of single trees to overfit. This is explained in the section about random forests. The second ensemble method combines multiple decision stumps—decision trees with a single branch—in an algorithm called AdaBoost. By expertly combining multiple weak classifiers, we build a powerful strong classifier. The chapter concludes by using cross validation to determine the optimal number of decision stumps to combine.

We next take our first step towards the magic of deep learning. The journey starts in Chapter 9 with linear regression. Matrices and matrix operators are reviewed and used with the wine-quality data set to predict the quality of red wine. We then review abstractions built into the funq project that enable faster matrix operations. After presenting the normal equations, gradient descent is introduced so optimization problems without closed-form solutions can be solved. After describing the concept of feature scaling, we present functions to perform conjugate gradient and stochastic gradient descent. L1 and L2 regularization functions are reviewed before their relationship with LASSO, ridge regression and elastic net is explained. The chapter ends with a review of the cross-validation technique used to determine the best regularization parameters.

A popular binary classification algorithm called logistic regression is used with the Wisconsin Diagnosis Breast Cancer data set in Chapter 10 to model tumor malignancy. The chapter begins by introducing the sigmoid function and the log-loss cost function. In order to find optimal model parameters with gradient descent, the logistic regression cost and gradient functions are defined. Multi-class classification then explained and a one-vs.-all classifier is implemented. The chapter concludes by testing the accuracy of the algorithm on the MNIST handwritten-digit data set.

Having warmed up with a lap through logistic regression, Chapter 11 sprints through the neural network finish line. The chapter begins with a brief history of the perceptron before describing a fully-connected neural network with a single hidden layer. To prevent the common vanishing-gradient problem, two different parameter-initialization techniques are reviewed. The cost and gradient functions for a neural network with a sigmoid activation function and log-loss cost function are introduced along with the concepts of feed-forward loss calculation and backward propagation. To demonstrate the flexibility of the functional neural network implementation, different activation functions—such as the hyperbolic tangent and rectified linear unit relu—and cost functions—such as the cross entropy and softmax functions—are used to analyze the MNIST handwritten-digit data set. The chapter concludes by demonstrating how, in addition to classification problems, neural networks can be used to model regression problems as well.

With the concepts of linear regression, k-nearest neighbors, gradient descent and regularization firmly in our tool belt, Chapter 12 leverages these techniques to implement movie recommender systems. The chapter begins by reviewing the MovieLens data set and uses linear regularization to create a content-based recommender system. By using genre labels such as Romance and Comedy, the model is able to recommend movies that match the genres that we have ranked the highest. We then implement a user-user collaborative filtering recommender system that uses k-nearest neighbors to recommend movies other people with similar tastes also liked. We then introduce two matrix-factorization techniques that decompose the rating matrix into user and item matrices that reflect the user’s interest in, and the movie’s exposure to, hidden features. The first technique uses gradient descent and the second uses a method called alternating least squares.

We finish our discussion of machine learning with an algorithm that made it to the top 10 machine-learning algorithms in 2006. It has since dropped off the list but remains a powerful algorithm that affects our everyday lives. Chapter 13 explains the PageRank algorithm, which was developed by Larry Page and Sergei Brin to rank search results. The chapter begins with the construction of an adjacency matrix to model the links between web pages. This is transformed into a transition matrix also called a stochastic or Markov matrix. By adding random surfing, a Google matrix is created. Three implementations of the PageRank algorithm are then introduced. The power method, which iteratively produces better rankings, is explained first. This is followed by an algebraic solution that uses matrix inversion. The final algorithm uses another iterative approach that allows new observations to adjust a previously stored page rank vector. The chapter concludes with the introduction of sparse matrices and a sparse matrix implementation of the iterative PageRank algorithm.

Admittedly, implementing this many machine-learning algorithms in q may be more fun than you can handle. But if you are ready for more, the last chapter is for you. Chapter 14 is dedicated to using q to visualize data. The chapter begins by reviewing the ASCII plotting tool used throughout the book. The next section uses complex numbers to generate a black and white, as well as grayscale, image of the Mandelbrot set. We then use utility functions to store the Mandelbrot matrices in each of the three Netpbm portable bitmap file formats: black and white, grayscale and color. This allows us to view the images in their full glory with most standard graphics programs. The chapter, book and fun comes to an end when we use q to implement an algorithm to generate sparklines and plot financial time-series data.

1.5. Naming Conventions

The funq project uses single-letter variables, where possible, for function parameters and local variables. In an attempt at consistency, I list the most common variables and their meanings below. Atoms, vectors, tables and functions use lowercase letters. Matrices, however, are represented with uppercase letters.

Table 1.1. Naming Conventions

[3] E. Brynjolfsson and A. McAfee, What’s Driving the Machine Learning Explosion?, Harvard Business Review, 18 Jul 2017.

[4] The funq.q library also attempts to load a few other libraries, but these require extra packages to be installed are not used in this book.

Chapter 2. K-Nearest Neighbors

Machine-learning algorithms can be divided into two types: supervised and unsupervised. Supervised learning algorithms are trained by giving them a set of inputs and told explicitly what the output should be. Unsupervised learning algorithms, however, are only given input data sets and are expected to find models that describe the relationship between data points. Where unsupervised learning algorithms only model the provided data, supervised learning algorithms build models on training data sets so the models can then be used to make predictions on as-yet unseen test data sets.

Building a model to describe the training data has two benefits:

The costly process of building a model can be performed offline, while classification and regression can be quickly performed online.

Large amounts of training data can be used to build the model and then left behind during prediction.

These algorithms are known as eager learning algorithms because they consume their training data immediately. While the remainder of this book will focus on eager learning examples, this chapter is about the lazy learning algorithm called k-nearest neighbors. It is lazy because all computations on the training data are deferred until the predictions are required. The k-nearest neighbors algorithm is also known as an instance-based algorithm because the training instances are kept with the model and used to make predictions for new observations. Algorithms that build models, and leave the training data behind, are intuitively called model-based algorithms.

The k-nearest neighbors algorithm is the simplest and most intuitive supervised learning algorithm. It begins by building a matrix of distances between the training and test data set. The algorithm then attempts to make a classification (regression) prediction by using the labels (values) of the k nearest training data points to classify (predict) the label (value) of the new observations.

2.1. K-Nearest Neighbors Example

Let’s start with a quick demonstration of how the funq project can use the k-NN algorithm to model the iris data set. This may seem like a cursory treatment of the material. Have no fear, we will dissect every line of code as the chapter progresses.

The following code loads the funq.q library and the iris data set from the iris.q library.[5] It then splits the data into training and test partitions and extracts the data into matrices X and Xt, representing the training and test observations, and vectors y and yt representing the training and test species labels.

q)\l funq.q

q)\l iris.q

[down]loading iris data set

q)d:.ut.part[`train`test!7 3;0N?] iris.t

q)y:first get first `Y`X set' 0 1 cut value flip d`train

q)yt:first get first `Yt`Xt set' 0 1 cut value flip d`test

Using the training partition to classify each unseen Iris observation in the test partition with the k-nearest neighbors algorithm obtains an accuracy of 98% when only the nearest neighbor is used.

q)avg yt=.ml.knn[1f%;1;y] .ml.edist[X] each flip Xt

0.9777778

Using three neighbors manages to make a perfect prediction.

q)avg yt=.ml.knn[1f%;3;y] .ml.edist[X] each flip Xt

And finally, the performance degrades when we start using more distant neighbors.

q)avg yt=.ml.knn[1f%;7;y] .ml.edist[X] each flip Xt

0.9555556

This is just one example of how k-nearest neighbors can be parameterized. The algorithm requires us to make three choices:

Which distance metric should we use?

How many neighbors should we include?

How much weight should we give each neighbor?

Before addressing each

Enjoying the preview?

Page 1 of 1

Fun Q: A Functional Introduction to Machine Learning in Q

About this ebook

Nick Psaris

Related authors

Related to Fun Q

Related ebooks

Intelligence (AI) & Semantics For You

Related podcast episodes

Related articles

Related categories

Reviews for Fun Q

What did you think?

Book preview

Fun Q - Nick Psaris

Fun Q

A Functional Introduction to Machine Learning in Q

Nick Psaris

Preface

Chapter 1. Introduction

1.1. Who is This Book For?

1.2. How to Read This Book

1.3. Getting Started

Installing 7-Zip

1.4. What is in This Book?

1.5. Naming Conventions

Chapter 2. K-Nearest Neighbors

2.1. K-Nearest Neighbors Example