Ebook550 pages2 hours

Scala Data Analysis Cookbook

Name: Scala Data Analysis Cookbook
Author: Manivannan Arun
ISBN: 9781784394998

By Manivannan Arun

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Engineers and scientists who are familiar with Scala and would like to exploit the Spark ecosystem for big data analysis will benefit most from this book.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateOct 30, 2015

ISBN9781784394998

Author

Manivannan Arun

Related authors

Skip carousel

Related to Scala Data Analysis Cookbook

Related ebooks

Skip carousel

Scala for Data Science
Ebook
Scala for Data Science
byBugnion Pascal
Rating: 0 out of 5 stars
0 ratings
Apache Hive Cookbook
Ebook
Apache Hive Cookbook
byShrey Mehrotra
Rating: 0 out of 5 stars
0 ratings
Java Data Science Cookbook
Ebook
Java Data Science Cookbook
byRushdi Shams
Rating: 0 out of 5 stars
0 ratings
Clojure Data Analysis Cookbook - Second Edition
Ebook
Clojure Data Analysis Cookbook - Second Edition
byEric Rochester
Rating: 0 out of 5 stars
0 ratings
Hadoop Real-World Solutions Cookbook - Second Edition
Ebook
Hadoop Real-World Solutions Cookbook - Second Edition
byDeshpande Tanmay
Rating: 0 out of 5 stars
0 ratings
Scientific Computing with Scala
Ebook
Scientific Computing with Scala
byVytautas Jančauskas
Rating: 0 out of 5 stars
0 ratings
Clojure Programming Cookbook
Ebook
Clojure Programming Cookbook
byMakoto Hashimoto
Rating: 0 out of 5 stars
0 ratings
Spark in Action: Covers Apache Spark 3 with Examples in Java, Python, and Scala
Ebook
Spark in Action: Covers Apache Spark 3 with Examples in Java, Python, and Scala
byJean-Georges Perrin
Rating: 0 out of 5 stars
0 ratings
MLOps Engineering at Scale
Ebook
MLOps Engineering at Scale
byCarl Osipov
Rating: 0 out of 5 stars
0 ratings
Mastering Spark for Data Science
Ebook
Mastering Spark for Data Science
byAndrew Morgan
Rating: 0 out of 5 stars
0 ratings
Flask By Example
Ebook
Flask By Example
byDwyer Gareth
Rating: 0 out of 5 stars
0 ratings
Frank Kane's Taming Big Data with Apache Spark and Python
Ebook
Frank Kane's Taming Big Data with Apache Spark and Python
byFrank Kane
Rating: 0 out of 5 stars
0 ratings
Hadoop MapReduce v2 Cookbook - Second Edition
Ebook
Hadoop MapReduce v2 Cookbook - Second Edition
byThilina Gunarathne
Rating: 0 out of 5 stars
0 ratings
Pandas in Action
Ebook
Pandas in Action
byBoris Paskhaver
Rating: 0 out of 5 stars
0 ratings
Hadoop Blueprints
Ebook
Hadoop Blueprints
byAnurag Shrivastava
Rating: 0 out of 5 stars
0 ratings
Apache Spark for Data Science Cookbook
Ebook
Apache Spark for Data Science Cookbook
byPadma Priya Chitturi
Rating: 0 out of 5 stars
0 ratings
D Cookbook
Ebook
D Cookbook
byAdam D. Ruppe
Rating: 0 out of 5 stars
0 ratings
Learning Concurrent Programming in Scala
Ebook
Learning Concurrent Programming in Scala
byAleksandar Prokopec
Rating: 0 out of 5 stars
0 ratings
Scala for Machine Learning
Ebook
Scala for Machine Learning
byNicolas Patrick R.
Rating: 0 out of 5 stars
0 ratings
Scala Test-Driven Development
Ebook
Scala Test-Driven Development
byGaurav Sood
Rating: 0 out of 5 stars
0 ratings
Mastering Scala Machine Learning
Ebook
Mastering Scala Machine Learning
byAlex Kozlov
Rating: 0 out of 5 stars
0 ratings
Scala Functional Programming Patterns
Ebook
Scala Functional Programming Patterns
byKhot Atul S.
Rating: 0 out of 5 stars
0 ratings
Learning Concurrent Programming in Scala - Second Edition
Ebook
Learning Concurrent Programming in Scala - Second Edition
byAleksandar Prokopec
Rating: 0 out of 5 stars
0 ratings
Scala in Depth
Ebook
Scala in Depth
byJosh Suereth
Rating: 4 out of 5 stars
4/5
Learning YARN
Ebook
Learning YARN
byAkhil Arora
Rating: 0 out of 5 stars
0 ratings
Python Text Processing with NLTK 2.0 Cookbook: LITE
Ebook
Python Text Processing with NLTK 2.0 Cookbook: LITE
byJacob Perkins
Rating: 4 out of 5 stars
4/5
Building Python Real-Time Applications with Storm
Ebook
Building Python Real-Time Applications with Storm
byBhatnagar Kartik
Rating: 0 out of 5 stars
0 ratings
Scala Design Patterns
Ebook
Scala Design Patterns
byNikolov Ivan
Rating: 0 out of 5 stars
0 ratings
Practical Python Data Visualization: A Fast Track Approach To Learning Data Visualization With Python
Ebook
Practical Python Data Visualization: A Fast Track Approach To Learning Data Visualization With Python
byAshwin Pajankar
Rating: 4 out of 5 stars
4/5
Python High Performance - Second Edition
Ebook
Python High Performance - Second Edition
byGabriele Lanaro
Rating: 0 out of 5 stars
0 ratings

Programming For You

Skip carousel

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
Ebook
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
byDexter Jackson
Rating: 4 out of 5 stars
4/5
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
Ebook
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
byJoseph Labrecque
Rating: 5 out of 5 stars
5/5
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
Ebook
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
byMark Chan
Rating: 5 out of 5 stars
5/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
SQL All-in-One For Dummies
Ebook
SQL All-in-One For Dummies
byAllen G. Taylor
Rating: 3 out of 5 stars
3/5
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
Ebook
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
byBrady Ellison
Rating: 5 out of 5 stars
5/5
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
Ebook
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
byTravis Plunk
Rating: 0 out of 5 stars
0 ratings
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
Ebook
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
byHeath Haskins
Rating: 5 out of 5 stars
5/5
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
Ebook
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
byEric Vargas
Rating: 0 out of 5 stars
0 ratings
Python Projects for Beginners: A Ten-Week Bootcamp Approach to Python Programming
Ebook
Python Projects for Beginners: A Ten-Week Bootcamp Approach to Python Programming
byConnor P. Milliken
Rating: 0 out of 5 stars
0 ratings
The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application
Ebook
The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application
byPaul Richards
Rating: 0 out of 5 stars
0 ratings
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
Ebook
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
byDavid DuRocher
Rating: 4 out of 5 stars
4/5
Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS
Ebook
Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS
byGame Guidez
Rating: 5 out of 5 stars
5/5
Teach Yourself C++
Ebook
Teach Yourself C++
byAl Stevens
Rating: 4 out of 5 stars
4/5
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
Ebook
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
byi Code Academy
Rating: 5 out of 5 stars
5/5
The Little SAS Book: A Primer, Sixth Edition
Ebook
The Little SAS Book: A Primer, Sixth Edition
byLora D. Delwiche
Rating: 5 out of 5 stars
5/5
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
Ebook
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
byTimothy C. Needham
Rating: 4 out of 5 stars
4/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
101 Amazing Nintendo NES Facts: Includes facts about the Famicom
Ebook
101 Amazing Nintendo NES Facts: Includes facts about the Famicom
byJimmy Russell
Rating: 4 out of 5 stars
4/5
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
Podcast episode
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
byThe Web Platform Podcast
100%
100% found this document useful
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
Podcast episode
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
byCoder Radio
0 ratings
0% found this document useful
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
Podcast episode
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
byData Engineering Podcast
0 ratings
0% found this document useful
#37 Prophet, Time Series & Causal Inference, with Sean Taylor
Podcast episode
#37 Prophet, Time Series & Causal Inference, with Sean Taylor
byLearning Bayesian Statistics
0 ratings
0% found this document useful
Speed Up And Simplify Your Streaming Data Workloads With Red Panda - Episode 152: An interview with Vectorized founder Alexander Gallego about the Red Panda streaming engine and building a drop-in replacement for Kafka with better performance and throughput.
Podcast episode
Speed Up And Simplify Your Streaming Data Workloads With Red Panda - Episode 152: An interview with Vectorized founder Alexander Gallego about the Red Panda streaming engine and building a drop-in replacement for Kafka with better performance and throughput.
byData Engineering Podcast
0 ratings
0% found this document useful
Every commit is a gift: celebrating Maintainer Week with Brett Cannon
Podcast episode
Every commit is a gift: celebrating Maintainer Week with Brett Cannon
byThe Changelog: Software Development, Open Source
0 ratings
0% found this document useful
How ChatGPT Changes Tech + The End of Remote Work? — With Aaron Levie
Podcast episode
How ChatGPT Changes Tech + The End of Remote Work? — With Aaron Levie
byBig Technology Podcast
100%
100% found this document useful
Build Your Data Analytics Like An Engineer - Episode 81: An interview about how dbt enables your data teams to build better analytics in your data warehouse
Podcast episode
Build Your Data Analytics Like An Engineer - Episode 81: An interview about how dbt enables your data teams to build better analytics in your data warehouse
byData Engineering Podcast
0 ratings
0% found this document useful
All Things Azure with Dwayne Monroe: Dwayne Monroe is a senior cloud architect at Cloudreach, an organization that helps enterprises maximize their cloud investments, who’s focused on Azure. Prior to joining Cloudreach, Dwayne worked as a senior Microsoft and cloud architect at High Availabi
Podcast episode
All Things Azure with Dwayne Monroe: Dwayne Monroe is a senior cloud architect at Cloudreach, an organization that helps enterprises maximize their cloud investments, who’s focused on Azure. Prior to joining Cloudreach, Dwayne worked as a senior Microsoft and cloud architect at High Availabi
byScreaming in the Cloud
0 ratings
0% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Exploring K-means Clustering and Building a Gradebook With Pandas
Podcast episode
Exploring K-means Clustering and Building a Gradebook With Pandas
byThe Real Python Podcast
0 ratings
0% found this document useful
25: Selenium, pytest, Mozilla – Dave Hunt: Interview with Dave Hunt @davehunt82. We Cover: Selenium Driver: http://www.seleniumhq.org/ pytest: http://docs.pytest.org/ pytest plugins: pytest-selenium: http://pytest-selenium.readthedocs.io/ pytest-html: https://pypi.python.
Podcast episode
25: Selenium, pytest, Mozilla – Dave Hunt: Interview with Dave Hunt @davehunt82. We Cover: Selenium Driver: http://www.seleniumhq.org/ pytest: http://docs.pytest.org/ pytest plugins: pytest-selenium: http://pytest-selenium.readthedocs.io/ pytest-html: https://pypi.python.
byTest and Code
0 ratings
0% found this document useful
Machine Learning Bias and Fairness with Timnit Gebru and Margaret Mitchell: Timnit Gebru and Margaret Mitchell discuss machine learning bias and fairness.
Podcast episode
Machine Learning Bias and Fairness with Timnit Gebru and Margaret Mitchell: Timnit Gebru and Margaret Mitchell discuss machine learning bias and fairness.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
Podcast episode
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
byData Engineering Podcast
0 ratings
0% found this document useful
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
Podcast episode
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
Podcast episode
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
#122 How Organizations Can Bridge the Data Literacy Gap
Podcast episode
#122 How Organizations Can Bridge the Data Literacy Gap
byDataFramed
0 ratings
0% found this document useful
Engineering interview tips & tricks: with Emma Draper & Jonas
Podcast episode
Engineering interview tips & tricks: with Emma Draper & Jonas
byGo Time: Golang, Software Engineering
0 ratings
0% found this document useful
Simplifying Data Integration Through Eventual Connectivity - Episode 91: An interview about a new pattern for data integration that reduces the amount of effort required to find connections in numerous data sets
Podcast episode
Simplifying Data Integration Through Eventual Connectivity - Episode 91: An interview about a new pattern for data integration that reduces the amount of effort required to find connections in numerous data sets
byData Engineering Podcast
0 ratings
0% found this document useful
Let's Talk About Natural Language Processing: This episode reboots our podcast with the theme of Natural Language Processing for the next few months. We begin with introductions of Yoshi and Linh Da and then get into a broad discussion about natural language processing: what it is, what some of...
Podcast episode
Let's Talk About Natural Language Processing: This episode reboots our podcast with the theme of Natural Language Processing for the next few months. We begin with introductions of Yoshi and Linh Da and then get into a broad discussion about natural language processing: what it is, what some of...
byData Skeptic
0 ratings
0% found this document useful
JavaScript is the CO2 of the web: with Chris Ferdinandi, "The Vanilla JavaScript guy"
Podcast episode
JavaScript is the CO2 of the web: with Chris Ferdinandi, "The Vanilla JavaScript guy"
byJS Party: JavaScript, CSS, Web Development
0 ratings
0% found this document useful
#42 Full Stack Data Science
Podcast episode
#42 Full Stack Data Science
byDataFramed
0 ratings
0% found this document useful
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
Podcast episode
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Diversification in Recommender Systems with Ahsan Ashraf - TWiML Talk #187: In this episode of our Strata Data conference series, we’re joined by Ahsan Ashraf, data scientist at Pinterest. In our conversation, Ahsan and I discuss his presentation from the conference, “Diversification in recommender systems: Using topical...
Podcast episode
Diversification in Recommender Systems with Ahsan Ashraf - TWiML Talk #187: In this episode of our Strata Data conference series, we’re joined by Ahsan Ashraf, data scientist at Pinterest. In our conversation, Ahsan and I discuss his presentation from the conference, “Diversification in recommender systems: Using topical...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
A Programmer's Introduction to Mathematics with Jeremy Kun: Like Programming, Mathematics has language and culture. Jeremy Kun has written A Programmer's Introduction to Mathematics as a way to bridge these two worlds and make the power and magic of mathematics available and understandable to programmers everywhere.
Podcast episode
A Programmer's Introduction to Mathematics with Jeremy Kun: Like Programming, Mathematics has language and culture. Jeremy Kun has written A Programmer's Introduction to Mathematics as a way to bridge these two worlds and make the power and magic of mathematics available and understandable to programmers everywhere.
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
Going with GraphQL: featuring Mark Sandstrom & Ben Kraft
Podcast episode
Going with GraphQL: featuring Mark Sandstrom & Ben Kraft
byGo Time: Golang, Software Engineering
0 ratings
0% found this document useful
Mark Downie: Balancing The Promises That Open Source Projects Make: Robby speaks with Mark Downie, Program Manager at Microsoft. They discuss the benefits of frameworks and approaches to making your open source project accessible and welcoming to new contributors and users. Mark also shares how Visual Studio's workflow for navigating customer requirements and getting early feedback, along with an introduction to what a Program Manager role is responsible for on the Visual Studio team.
Podcast episode
Mark Downie: Balancing The Promises That Open Source Projects Make: Robby speaks with Mark Downie, Program Manager at Microsoft. They discuss the benefits of frameworks and approaches to making your open source project accessible and welcoming to new contributors and users. Mark also shares how Visual Studio's workflow for navigating customer requirements and getting early feedback, along with an introduction to what a Program Manager role is responsible for on the Visual Studio team.
byMaintainable
0 ratings
0% found this document useful
Using FoundationDB As The Bedrock For Your Distributed Systems - Episode 80: An interview about the FoundationDB project and how it simplifies the work of building custom distributed systems applications
Podcast episode
Using FoundationDB As The Bedrock For Your Distributed Systems - Episode 80: An interview about the FoundationDB project and how it simplifies the work of building custom distributed systems applications
byData Engineering Podcast
0 ratings
0% found this document useful
Building Real Time Applications On Streaming Data With Eventador - Episode 129: An interview with Eventador CEO Kenny Gorman about the challenges of building a managed service for streaming data to simplify building real time applications
Podcast episode
Building Real Time Applications On Streaming Data With Eventador - Episode 129: An interview with Eventador CEO Kenny Gorman about the challenges of building a managed service for streaming data to simplify building real time applications
byData Engineering Podcast
0 ratings
0% found this document useful
Make Database Performance Optimization A Playful Experience With Ottertune: An interview with Andy Pavlo about his work on Ottertune to automatically tune your database configuration for better performance.
Podcast episode
Make Database Performance Optimization A Playful Experience With Ottertune: An interview with Andy Pavlo about his work on Ottertune to automatically tune your database configuration for better performance.
byData Engineering Podcast
0 ratings
0% found this document useful

Skip carousel

Build A Static Analysis Development Pipeline
Linux Format
Article
Build A Static Analysis Development Pipeline
Jul 27, 2021
9 min read
Usability
Linux Format
Article
Usability
Oct 19, 2021
3 min read
Docker vs Podman
APC
Article
Docker vs Podman
Apr 19, 2021
When Cockpit was first developed, it had plug-in support for administering your Docker containers remotely via its user-friendly web interface. But then Red Hat OS became a major backer of Cockpit, and when Red Hat developed its own alternative to Do
1 min read
VisionFive V1 RISC-V SBC on sale
Linux Format
Article
VisionFive V1 RISC-V SBC on sale
May 3, 2022
1 min read
Rokoko Studio 2.0
3D World
Article
Rokoko Studio 2.0
Feb 23, 2021
1 min read
Elasticsearch And Kibana Basics
Linux Format
Article
Elasticsearch And Kibana Basics
Dec 15, 2020
1 min read
Create Asynchronous Code With Python
Linux Format
Article
Create Asynchronous Code With Python
Jun 29, 2021
8 min read
Develop TCP/IP Servers And Clients
Linux Format
Article
Develop TCP/IP Servers And Clients
Aug 23, 2022
RUST OUR EXPERT Get the code for this tutorial from the Linux Format archive: www. linuxformat. com/archives ?issue=293. You can learn more about Rust at www. rust-lang.org. This month we’ll learn how to develop TCP/IP servers and clients in Rust
10 min read
Route Traffic Between Networks Using A Pi
Linux Format
Article
Route Traffic Between Networks Using A Pi
Jun 2, 2020
A deep-dive into Pi networking solutions resulted in this tutorial. The goal was to uncover a Pi configuration that would enable the routing of network traffic from a wired network to a wireless network. The aim is to build a network router using a R
10 min read
Updating Sites
Linux Format
Article
Updating Sites
Oct 19, 2021
1 min read
Installation
Linux Format
Article
Installation
Oct 19, 2021
1 min read
In Brief
Linux Format
Article
In Brief
Jun 1, 2021
Mu is a code editor for many forms of Python. We can write standard Python 3 code, create web apps and write code for microcontrollers such as the new Raspberry Pi Pico. Mu is designed for new users and does away with complicated IDEs in favour of a
1 min read
Traefik Configuration
Linux Format
Article
Traefik Configuration
Mar 10, 2020
In this tutorial we have configured Traefik using command-line switches in our Docker Compose file (the section starting command:). This is the equivalent of starting the application with a whole bunch of command options each time, and while this wou
1 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
Comparing Time Series Data Like A Pro
Linux Format
Article
Comparing Time Series Data Like A Pro
Jun 1, 2021
8 min read
How To Use Mojolicious For Web Scraping
Linux Format
Article
How To Use Mojolicious For Web Scraping
Mar 8, 2022
Part One Don’t miss next issue! Subscribe on page 16 Mark Gardner is a software developer and blogger with over 25 years of IT experience. You can reach him at www.phoenixtrap.com and @markjgardner. The map function is designed to transform a list or
5 min read
How To Use Mojolicious For Web Scraping
Linux Format
Article
How To Use Mojolicious For Web Scraping
Mar 8, 2022
Part One Don’t miss next issue! Subscribe on page 16 Mark Gardner is a software developer and blogger with over 25 years of IT experience. You can reach him at www.phoenixtrap.com and @markjgardner. The map function is designed to transform a list or
5 min read
Code An Admin Back-end In Django
Linux Format
Article
Code An Admin Back-end In Django
Dec 13, 2022
Credit: www.djangoproject.com OUR EXPERT Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://
6 min read
Scan And Scrape Websites Using Python
Linux Format
Article
Scan And Scrape Websites Using Python
Nov 14, 2023
David Bolton once accidentally boosted the traffic for his firm’s website by 25% in one day by running a web scraper on it. Luckily, they never found out! Ever since the web made an appearance back in the mid-’90s, programmers have been writing softw
6 min read
Collect And Graph Metrics With Python
Linux Format
Article
Collect And Graph Metrics With Python
May 4, 2021
7 min read
Discover Easy-to -build Desktop Apps
Linux Format
Article
Discover Easy-to -build Desktop Apps
Oct 22, 2019
Electron is actually a browser packaged with node.js and a few APIs. Because it’s built on top of the Chromium browser, you have everything available from there to add to your application. GitHub developed it as part of the Atom editor; it was open-s
7 min read
Database Control With C++ Tools
Linux Format
Article
Database Control With C++ Tools
Dec 17, 2019
10 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
Visualise Smart- Home Sensor Data
Linux Format
Article
Visualise Smart- Home Sensor Data
Oct 17, 2023
8 min read
Visualise Complex Data In Style Using Timelion
Linux Format
Article
Visualise Complex Data In Style Using Timelion
Oct 20, 2020
Simon Quain is a site reliability engineer who likes discovering open datasets online to play around with in the Elastic Stack. You’ve probably heard of Elasticsearch – the search engine that enables you to index and then quickly search through your
9 min read
Google Answer Box Strategy
Techfastly
Article
Google Answer Box Strategy
Sep 21, 2020
Leveraging the Google PAA (People Also Ask) element on a Search Results Page for Targeted Content Creation with a Python Scraper All businesses that are online today are creating content at a furious pace. According to Technavio, a research firm, con
7 min read
How To Code Diagrams, Graphs And Pie Charts
Linux Format
Article
How To Code Diagrams, Graphs And Pie Charts
Mar 9, 2021
7 min read
Create Smaller Sized Apps With React
Linux Format
Article
Create Smaller Sized Apps With React
Nov 19, 2019
You may not be surprised that some developers have criticised Electron (see tutorials LXF256), mostly regarding the memory usage of its final binaries. The initial binary is over 100MB, because a major chunk of code from Chrome is embedded. When you
6 min read
Create Visualisations And Cool Dashboards
Linux Format
Article
Create Visualisations And Cool Dashboards
Jan 14, 2020
8 min read
Liz Rice Chief Open Source Officer at Isovalent
Techfastly
Article
Liz Rice Chief Open Source Officer at Isovalent
Apr 1, 2022
5 min read

Related categories

Skip carousel

Reviews for Scala Data Analysis Cookbook

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Scala Data Analysis Cookbook - Manivannan Arun

Scala Data Analysis Cookbook

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why Subscribe?

Free Access for Packt account holders

Preface

Apache Flink

Scalding

Saddle

Spire

Akka

Accord

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

How to do it…

How it works…

There's more…

See also

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Getting Started with Breeze

Introduction

Getting Breeze – the linear algebra library

How to do it...

There's more...

The org.scalanlp.breeze dependency

The org.scalanlp.breeze-natives package

Working with vectors

Getting ready

How to do it...

Creating vectors

Constructing a vector from values

Creating a zero vector

Creating a vector out of a function

Creating a vector of linearly spaced values

Creating a vector with values in a specific range

Creating an entire vector with a single value

Slicing a sub-vector from a bigger vector

Creating a Breeze Vector from a Scala Vector

Vector arithmetic

Scalar operations

Calculating the dot product of two vectors

Creating a new vector by adding two vectors together

Appending vectors and converting a vector of one type to another

Concatenating two vectors

Converting a vector of Int to a vector of Double

Computing basic statistics

Mean and variance

Standard deviation

Find the largest value in a vector

Finding the sum, square root and log of all the values in the vector

The Sqrt function

The Log function

Working with matrices

How to do it...

Creating matrices

Creating a matrix from values

Creating a zero matrix

Creating a matrix out of a function

Creating an identity matrix

Creating a matrix from random numbers

Creating from a Scala collection

Matrix arithmetic

Addition

Multiplication

Appending and conversion

Concatenating matrices – vertically

Concatenating matrices – horizontally

Converting a matrix of Int to a matrix of Double

Data manipulation operations

Getting column vectors out of the matrix

Getting row vectors out of the matrix

Getting values inside the matrix

Getting the inverse and transpose of a matrix

Computing basic statistics

Mean and variance

Standard deviation

Finding the largest value in a matrix

Finding the sum, square root and log of all the values in the matrix

Sqrt

Log

Calculating the eigenvectors and eigenvalues of a matrix

How it works...

Vectors and matrices with randomly distributed values

How it works...

Creating vectors with uniformly distributed random values

Creating vectors with normally distributed random values

Creating vectors with random values that have a Poisson distribution

Creating a matrix with uniformly random values

Creating a matrix with normally distributed random values

Creating a matrix with random values that has a Poisson distribution

Reading and writing CSV files

How it works...

2. Getting Started with Apache Spark DataFrames

Introduction

Getting Apache Spark

How to do it...

Creating a DataFrame from CSV

How to do it...

How it works...

There's more…

Manipulating DataFrames

How to do it...

Printing the schema of the DataFrame

Sampling the data in the DataFrame

Selecting DataFrame columns

Filtering data by condition

Sorting data in the frame

Renaming columns

Treating the DataFrame as a relational table

Joining two DataFrames

Inner join

Right outer join

Left outer join

Saving the DataFrame as a file

Creating a DataFrame from Scala case classes

How to do it...

How it works...

3. Loading and Preparing Data – DataFrame

Introduction

Loading more than 22 features into classes

How to do it...

How it works...

There's more…

Loading JSON into DataFrames

How to do it…

Reading a JSON file using SQLContext.jsonFile

Reading a text file and converting it to JSON RDD

Explicitly specifying your schema

There's more…

Storing data as Parquet files

How to do it…

Load a simple CSV file, convert it to case classes, and create a DataFrame from it

Save it as a Parquet file

Install Parquet tools

Using the tools to inspect the Parquet file

Enable compression for the Parquet file

Using the Avro data model in Parquet

How to do it…

Creation of the Avro model

Generation of Avro objects using the sbt-avro plugin

Constructing an RDD of our generated object from Students.csv

Saving RDD[StudentAvro] in a Parquet file

Reading the file back for verification

Using Parquet tools for verification

Loading from RDBMS

How to do it…

Preparing data in Dataframes

How to do it...

4. Data Visualization

Introduction

Visualizing using Zeppelin

How to do it...

Installing Zeppelin

Customizing Zeppelin's server and websocket port

Visualizing data on HDFS – parameterizing inputs

Running custom functions

Adding external dependencies to Zeppelin

Pointing to an external Spark cluster

Creating scatter plots with Bokeh-Scala

How to do it...

Preparing our data

Creating Plot and Document objects

Creating a marker object

Setting the X and Y axes' data range for the plot

Drawing the x and the y axes

Viewing flower species with varying colors

Adding grid lines

Adding a legend to the plot

Creating a time series MultiPlot with Bokeh-Scala

How to do it...

Preparing our data

Creating a plot

Creating a line that joins all the data points

Setting the x and y axes' data range for the plot

Drawing the axes and the grids

Adding tools

Adding a legend to the plot

Multiple plots in the document

5. Learning from Data

Introduction

Supervised and unsupervised learning

Gradient descent

Predicting continuous values using linear regression

How to do it...

Importing the data

Converting each instance into a LabeledPoint

Preparing the training and test data

Scaling the features

Training the model

Predicting against test data

Evaluating the model

Regularizing the parameters

Mini batching

Binary classification using LogisticRegression and SVM

How to do it...

Importing the data

Tokenizing the data and converting it into LabeledPoints

Factoring the inverse document frequency

Prepare the training and test data

Constructing the algorithm

Training the model and predicting the test data

Evaluating the model

Binary classification using LogisticRegression with Pipeline API

How to do it...

Importing and splitting data as test and training sets

Construct the participants of the Pipeline

Preparing a pipeline and training a model

Predicting against test data

Evaluating a model without cross-validation

Constructing parameters for cross-validation

Constructing cross-validator and fit the best model

Evaluating the model with cross-validation

Clustering using K-means

How to do it...

KMeans.RANDOM

KMeans.PARALLEL

K-means++

K-means||

Max iterations

Epsilon

Importing the data and converting it into a vector

Feature scaling the data

Deriving the number of clusters

Constructing the model

Evaluating the model

Feature reduction using principal component analysis

How to do it...

Dimensionality reduction of data for supervised learning

Mean-normalizing the training data

Extracting the principal components

Preparing the labeled data

Preparing the test data

Classify and evaluate the metrics

Dimensionality reduction of data for unsupervised learning

Mean-normalizing the training data

Extracting the principal components

Arriving at the number of components

Evaluating the metrics

6. Scaling Up

Introduction

Building the Uber JAR

How to do it...

Transitive dependency stated explicitly in the SBT dependency

Two different libraries depend on the same external library

Submitting jobs to the Spark cluster (local)

How to do it...

Downloading Spark

Running HDFS on Pseudo-clustered mode

Running the Spark master and slave locally

Pushing data into HDFS

Submitting the Spark application on the cluster

Running the Spark Standalone cluster on EC2

How to do it...

Creating the AccessKey and pem file

Setting the environment variables

Running the launch script

Verifying installation

Making changes to the code

Transferring the data and job files

Loading the dataset into HDFS

Running the job

Destroying the cluster

Running the Spark Job on Mesos (local)

How to do it...

Installing Mesos

Starting the Mesos master and slave

Uploading the Spark binary package and the dataset to HDFS

Running the job

Running the Spark Job on YARN (local)

How to do it...

Installing the Hadoop cluster

Starting HDFS and YARN

Pushing Spark assembly and dataset to HDFS

Running a Spark job in yarn-client mode

Running Spark job in yarn-cluster mode

7. Going Further

Introduction

Using Spark Streaming to subscribe to a Twitter stream

How to do it...

Using Spark as an ETL tool

How to do it...

Using StreamingLogisticRegression to classify a Twitter stream using Kafka as a training stream

How to do it...

Using GraphX to analyze Twitter data

How to do it...

Index

Scala Data Analysis Cookbook

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: October 2015

Production reference: 1261015

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78439-674-9

www.packtpub.com

Credits

Author

Arun Manivannan

Reviewers

Amir Hajian

Shams Mahmood Imam

Gerald Loeffler

Commissioning Editor

Nadeem N. Bagban

Acquisition Editor

Larissa Pinto

Content Development Editor

Rashmi Suvarna

Technical Editor

Tanmayee Patil

Copy Editors

Ameesha Green

Vikrant Phadke

Project Coordinator

Milton Dsouza

Proofreader

Safis Editing

Indexer

Rekha Nair

Production Coordinator

Manu Joseph

Cover Work

Manu Joseph

About the Author

Arun Manivannan has been an engineer in various multinational companies, tier-1 financial institutions, and start-ups, primarily focusing on developing distributed applications that manage and mine data. His languages of choice are Scala and Java, but he also meddles around with various others for kicks. He blogs at http://rerun.me.

Arun holds a master's degree in software engineering from the National University of Singapore.

He also holds degrees in commerce, computer applications, and HR management. His interests and education could probably be a good dataset for clustering.

I am deeply indebted to my dad, Manivannan, who taught me the value of persistence, hard work and determination in life, and my mom, Arockiamary, without whose prayers and boundless love I'd be nothing. I could never try to pay them back. No words can do justice to thank my loving wife, Daisy. Her humongous faith in me and her support and patience make me believe in lifelong miracles. She simply made me the man I am today.

I can't finish without thanking my 6-year old son, Jason, for hiding his disappointment in me as I sat in front of the keyboard all the time. In your smiles and hugs, I derive the purpose of my life.

I would like to specially thank Abhilash, Rajesh, and Mohan, who proved that hard times reveal true friends.

It would be a crime not to thank my VCRC friends for being a constant source of inspiration. I am proud to be a part of the bunch.

Also, I sincerely thank the truly awesome reviewers and editors at Packt Publishing. Without their guidance and feedback, this book would have never gotten its current shape. I sincerely apologize for all the typos and errors that could have crept in.

About the Reviewers

Amir Hajian is a data scientist at the Thomson Reuters Data Innovation Lab. He has a PhD in astrophysics, and prior to joining Thomson Reuters, he was a senior research associate at the Canadian Institute for Theoretical Astrophysics in Toronto and a research physicist at Princeton University. His main focus in recent years has been bringing data science into astrophysics by developing and applying new algorithms for astrophysical data analysis using statistics, machine learning, visualization, and big data technology. Amir's research has been frequently highlighted in the media. He has led multinational research team efforts into successful publications. He has published in more than 70 peer-reviewed articles with more than 4,000 citations, giving him an h-index of 34.

I would like to thank the Canadian Institute for Theoretical Astrophysics for providing the excellent computational facilities that I enjoyed during the review of this book.

Shams Mahmood Imam completed his PhD from the department of computer science at Rice University, working under Prof. Vivek Sarkar in the Habanero multicore software research project. His research interests mostly include parallel programming models and runtime systems, with the aim of making the writing of task-parallel programs on multicore machines easier for programmers. Shams is currently completing his thesis titled Cooperative Execution of Parallel Tasks with Synchronization Constraints. His work involves building a generic framework that efficiently supports all synchronization patterns (and not only those available in actors or the fork-join model) in task-parallel programs. It includes extensions such as Eureka programming for speculative computations in task-parallel models and selectors for coordination protocols in the actor model. Shams implemented a framework as part of the cooperative runtime for the Habanero-Java parallel programming library. His work has been published at leading conferences, such as OOPSLA, ECOOP, Euro-Par, PPPJ, and so on. Previously, he has been involved in projects such as Habanero-Scala, CnC-Scala, CnC-Matlab, and CnC-Python.

Gerald Loeffler is an MBA. He was trained as a biochemist and has worked in academia and the pharmaceutical industry, conducting research in parallel and distributed biophysical computer simulations and data science in bioinformatics. Then he switched to IT consulting and widened his interests to include general software development and architecture, focusing on JVM-centric enterprise applications, systems, and their integration ever since. Inspired by the practice of commercial software development projects in this context, Gerald has developed a keen interest in team collaboration, the software craftsmanship movement, sound software engineering, type safety, distributed software and system architectures, and the innovations introduced by technologies such as Java EE, Scala, Akka, and Spark. He is employed by MuleSoft as a principal solutions architect in their professional services team, working with EMEA clients on their integration needs and the challenges that spring from them.

Gerald lives with his wife and two cats in Vienna, Austria, where he enjoys music, theatre, and city life.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Scala Data Analysis Cookbook

About this ebook

Manivannan Arun

Related authors

Related to Scala Data Analysis Cookbook

Related ebooks

Programming For You

Related podcast episodes

Related articles

Related categories

Reviews for Scala Data Analysis Cookbook

What did you think?

Book preview

Scala Data Analysis Cookbook - Manivannan Arun

Table of Contents

Scala Data Analysis Cookbook

Scala Data Analysis Cookbook

Credits

About the Author

About the Reviewers

Support files, eBooks, discount offers, and more

Why Subscribe?