Clojure Data Analysis Cookbook - Second Edition

Ebook954 pages6 hours

Clojure Data Analysis Cookbook - Second Edition

Name: Clojure Data Analysis Cookbook - Second Edition
Author: Eric Rochester
ISBN: 9781784399955

By Eric Rochester

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book

Take control of your data, from collection to classification
Troubleshoot and solve data analysis problems using Clojure and a variety of Java libraries
Get clear, practical techniques for every stage of data analysis

Who This Book Is For

This book is for those with a basic knowledge of Clojure, who are looking to push the language to excel with data analysis.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateJan 27, 2015

ISBN9781784399955

Author

Eric Rochester

Related authors

Skip carousel

Related to Clojure Data Analysis Cookbook - Second Edition

Related ebooks

Skip carousel

Scala Data Analysis Cookbook
Ebook
Scala Data Analysis Cookbook
byManivannan Arun
Rating: 0 out of 5 stars
0 ratings
Elixir Cookbook
Ebook
Elixir Cookbook
byPaulo A Pereira
Rating: 0 out of 5 stars
0 ratings
D Cookbook
Ebook
D Cookbook
byAdam D. Ruppe
Rating: 0 out of 5 stars
0 ratings
Python Business Intelligence Cookbook
Ebook
Python Business Intelligence Cookbook
byDempsey Robert
Rating: 0 out of 5 stars
0 ratings
Windows Application Development Cookbook
Ebook
Windows Application Development Cookbook
byMarcin Jamro
Rating: 0 out of 5 stars
0 ratings
MongoDB Cookbook - Second Edition
Ebook
MongoDB Cookbook - Second Edition
byDasadia Cyrus
Rating: 0 out of 5 stars
0 ratings
Clojure for Data Science
Ebook
Clojure for Data Science
byGarner Henry
Rating: 0 out of 5 stars
0 ratings
Clojure Programming Cookbook
Ebook
Clojure Programming Cookbook
byMakoto Hashimoto
Rating: 0 out of 5 stars
0 ratings
Mastering Clojure
Ebook
Mastering Clojure
byWali Akhil
Rating: 0 out of 5 stars
0 ratings
Clojure Web Development Essentials
Ebook
Clojure Web Development Essentials
byRyan Baldwin
Rating: 0 out of 5 stars
0 ratings
Learning ClojureScript
Ebook
Learning ClojureScript
byRafik Naccache
Rating: 0 out of 5 stars
0 ratings
Clojure High Performance Programming - Second Edition
Ebook
Clojure High Performance Programming - Second Edition
byKumar Shantanu
Rating: 0 out of 5 stars
0 ratings
Clojure Data Structures and Algorithms Cookbook
Ebook
Clojure Data Structures and Algorithms Cookbook
byRafik Naccache
Rating: 0 out of 5 stars
0 ratings
Clojure Reactive Programming
Ebook
Clojure Reactive Programming
byLeonardo Borges
Rating: 0 out of 5 stars
0 ratings
The Clojure Workshop: Use functional programming to build data-centric applications with Clojure and ClojureScript
Ebook
The Clojure Workshop: Use functional programming to build data-centric applications with Clojure and ClojureScript
byJoseph Fahey
Rating: 0 out of 5 stars
0 ratings
Learning Python Design Patterns - Second Edition
Ebook
Learning Python Design Patterns - Second Edition
byGiridhar Chetan
Rating: 0 out of 5 stars
0 ratings
Building Python Real-Time Applications with Storm
Ebook
Building Python Real-Time Applications with Storm
byBhatnagar Kartik
Rating: 0 out of 5 stars
0 ratings
Mastering F#
Ebook
Mastering F#
byAlfonso García-Caro Núñez
Rating: 5 out of 5 stars
5/5
Scala in Depth
Ebook
Scala in Depth
byJosh Suereth
Rating: 4 out of 5 stars
4/5
The Way to Go: A Thorough Introduction to the Go Programming Language
Ebook
The Way to Go: A Thorough Introduction to the Go Programming Language
byIvo Balbaert
Rating: 2 out of 5 stars
2/5
Real-World Functional Programming: With examples in F# and C#
Ebook
Real-World Functional Programming: With examples in F# and C#
byTomas Petricek
Rating: 0 out of 5 stars
0 ratings
Haskell Design Patterns
Ebook
Haskell Design Patterns
byLemmer Ryan
Rating: 0 out of 5 stars
0 ratings
Haskell from Another Site
Ebook
Haskell from Another Site
byJagoda Górska
Rating: 0 out of 5 stars
0 ratings
PyTorch Recipes: A Problem-Solution Approach
Ebook
PyTorch Recipes: A Problem-Solution Approach
byPradeepta Mishra
Rating: 0 out of 5 stars
0 ratings
Mastering Clojure Data Analysis
Ebook
Mastering Clojure Data Analysis
byEric Rochester
Rating: 0 out of 5 stars
0 ratings
Python 3 Text Processing with NLTK 3 Cookbook
Ebook
Python 3 Text Processing with NLTK 3 Cookbook
byJacob Perkins
Rating: 4 out of 5 stars
4/5
Clojure for Java Developers
Ebook
Clojure for Java Developers
byDíaz Eduardo
Rating: 0 out of 5 stars
0 ratings
Learning Functional Data Structures and Algorithms
Ebook
Learning Functional Data Structures and Algorithms
byKhot Atul S.
Rating: 0 out of 5 stars
0 ratings
Python Text Processing with NLTK 2.0 Cookbook: LITE
Ebook
Python Text Processing with NLTK 2.0 Cookbook: LITE
byJacob Perkins
Rating: 4 out of 5 stars
4/5
Statistics with Rust: 50+ Statistical Techniques Put into Action
Ebook
Statistics with Rust: 50+ Statistical Techniques Put into Action
byKeiko Nakamura
Rating: 0 out of 5 stars
0 ratings

Programming For You

Skip carousel

Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
Ebook
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
byTimothy C. Needham
Rating: 4 out of 5 stars
4/5
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
Ebook
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
byJoseph Labrecque
Rating: 5 out of 5 stars
5/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
Ebook
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
byBrady Ellison
Rating: 5 out of 5 stars
5/5
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
Ebook
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
byEric Vargas
Rating: 0 out of 5 stars
0 ratings
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
Ebook
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
byi Code Academy
Rating: 5 out of 5 stars
5/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
Ebook
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
byHeath Haskins
Rating: 5 out of 5 stars
5/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Linux: Learn in 24 Hours
Ebook
Linux: Learn in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
Ebook
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
byDavid DuRocher
Rating: 4 out of 5 stars
4/5
Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS
Ebook
Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS
byGame Guidez
Rating: 5 out of 5 stars
5/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
SQL All-in-One For Dummies
Ebook
SQL All-in-One For Dummies
byAllen G. Taylor
Rating: 3 out of 5 stars
3/5
Modern C++ for Absolute Beginners: A Friendly Introduction to C++ Programming Language and C++11 to C++20 Standards
Ebook
Modern C++ for Absolute Beginners: A Friendly Introduction to C++ Programming Language and C++11 to C++20 Standards
bySlobodan Dmitrović
Rating: 0 out of 5 stars
0 ratings
Web Designer's Idea Book, Volume 4: Inspiration from the Best Web Design Trends, Themes and Styles
Ebook
Web Designer's Idea Book, Volume 4: Inspiration from the Best Web Design Trends, Themes and Styles
byPatrick McNeil
Rating: 4 out of 5 stars
4/5
101 Amazing Nintendo NES Facts: Includes facts about the Famicom
Ebook
101 Amazing Nintendo NES Facts: Includes facts about the Famicom
byJimmy Russell
Rating: 4 out of 5 stars
4/5
OneNote: The Ultimate Guide on How to Use Microsoft OneNote for Getting Things Done
Ebook
OneNote: The Ultimate Guide on How to Use Microsoft OneNote for Getting Things Done
byChris Will
Rating: 1 out of 5 stars
1/5
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
Ebook
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
byTravis Plunk
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
Podcast episode
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
Podcast episode
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
byThe Web Platform Podcast
100%
100% found this document useful
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
Podcast episode
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Episode 161: Trapped as a QA engineer and trapped as a generalist
Podcast episode
Episode 161: Trapped as a QA engineer and trapped as a generalist
bySoft Skills Engineering
0 ratings
0% found this document useful
Python, Django, and Channels: with Andrew Godwin, creator of Django Channels
Podcast episode
Python, Django, and Channels: with Andrew Godwin, creator of Django Channels
byThe Changelog: Software Development, Open Source
0 ratings
0% found this document useful
15: “My interpretation of functional programming”, with special guest Chris Eidhof: Chris Eidhof, founder of objc.io and co-host of Swift Talk, joins John to talk about app architecture, functional programming, the "rockstar developer culture", picking database solutions and much more!
Podcast episode
15: “My interpretation of functional programming”, with special guest Chris Eidhof: Chris Eidhof, founder of objc.io and co-host of Swift Talk, joins John to talk about app architecture, functional programming, the "rockstar developer culture", picking database solutions and much more!
bySwift by Sundell
100%
100% found this document useful
You don't know JS with Getify (Kyle Simpson): Kyle Simpson, aka @getify, is the Curriculum Manager for MakerSquare and has created a series of books called You Don't Know JS. You can read the You Don't Know JS book series for free on GitHub, but we know you'll want to buy them after you hear this interview. Kyle sets Scott straight and explains why Scott doesn't know JavaScript. It's true, he really doesn't...at least not as well as he thought!
Podcast episode
You don't know JS with Getify (Kyle Simpson): Kyle Simpson, aka @getify, is the Curriculum Manager for MakerSquare and has created a series of books called You Don't Know JS. You can read the You Don't Know JS book series for free on GitHub, but we know you'll want to buy them after you hear this interview. Kyle sets Scott straight and explains why Scott doesn't know JavaScript. It's true, he really doesn't...at least not as well as he thought!
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
Accidentally Building A Business With Python At Listen Notes: An interview with Listen Notes founder Wenbin Fang about his experience building a one person company powered by Python and his views on the podcast ecosystem.
Podcast episode
Accidentally Building A Business With Python At Listen Notes: An interview with Listen Notes founder Wenbin Fang about his experience building a one person company powered by Python and his views on the podcast ecosystem.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
A Programmer's Introduction to Mathematics with Jeremy Kun: Like Programming, Mathematics has language and culture. Jeremy Kun has written A Programmer's Introduction to Mathematics as a way to bridge these two worlds and make the power and magic of mathematics available and understandable to programmers everywhere.
Podcast episode
A Programmer's Introduction to Mathematics with Jeremy Kun: Like Programming, Mathematics has language and culture. Jeremy Kun has written A Programmer's Introduction to Mathematics as a way to bridge these two worlds and make the power and magic of mathematics available and understandable to programmers everywhere.
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
Microservices with Rafi Schloming: Microservices are a widely adopted pattern for breaking an application up into pieces that can be well-understood by the individual teams within the company. Microservices also allow these individual pieces to be scaled independently and updated in iso...
Podcast episode
Microservices with Rafi Schloming: Microservices are a widely adopted pattern for breaking an application up into pieces that can be well-understood by the individual teams within the company. Microservices also allow these individual pieces to be scaled independently and updated in iso...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
120: FastAPI & Typer - Sebastián Ramírez: Sebastián Ramírez is the developer behind FastAPI for Python REST APIs and Typer, for CLI applications. We discuss FastAPI, Typer, Swagger UI, interface design, autocompletion, and more.
Podcast episode
120: FastAPI & Typer - Sebastián Ramírez: Sebastián Ramírez is the developer behind FastAPI for Python REST APIs and Typer, for CLI applications. We discuss FastAPI, Typer, Swagger UI, interface design, autocompletion, and more.
byTest and Code
0 ratings
0% found this document useful
Hasty Treat - Seven Interesting JavaScript Proposals - Async Do, JSON Modules, Immutable Array Methods, and More!: In this Hasty Treat, Scott and Wes talk about seven new JavaScript proposals — what they do, where they’re at, and how you might use them. Deque - Sponsor Deque’s axe DevTools makes accessibility testing easy and doesn’t require special...
Podcast episode
Hasty Treat - Seven Interesting JavaScript Proposals - Async Do, JSON Modules, Immutable Array Methods, and More!: In this Hasty Treat, Scott and Wes talk about seven new JavaScript proposals — what they do, where they’re at, and how you might use them. Deque - Sponsor Deque’s axe DevTools makes accessibility testing easy and doesn’t require special...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
41. Bob Nystrom
Podcast episode
41. Bob Nystrom
byIt's All Widgets! Flutter Podcast
0 ratings
0% found this document useful
Engineering interview tips & tricks: with Emma Draper & Jonas
Podcast episode
Engineering interview tips & tricks: with Emma Draper & Jonas
byGo Time: Golang, Software Engineering
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
Podcast episode
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
Podcast episode
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
byGoogle Cloud Platform Podcast
100%
100% found this document useful
AI Today Podcast #114: Patterns of AI – Predictive Analytics / Decision Support: Patterns of AI: Predictive Analytics / Decision Support
Podcast episode
AI Today Podcast #114: Patterns of AI – Predictive Analytics / Decision Support: Patterns of AI: Predictive Analytics / Decision Support
byAI Today Podcast: Artificial Intelligence Insights, Experts, and Opinion
0 ratings
0% found this document useful
C++ and Lua Game Development with Elias Daler: Rob and Jason are joined by Elias Daler, CS student and Indie game developer to discuss game development with C++ and Lua. Ilya Daylidenok, better known as Elias Daler, is a CS student, indie game developer and C++ enthusiast. Passion for game...
Podcast episode
C++ and Lua Game Development with Elias Daler: Rob and Jason are joined by Elias Daler, CS student and Indie game developer to discuss game development with C++ and Lua. Ilya Daylidenok, better known as Elias Daler, is a CS student, indie game developer and C++ enthusiast. Passion for game...
byCppCast
0 ratings
0% found this document useful
EP42 - Java Careers and What they Pay: In today's episode, we'll dive into all the different facets and disciplines that exist in the world of Java programming. You'll learn about the "5 Pillars of Java Programming" and how they apply to the different types of jobs. You'll also get a high...
Podcast episode
EP42 - Java Careers and What they Pay: In today's episode, we'll dive into all the different facets and disciplines that exist in the world of Java programming. You'll learn about the "5 Pillars of Java Programming" and how they apply to the different types of jobs. You'll also get a high...
byCoders Campus Podcast
0 ratings
0% found this document useful
Open Source Object Storage For All Of Your Data - Episode 99: An interview on the open source MinIO platform for fast and flexible object storage for data intensive applications and analytics that runs everywhere
Podcast episode
Open Source Object Storage For All Of Your Data - Episode 99: An interview on the open source MinIO platform for fast and flexible object storage for data intensive applications and analytics that runs everywhere
byData Engineering Podcast
0 ratings
0% found this document useful
Keeping Your Data Warehouse In Order With DataForm - Episode 102: An interview about Dataform and how it helps you to keep your data warehouse in good working order
Podcast episode
Keeping Your Data Warehouse In Order With DataForm - Episode 102: An interview about Dataform and how it helps you to keep your data warehouse in good working order
byData Engineering Podcast
0 ratings
0% found this document useful
The Undocumented Web: scraping, private APIs, proxies and “alternative solutions”: What is the undocumented web? Scott and Wes dive into it, discussing APIs, faking, scraping, automation, proxies as well as tips and tricks for best practices. Kyle Prinsloo’s Freelancing & Beyond — Sponsor Kyle Prinsloo teaches you everything...
Podcast episode
The Undocumented Web: scraping, private APIs, proxies and “alternative solutions”: What is the undocumented web? Scott and Wes dive into it, discussing APIs, faking, scraping, automation, proxies as well as tips and tricks for best practices. Kyle Prinsloo’s Freelancing & Beyond — Sponsor Kyle Prinsloo teaches you everything...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Morgan Senkal: Using Epics to Improve Code Quality Within Sprints: Robby speaks with Morgan Senkal, Software Architect at Metal Toad. Morgan recalls a challenging 15-year-old legacy project that was reminiscent of a Stephen King story and explains what to think about when considering a software rewrite. Morgan and Robby keep a running analogy of technical debt and automotive repairs.
Podcast episode
Morgan Senkal: Using Epics to Improve Code Quality Within Sprints: Robby speaks with Morgan Senkal, Software Architect at Metal Toad. Morgan recalls a challenging 15-year-old legacy project that was reminiscent of a Stephen King story and explains what to think about when considering a software rewrite. Morgan and Robby keep a running analogy of technical debt and automotive repairs.
byMaintainable
0 ratings
0% found this document useful
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
Podcast episode
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
Back to Agile's basics: with "Uncle Bob" Martin
Podcast episode
Back to Agile's basics: with "Uncle Bob" Martin
byThe Changelog: Software Development, Open Source
0 ratings
0% found this document useful
235: Pair programming with Ben Orenstein & Tuple: In this episode, Kaushik goes solo and interviews Ben Orenstein. Ben is a prolific Ruby developer, an amazing conference speaker, an ardent vim-ster, and now the CEO of Tuple. Kaushik has been a big fan of Ben's work and was super stoked to talk to Ben and pick his brains on a host of topics: starting the company Tuple, pair programming in general, learning different programming languages and technology, giving better conference talks and more! This episode is chock full of wisdom from Ben. Enjoy!
Podcast episode
235: Pair programming with Ben Orenstein & Tuple: In this episode, Kaushik goes solo and interviews Ben Orenstein. Ben is a prolific Ruby developer, an amazing conference speaker, an ardent vim-ster, and now the CEO of Tuple. Kaushik has been a big fan of Ben's work and was super stoked to talk to Ben and pick his brains on a host of topics: starting the company Tuple, pair programming in general, learning different programming languages and technology, giving better conference talks and more! This episode is chock full of wisdom from Ben. Enjoy!
byFragmented - An Android Developer Podcast
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars: This Week in Machine Learning & AI brings you the…
Podcast episode
This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars: This Week in Machine Learning & AI brings you the…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
The Rust Programming Language: with Steve Klabnik and Yehuda Katz
Podcast episode
The Rust Programming Language: with Steve Klabnik and Yehuda Katz
byThe Changelog: Software Development, Open Source
0 ratings
0% found this document useful
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
Podcast episode
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
byCoder Radio
0 ratings
0% found this document useful

Skip carousel

Tensor Flow 101
APC
Article
Tensor Flow 101
Jan 27, 2020
4 min read
What an AI's Non-Human Language Actually Looks Like
The Atlantic
Article
What an AI's Non-Human Language Actually Looks Like
Jun 20, 2017
4 min read
Perl at 34
Linux Format
Article
Perl at 34
Feb 8, 2022
7 min read
Rise Of The Robots
Linux Format
Article
Rise Of The Robots
Jan 12, 2021
7 min read
Create A RESTful Server In Go
Linux Format
Article
Create A RESTful Server In Go
Oct 19, 2021
8 min read
Access Your Mac Anywhere
MacLife
Article
Access Your Mac Anywhere
Nov 8, 2022
2 min read
Picture In A Mainframe
Linux Format
Article
Picture In A Mainframe
Jul 2, 2019
11 min read
Build A Static Analysis Development Pipeline
Linux Format
Article
Build A Static Analysis Development Pipeline
Jul 27, 2021
9 min read
Create Asynchronous Code With Python
Linux Format
Article
Create Asynchronous Code With Python
Jun 29, 2021
8 min read
How To Use Mojolicious For Web Scraping
Linux Format
Article
How To Use Mojolicious For Web Scraping
Mar 8, 2022
Part One Don’t miss next issue! Subscribe on page 16 Mark Gardner is a software developer and blogger with over 25 years of IT experience. You can reach him at www.phoenixtrap.com and @markjgardner. The map function is designed to transform a list or
5 min read
NixOS 21.05
Linux Format
Article
NixOS 21.05
Sep 21, 2021
3 min read
Are Docker Containers a Good Idea for Laptops?
Maximum PC
Article
Are Docker Containers a Good Idea for Laptops?
Mar 31, 2020
Docker containers are cool. If you haven’t yet played with Docker, you’re missing a large world of easily deployed applications. For example, I can deploy NodeRed, Plex, Jupyter Lab, and Nextcloud servers, and run them behind a Traefik reverse proxy
2 min read
Giving The Game A Way: Why Meta Is Offering Its AI For Free
PC Pro Magazine
Article
Giving The Game A Way: Why Meta Is Offering Its AI For Free
Sep 7, 2023
4 min read
Add Linux Apps To Windows In Just One Easy Step
PCWorld
Article
Add Linux Apps To Windows In Just One Easy Step
Jul 7, 2021
4 min read
Traefik Configuration
Linux Format
Article
Traefik Configuration
Mar 10, 2020
In this tutorial we have configured Traefik using command-line switches in our Docker Compose file (the section starting command:). This is the equivalent of starting the application with a whole bunch of command options each time, and while this wou
1 min read
Roundup
Linux Format
Article
Roundup
Dec 13, 2022
13 min read
VisionFive V1 RISC-V SBC on sale
Linux Format
Article
VisionFive V1 RISC-V SBC on sale
May 3, 2022
1 min read
Search Desktop File Contents Instantly
Linux Format
Article
Search Desktop File Contents Instantly
May 30, 2023
9 min read
Build A Search And Analytic Engine
Linux Format
Article
Build A Search And Analytic Engine
Mar 10, 2020
7 min read
Create Visualisations And Cool Dashboards
Linux Format
Article
Create Visualisations And Cool Dashboards
Jan 14, 2020
8 min read
Ad-blocking To Get Harder
Linux Format
Article
Ad-blocking To Get Harder
Nov 15, 2022
A focus of this issue’s main feature is chrome is shifting from Manifest V2 extensions to V3; the process is expected to be complete in January 2023. According to the Chrome peeps, it will offer “increased safety and peace of mind”. Until then, Manif
1 min read
How To File Your Taxes For Free Online With Deadline Day Nearing
Los Angeles Times
Article
How To File Your Taxes For Free Online With Deadline Day Nearing
May 11, 2021
Taxes may be one of life's certainties. But paying to pay them isn't. In many other countries, the government does the math for you and tells you how much to pay, at no cost to you (beyond the taxes you are paying in the first place). In America, the
3 min read
Cutting Edges
Linux Format
Article
Cutting Edges
Feb 6, 2024
Living in the open source world means if you want to sit on the cutting edge of development, you can. Nightly builds, compiling the kernel from source, grabbing Git repositories, if you want to try the latest, no one is going to try and stop or gatek
1 min read
The Coming Software Apocalypse
The Atlantic
Article
The Coming Software Apocalypse
Sep 26, 2017
33 min read
Kodachi 8.3
Linux Format
Article
Kodachi 8.3
May 4, 2021
2 min read
GO Inside Parsing – How Go Handles The Code
Linux Format
Article
GO Inside Parsing – How Go Handles The Code
Jul 30, 2019
This tutorial has two aspects: a theoretical one and a practical one. In the theoretical part, you will learn about parsing, grammar and regular expressions; this is how languages are built and therefore understood in terms of construction and usage.
8 min read
Take And Organise Notes With Ease
Linux Format
Article
Take And Organise Notes With Ease
Jul 25, 2023
10 min read
Metrics & Visuals In Go
Linux Format
Article
Metrics & Visuals In Go
Nov 17, 2020
Mihalis Tsoukalos is a DataOps engineer and a technical writer. He’s the author of Go Systems Programming and Mastering Go, 2nd edition. The subject of this tutorial is two-fold. First, it’s about creating a Go application that exports metrics to P
7 min read
LISP - Exploring The Original AI Language
Linux Format
Article
LISP - Exploring The Original AI Language
May 30, 2023
11 min read
Google Answer Box Strategy
Techfastly
Article
Google Answer Box Strategy
Sep 21, 2020
Leveraging the Google PAA (People Also Ask) element on a Search Results Page for Targeted Content Creation with a Python Scraper All businesses that are online today are creating content at a furious pace. According to Technavio, a research firm, con
7 min read

Related categories

Skip carousel

Reviews for Clojure Data Analysis Cookbook - Second Edition

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Clojure Data Analysis Cookbook - Second Edition - Eric Rochester

Clojure Data Analysis Cookbook Second Edition

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Importing Data for Analysis

Introduction

Creating a new project

Getting ready

How to do it...

How it works...

Reading CSV data into Incanter datasets

Getting ready

How to do it…

How it works…

There's more…

Reading JSON data into Incanter datasets

Getting ready

How to do it…

How it works…

Reading data from Excel with Incanter

Getting ready

How to do it…

How it works…

Reading data from JDBC databases

Getting ready

How to do it…

How it works…

See also

Reading XML data into Incanter datasets

Getting ready

How to do it…

How it works…

There's more…

Navigating structures with zippers

Processing in a pipeline

Comparing XML and JSON

Scraping data from tables in web pages

Getting ready

How to do it…

How it works…

See also

Scraping textual data from web pages

Getting ready

How to do it…

How it works…

Reading RDF data

Getting ready

How to do it…

How it works…

See also

Querying RDF data with SPARQL

Getting ready

How to do it…

How it works…

There's more…

Aggregating data from different formats

Getting ready

How to do it…

Creating the triple store

Scraping exchange rates

Loading currency data and tying it all together

How it works…

See also

2. Cleaning and Validating Data

Introduction

Cleaning data with regular expressions

Getting ready

How to do it…

How it works…

There's more...

See also

Maintaining consistency with synonym maps

Getting ready

How to do it…

How it works…

See also

Identifying and removing duplicate data

Getting ready

How to do it…

How it works…

There's more…

Regularizing numbers

Getting ready

How to do it…

How it works…

Calculating relative values

Getting ready

How to do it…

How it works…

Parsing dates and times

Getting ready

How to do it…

There's more…

Lazily processing very large data sets

Getting ready

How to do it…

How it works…

Sampling from very large data sets

Getting ready

How to do it…

Sampling by percentage

Sampling exactly

How it works…

Fixing spelling errors

Getting ready

How to do it…

How it works…

There's more…

Parsing custom data formats

Getting ready

How to do it…

How it works…

Validating data with Valip

Getting ready

How to do it…

How it works…

3. Managing Complexity with Concurrent Programming

Introduction

Managing program complexity with STM

Getting ready

How to do it…

How it works…

See also

Managing program complexity with agents

Getting ready

How to do it…

How it works…

See also

Getting better performance with commute

Getting ready

How to do it…

How it works…

Combining agents and STM

Getting ready

How to do it…

How it works…

Maintaining consistency with ensure

Getting ready

How to do it…

How it works…

Introducing safe side effects into the STM

Getting ready

How to do it…

Maintaining data consistency with validators

Getting ready

How to do it…

How it works…

See also

Monitoring processing with watchers

Getting ready

How to do it…

How it works…

Debugging concurrent programs with watchers

Getting ready

How to do it…

There's more...

Recovering from errors in agents

How to do it…

Failing on errors

Continuing on errors

Using a custom error handler

There's more...

Managing large inputs with sized queues

How to do it…

How it works...

4. Improving Performance with Parallel Programming

Introduction

Parallelizing processing with pmap

How to do it…

How it works…

There's more…

See also

Parallelizing processing with Incanter

Getting ready

How to do it…

How it works…

Partitioning Monte Carlo simulations for better pmap performance

Getting ready

How to do it…

How it works…

Estimating with Monte Carlo simulations

Chunking data for pmap

Finding the optimal partition size with simulated annealing

Getting ready

How to do it…

How it works…

There's more…

Combining function calls with reducers

Getting ready

How to do it…

What happened here?

There's more...

See also

Parallelizing with reducers

Getting ready

How to do it…

How it works…

See also

Generating online summary statistics for data streams with reducers

Getting ready

How to do it…

Using type hints

Getting ready

How to do it…

How it works…

See also

Benchmarking with Criterium

Getting ready

How to do it…

How it works…

See also

5. Distributed Data Processing with Cascalog

Introduction

Initializing Cascalog and Hadoop for distributed processing

Getting ready

How to do it…

How it works…

See also

Querying data with Cascalog

Getting ready

How to do it…

How it works…

There's more

Distributing data with Apache HDFS

Getting ready

How to do it…

How it works…

Parsing CSV files with Cascalog

Getting ready

How to do it…

How it works…

There's more

Executing complex queries with Cascalog

Getting ready

How to do it…

Aggregating data with Cascalog

Getting ready

How to do it…

There's more

Defining new Cascalog operators

Getting ready

How to do it…

Creating map operators

Creating map concatenation operators

Creating filter operators

Creating buffer operators

Creating aggregate operators

Creating parallel aggregate operators

Composing Cascalog queries

Getting ready

How to do it…

How it works…

Transforming data with Cascalog

Getting ready

How to do it…

How it works…

6. Working with Incanter Datasets

Introduction

Loading Incanter's sample datasets

Getting ready

How to do it…

How it works…

There's more...

Loading Clojure data structures into datasets

Getting ready

How to do it…

How it works…

See also…

Viewing datasets interactively with view

Getting ready

How to do it…

How it works…

See also…

Converting datasets to matrices

Getting ready

How to do it…

How it works…

There's more…

See also…

Using infix formulas in Incanter

Getting ready

How to do it…

How it works…

Selecting columns with $

Getting ready

How to do it…

How it works…

There's more…

See also…

Selecting rows with $

Getting ready

How to do it…

How it works…

Filtering datasets with $where

Getting ready

How to do it…

How it works…

There's more…

Grouping data with $group-by

Getting ready

How to do it…

How it works…

Saving datasets to CSV and JSON

Getting ready

How to do it…

Saving data as CSV

Saving data as JSON

How it works…

See also…

Projecting from multiple datasets with $join

Getting ready

How to do it…

How it works…

7. Statistical Data Analysis with Incanter

Introduction

Generating summary statistics with $rollup

Getting ready

How to do it…

How it works…

Working with changes in values

Getting ready

How to do it…

How it works…

Scaling variables to simplify variable relationships

Getting ready

How to do it…

How it works…

Working with time series data with Incanter Zoo

Getting ready

How to do it…

There's more...

Smoothing variables to decrease variation

Getting ready

How to do it…

How it works…

Validating sample statistics with bootstrapping

Getting ready

How to do it…

How it works…

There's more…

Modeling linear relationships

Getting ready

How to do it…

How it works…

Modeling non-linear relationships

Getting ready

How to do it…

How it works...

Modeling multinomial Bayesian distributions

Getting ready

How to do it…

How it works…

There's more...

Finding data errors with Benford's law

Getting ready

How to do it…

How it works…

There's more…

8. Working with Mathematica and R

Introduction

Setting up Mathematica to talk to Clojuratica for Mac OS X and Linux

Getting ready

How to do it…

How it works…

There's more…

Setting up Mathematica to talk to Clojuratica for Windows

Getting ready

How to do it...

How it works...

Calling Mathematica functions from Clojuratica

Getting ready

How to do it…

How it works…

Sending matrixes to Mathematica from Clojuratica

Getting ready

How to do it…

How it works…

Evaluating Mathematica scripts from Clojuratica

Getting ready

How to do it…

How it works…

Creating functions from Mathematica

Getting ready

How to do it…

How it works…

Setting up R to talk to Clojure

Getting ready

How to do it…

Setting up R

Setting up Clojure

How it works…

Calling R functions from Clojure

Getting ready

How to do it…

How it works…

There's more…

Passing vectors into R

Getting ready

How to do it…

How it works…

Evaluating R files from Clojure

Getting ready

How to do it…

How it works…

There's more…

Plotting in R from Clojure

Getting ready

How to do it…

How it works…

There's more…

9. Clustering, Classifying, and Working with Weka

Introduction

Loading CSV and ARFF files into Weka

Getting ready

How to do it…

How it works…

There's more…

See also…

Filtering, renaming, and deleting columns in Weka datasets

Getting ready

How to do it…

Renaming columns

Removing columns

Hiding columns

How it works…

Discovering groups of data using K-Means clustering

Getting ready

How to do it…

How it works…

Clustering with K-Means

Analyzing the results

Building macros

See also…

Finding hierarchical clusters in Weka

Getting ready

How to do it…

How it works…

There's more…

Clustering with SOMs in Incanter

Getting ready

How to do it…

How it works…

There's more…

Classifying data with decision trees

Getting ready

How to do it…

How it works…

There's more…

Classifying data with the Naive Bayesian classifier

Getting ready

How to do it…

How it works…

There's more…

Classifying data with support vector machines

Getting ready

How to do it…

There's more…

Finding associations in data with the Apriori algorithm

Getting ready

How to do it…

How it works…

There's more…

10. Working with Unstructured and Textual Data

Introduction

Tokenizing text

Getting ready

How to do it…

How it works…

Finding sentences

Getting ready

How to do it…

How it works…

Focusing on content words with stoplists

Getting ready

How to do it…

Getting document frequencies

Getting ready

How to do it…

Scaling document frequencies by document size

Getting ready

How to do it…

How it works…

Scaling document frequencies with TF-IDF

Getting ready

How to do it…

How it works…

Finding people, places, and things with Named Entity Recognition

Getting ready

How to do it…

How it works…

Mapping documents to a sparse vector space representation

Getting ready…

How to do it…

Performing topic modeling with MALLET

Getting ready

How to do it…

How it works…

See also…

Performing naïve Bayesian classification with MALLET

Getting ready

How to do it…

How it works…

There's more…

See also…

11. Graphing in Incanter

Introduction

Creating scatter plots with Incanter

Getting ready

How to do it...

How it works...

There's more...

See also

Graphing non-numeric data in bar charts

Getting ready

How to do it...

How it works...

Creating histograms with Incanter

Getting ready

How to do it...

How it works...

Creating function plots with Incanter

Getting ready

How to do it...

How it works...

See also

Adding equations to Incanter charts

Getting ready

How to do it...

There's more...

Adding lines to scatter charts

Getting ready

How to do it...

How it works...

See also

Customizing charts with JFreeChart

Getting ready

How to do it...

How it works...

See also

Customizing chart colors and styles

Getting ready

How to do it...

Saving Incanter graphs to PNG

Getting ready

How to do it...

How it works...

Using PCA to graph multi-dimensional data

Getting ready

How to do it...

How it works...

There's more...

Creating dynamic charts with Incanter

Getting ready

How to do it...

How it works...

12. Creating Charts for the Web

Introduction

Serving data with Ring and Compojure

Getting ready

How to do it…

Configuring and setting up the web application

Serving data

Defining routes and handlers

Running the server

How it works…

There's more…

Creating HTML with Hiccup

Getting ready

How to do it…

How it works…

There's more…

Setting up to use ClojureScript

Getting ready

How to do it…

How it works…

There's more…

Creating scatter plots with NVD3

Getting ready

How to do it…

How it works…

There's more…

Creating bar charts with NVD3

Getting ready

How to do it…

How it works…

Creating histograms with NVD3

Getting ready

How to do it…

How it works…

Creating time series charts with D3

Getting ready

How to do it…

How it works…

There's more…

Visualizing graphs with force-directed layouts

Getting ready

How to do it…

How it works…

There's more…

Creating interactive visualizations with D3

Getting ready

How to do it…

How it works…

There's more…

Index

Clojure Data Analysis Cookbook Second Edition

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: March 2013

Second edition: January 2015

Production reference: 1220115

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78439-029-7

www.packtpub.com

Credits

Author

Eric Rochester

Reviewers

Vitomir Kovanovic

Muktabh Mayank Srivastava

Federico Tomassetti

Commissioning Editor

Ashwin Nair

Acquisition Editor

Sam Wood

Content Development Editor

Parita Khedekar

Technical Editor

Ryan Kochery

Copy Editors

Dipti Kapadia

Puja Lalwani

Vikrant Phadke

Project Coordinator

Neha Thakur

Proofreaders

Ameesha Green

Joel T. Johnson

Samantha Lyon

Indexer

Priya Sane

Graphics

Sheetal Aute

Disha Haria

Production Coordinator

Nitesh Thakur

Cover Work

Nitesh Thakur

About the Author

Eric Rochester enjoys reading, writing, and spending time with his wife and kids. When he’s not doing these things, he programs in a variety of languages and platforms, including websites and systems in Python, and libraries for linguistics and statistics in C#. Currently, he is exploring functional programming languages, including Clojure and Haskell. He works at Scholars’ Lab in the library at the University of Virginia, helping humanities professors and graduate students realize their digitally informed research agendas. He is also the author of Mastering Clojure Data Analysis, Packt Publishing.

I’d like to thank everyone. My technical reviewers proved invaluable. Also, thank you to the editorial staff at Packt Publishing. This book is much stronger because of all of their feedback, and any remaining deficiencies are mine alone.

A special thanks to Jackie, Melina, and Micah. They’ve been patient and supportive while I worked on this project. It is, in every way, for them.

About the Reviewers

Vitomir Kovanovic is a PhD student at the School of Informatics, University of Edinburgh, Edinburgh, UK. He received an MSc degree in computer science and software engineering in 2011, and BSc in information systems and business administration in 2009 from the University of Belgrade, Serbia. His research interests include learning analytics, educational data mining, and online education. He is a member of the Society for Learning Analytics Research and a member of program committees of several conferences and journals in technology-enhanced learning. In his PhD research, he focuses on the use of trace data for understanding the effects of technology use on the quality of the social learning process and learning outcomes. For more information, visit http://vitomir.kovanovic.info/

Muktabh Mayank Srivastava is a data scientist and the cofounder of ParallelDots.com. Previously, he helped in solving many complex data analysis and machine learning problems for clients from different domains such as healthcare, retail, procurement, automation, Bitcoin, social recommendation engines, geolocation fact-finding, customer profiling, and so on.

His new venture is ParallelDots. It is a tool that allows any content archive to be presented in a story using advanced techniques of NLP and machine learning. For publishers and bloggers, it automatically creates a timeline of any event using their archive and presents it in an interactive, intuitive, and easy-to-navigate interface on their webpage. You can find him on LinkedIn at http://in.linkedin.com/in/muktabh/ and on Twitter at @muktabh / @ParallelDots.

Federico Tomassetti has been programming since he was a child and has a PhD in software engineering. He works as a consultant on model-driven development and domain-specific languages, writes technical articles, teaches programming, and works as a full-stack software engineer.

He has experience working in Italy, Germany, and Ireland, and he is currently working at Groupon International.

You can read about his projects on http://federico-tomassetti.it/ or https://github.com/ftomassetti/.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt’s online digital book library. Here, you can search, access, and read Packt’s entire library of books.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

Welcome to the second edition of Clojure Data Analysis Cookbook! It seems that books become obsolete almost as quickly as software does, so here we have the opportunity to keep things up-to-date and useful.

Moreover, the state of the art of data analysis is also still evolving and changing. The techniques and technologies are being refined and improved. Hopefully, this book will capture some of that. I've also added a new chapter on how to work with unstructured textual data.

In spite of these changes, some things have stayed the same. Clojure has further proven itself to be an excellent environment to work with data. As a member of the lisp family of languages, it inherits a flexibility and power that is hard to match. The concurrency and parallelization features have further proven themselves as great tools for developing software and analyzing data.

Clojure's usefulness for data analysis is further improved by a number of strong libraries. Incanter provides a practical environment to work with data and perform statistical analysis. Cascalog is an easy-to-use wrapper over Hadoop and Cascading. Finally, when you're ready to publish your results, ClojureScript, an implementation of Clojure that generates JavaScript, can help you to visualize your data in an effective and persuasive way.

Moreover, Clojure runs on the Java Virtual Machine (JVM), so any libraries written for Java are available too. This gives Clojure an incredible amount of breadth and power.

I hope that this book will give you the tools and techniques you need to get answers from your data.

What this book covers

Chapter 1, Importing Data for Analysis, covers how to read data from a variety of sources, including CSV files, web pages, and linked semantic web data.

Chapter 2, Cleaning and Validating Data, presents strategies and implementations to normalize dates, fix spelling, and work with large datasets. Getting data into a useable shape is an important, but often overlooked, stage of data analysis.

Chapter 3, Managing Complexity with Concurrent Programming, covers Clojure's concurrency features and how you can use them to simplify your programs.

Chapter 4, Improving Performance with Parallel Programming, covers how to use Clojure's parallel processing capabilities to speed up the processing of data.

Chapter 5, Distributed Data Processing with Cascalog, covers how to use Cascalog as a wrapper over Hadoop and the Cascading library to process large amounts of data distributed over multiple computers.

Chapter 6, Working with Incanter Datasets, covers the basics of working with Incanter datasets. Datasets are the core data structures used by Incanter, and understanding them is necessary in order to use Incanter effectively.

Chapter 7, Statistical Data Analysis with Incanter, covers a variety of statistical processes and tests used in data analysis. Some of these are quite simple, such as generating summary statistics. Others are more complex, such as performing linear regressions and auditing data with Benford's Law.

Chapter 8, Working with Mathematica and R, talks about how to set up Clojure in order to talk to Mathematica or R. These are powerful data analysis systems, and we might want to use them sometimes. This chapter will show you how to get these systems to work together, as well as some tasks that you can perform once they are communicating.

Chapter 9, Clustering, Classifying, and Working with Weka, covers more advanced machine learning techniques. In this chapter, we'll primarily use the Weka machine learning library. Some recipes will discuss how to use it and the data structures its built on, while other recipes will demonstrate machine learning algorithms.

Chapter 10, Working with Unstructured and Textual Data, looks at tools and techniques used to extract information from the reams of unstructured, textual data.

Chapter 11, Graphing in Incanter, shows you how to generate graphs and other visualizations in Incanter. These can be important for exploring and learning about your data and also for publishing and presenting your results.

Chapter 12, Creating Charts for the Web, shows you how to set up a simple web application in order to present findings from data analysis. It will include a number of recipes that leverage the powerful D3 visualization library.

What you need for this book

One piece of software required for this book is the Java Development Kit (JDK), which you can obtain from http://www.oracle.com/technetwork/java/javase/downloads/index.html. JDK is necessary to run and develop on the Java platform.

The other major piece of software that you'll need is Leiningen 2, which you can download and install from http://leiningen.org/. Leiningen 2 is a tool used to manage Clojure projects and their dependencies. It has become the de facto standard project tool in the Clojure community.

Throughout this book, we'll use a number of other Clojure and Java libraries, including Clojure itself. Leiningen will take care of downloading these for us as we need them.

You'll also need a text editor or Integrated Development Environment (IDE). If you already have a text editor of your choice, you can probably use it. See http://clojure.org/getting_started for tips and plugins for using your particular favorite environment. If you don't have a preference, I'd suggest that you take a look at using Eclipse with Counterclockwise. There are instructions to this set up at https://code.google.com/p/counterclockwise/.

That is all that's required. However, at various places throughout the book, some recipes will access other software. The recipes in Chapter 8, Working with Mathematica and R, that are related to Mathematica will require Mathematica, obviously, and those that are related to R will require that. However, these programs won't be used in the rest of the book, and whether you're interested in those recipes might depend on whether you already have this software.

Who this book is for

This book is for programmers or data scientists who are familiar with Clojure and want to use it in their data analysis processes. This isn't a tutorial on Clojure—there are already a number of excellent introductory books out there—so you'll need to be familiar with the language, but you don't need to be an expert.

Likewise, you don't have to be an expert on data analysis, although you should probably be familiar with its tasks, processes, and techniques. While you might be able to glean enough from these recipes to get started with, for it to be truly effective, you'll want to get a more thorough introduction to this field.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Now, there will be a new subdirectory named getting-data.

A block of code is set as follows:

(defproject getting-data 0.1.0-SNAPSHOT

:description FIXME: write description

:url http://example.com/FIXME

:license {:name Eclipse Public License

:url http://www.eclipse.org/legal/epl-v10.html}

:dependencies [[org.clojure/clojure 1.6.0]])

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

(defn watch-debugging

[input-file]

(let [reader (agent

(seque

(mapcat

lazy-read-csv

input-files)))

caster (agent nil)

sink (agent [])

counter (ref 0)

done (ref false)]

(add-watch caster :counter

(partial watch-caster counter))

(add-watch caster :debug debug-watch)

(send reader read-row caster sink done)

(wait-for-it 250 done)

{:results @sink

:count-watcher @counter}))

Any command-line input or output is written as follows:

$ lein new getting-data Generating a project called getting-data based on the default template. To see other templates (app, lein plugin, etc), try lein help new.

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: Take a look at the Hadoop website for the Getting Started documentation of your version. Get a single node setup working.

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <feedback@packtpub.com>, and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Downloading the color images of this book

We also provide you a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from: https://www.packtpub.com/sites/default/files/downloads/B03480_coloredimages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright

Enjoying the preview?

Page 1 of 1

Clojure Data Analysis Cookbook - Second Edition

About this ebook

Eric Rochester

Related authors

Related to Clojure Data Analysis Cookbook - Second Edition

Related ebooks

Programming For You

Related podcast episodes

Related articles

Related categories

Reviews for Clojure Data Analysis Cookbook - Second Edition

What did you think?

Book preview

Clojure Data Analysis Cookbook - Second Edition - Eric Rochester

Table of Contents

Clojure Data Analysis Cookbook Second Edition

Clojure Data Analysis Cookbook Second Edition

Credits

About the Author

About the Reviewers

Support files, eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Errata

Piracy