Ebook681 pages5 hours

Mastering Clojure Data Analysis

Name: Mastering Clojure Data Analysis
Author: Eric Rochester
ISBN: 9781783284146

By Eric Rochester

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book consists of a practical, exampleoriented approach that aims to help you learn how to use Clojure for data analysis quickly and efficiently. This book is great for those who have experience with Clojure and need to use it to perform data analysis. This book will also be hugely beneficial for readers with basic experience in data analysis and statistics.

Skip carousel

Programming

LanguageEnglish

PublisherPackt Publishing

Release dateMay 26, 2014

ISBN9781783284146

Author

Eric Rochester

Related authors

Skip carousel

Related to Mastering Clojure Data Analysis

Related ebooks

Skip carousel

Clojure for Data Science
Ebook
Clojure for Data Science
byGarner Henry
Rating: 0 out of 5 stars
0 ratings
Data Science Fundamentals for Python and MongoDB
Ebook
Data Science Fundamentals for Python and MongoDB
byDavid Paper
Rating: 0 out of 5 stars
0 ratings
Building a Recommendation System with R
Ebook
Building a Recommendation System with R
byGorakala Suresh K.
Rating: 0 out of 5 stars
0 ratings
Mastering Text Mining with R
Ebook
Mastering Text Mining with R
byAvinash Paul
Rating: 0 out of 5 stars
0 ratings
Practical Data Science with Python 3: Synthesizing Actionable Insights from Data
Ebook
Practical Data Science with Python 3: Synthesizing Actionable Insights from Data
byErvin Varga
Rating: 0 out of 5 stars
0 ratings
Collective Intelligence in Action
Ebook
Collective Intelligence in Action
bySatnam Alag
Rating: 4 out of 5 stars
4/5
Clojure Data Analysis Cookbook - Second Edition
Ebook
Clojure Data Analysis Cookbook - Second Edition
byEric Rochester
Rating: 0 out of 5 stars
0 ratings
Clojure High Performance Programming - Second Edition
Ebook
Clojure High Performance Programming - Second Edition
byKumar Shantanu
Rating: 0 out of 5 stars
0 ratings
Swarm Intelligence
Ebook
Swarm Intelligence
byRussell C. Eberhart
Rating: 4 out of 5 stars
4/5
Dynamics and Stochasticity in Transportation Systems: Tools for Transportation Network Modelling
Ebook
Dynamics and Stochasticity in Transportation Systems: Tools for Transportation Network Modelling
byGiulio E Cantarella
Rating: 0 out of 5 stars
0 ratings
Hybrid Computational Intelligence: Challenges and Applications
Ebook
Hybrid Computational Intelligence: Challenges and Applications
bySiddhartha Bhattacharyya
Rating: 0 out of 5 stars
0 ratings
A Short Guide to Marketing Model Alignment & Design: Advanced Topics in Goal Alignment - Model Formulation
Ebook
A Short Guide to Marketing Model Alignment & Design: Advanced Topics in Goal Alignment - Model Formulation
byDavid Young
Rating: 0 out of 5 stars
0 ratings
Machine Intelligence and Pattern Recognition
Ebook series
Machine Intelligence and Pattern Recognition
byElsevier Books Reference
Pattern Recognition and Artificial Intelligence
Ebook
Pattern Recognition and Artificial Intelligence
byC.H. Chen
Rating: 0 out of 5 stars
0 ratings
Learn Data Science Using SAS Studio: A Quick-Start Guide
Ebook
Learn Data Science Using SAS Studio: A Quick-Start Guide
byEngy Fouda
Rating: 0 out of 5 stars
0 ratings
Pattern-Oriented Software Architecture, On Patterns and Pattern Languages
Ebook
Pattern-Oriented Software Architecture, On Patterns and Pattern Languages
byFrank Buschmann
Rating: 5 out of 5 stars
5/5
WebAssembly Essentials
Ebook
WebAssembly Essentials
byEmrys Callahan
Rating: 0 out of 5 stars
0 ratings
Fleet management software Second Edition
Ebook
Fleet management software Second Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Pro Cryptography and Cryptanalysis: Creating Advanced Algorithms with C# and .NET
Ebook
Pro Cryptography and Cryptanalysis: Creating Advanced Algorithms with C# and .NET
byMarius Iulian Mihailescu
Rating: 0 out of 5 stars
0 ratings
Deep Belief Nets in C++ and CUDA C: Volume 1: Restricted Boltzmann Machines and Supervised Feedforward Networks
Ebook
Deep Belief Nets in C++ and CUDA C: Volume 1: Restricted Boltzmann Machines and Supervised Feedforward Networks
byTimothy Masters
Rating: 0 out of 5 stars
0 ratings
Learning Python with Raspberry Pi
Ebook
Learning Python with Raspberry Pi
byAlex Bradbury
Rating: 0 out of 5 stars
0 ratings
Spatial Regression Analysis Using Eigenvector Spatial Filtering
Ebook
Spatial Regression Analysis Using Eigenvector Spatial Filtering
byDaniel Griffith
Rating: 0 out of 5 stars
0 ratings
Semantic Knowledge Graphing Third Edition
Ebook
Semantic Knowledge Graphing Third Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Hands-On Julia Programming: An Authoritative Guide to the Production-Ready Systems in Julia
Ebook
Hands-On Julia Programming: An Authoritative Guide to the Production-Ready Systems in Julia
bySambit Kumar Dash
Rating: 0 out of 5 stars
0 ratings
Human Recognition in Unconstrained Environments: Using Computer Vision, Pattern Recognition and Machine Learning Methods for Biometrics
Ebook
Human Recognition in Unconstrained Environments: Using Computer Vision, Pattern Recognition and Machine Learning Methods for Biometrics
byMaria De Marsico
Rating: 0 out of 5 stars
0 ratings
Supplier Relationship Management: How to Maximize Vendor Value and Opportunity
Ebook
Supplier Relationship Management: How to Maximize Vendor Value and Opportunity
byStephen Easton
Rating: 0 out of 5 stars
0 ratings
Machine Learning: Hands-On for Developers and Technical Professionals
Ebook
Machine Learning: Hands-On for Developers and Technical Professionals
byJason Bell
Rating: 0 out of 5 stars
0 ratings
Data Science Solutions with Python: Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn
Ebook
Data Science Solutions with Python: Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn
byTshepo Chris Nokeri
Rating: 0 out of 5 stars
0 ratings
Generating a New Reality: From Autoencoders and Adversarial Networks to Deepfakes
Ebook
Generating a New Reality: From Autoencoders and Adversarial Networks to Deepfakes
byMicheal Lanham
Rating: 0 out of 5 stars
0 ratings
Deep Belief Nets in C++ and CUDA C: Volume 2: Autoencoding in the Complex Domain
Ebook
Deep Belief Nets in C++ and CUDA C: Volume 2: Autoencoding in the Complex Domain
byTimothy Masters
Rating: 0 out of 5 stars
0 ratings

Programming For You

Skip carousel

Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
Ebook
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
byBrady Ellison
Rating: 5 out of 5 stars
5/5
Python: Learn Python in 24 Hours
Ebook
Python: Learn Python in 24 Hours
byAlex Nordeen
Rating: 4 out of 5 stars
4/5
Game Development with Unreal Engine 5: Learn the Basics of Game Development in Unreal Engine 5 (English Edition)
Ebook
Game Development with Unreal Engine 5: Learn the Basics of Game Development in Unreal Engine 5 (English Edition)
byMitchell Lynn
Rating: 0 out of 5 stars
0 ratings
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
Ebook
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
byHeath Haskins
Rating: 5 out of 5 stars
5/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
Ebook
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
byi Code Academy
Rating: 5 out of 5 stars
5/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
Ebook
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
byJoseph Labrecque
Rating: 5 out of 5 stars
5/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Data Structures and Algorithm Analysis in Java, Third Edition
Ebook
Data Structures and Algorithm Analysis in Java, Third Edition
byClifford A. Shaffer
Rating: 4 out of 5 stars
4/5
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
Ebook
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
byDavid DuRocher
Rating: 4 out of 5 stars
4/5
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
Ebook
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
byTimothy C. Needham
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
Python for Beginners: Learn the Fundamentals of Computer Programming
Ebook
Python for Beginners: Learn the Fundamentals of Computer Programming
byJ Foster
Rating: 0 out of 5 stars
0 ratings
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
Beginning Programming with Python For Dummies
Ebook
Beginning Programming with Python For Dummies
byJohn Paul Mueller
Rating: 3 out of 5 stars
3/5
SQL All-in-One For Dummies
Ebook
SQL All-in-One For Dummies
byAllen G. Taylor
Rating: 3 out of 5 stars
3/5
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
C++ Learn in 24 Hours
Ebook
C++ Learn in 24 Hours
byAlex Nordeen
Rating: 0 out of 5 stars
0 ratings
C# 7.0 All-in-One For Dummies
Ebook
C# 7.0 All-in-One For Dummies
byBill Sempf
Rating: 0 out of 5 stars
0 ratings
Expert Python Programming - Third Edition: Become a master in Python by learning coding best practices and advanced programming concepts in Python 3.7, 3rd Edition
Ebook
Expert Python Programming - Third Edition: Become a master in Python by learning coding best practices and advanced programming concepts in Python 3.7, 3rd Edition
byMichał Jaworski
Rating: 0 out of 5 stars
0 ratings
Linux: Learn in 24 Hours
Ebook
Linux: Learn in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

[DataFramed Careers Series #3]: Accelerating Data Careers with Writing
Podcast episode
[DataFramed Careers Series #3]: Accelerating Data Careers with Writing
byDataFramed
0 ratings
0% found this document useful
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188: In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern...
Podcast episode
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188: In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Deploying Edge and Embedded AI Systems with Heather Gorr - #655
Podcast episode
Deploying Edge and Embedded AI Systems with Heather Gorr - #655
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Heavy Networking 707: Getting Real With Selector’s AIOps (Sponsored): AI and machine learning are finally being applied to networking in meaningful ways. On today's sponsored show we talk with Selector about its AIOps platform, which ingests networking logs, flows, configurations, SNMP,
Podcast episode
Heavy Networking 707: Getting Real With Selector’s AIOps (Sponsored): AI and machine learning are finally being applied to networking in meaningful ways. On today's sponsored show we talk with Selector about its AIOps platform, which ingests networking logs, flows, configurations, SNMP,
byHeavy Networking
0 ratings
0% found this document useful
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
Podcast episode
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Exploring The Patterns And Practices For Deep Learning With Andrew Ferlitsch: An interview with Andrew Ferlitsch about his experiences building and teaching deep learning models and his work on a book to capture those lessons for everyone to learn from.
Podcast episode
Exploring The Patterns And Practices For Deep Learning With Andrew Ferlitsch: An interview with Andrew Ferlitsch about his experiences building and teaching deep learning models and his work on a book to capture those lessons for everyone to learn from.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
Being Bayesian: This episode explores the root concept of what it is to be Bayesian: describing knowledge of a system probabilistically, having an appropriate prior probability, know how to weigh new evidence, and following Bayes's rule to compute the revised...
Podcast episode
Being Bayesian: This episode explores the root concept of what it is to be Bayesian: describing knowledge of a system probabilistically, having an appropriate prior probability, know how to weigh new evidence, and following Bayes's rule to compute the revised...
byData Skeptic
0 ratings
0% found this document useful
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
Podcast episode
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
#121 — ChatGPT and How Generative AI is Augmenting Workflows
Podcast episode
#121 — ChatGPT and How Generative AI is Augmenting Workflows
byDataFramed
0 ratings
0% found this document useful
Bayesian A/B Testing: Today's guest is Cameron Davidson-Pilon. Cameron has a masters degree in quantitative finance from the University of Waterloo. Think of it as statistics on stock markets. For the last two years he's been the team lead of data science at Shopify. He's...
Podcast episode
Bayesian A/B Testing: Today's guest is Cameron Davidson-Pilon. Cameron has a masters degree in quantitative finance from the University of Waterloo. Think of it as statistics on stock markets. For the last two years he's been the team lead of data science at Shopify. He's...
byData Skeptic
100%
100% found this document useful
Massively Parallel Data Processing In Python Without The Effort Using Bodo: An interview about how Bodo converts standard Python code to native MPI automatically for massive speed ups in data processing workloads
Podcast episode
Massively Parallel Data Processing In Python Without The Effort Using Bodo: An interview about how Bodo converts standard Python code to native MPI automatically for massive speed ups in data processing workloads
byData Engineering Podcast
0 ratings
0% found this document useful
167 | Visualization and Statistics with Andrew Gelman and Jessica Hullman
Podcast episode
167 | Visualization and Statistics with Andrew Gelman and Jessica Hullman
byData Stories
0 ratings
0% found this document useful
Runway Gen-2: Generative AI for Video Creation with Anastasis Germanidis - #622
Podcast episode
Runway Gen-2: Generative AI for Video Creation with Anastasis Germanidis - #622
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
Podcast episode
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
[AI Team Success] Cultivating an Innovation Culture for AI - with Caroline Gorski of Rolls-Royce: Today’s guest is Caroline Gorski, CEO of R2 Data Labs – a data and tech-focused subsidiary of Rolls Royce. In conversation with Emerj CEO Daniel Faggella, they discuss being a new kind of company underneath a century-old legacy luxury name brand...
Podcast episode
[AI Team Success] Cultivating an Innovation Culture for AI - with Caroline Gorski of Rolls-Royce: Today’s guest is Caroline Gorski, CEO of R2 Data Labs – a data and tech-focused subsidiary of Rolls Royce. In conversation with Emerj CEO Daniel Faggella, they discuss being a new kind of company underneath a century-old legacy luxury name brand...
byThe AI in Business Podcast
0 ratings
0% found this document useful
The Outer Limits of Reason: What Science, Mathematics, and Logic Cannot Tell Us: An interview with Noson S. Yanofsky
Podcast episode
The Outer Limits of Reason: What Science, Mathematics, and Logic Cannot Tell Us: An interview with Noson S. Yanofsky
byNew Books in Mathematics
0 ratings
0% found this document useful
Can we predict the accuracy of a Neural Network? Yes, with the WeightWatcher tool by Charles Martin, Ph.D. - 002: In this episode we do a deep dive into deep neural networks. What conclusions can we make looking at the distribution of eigenvalues of each layer?
Podcast episode
Can we predict the accuracy of a Neural Network? Yes, with the WeightWatcher tool by Charles Martin, Ph.D. - 002: In this episode we do a deep dive into deep neural networks. What conclusions can we make looking at the distribution of eigenvalues of each layer?
byMachine Learning Cafe
0 ratings
0% found this document useful
Jonathan Brill, "Rogue Waves: Future-Proof Your Business to Survive and Profit from Radical Change" (McGraw-Hill Education, 2021): An interview with Jonathan Brill
Podcast episode
Jonathan Brill, "Rogue Waves: Future-Proof Your Business to Survive and Profit from Radical Change" (McGraw-Hill Education, 2021): An interview with Jonathan Brill
byNew Books in Economics
0 ratings
0% found this document useful
Declarative Machine Learning For High Performance Deep Learning Models With Predibase
Podcast episode
Declarative Machine Learning For High Performance Deep Learning Models With Predibase
byThe Python Podcast.__init__
0 ratings
0% found this document useful
Peter Jones and Kristel van Ael, "Design Journeys Through Complex Systems" (Bis Publishers, 2022): An interview with Peter Jones and Kristel van Ael
Podcast episode
Peter Jones and Kristel van Ael, "Design Journeys Through Complex Systems" (Bis Publishers, 2022): An interview with Peter Jones and Kristel van Ael
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
Podcast episode
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
Podcast episode
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Ilya Sutskever: Ilya Sutskever, a cofounder and chief scientist of OpenAI and one of the primary minds behind the large language model GPT-4 and it’s public progeny, ChatGPT, talks about AI hallucinations and his vision of AI democracy.
Podcast episode
Ilya Sutskever: Ilya Sutskever, a cofounder and chief scientist of OpenAI and one of the primary minds behind the large language model GPT-4 and it’s public progeny, ChatGPT, talks about AI hallucinations and his vision of AI democracy.
byEye On A.I.
0 ratings
0% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
044 | Tamara Munzner
Podcast episode
044 | Tamara Munzner
byData Stories
0 ratings
0% found this document useful
Flow Architectures & the Future of Streaming Data with James Urquhart: James Urquhart is the global field CTO at VMware. He brings more than 25 years of tech experience to this position, having worked as the global field CTO at Pivotal Software, the general manager of learning services at AWS, SVP of performance analytics at
Podcast episode
Flow Architectures & the Future of Streaming Data with James Urquhart: James Urquhart is the global field CTO at VMware. He brings more than 25 years of tech experience to this position, having worked as the global field CTO at Pivotal Software, the general manager of learning services at AWS, SVP of performance analytics at
byScreaming in the Cloud
0 ratings
0% found this document useful
126 | FlowingData with Nathan Yau
Podcast episode
126 | FlowingData with Nathan Yau
byData Stories
0 ratings
0% found this document useful
41. Bob Nystrom
Podcast episode
41. Bob Nystrom
byIt's All Widgets! Flutter Podcast
0 ratings
0% found this document useful
Hyperparameter Optimization through Neural Network Partitioning with Christos Louizos - #627
Podcast episode
Hyperparameter Optimization through Neural Network Partitioning with Christos Louizos - #627
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
AI’s Legal and Ethical Implications with Sandra Wachter - #521
Podcast episode
AI’s Legal and Ethical Implications with Sandra Wachter - #521
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful

Skip carousel

How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
Why a Hedge Fund Started a Video Game Competition
Nautilus
Article
Why a Hedge Fund Started a Video Game Competition
Nov 30, 2017
There’s a weird way in which a hedge fund is a confluence of everything. There’s the money of course—Two Sigma, located in lower Manhattan, manages over $50 billion, an amount that has grown 600 percent in 6 years and is roughly the size of the econo
9 min read
The Not-Com Bubble Is Popping
The Atlantic
Article
The Not-Com Bubble Is Popping
Oct 18, 2019
4 min read
Rise Of The Robots
Linux Format
Article
Rise Of The Robots
Jan 12, 2021
7 min read
The Truth About Robots
TIME
Article
The Truth About Robots
Feb 4, 2019
Artificial intelligence is powerful—and misunderstood. What we need to know to protect workers
3 min read
GPT-4 Might Just Be a Bloated, Pointless Mess
The Atlantic
Article
GPT-4 Might Just Be a Bloated, Pointless Mess
Mar 6, 2023
4 min read
We Need an FDA For Algorithms
Nautilus
Article
We Need an FDA For Algorithms
Nov 1, 2018
In the introduction to her new book, Hannah Fry points out something interesting about the phrase “Hello World.” It’s never been quite clear, she says, whether the phrase—which is frequently the entire output of a student’s first computer program—is
10 min read
Orchestral Manoeuvres In The Docker
Linux Format
Article
Orchestral Manoeuvres In The Docker
Feb 9, 2021
Jonni’s been arguing with me this issue – he thinks Linux Format readers don’t need virtual machine orchestration. Of course, as always, he’s right, but I’ve never let being wrong stop me before… Just because you don’t actually “need” something doesn
1 min read
The Case For Leaving City Rats Alone: A Vancouver rat study is showing us how pest control can backfire.
Nautilus
Article
The Case For Leaving City Rats Alone: A Vancouver rat study is showing us how pest control can backfire.
Jul 28, 2016
Kaylee Byers crouches in a patch of urban blackberries early one morning this June, to check a live trap in one of Vancouver’s poorest areas, the V6A postal code. Her first catch of the day is near a large blue dumpster on “Block 5,” in front of a 20
8 min read
Get Started Photo Editing With GIMP
Linux Format
Article
Get Started Photo Editing With GIMP
Dec 13, 2022
10 min read
The X Files
Linux Format
Article
The X Files
Jun 29, 2021
6 min read
Don’t Be Misled by GPT-4’s Gift of Gab
The Atlantic
Article
Don’t Be Misled by GPT-4’s Gift of Gab
Mar 15, 2023
4 min read
Computer Scientists Discover Limits of Major Research Algorithm
Quanta
Article
Computer Scientists Discover Limits of Major Research Algorithm
Aug 17, 2021
1 min read
Inside APC
APC
Article
Inside APC
Apr 20, 2023
APC is Australia’s oldest consumer technology magazine – having been consistently in print for over forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on
2 min read
Inside APC
APC
Article
Inside APC
May 22, 2023
2 min read
Inside APC
APC
Article
Inside APC
Mar 20, 2023
APC is Australia’s oldest consumer technology magazine – having been consistently in print for over forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on
2 min read
Inside APC
APC
Article
Inside APC
Jun 19, 2023
APC is Australia’s oldest consumer technology magazine – having been consistently in print for over forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on
2 min read
Inside APC
APC
Article
Inside APC
Oct 31, 2022
2 min read
Inside APC
APC
Article
Inside APC
Feb 20, 2023
APC is Australia’s oldest consumer technology magazine – having been consistently in print for over forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on
2 min read
Inside APC
APC
Article
Inside APC
Feb 20, 2023
APC is Australia’s oldest consumer technology magazine – having been consistently in print for over forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on
2 min read
Inside APC
APC
Article
Inside APC
Aug 8, 2022
APC is Australia’s oldest consumer technology magazine – having been consistently in print for forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on the
2 min read
Inside APC
APC
Article
Inside APC
Sep 5, 2022
APC is Australia’s oldest consumer technology magazine – having been consistently in print for forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on the
2 min read
Inside APC
APC
Article
Inside APC
Oct 3, 2022
2 min read
Inside APC
APC
Article
Inside APC
Jul 17, 2023
2 min read
Machine Learning – With Zero Programming
APC
Article
Machine Learning – With Zero Programming
Aug 12, 2019
6 min read
Machine-learning On Your Android Phone?
APC
Article
Machine-learning On Your Android Phone?
Dec 30, 2019
4 min read
Inside APC
APC
Article
Inside APC
Apr 19, 2021
2 min read
Inside APC
APC
Article
Inside APC
Aug 9, 2021
2 min read
Inside APC
APC
Article
Inside APC
Jul 12, 2021
2 min read
How Spooky Science Helps Us Peer Inside The Planets
All About Space
Article
How Spooky Science Helps Us Peer Inside The Planets
Dec 3, 2020
An assistant professor of computational science at the EPFL research centre in Lausanne, Switzerland, involved in the current research on metallic hydrogen. Could you explain how the machine-learning techniques used in your research work? Why were th
1 min read

Related categories

Skip carousel

Reviews for Mastering Clojure Data Analysis

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Mastering Clojure Data Analysis - Eric Rochester

Mastering Clojure Data Analysis

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Network Analysis – The Six Degrees of Kevin Bacon

Analyzing social networks

Getting the data

Understanding graphs

Implementing the graphs

Loading the data

Measuring social network graphs

Density

Degrees

Paths

Average path length

Network diameter

Clustering coefficient

Centrality

Degrees of separation

Visualizing the graph

Setting up ClojureScript

A force-directed layout

A hive plot

A pie chart

Summary

2. GIS Analysis – Mapping Climate Change

Understanding GIS

Mapping the climate change

Downloading and extracting the data

Downloading the files

Extracting the files

Transforming the data – filtering

Rolling averages

Reading the data

Interpolating sample points and generating heat maps using inverse distance weighting (IDW)

Working with map projections

Finding a base map

Working with ArcGIS

Summary

3. Topic Modeling – Changing Concerns in the State of the Union Addresses

Understanding data in the State of Union addresses

Understanding topic modeling

Preparing for visualizations

Setting up the project

Getting the data

Loading the data into MALLET

Visualizing with D3 and ClojureScript

Exploring the topics

Exploring topic 43

Exploring topic 26

Exploring topic 42

Summary

4. Classifying UFO Sightings

Getting the data

Extracting the data

Dealing with messy data

Visualizing UFO data

Description

Topic modeling descriptions

Hoaxes

Preparing the data

Reading the data into a sequence of data records

Splitting the NUFORC comments

Categorizing the documents based on the comments

Partitioning the documents into directories based on the categories

Dividing them into training and test sets

Classifying the data

Coding the classifier interface

Setting up the Pipe and InstanceList

Training

Classifying

Validating

Tying it all together

Running the classifier and examining the results

Summary

5. Benford's Law – Detecting Natural Progressions of Numbers

Learning about Benford's Law

Applying Benford's law to compound interest

Looking at the world population data

Failing Benford's Law

Case studies

Summary

6. Sentiment Analysis – Categorizing Hotel Reviews

Understanding sentiment analysis

Getting hotel review data

Exploring the data

Preparing the data

Tokenizing

Creating feature vectors

Creating feature vector functions and POS tagging

Cross-validating the results

Calculating error rates

Using the Weka machine learning library

Connecting Weka and cross-validation

Understanding maximum entropy classifiers

Understanding naive Bayesian classifiers

Running the experiment

Examining the results

Combining the error rates

Improving the results

Summary

7. Null Hypothesis Tests – Analyzing Crime Data

Introducing confirmatory data analysis

Understanding null hypothesis testing

Understanding the process

Formulating an initial hypothesis

Stating the null and alternative hypotheses

Determining appropriate tests

Selecting the significance level

Determining the critical region

Calculating the test statistics and its probability

Deciding whether to reject the null hypothesis or not

Flipping coins

Formulating an initial hypothesis

Stating the null and alternative hypotheses

Identifying the statistical assumptions in the sample

Determining appropriate tests

Selecting the significance level

Determining the critical region

Calculating the test statistic and its probability

Deciding whether to reject the null hypothesis or not

Understanding burglary rates

Getting the data

Parsing the Excel files

Pulling out raw data

Growing a data tree

Cutting down the data tree

Putting it all together

Transforming the data

Joining the data sources

Pivoting the data

Filtering the missing data

Putting it all together

Exploring the data

Generating summary statistics

Summarizing UNODC crime data

Summarizing World Bank land area and GNI data

Generating more charts and graphs

Conducting the experiment

Formulating an initial hypothesis

Stating the null and alternative hypotheses

Identifying the statistical assumptions in the sample

Determining which tests are appropriate

Understanding Spearman's rank correlation coefficient

Selecting the significance level

Determining the critical region

Calculating the test statistic and its probability

Deciding whether to reject the null hypothesis or not

Interpreting the results

Summary

8. A/B Testing – Statistical Experiments for the Web

Defining A/B testing

Conducting an A/B test

Planning the experiment

Framing the statistics

Building the experiment

Looking at options to build the site

Implementing A/B testing on the server

Understanding the scaffolded site

Building the test site

Implementing A/B testing

Viewing the results

Looking at A/B testing as a user

Analyzing the results

Understanding the t-test

Testing coin tosses

Testing the results

Summary

9. Analyzing Social Data Participation

Setting up the project

Understanding the analyses

Understanding social network data

Understanding knowledge-based social networks

Introducing the 80/20 rule

Getting the data

Looking at the amount of data

Looking at the data format

Defining and loading the data

Counting frequencies

Sorting and ranking

Finding the patterns of participation

Matching the 80/20 rule

Looking for the 20 percent of questioners

Looking for the 20 percent of respondents

Combining ranks

Looking at those who only post questions

Looking at those who only post answers

Looking at those who post both questions and answers

Finding the up-voted answers

Processing the answers

Predicting the accepted answer

Setting up

Creating the InstanceList object

Training sets and Test sets

Training

Testing

Evaluating the outcome

Summary

10. Modeling Stock Data

Learning about financial data analysis

Setting up the basics

Setting up the library

Getting the data

Getting prepared with data

Working with news articles

Working with stock data

Analyzing the text

Analyzing vocabulary

Stop lists

Hapax and Dis Legomena

TF-IDF

Inspecting the stock prices

Merging text and stock features

Analyzing both text and stock features together with neural nets

Understanding neural nets

Setting up the neural net

Training the neural net

Running the neural net

Validating the neural net

Finding the best parameters

Predicting the future

Loading stock prices

Loading news articles

Creating training and test sets

Finding the best parameters for the neural network

Training and validating the neural network

Running the network on new data

Taking it with a grain of salt

Related to this project

Related to machine learning and market modeling in general

Summary

Index

Mastering Clojure Data Analysis

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: May 2014

Production Reference: 1200514

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78328-413-9

www.packtpub.com

Cover Image by Jarosław Blaminsky (<milak6@wp.pl>)

Credits

Author

Eric Rochester

Reviewers

Masato Hagiwara

Bart Kastermans

Nicholas Quirk

Andrew Stine

Commissioning Editor

Edward Gordon

Acquisition Editor

Greg Wild

Content Development Editor

Athira Laji

Technical Editors

Arwa Manasawala

Mrunmayee Patil

Nachiket Vartak

Copy Editors

Aditya Nair

Stuti Srivastava

Project Coordinator

Neha Thakur

Proofreaders

Simran Bhogal

Ameesha Green

Clyde Jenkins

Indexers

Tejal Soni

Priya Subramani

Graphics

Ronak Dhruv

Yuvraj Mannari

Production Coordinator

Komal Ramchandani

Cover Work

Komal Ramchandani

About the Author

Eric Rochester enjoys reading, writing, and spending time with his wife and kids. When he's not doing these things, he likes to work on programs in a variety of languages and platforms. Currently, he is exploring functional programming languages, including Clojure and Haskell. He has also written Clojure Data Analysis Cookbook, Packt Publishing. He works at the Scholars' Lab library at the University of Virginia, helping the professors and graduate students of humanities realize their digitally informed research agendas.

I'd like to thank almost everyone. My technical reviewers proved invaluable. Also, thank you to the editorial staff at Packt Publishing. This book is much stronger for all of their feedback, and any remaining deficiencies are mine alone.

Thank you to Bethany Nowviskie and Wayne Graham. They've made the Scholars' Lab a great place to work at; they have interesting projects and give us space to explore our own interests as well.

A special thank you to Jackie, Melina, and Micah. They've been exceptionally patient and supportive while I worked on this project. Without them, it wouldn't be worth it.

About the Reviewers

Masato Hagiwara works as a lead scientist at the Rakuten Institute of Technology, New York. He received his PhD in Information Science from Nagoya University in 2009. Before joining Rakuten, he worked at Google and Microsoft Research as an intern, and at Baidu, Japan as a full-time R&D engineer, focusing on Japanese language processing related to search engines. His research interests include Japanese and Chinese word segmentation, knowledge acquisition, transliteration, and language education. He received several awards from Japanese domestic conferences for his work on knowledge acquisition and transliteration. He extensively uses Clojure for his research projects.

To Lynn and Daphne, thank you for filling my life with smiles and happiness.

Bart Kastermans is an academician turned software developer. He has worked in set and computability theory, before giving in to his long-standing interest in information technology. Currently, he is working as a data scientist at AdGoji, a mobile marketing start-up in Amsterdam.

Nicholas Quirk has been a lifelong resident of Massachusetts. He currently works as one of the few in-house programmers for a billion-dollar manufacturing company. Working there for only three years, he was the sole designer and programmer responsible for the rewriting of some legacy applications, most notably, the production scheduling and order entry software. He has a continuous drive for self improvement. His interests tend to sit in two realms; arts and technology, which he likes to meld when the opportunity presents itself. His art interests include watercolors, drawing (traditional and digital), digital photography, learning languages, and playing the piano. His technical interests include learning about functional programming (Clojure, Haskell, or just about any LISP), language design, compilers, virtual machines, and game design. He also has an unending curiosity in typography, sequential art, text editor color schemes, and knowing how to trick the brain into learning.

You can find more information about him at www.nicholas-quirk.com.

I'd like to thank my partner Caitlin. She has a great set of ears and did a fantastic job editing my biography.

Andrew Stine is a software developer from Northern Virginia. He loves coding and has used a wider variety of technologies than he would care to recall. His favorite language is Clojure.

www.PacktPub.com

Support files, eBooks, discount offers, and more

You might want to visit www.PacktPub.com for support files and downloads related to your book.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

http://PacktLib.PacktPub.com

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print and bookmark content

On demand and accessible via web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

Preface

Data has become increasingly important almost everywhere. It's been said that software is eating the world, but that seems even truer of data. Sometimes, it seems that the focus has shifted: companies no long seem to want more users in order to show them advertisements. Now they want more users to gather data on them. Having more data is seen as a tremendous business advantage.

However, data by itself isn't really useful. It has to be analyzed, interrogated, and interpreted. Data scientists are settling on a number of great tools to do this, from R and Python to Hadoop and the web browser.

This book looks at 10 data analysis tasks. Unlike Clojure Data Analysis Cookbook, Packt Publishing, this book examines fewer problems and tries to go into more depth. It's more of a case study approach.

Why use Clojure? Clojure was first released in 2007 by Rich Hickey. It's a member of the lisp family of languages, and it has the strengths and flexibility that they provide. It's also functional, so Clojure programs are easy for reasoning. Also, it has amazing features to work concurrently and in parallel. All of these can help us as we analyze data, while keeping things simple and fast.

Moreover, Clojure runs on Java Virtual Machine (JVM), so any libraries written for Java are available as well. Throughout this book, we'll see many examples of leveraging Java libraries for machine learning and other tasks. This gives Clojure an incredible amount of breadth and power.

I hope that this book will help you analyze your data further and in a better manner and also make the process more fun and enjoyable.

What this book covers

Chapter 1, Network Analysis – The Six Degrees of Kevin Bacon, will discuss how people are socially organized into networks. These networks are reified in interesting ways in online social networks. We'll take the opportunity to get a small dataset from an online social network and analyze and look at how people are related in it.

Chapter 2, GIS Analysis – Mapping Climate Change, will explore how we can work with geographical data. It also walks us through getting the weather data and tying it to a geographical location. It then involves analyzing nearby points together to generate a graphic of a simplified and somewhat naive notion of how climate has changed over the period the weather has been tracked.

Chapter 3, Topic Modeling – Changing Concerns in the State of the Union Addresses, will address how we can scrape free text information off the Internet. It then uses topic modeling to look at the problems that presidents have faced and the themes that they've addressed over the years.

Chapter 4, Classifying UFO Sightings, will take a look at UFO sightings and talk about different ways to explore and get a grasp of what's in the dataset. It will then classify the UFO sightings based on various attributes related to the sightings as well as their descriptions.

Chapter 5, Benford's Law – Detecting Natural Progressions of Numbers, will take a look at the world population data from the World Bank data site. It will discuss Benford's Law and how it can be used to determine whether a set of numbers is naturally generated or artificially or randomly constructed.

Chapter 6, Sentiment Analysis – Categorizing Hotel Reviews, will take a look at the problems and possibilities related to sentiment analysis tasks. These are typically difficult and fraught categorizations of documents based on a notion of positive or negative. In this chapter, we'll also take a look at categorizing, both manually and automatically, a dataset of hotel reviews.

Chapter 7, Null Hypothesis Tests – Analyzing Crime Data, will take a look at planning, constructing, and performing null-hypothesis tests for statistical significance. It will use international crime data to look at the relationship between economic indicators and some types of crime.

Chapter 8, A/B Testing – Statistical Experiments for the Web, will take a look at how to determine which version of a website engages with the users in a better way. Although conceptually simple, this task does have a few pitfalls and danger points to be aware of.

Chapter 9, Analyzing Social Data Participation, will take a look at how people participate in online social networks. We will discuss and demonstrate some ways to analyze this data with an eye toward encouraging more interaction, contributions, and participation.

Chapter 10, Modeling Stock Data, will take a look at how to work with time-series data, stock data, natural language, and neural networks in order to find relationships between news articles and fluctuations in stock prices.

What you need for this book

One piece of software required for this book is JDK, which you can get from http://www.oracle.com/technetwork/java/javase/downloads/index.html. JDK is necessary to run and develop on the Java platform.

The other major piece of software that you'll need is Leiningen 2, which you can download and install from https://github.com/technomancy/leiningen. Leiningen 2 is a tool that is used to manage Clojure projects and their dependencies. It's quickly becoming the de facto standard project tool in the Clojure community.

Throughout this book, we'll use a number of other Clojure and Java libraries, including Clojure itself. Leiningen will take care of downloading these for us as and when we need them.

You'll also need a text editor or Integrated Development Environment (IDE). If you already have a text editor that you like, you can probably use it. Refer to http://dev.clojure.org/display/doc/Getting+Started for tips and plugins to use your particular favorite environment. If you don't have a preference, I'd suggest that you look at using Eclipse with Counterclockwise. There are instructions to get this setup at http://dev.clojure.org/display/doc/Getting+Started+with+Eclipse+and+Counterclockwise.

Who this book is for

If you are a programmer or data scientist who is familiar with Clojure and wants to use it in your data analysis processes, this book is for you. This isn't a tutorial on Clojure—there are already a number of excellent introductory books out there—so you'll need to be familiar with the language; however, you don't need to be an expert at it.

Likewise, you don't need to be an expert on data analysis, although you should probably be familiar with its tasks, processes, and techniques. While you might be able to gain enough from these case studies to get started, you'll want to get a more thorough introduction to this field to be truly effective.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: However, before we start looking at the code, let's check out the Leiningen 2 project.clj file.

A block of code is set as follows:

(ns network-six.graph

(:require [clojure.set :as set]

[clojure.core.reducers :as r]

[clojure.data.json :as json]

[clojure.java.io :as io]

[clojure.set :as set]

[network-six.util :as u]))

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

clojure.lang.PersistentStructMap (extract-text [x] (concat (extract-text (:content x)) (when (contains? #{:span :p} (:tag x)) [\n\n])))

Any command-line input or output is written as follows:

$ cd www $ python -m SimpleHTTPServer Serving HTTP on 0.0.0.0 port 8000 …

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: Right-click on the new layer and select Properties.

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <feedback@packtpub.com>, and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Downloading the color images of this book

We also provide you a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/4139OS_ColoredImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <copyright@packtpub.com> with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at <questions@packtpub.com> if you are having a problem with any aspect of the book, and we will do our best to address it.

Chapter 1. Network Analysis – The Six Degrees of Kevin Bacon

With the popularity of Facebook, Twitter, LinkedIn, and other social networks, we're increasingly defined by who we know and who's in our network. These websites help us manage who we know—whether personally, professionally, or in some other way—and our interactions with those groups and individuals. In exchange, we tell these sites who we are in the network.

These companies, and many others, spend a lot of time on and pay attention to our social networks. What do they say about us, and how can we sell things to these groups?

In this chapter, we'll walk through learning about and analyzing social networks:

Analyzing social networks

Getting the data

Understanding graphs

Implementing the graphs

Measuring social network graphs

Visualizing social network graphs

Analyzing social networks

Although the Internet and popular games such as Six Degrees of Kevin Bacon have popularized the concept, social network analysis has been around for a long time. It has deep roots in sociology. Although the sociologist John A. Barnes may have been the first person to use the term in 1954 in the article Class and communities in a Norwegian island parish (http://garfield.library.upenn.edu/classics1987/A1987H444300001.pdf), he was building on a tradition from the 1930s, and before that, he was looking at social groups and interactions relationally. Researchers contended that the phenomenon arose from social interactions and not individuals.

Slightly more recently, starting in the 1960s, Stanley Milgram has been working on a small world experiment. He would mail a letter to a volunteer somewhere in the mid-western United States and ask him or her to get it to a target individual in Boston. If the volunteer knew the target on a first-name basis, he or she could mail it to him. Otherwise, they would need to pass it to someone they knew who might know the target. At each step, the participants were to mail a postcard to Milgram so that he could track the progress of the letter.

This experiment (and other experiments based on it) has been criticized. For one thing, the participants may decide to just throw the letter away and miss huge swathes of the network. However, the results are evocative. Milgram found that the few letters that made it to the target, did so with an average of six steps. Similar results have been born out by later, similar experiments.

Milgram himself did not use the popular phrase six degrees of separation. This was probably taken from John Guare's play and film Six Degrees of Separation (1990 and 1993). He said he got the concept from Guglielmo Marconi, who discussed it in his 1909 Nobel Prize address.

The phrase six degrees is synonymous with social networks in the popular imagination, and a large part of this is due to the pop culture game Six Degrees of Kevin Bacon. In this game, people would try to find a link between Kevin Bacon and some other actor by tracing the films in which they've worked together.

In this chapter, we'll take a look at this game more critically. We'll use it to explore a network of Facebook (https://www.facebook.com/) users. We'll visualize this network and look at some of its characteristics.

Specifically, we're going to look at a network that has been gathered from Facebook. We'll find data for Facebook users and their friends, and we'll use that data to construct a social network graph. We'll analyze that information to see whether the observation about the six degrees of separation applies to this network. More broadly, we'll see what we can learn about the relationships represented in the network and consider some possible directions for future research.

Getting the data

A couple of small datasets of the Facebook network data are available on the Internet. None of them are particularly large or complete, but they do give us a reasonable snapshot of part of Facebook's network. As the Facebook graph is a private data source, this partial view is probably the best that we can hope for.

We'll get the data from the Stanford Large Network Dataset Collection (http://snap.stanford.edu/data/). This contains a number of network datasets, from Facebook and Twitter, to road networks and citation networks. To do this, we'll download the facebook.tar.gz file from http://snap.stanford.edu/data/egonets-Facebook.html. Once it's on your computer, you can extract it. When I put it into the folder with my source code, it created a directory named facebook.

The directory contains 10 sets of files. Each group is based on one primary vertex (user), and each contains five files. For vertex 0, these files would be as follows:

0.edges: This contains the vertices that the primary one links to.

0.circles: This contains the groupings that the user has created for his or her friends.

0.feat: This contains the features of the vertices that the user is adjacent to and ones that are listed in 0.edges.

0.egofeat: This contains the primary user's features.

0.featnames: This contains the names of the features described in 0.feat and 0.egofeat. For Facebook, these values have been anonymized.

For these purposes, we'll just use the *.edges files.

Now let's turn our attention to the data in the files and what they represent.

Understanding graphs

Graphs are the Swiss army knife of computer science data structures. Theoretically, any other data structure can be represented as a

Enjoying the preview?

Page 1 of 1

Mastering Clojure Data Analysis

About this ebook

Eric Rochester

Related authors

Related to Mastering Clojure Data Analysis

Related ebooks

Programming For You

Related podcast episodes

Related articles

Related categories

Reviews for Mastering Clojure Data Analysis

What did you think?

Book preview

Mastering Clojure Data Analysis - Eric Rochester

Table of Contents

Mastering Clojure Data Analysis

Mastering Clojure Data Analysis

Credits

About the Author

About the Reviewers

Support files, eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

Chapter 1. Network Analysis – The Six Degrees of Kevin Bacon

Analyzing social networks

Getting the data

Understanding graphs