Data Analysis in the Cloud: Models, Techniques and Applications

Ebook225 pages2 hours

Data Analysis in the Cloud: Models, Techniques and Applications

Name: Data Analysis in the Cloud: Models, Techniques and Applications
Author: Domenico Talia
ISBN: 9780128029145

By Domenico Talia, Paolo Trunfio and Fabrizio Marozzo

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Data Analysis in the Cloud introduces and discusses models, methods, techniques, and systems to analyze the large number of digital data sources available on the Internet using the computing and storage facilities of the cloud.

Coverage includes scalable data mining and knowledge discovery techniques together with cloud computing concepts, models, and systems. Specific sections focus on map-reduce and NoSQL models. The book also includes techniques for conducting high-performance distributed analysis of large data on clouds. Finally, the book examines research trends such as Big Data pervasive computing, data-intensive exascale computing, and massive social network analysis.

Introduces data analysis techniques and cloud computing concepts
Describes cloud-based models and systems for Big Data analytics
Provides examples of the state-of-the-art in cloud data analysis
Explains how to develop large-scale data mining applications on clouds
Outlines the main research trends in the area of scalable Big Data analysis

Skip carousel

LanguageEnglish

PublisherElsevier Science

Release dateSep 15, 2015

ISBN9780128029145

Author

Domenico Talia

Domenico Talia is a professor of computer engineering at University of Calabria and partner of two startups: DtoK Lab and Exeura. His research interests include parallel and distributed data mining algorithms, cloud computing, social data analysis, distributed knowledge discovery, mobile computing, green computing systems, peer-to-peer systems, and parallel programming. He is the author of several books including Service-Oriented Distributed Knowledge Discovery (CRC 2012) and Grid Middleware and Services: Challenges and Solutions (Springer 2010), and more than 300 papers in archival journals such as CACM, IEEE TKDE, ACM Computing Surveys, FGCS, Parallel Computing, IEEE Internet Computing and international conference proceedings. He is a member of the editorial boards of many journals including IEEE Transactions on Cloud Computing, the Future Generation Computer Systems journal, Journal of Cloud Computing, and The International Journal on Web and Grid Services.

Related authors

Skip carousel

Related to Data Analysis in the Cloud

Related ebooks

Skip carousel

Introduction to Data Science Using R
Ebook
Introduction to Data Science Using R
byPrema Alla
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph
Ebook
Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph
byDavid Loshin
Rating: 5 out of 5 stars
5/5
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
Ebook
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
bydaniel Huston
Rating: 0 out of 5 stars
0 ratings
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
Ebook
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
byFLOYD BAX
Rating: 0 out of 5 stars
0 ratings
Python Data Science: A Step-By-Step Guide to Data Analysis. What a Beginner Needs to Know About Machine Learning and Artificial Intelligence. Exercises Included
Ebook
Python Data Science: A Step-By-Step Guide to Data Analysis. What a Beginner Needs to Know About Machine Learning and Artificial Intelligence. Exercises Included
byAxel Ross
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis Cookbook
Ebook
Practical Data Analysis Cookbook
byTomasz Drabas
Rating: 0 out of 5 stars
0 ratings
Pervasive Computing: Next Generation Platforms for Intelligent Data Collection
Ebook
Pervasive Computing: Next Generation Platforms for Intelligent Data Collection
byCiprian Dobre
Rating: 5 out of 5 stars
5/5
Internet Economics: Models, Mechanisms and Management
Ebook
Internet Economics: Models, Mechanisms and Management
byHans W. Gottinger
Rating: 0 out of 5 stars
0 ratings
Artificial Intelligence (AI) Unleashed
Ebook
Artificial Intelligence (AI) Unleashed
byMichael McNaught
Rating: 0 out of 5 stars
0 ratings
Knowledge Discovery in the Social Sciences: A Data Mining Approach
Ebook
Knowledge Discovery in the Social Sciences: A Data Mining Approach
byProf. Xiaoling Shu
Rating: 0 out of 5 stars
0 ratings
Data Processing and Modeling with Hadoop: Mastering Hadoop Ecosystem Including ETL, Data Vault, DMBok, GDPR, and Various Data-Centric Tools
Ebook
Data Processing and Modeling with Hadoop: Mastering Hadoop Ecosystem Including ETL, Data Vault, DMBok, GDPR, and Various Data-Centric Tools
byVinicius Aquino do Vale
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics: Disruptive Technologies for Changing the Game
Ebook
Big Data Analytics: Disruptive Technologies for Changing the Game
byArvind Sathi
Rating: 4 out of 5 stars
4/5
Microsoft Windows Security Essentials
Ebook
Microsoft Windows Security Essentials
byDarril Gibson
Rating: 5 out of 5 stars
5/5
Optimizing Optimization: The Next Generation of Optimization Applications and Theory
Ebook
Optimizing Optimization: The Next Generation of Optimization Applications and Theory
byStephen Satchell
Rating: 5 out of 5 stars
5/5
Mathematical Approaches to Neural Networks
Ebook
Mathematical Approaches to Neural Networks
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
BCS Glossary of Computing
Ebook
BCS Glossary of Computing
byArnold Burdett
Rating: 0 out of 5 stars
0 ratings
Hands-On Machine Learning with Microsoft Excel 2019: Build complete data analysis flows, from data collection to visualization
Ebook
Hands-On Machine Learning with Microsoft Excel 2019: Build complete data analysis flows, from data collection to visualization
byJulio Cesar Rodriguez Martino
Rating: 0 out of 5 stars
0 ratings
Graph Databases A Complete Guide - 2019 Edition
Ebook
Graph Databases A Complete Guide - 2019 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Secure Chains: Cybersecurity and Blockchain-powered Automation
Ebook
Secure Chains: Cybersecurity and Blockchain-powered Automation
bySrinivas Mahankali
Rating: 0 out of 5 stars
0 ratings
Develop Use Cases Second Edition
Ebook
Develop Use Cases Second Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Data Analysis Using SQL and Excel
Ebook
Data Analysis Using SQL and Excel
byGordon S. Linoff
Rating: 3 out of 5 stars
3/5
Mastering MLOps Architecture: From Code to Deployment: Manage the production cycle of continual learning ML models with MLOps (English Edition)
Ebook
Mastering MLOps Architecture: From Code to Deployment: Manage the production cycle of continual learning ML models with MLOps (English Edition)
byRaman Jhajj
Rating: 0 out of 5 stars
0 ratings
Learning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition
Ebook
Learning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition
byJoshua N. Milligan
Rating: 0 out of 5 stars
0 ratings
Generating eBook Income for Intellectuals: A Comprehensive Guide to Creating and Monetizing Digital Books
Ebook
Generating eBook Income for Intellectuals: A Comprehensive Guide to Creating and Monetizing Digital Books
byMarina Peters
Rating: 0 out of 5 stars
0 ratings
Principles and Labs for Deep Learning
Ebook
Principles and Labs for Deep Learning
byShih-Chia Huang
Rating: 0 out of 5 stars
0 ratings
Master Data Model A Complete Guide - 2020 Edition
Ebook
Master Data Model A Complete Guide - 2020 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis
Ebook
Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis
byColleen McCue
Rating: 4 out of 5 stars
4/5
Mastering Python Data Analysis
Ebook
Mastering Python Data Analysis
byMagnus Vilhelm Persson
Rating: 0 out of 5 stars
0 ratings
Practical Python Data Visualization: A Fast Track Approach To Learning Data Visualization With Python
Ebook
Practical Python Data Visualization: A Fast Track Approach To Learning Data Visualization With Python
byAshwin Pajankar
Rating: 4 out of 5 stars
4/5
Business Modeling and Data Mining
Ebook
Business Modeling and Data Mining
byDorian Pyle
Rating: 3 out of 5 stars
3/5

Enterprise Applications For You

Skip carousel

Bitcoin For Dummies
Ebook
Bitcoin For Dummies
byPrypto
Rating: 4 out of 5 stars
4/5
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
The Ridiculously Simple Guide to Google Docs: A Practical Guide to Cloud-Based Word Processing
Ebook
The Ridiculously Simple Guide to Google Docs: A Practical Guide to Cloud-Based Word Processing
byScott La Counte
Rating: 0 out of 5 stars
0 ratings
QuickBooks 2023 All-in-One For Dummies
Ebook
QuickBooks 2023 All-in-One For Dummies
byStephen L. Nelson
Rating: 0 out of 5 stars
0 ratings
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Scrivener For Dummies
Ebook
Scrivener For Dummies
byGwen Hernandez
Rating: 4 out of 5 stars
4/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
Excel 2019 For Dummies
Ebook
Excel 2019 For Dummies
byGreg Harvey
Rating: 3 out of 5 stars
3/5
Systems Thinking: Managing Chaos and Complexity: A Platform for Designing Business Architecture
Ebook
Systems Thinking: Managing Chaos and Complexity: A Platform for Designing Business Architecture
byJamshid Gharajedaghi
Rating: 4 out of 5 stars
4/5
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
Excel for Beginners 2023: A Step-by-Step and Quick Reference Guide to Master the Fundamentals, Formulas, Functions, & Charts in Excel with Practical Examples | A Complete Excel Shortcuts Cheat Sheet
Ebook
Excel for Beginners 2023: A Step-by-Step and Quick Reference Guide to Master the Fundamentals, Formulas, Functions, & Charts in Excel with Practical Examples | A Complete Excel Shortcuts Cheat Sheet
byJames H. Moyle
Rating: 0 out of 5 stars
0 ratings
50 Useful Excel Functions: Excel Essentials, #3
Ebook
50 Useful Excel Functions: Excel Essentials, #3
byM.L. Humphrey
Rating: 5 out of 5 stars
5/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
The New Email Revolution: Save Time, Make Money, and Write Emails People Actually Want to Read!
Ebook
The New Email Revolution: Save Time, Make Money, and Write Emails People Actually Want to Read!
byRobert W. Bly
Rating: 5 out of 5 stars
5/5
QuickBooks Online For Dummies
Ebook
QuickBooks Online For Dummies
byElaine Marmel
Rating: 0 out of 5 stars
0 ratings
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
Ebook
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
byTJ Books
Rating: 3 out of 5 stars
3/5
Excel Formulas and Functions 2020: Excel Academy, #1
Ebook
Excel Formulas and Functions 2020: Excel Academy, #1
byAdam Ramirez
Rating: 4 out of 5 stars
4/5
Excel 2023 for Beginners: A Complete Quick Reference Guide from Beginner to Advanced with Simple Tips and Tricks to Master All Essential Fundamentals, Formulas, Functions, Charts, Tools, & Shortcuts
Ebook
Excel 2023 for Beginners: A Complete Quick Reference Guide from Beginner to Advanced with Simple Tips and Tricks to Master All Essential Fundamentals, Formulas, Functions, Charts, Tools, & Shortcuts
byTerry R. Hoffmann
Rating: 0 out of 5 stars
0 ratings
Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program
Ebook
Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program
byJohn Ladley
Rating: 4 out of 5 stars
4/5
QuickBooks Online For Dummies
Ebook
QuickBooks Online For Dummies
byDavid H. Ringstrom
Rating: 0 out of 5 stars
0 ratings
MrExcel XL: The 40 Greatest Excel Tips of All Time
Ebook
MrExcel XL: The 40 Greatest Excel Tips of All Time
byBill Jelen
Rating: 4 out of 5 stars
4/5
Enterprise AI For Dummies
Ebook
Enterprise AI For Dummies
byZachary Jarvinen
Rating: 3 out of 5 stars
3/5
Experts' Guide to OneNote
Ebook
Experts' Guide to OneNote
byJeremy P. Jones
Rating: 5 out of 5 stars
5/5
Mastering QuickBooks 2020: The ultimate guide to bookkeeping and QuickBooks Online
Ebook
Mastering QuickBooks 2020: The ultimate guide to bookkeeping and QuickBooks Online
byCrystalynn Shelton
Rating: 0 out of 5 stars
0 ratings
Microsoft Power Platform A Deep Dive: Dig into Power Apps, Power Automate, Power BI, and Power Virtual Agents (English Edition)
Ebook
Microsoft Power Platform A Deep Dive: Dig into Power Apps, Power Automate, Power BI, and Power Virtual Agents (English Edition)
byBijay Kumar Sahoo
Rating: 0 out of 5 stars
0 ratings
QuickBooks 2021 For Dummies
Ebook
QuickBooks 2021 For Dummies
byStephen L. Nelson
Rating: 0 out of 5 stars
0 ratings
Excel Formulas That Automate Tasks You No Longer Have Time For
Ebook
Excel Formulas That Automate Tasks You No Longer Have Time For
byErik Kopp
Rating: 5 out of 5 stars
5/5
Excel 2016 For Dummies
Ebook
Excel 2016 For Dummies
byGreg Harvey
Rating: 4 out of 5 stars
4/5
Managing Humans: Biting and Humorous Tales of a Software Engineering Manager
Ebook
Managing Humans: Biting and Humorous Tales of a Software Engineering Manager
byMichael Lopp
Rating: 4 out of 5 stars
4/5
101 Ready-to-Use Excel Formulas
Ebook
101 Ready-to-Use Excel Formulas
byMichael Alexander
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
Podcast episode
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
byData Engineering Podcast
0 ratings
0% found this document useful
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
Podcast episode
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Streaming Data Pipelines Made SQL With Decodable: An interview with Eric Sammer about the difficulty of working with streaming engines at a low level of abstraction and how he and his team at Decodable are working to make development of streaming data pipelines as straightforward as writing SQL
Podcast episode
Streaming Data Pipelines Made SQL With Decodable: An interview with Eric Sammer about the difficulty of working with streaming engines at a low level of abstraction and how he and his team at Decodable are working to make development of streaming data pipelines as straightforward as writing SQL
byData Engineering Podcast
0 ratings
0% found this document useful
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188: In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern...
Podcast episode
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188: In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Data Operations vs. Data Analytics: Are we doing data and analytics correctly? Self service, centralization vs decentralization, analytics vs operations… so many aspects that data teams need to consider. Join this week’s episode of Catalog & Cocktails with hos...
Podcast episode
Data Operations vs. Data Analytics: Are we doing data and analytics correctly? Self service, centralization vs decentralization, analytics vs operations… so many aspects that data teams need to consider. Join this week’s episode of Catalog & Cocktails with hos...
byCatalog & Cocktails: The Honest, No-BS Data Podcast
0 ratings
0% found this document useful
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
Podcast episode
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
byPHPRoundtable Podcast
0 ratings
0% found this document useful
569: What fuels our desires? (with Luke Burgis): For this episode, we interviewed the author of Wanting: The Power of Mimetic Desire in Everyday Life, Luke Burgis. Why do we want what we want? This topic is rarely presented in conversations, business meetings, or political agendas, but it has a...
Podcast episode
569: What fuels our desires? (with Luke Burgis): For this episode, we interviewed the author of Wanting: The Power of Mimetic Desire in Everyday Life, Luke Burgis. Why do we want what we want? This topic is rarely presented in conversations, business meetings, or political agendas, but it has a...
byCase Interview Preparation & Management Consulting | Strategy | Critical Thinking
0 ratings
0% found this document useful
What is beyond PoCs? ML project-hurdles you should be prepared to take with Balázs Kégl - 016: Why do we do PoCs all the time and why do we struggle with Real projects? We are going to talk about ML project-hurdles with the head of AI at Huawei Paris, Balazs Kegl.
Podcast episode
What is beyond PoCs? ML project-hurdles you should be prepared to take with Balázs Kégl - 016: Why do we do PoCs all the time and why do we struggle with Real projects? We are going to talk about ML project-hurdles with the head of AI at Huawei Paris, Balazs Kegl.
byMachine Learning Cafe
0 ratings
0% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Writing The Book That Offers A Single Reference For The Fundamentals Of Data Engineering: An interview with Joe Reis and Matt Housley about their experience and insights gained while writing the book "Fundamentals of Data Engineering" and the inherent challenges of offering a single reference that covers the variety of skills necessary to work as a data engineer.
Podcast episode
Writing The Book That Offers A Single Reference For The Fundamentals Of Data Engineering: An interview with Joe Reis and Matt Housley about their experience and insights gained while writing the book "Fundamentals of Data Engineering" and the inherent challenges of offering a single reference that covers the variety of skills necessary to work as a data engineer.
byData Engineering Podcast
0 ratings
0% found this document useful
4 + 1 Model of Data Science: Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning. In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado. Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado: “The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The fol
Podcast episode
4 + 1 Model of Data Science: Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning. In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado. Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado: “The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The fol
byUVA Data Points
0 ratings
0% found this document useful
[Bite] Data Science and the Scientific Method
Podcast episode
[Bite] Data Science and the Scientific Method
byDataCafé
0 ratings
0% found this document useful
37. Sean Knapp - The brave new world of data engineering
Podcast episode
37. Sean Knapp - The brave new world of data engineering
byTowards Data Science
0 ratings
0% found this document useful
48. Big Data Wrangling for Core Sensing Technology
Podcast episode
48. Big Data Wrangling for Core Sensing Technology
byDiscovery to Recovery
0 ratings
0% found this document useful
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
Podcast episode
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
byLinear Digressions
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
Ep. 145 - Laura Anne Edwards, DATA OASIS founder, NASA Datanaut, TED Resident & SheCanHackIT on Sustainable Innovation and Big Data: Laura Anne Edwards is founder of DATA OASIS and serves as a NASA Datanaut, TED Resident and with SheCanHackIT. Brian Ardinger, Inside Outside Innovation founder, talks with Laura Anne about sustainable innovation and big data. Important Take Aways: Su
Podcast episode
Ep. 145 - Laura Anne Edwards, DATA OASIS founder, NASA Datanaut, TED Resident & SheCanHackIT on Sustainable Innovation and Big Data: Laura Anne Edwards is founder of DATA OASIS and serves as a NASA Datanaut, TED Resident and with SheCanHackIT. Brian Ardinger, Inside Outside Innovation founder, talks with Laura Anne about sustainable innovation and big data. Important Take Aways: Su
byInside Outside Innovation
0 ratings
0% found this document useful
007 Prof. Kristin Persson of the Materials Project – Building a Global Materials Informatics Platform: Summary: This episode focuses on Prof. Kristin Persson’s work directing the Materials Project, where she had her group have built an open-source materials informatics platform that reaches over 75,000 users worldwide. In this episode,...
Podcast episode
007 Prof. Kristin Persson of the Materials Project – Building a Global Materials Informatics Platform: Summary: This episode focuses on Prof. Kristin Persson’s work directing the Materials Project, where she had her group have built an open-source materials informatics platform that reaches over 75,000 users worldwide. In this episode,...
byDataLab: The Materials Informatics Podcast
0 ratings
0% found this document useful
Debezium - Capturing Data the Instant it Happens (with Gunnar Morling)
Podcast episode
Debezium - Capturing Data the Instant it Happens (with Gunnar Morling)
byDeveloper Voices
0 ratings
0% found this document useful
A "AI & ML" Look Ahead for 2020
Podcast episode
A "AI & ML" Look Ahead for 2020
byThe Cloudcast
0 ratings
0% found this document useful
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel: Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.
Podcast episode
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel: Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.
byData Engineering Podcast
0 ratings
0% found this document useful
Product Owners in Data Science - Anna Hannemann
Podcast episode
Product Owners in Data Science - Anna Hannemann
byDataTalks.Club
0 ratings
0% found this document useful
Database Monitoring & Observability
Podcast episode
Database Monitoring & Observability
byThe Cloudcast
0 ratings
0% found this document useful
Intelligently Building Community in the AI and Data Science Space—Dr. Alex Liu—RMDS Lab: Former IBM Chief Scientist, Dr. Alex Liu, discusses the services provided by RMDS Lab, a community-based ecosystem provider in the artificial intelligence (AI) and big data sector. You will learn: Why AI and data-related...
Podcast episode
Intelligently Building Community in the AI and Data Science Space—Dr. Alex Liu—RMDS Lab: Former IBM Chief Scientist, Dr. Alex Liu, discusses the services provided by RMDS Lab, a community-based ecosystem provider in the artificial intelligence (AI) and big data sector. You will learn: Why AI and data-related...
byFinding Genius Podcast
0 ratings
0% found this document useful
Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
Podcast episode
Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
byData Engineering Podcast
0 ratings
0% found this document useful
Fast.ai, AutoML, and Software Engineering for ML: Jeremy Howard // Coffee Session #47
Podcast episode
Fast.ai, AutoML, and Software Engineering for ML: Jeremy Howard // Coffee Session #47
byMLOps.community
0 ratings
0% found this document useful
Understanding Time-Series Database Patterns
Podcast episode
Understanding Time-Series Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
Putting machine learning into a database: Most data scientists bounce back and forth regula…
Podcast episode
Putting machine learning into a database: Most data scientists bounce back and forth regula…
byLinear Digressions
0 ratings
0% found this document useful
Strachey Lecture - Privacy-preserving analytics in, or out of, the cloud: This talk is about the experience of providing privacy when running analytics on users’ personal data.
Podcast episode
Strachey Lecture - Privacy-preserving analytics in, or out of, the cloud: This talk is about the experience of providing privacy when running analytics on users’ personal data.
byComputer Science
0 ratings
0% found this document useful
Mastering Algorithms and Data Structures - Marcello La Rocca
Podcast episode
Mastering Algorithms and Data Structures - Marcello La Rocca
byDataTalks.Club
0 ratings
0% found this document useful

Skip carousel

Don’t Be Misled by GPT-4’s Gift of Gab
The Atlantic
Article
Don’t Be Misled by GPT-4’s Gift of Gab
Mar 15, 2023
4 min read
Inform And Enhance Your Business With Open Data
PC Pro Magazine
Article
Inform And Enhance Your Business With Open Data
Jun 10, 2021
7 min read
Team Encodes Digital ‘Hello’ Into Lab-made DNA
Futurity
Article
Team Encodes Digital ‘Hello’ Into Lab-made DNA
Mar 26, 2019
4 min read
How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
How To Make Sense From And With AI ?
The European Business Review
Article
How To Make Sense From And With AI ?
Sep 25, 2021
4 min read
Quantum Computing and The Rise Of Machine Learning
Techfastly
Article
Quantum Computing and The Rise Of Machine Learning
Oct 1, 2021
2 min read
Data Centers Aren’t The Energy Hogs We Thought
Futurity
Article
Data Centers Aren’t The Energy Hogs We Thought
Feb 28, 2020
2 min read
Quantum Simulators An Overview
Techfastly
Article
Quantum Simulators An Overview
Oct 1, 2021
4 min read
Cryptographers Solve Decades-Old Privacy Problem
Nautilus
Article
Cryptographers Solve Decades-Old Privacy Problem
Nov 17, 2023
4 min read
‘Deep Learning’ Goes Faster With Organized Data
Futurity
Article
‘Deep Learning’ Goes Faster With Organized Data
Jun 5, 2017
Researchers have found that a technique for speedy data lookup, called hashing, can dramatically reduce the amount of computation required for deep learning, a demanding form of machine learning. “This applies to any deep-learning architecture, and t
2 min read
Federated Learning Uses The Data Right On Our Devices
Futurity
Article
Federated Learning Uses The Data Right On Our Devices
Jul 21, 2022
2 min read
Finding Your Data
APC
Article
Finding Your Data
Sep 9, 2019
4 min read
The Future Is All Quantum
Techfastly
Article
The Future Is All Quantum
Oct 1, 2021
2 min read
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
The European Business Review
Article
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
May 25, 2021
8 min read
Building Trends, Building Momentum
Facility Management
Article
Building Trends, Building Momentum
Oct 14, 2019
3 min read
The Cloud Is All Around Us
MoneyWeek
Article
The Cloud Is All Around Us
Mar 17, 2023
The ways the cloud can be used in our day-to-day lives is unlimited, as these examples help to illustrate. Within entertainment, whether it’s Disney+ or Netflix, the television shows and films we watch are stored in the cloud so that millions can sim
2 min read
Safer Cyber
Cosmos Magazine
Article
Safer Cyber
Mar 14, 2024
3 min read
How To Train Computers Faster For ‘Extreme’ Datasets
Futurity
Article
How To Train Computers Faster For ‘Extreme’ Datasets
Dec 12, 2019
4 min read
Facilities Systems
Facility Management
Article
Facilities Systems
Oct 21, 2018
5 min read
Prototype Paves Way For ‘Computer-on-a-chip’
Futurity
Article
Prototype Paves Way For ‘Computer-on-a-chip’
Feb 22, 2019
2 min read
Herd In The Cloud
Linux Format
Article
Herd In The Cloud
Sep 21, 2021
Matt Yonkovit is Percona’s Head of Open Source Strategy and a member of SHA (Silly Hats Anonymous). “Going ‘cloud native’ involves building applications in new ways. Traditional applications are generally designed with a two- or three-tier architectu
1 min read
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Nautilus
Article
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
Business applications For Quantum computing
Rotman Management
Article
Business applications For Quantum computing
May 1, 2022
COMPUTERS DO ARITHMETIC. Underlying every amazing application of computers today is math, calculated using binary digits or ‘bits.’ The original computers of the early 1950s could perform about 465 multiplications per second — much faster than the ‘h
11 min read
Tensor Flow 101
APC
Article
Tensor Flow 101
Jan 27, 2020
4 min read
Is Artificial Intelligence Permanently Inscrutable?
Nautilus
Article
Is Artificial Intelligence Permanently Inscrutable?
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
What Tech Can Learn from the Fruit Fly’s Search Algorithm
Nautilus
Article
What Tech Can Learn from the Fruit Fly’s Search Algorithm
Nov 13, 2017
5 min read
Microchips Use ‘Sparse Coding’ to Recognize Objects Like We Do
Futurity
Article
Microchips Use ‘Sparse Coding’ to Recognize Objects Like We Do
May 26, 2017
3 min read
Saxo Bank And Thoughtworks: Enabling Data Democratization At A Global Investment Bank
Business Today
Article
Saxo Bank And Thoughtworks: Enabling Data Democratization At A Global Investment Bank
Jan 20, 2023
2 min read
Pragmatic Parametricism
Architectural Review Asia Pacific
Article
Pragmatic Parametricism
Nov 13, 2020
4 min read
The Future Of The Data Economy
The European Business Review
Article
The Future Of The Data Economy
Jun 1, 2022
6 min read

Related categories

Skip carousel

Reviews for Data Analysis in the Cloud

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Data Analysis in the Cloud - Domenico Talia

Data Analysis in the Cloud

Models, Techniques and Applications

Domenico Talia

Paolo Trunfio

Fabrizio Marozzo

Cover

Title page

Copyright

Dedication

Preface

Chapter 1: Introduction to Data Mining

Abstract

1.1. Data mining concepts

1.2. Parallel and distributed data mining

1.3. Summary

Chapter 2: Introduction to Cloud Computing

Abstract

2.1. Cloud computing: definition, models, and architectures

2.2. Cloud computing systems for data-intensive applications

2.3. Summary

Chapter 3: Models and Techniques for Cloud-Based Data Analysis

Abstracts

3.1. MapReduce for data analysis

3.2. Data analysis workflows

3.3. NoSQL models for data analytics

3.4. Summary

Chapter 4: Designing and Supporting Scalable Data Analytics

Abstract

4.1. Data analysis systems for clouds

4.2. How to design a scalable data analysis framework in clouds

4.3. Programming workflow-based data analysis

4.4. Data analysis case studies

4.5. Summary

Chapter 5: Research Trends in Big Data Analysis

Abstract

5.1. Data-intensive exascale computing

5.2. Massive social network analysis

5.3. Key research areas

5.4. Summary

Copyright

Elsevier

Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands

The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK

225 Wyman Street, Waltham, MA 02451, USA

No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

Notices

Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

ISBN: 978-0-12-802881-0

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library

Library of Congress Cataloging-in-Publication Data

A catalog record for this book is available from the Library of Congress

For information on all Elsevier publications visit our website at http://store.elsevier.com/

Dedication

To my beloved parents and to my darling family.

Domenico Talia

To my little daughter, Iris, who joined Erika, Thomas, and me along the way.

Paolo Trunfio

To Laura and to my family.

Fabrizio Marozzo

Preface

The massive amount of digital data currently being generated in all the human activities is a precious source of knowledge to both business and science. However, handling and analyzing huge datasets requires very large storage resources and scalable computing facilities. In fact, the large availability of big data sources demands for efficient data analysis tools and techniques for finding and extracting useful knowledge from them. Big data analysis today can be performed by storing data and running compute-intensive data mining algorithms on cloud computing systems to extract value from data in reduced time. Cloud computing systems can be used to run complex applications on dynamic computing servers and deliver them as services over the Internet. According to their elastic nature, cloud computing infrastructures can serve as effective platforms for addressing the computational and data storage needs of most big data analytics applications that are being developed nowadays. Coping with and gaining value from cloud-based big data, however, requires novel software tools and advanced analysis techniques. Indeed, advanced data mining techniques and innovative tools can help users to understand and extract what is useful in large and complex datasets; and the knowledge extracted from big data sources today is vital in making informed decisions in many business and scientific applications. This process, which constitutes the base for allowing the analysis of big data sources and repositories, must be implemented by combining big data analytics and knowledge discovery techniques with scalable computing systems such as clouds.

All these issues are discussed in this book. In fact, the main goal of the book is to introduce and present models, methods, techniques, and systems useful to analyze large digital data sources by using the computing and storage facilities of cloud computing systems. This book includes, as key topics, scalable data mining and knowledge discovery techniques, together with cloud computing concepts, models, and systems. After introducing these fields, this book focuses on scalable technologies for cloud-based data analysis such as MapReduce, workflows, and NoSQL models, and discusses how to design high-performance distributed analysis of big data on clouds. Finally, this book examines research trends such as big data exascale computing, and massive social network analysis.

This book is for graduate students, researchers, and professionals in cloud computing, big data analysis, distributed data mining, and data analytics. Both readers who are beginners to the subjects and those experienced in the cloud computing and data mining domains will find many topics of interest. Researchers will find some of the latest achievements in the area and significant technologies and examples on the state-of-the-art in cloud-based data analysis and knowledge discovery. Furthermore, graduate students and young researchers will learn useful concepts related to parallel and distributed data mining, cloud computing, data-intensive applications, and scalable data analysis.

Other than introducing the key concepts and systems in the area of cloud-based data analysis, this book presents real case studies that provide a useful guide for developers on issues, prospects, and successful approaches in the practical use of cloud-based data analysis frameworks. In this book, the chapters are presented in a way so that the book could also be used as reference text in graduate and postgraduate courses, in parallel/distributed data mining and in cloud computing for big data analysis.

We would like to thank people from the publisher, Elsevier, particularly Lindsay Lawrence, for their support and work during the book publication process.

We hope readers will find this book’s content interesting, attractive, and useful, as we found it stimulating and exciting to write.

Domenico Talia

Paolo Trunfio

Fabrizio Marozzo

Chapter 1

Introduction to Data Mining

Abstract

We introduce in this chapter the main concepts of data mining. This scientific field, together with Cloud computing, discussed in Chapter 2, is a basic pillar on which the contents of this book are built. Section 1.1 explores the main notions and principles of data mining introducing readers to this scientific field and giving them the needed information on sequential data mining techniques and algorithms that will be used in other sections and chapters of this book. Section 1.2 outlines the most important parallel and distributed data mining strategies and techniques.

Keywords

data mining

classification

clustering

association rules

parallel data mining

distributed data mining

meta-learning

collective data mining

ensemble learning

1.1. Data mining concepts

Computers have been created to help humans in executing complex and long operations automatically. One of the main effects of the invention of computers is the very huge amount of digital data that nowadays is stored in the memory of computers. Those data volumes can be used to know and understand facts, behaviors, and natural phenomena and take decisions on the basis of them. Researchers investigated methods for instructing computers to learn from data. In particular, machine learning is a scientific discipline that deals with the design and implementation of models, procedures, and algorithms that can learn from data. Such techniques are able to build a predictive model based on data input to be used for making predictions or taking decisions. More recently, data mining has been defined as an area of computer science where machine learning techniques are used to discover previously unknown properties in large data sets. More formally, data mining is the analysis of data sets to find interesting, novel, and useful patters, relationships, models, and trends. Data mining tasks include methods at the intersection of artificial intelligence, machine learning, statistics, mathematics, and database systems. The overall practical goal of a data mining task is to extract information from a data set and transform it into an understandable structure for further use. Data mining is considered also the central step of the knowledge discovery in databases (KDD) process that aims at discovering useful patterns and models for making sense of data. The additional steps in the KDD process are data preparation, data selection, data cleaning, incorporation of appropriate prior knowledge, and interpretation of the results of mining. They are essential to ensure, together with the data mining step, that useful knowledge is derived from the data that have to be analyzed.

Many data mining algorithms have been designed and implemented in several research areas such as statistics, machine learning, mathematics, artificial intelligence, pattern recognition, and databases, each of which uses specialized techniques from the respective application field. The most common types of data mining tasks include:

• Classification: the goal is to classify a data set in one or more predefined classes. This is done by models that implement a mapping from a vector of values to a categorical variable. In this way using classification we can predict the membership of a data instance to a given class from a set of predefined classes. For instance, a set of outlet store clients can be grouped in three classes: high spending, average spending, and low spending clients or a set of patients can be classified according to a set of diseases. Classification techniques are used in many application domains such financial services, bioinformatics, document classification, multimedia data, text processing, and social network analysis.

• Regression: it is a predictive technique that associates a data set to a quantitative variable and predicts the value of that variable. There are many applications of regression, such as assessing the likelihood that a patient can get sick from the results of diagnostic tests, predicting the margin of victory of a sport team based on results and technical data of previous matches. Regression is often used in economics, environmental studies, market trends, meteorology, and epidemiology.

• Clustering: this data mining task is targeted to identify a finite set of categories or groupings (clusters) to describe the data. Clustering techniques are used when no class to be predicted is available a priori and data instances are to be divided in groups of similar instances. The groups can be mutually exclusive and exhaustive, or consist of a more extensive representation, such as in the case of hierarchical categories. Examples of clustering applications concern the finding of homogeneous subsets of clients in a database of commercial sales or groups of objects with similar shapes and colors. Among the application domains where clustering is used are gene analysis, network intrusion, medical imaging, crime analysis, climatology, and text mining. Unlike classification in which classes are predefined, in clustering the classes must be derived from data, looking for clusters based on metrics of similarity between data without the assistance of users.

• Summarization: this data mining task provides a compact description of a subset of data. Summarization methods for unstructured data usually involve text classification that groups together documents sharing similar characteristics. An example of summarization of quantitative data is the tabulation of the mean and standard deviation of each data field. More complex functions involve summary rules and the discovery of functional relationships between variables. Summarization techniques are often used in the interactive analysis of data and the automatic generation of reports.

• Dependency modeling: this task consists in finding a model that describes significant dependencies between variables. Here the goal is to discover how some data values depend on other data values. Dependency models are at two levels: the structural level of the model specifies which variables are locally dependent on each other, while the quantitative level specifies the power of dependencies using a numeric scale. Dependency modeling approaches are used in retail, business process management, software development, and assembly line optimization.

• Association rule discovery: this task aims at finding sets of items that occur together in records of a data set and the relationships among those items in order to derive multiple correlations that meet the specified thresholds. It is intended to identify strong rules discovered in

Enjoying the preview?

Page 1 of 1

Data Analysis in the Cloud: Models, Techniques and Applications

About this ebook

Domenico Talia

Related authors

Related to Data Analysis in the Cloud

Related ebooks

Enterprise Applications For You

Related podcast episodes

Related articles

Related categories

Reviews for Data Analysis in the Cloud

What did you think?

Book preview

Data Analysis in the Cloud - Domenico Talia

Table of Contents

Copyright

Notices

Dedication

Preface

Abstract

Keywords

1.1. Data mining concepts