Ebook487 pages5 hours

Building Big Data Applications

Name: Building Big Data Applications
Author: Krish Krishnan
ISBN: 9780128158043

By Krish Krishnan

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Building Big Data Applications helps data managers and their organizations make the most of unstructured data with an existing data warehouse. It provides readers with what they need to know to make sense of how Big Data fits into the world of Data Warehousing. Readers will learn about infrastructure options and integration and come away with a solid understanding on how to leverage various architectures for integration. The book includes a wide range of use cases that will help data managers visualize reference architectures in the context of specific industries (healthcare, big oil, transportation, software, etc.).

Explores various ways to leverage Big Data by effectively integrating it into the data warehouse
Includes real-world case studies which clearly demonstrate Big Data technologies
Provides insights on how to optimize current data warehouse infrastructure and integrate newer infrastructure matching data processing workloads and requirements

Skip carousel

LanguageEnglish

PublisherAcademic Press

Release dateNov 15, 2019

ISBN9780128158043

Author

Krish Krishnan

Krish Krishnan is a recognized expert worldwide in the strategy, architecture and implementation of high performance data warehousing solutions and unstructured Data. A sought after visionary data warehouse thought leader and practitioner, he is ranked as one of the top strategy and architecture consultants in the world in this subject. Krish is also an independent analyst, and a speaker at various conferences around the world on Big Data and teaches at TDWI on this subject. Krish along with other experts is helping drive the industry maturity on the next generation of data warehousing, focusing on Big Data, Semantic Technologies, Crowdsourcing, Analytics, and Platform Engineering. Krish is the founder president of Sixth Sense Advisors Inc., a Chicago based company providing Independent Analyst services in Big Data, Analytics, Data Warehouse and Business Intelligence.

Related to Building Big Data Applications

Related ebooks

Skip carousel

Managing Data in Motion: Data Integration Best Practice Techniques and Technologies
Ebook
Managing Data in Motion: Data Integration Best Practice Techniques and Technologies
byApril Reeve
Rating: 0 out of 5 stars
0 ratings
Data Lake Development with Big Data
Ebook
Data Lake Development with Big Data
byPasupuleti Pradeep
Rating: 0 out of 5 stars
0 ratings
Agile Data Warehousing for the Enterprise: A Guide for Solution Architects and Project Leaders
Ebook
Agile Data Warehousing for the Enterprise: A Guide for Solution Architects and Project Leaders
byRalph Hughes
Rating: 0 out of 5 stars
0 ratings
Testing the Data Warehouse Practicum: Assuring Data Content, Data Structures and Quality
Ebook
Testing the Data Warehouse Practicum: Assuring Data Content, Data Structures and Quality
byDoug Vucevic
Rating: 0 out of 5 stars
0 ratings
Applied Data Mining for Forecasting Using SAS
Ebook
Applied Data Mining for Forecasting Using SAS
byTim Rey
Rating: 0 out of 5 stars
0 ratings
Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud
Ebook
Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud
byDr Mehmet Yildiz
Rating: 5 out of 5 stars
5/5
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
Ebook
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
byVibrant Publishers
Rating: 0 out of 5 stars
0 ratings
Making Big Data Work for Your Business: A guide to effective Big Data analytics
Ebook
Making Big Data Work for Your Business: A guide to effective Big Data analytics
bySudhi Sinha
Rating: 0 out of 5 stars
0 ratings
Modelling Business Information: Entity relationship and class modelling for Business Analysts
Ebook
Modelling Business Information: Entity relationship and class modelling for Business Analysts
byKeith Gordon
Rating: 0 out of 5 stars
0 ratings
Big Data: Understanding How Data Powers Big Business
Ebook
Big Data: Understanding How Data Powers Big Business
byBill Schmarzo
Rating: 2 out of 5 stars
2/5
Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph
Ebook
Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph
byDavid Loshin
Rating: 5 out of 5 stars
5/5
Building a Scalable Data Warehouse with Data Vault 2.0
Ebook
Building a Scalable Data Warehouse with Data Vault 2.0
byDaniel Linstedt
Rating: 4 out of 5 stars
4/5
Data Science for Business: Data Mining, Data Warehousing, Data Analytics, Data Visualization, Data Modelling, Regression Analysis, Big Data and Machine Learning
Ebook
Data Science for Business: Data Mining, Data Warehousing, Data Analytics, Data Visualization, Data Modelling, Regression Analysis, Big Data and Machine Learning
byTravis Goleman
Rating: 0 out of 5 stars
0 ratings
Data Warehousing Fundamentals for IT Professionals
Ebook
Data Warehousing Fundamentals for IT Professionals
byPaulraj Ponniah
Rating: 3 out of 5 stars
3/5
The Data Governance Imperative
Ebook
The Data Governance Imperative
bySteve Sarsfield
Rating: 0 out of 5 stars
0 ratings
Big Data: Opportunities and challenges
Ebook
Big Data: Opportunities and challenges
byBCS, The Chartered Institute for IT
Rating: 0 out of 5 stars
0 ratings
Applied Health Analytics and Informatics Using SAS
Ebook
Applied Health Analytics and Informatics Using SAS
byJoseph M. Woodside
Rating: 0 out of 5 stars
0 ratings
Scalable Big Data Architecture: A practitioners guide to choosing relevant Big Data architecture
Ebook
Scalable Big Data Architecture: A practitioners guide to choosing relevant Big Data architecture
byBahaaldine Azarmi
Rating: 0 out of 5 stars
0 ratings
Hadoop Essentials
Ebook
Hadoop Essentials
byShiva Achari
Rating: 5 out of 5 stars
5/5
Hadoop Real-World Solutions Cookbook - Second Edition
Ebook
Hadoop Real-World Solutions Cookbook - Second Edition
byDeshpande Tanmay
Rating: 0 out of 5 stars
0 ratings
DataOps A Complete Guide - 2020 Edition
Ebook
DataOps A Complete Guide - 2020 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
HDInsight Essentials - Second Edition
Ebook
HDInsight Essentials - Second Edition
byRajesh Nadipalli
Rating: 0 out of 5 stars
0 ratings
Practitioner’s Guide to Data Science: Streamlining Data Science Solutions using Python, Scikit-Learn, and Azure ML Service Platform
Ebook
Practitioner’s Guide to Data Science: Streamlining Data Science Solutions using Python, Scikit-Learn, and Azure ML Service Platform
byNasir Ali Mirza
Rating: 0 out of 5 stars
0 ratings
The Data Model Resource Book: Volume 3: Universal Patterns for Data Modeling
Ebook
The Data Model Resource Book: Volume 3: Universal Patterns for Data Modeling
byLen Silverston
Rating: 0 out of 5 stars
0 ratings
Hadoop: Data Processing and Modelling
Ebook
Hadoop: Data Processing and Modelling
byGarry Turkington
Rating: 0 out of 5 stars
0 ratings
Enterprise Data Warehouse Third Edition
Ebook
Enterprise Data Warehouse Third Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Business Value in an Ocean of Data: Data Mining from a User Perspective
Ebook
Business Value in an Ocean of Data: Data Mining from a User Perspective
byBulcsú Fajszi
Rating: 0 out of 5 stars
0 ratings
Real-Time Big Data Analytics
Ebook
Real-Time Big Data Analytics
byShilpi
Rating: 5 out of 5 stars
5/5
Learning Apache Spark 2
Ebook
Learning Apache Spark 2
byMuhammad Asif Abbasi
Rating: 0 out of 5 stars
0 ratings
Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses
Ebook
Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses
byRick van der Lans
Rating: 4 out of 5 stars
4/5

Enterprise Applications For You

Skip carousel

Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
Bitcoin For Dummies
Ebook
Bitcoin For Dummies
byPrypto
Rating: 4 out of 5 stars
4/5
Learn Windows PowerShell in a Month of Lunches
Ebook
Learn Windows PowerShell in a Month of Lunches
byDon Jones
Rating: 0 out of 5 stars
0 ratings
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Excel Formulas and Functions 2020: Excel Academy, #1
Ebook
Excel Formulas and Functions 2020: Excel Academy, #1
byAdam Ramirez
Rating: 4 out of 5 stars
4/5
Excel 2023 for Beginners: A Complete Quick Reference Guide from Beginner to Advanced with Simple Tips and Tricks to Master All Essential Fundamentals, Formulas, Functions, Charts, Tools, & Shortcuts
Ebook
Excel 2023 for Beginners: A Complete Quick Reference Guide from Beginner to Advanced with Simple Tips and Tricks to Master All Essential Fundamentals, Formulas, Functions, Charts, Tools, & Shortcuts
byTerry R. Hoffmann
Rating: 0 out of 5 stars
0 ratings
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
Ebook
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
byTJ Books
Rating: 3 out of 5 stars
3/5
101 Ready-to-Use Excel Formulas
Ebook
101 Ready-to-Use Excel Formulas
byMichael Alexander
Rating: 4 out of 5 stars
4/5
Enterprise AI For Dummies
Ebook
Enterprise AI For Dummies
byZachary Jarvinen
Rating: 3 out of 5 stars
3/5
The New Email Revolution: Save Time, Make Money, and Write Emails People Actually Want to Read!
Ebook
The New Email Revolution: Save Time, Make Money, and Write Emails People Actually Want to Read!
byRobert W. Bly
Rating: 5 out of 5 stars
5/5
Microsoft Power Platform A Deep Dive: Dig into Power Apps, Power Automate, Power BI, and Power Virtual Agents (English Edition)
Ebook
Microsoft Power Platform A Deep Dive: Dig into Power Apps, Power Automate, Power BI, and Power Virtual Agents (English Edition)
byBijay Kumar Sahoo
Rating: 0 out of 5 stars
0 ratings
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
Excel 2019 Bible
Ebook
Excel 2019 Bible
byMichael Alexander
Rating: 4 out of 5 stars
4/5
Excel Guide for Success
Ebook
Excel Guide for Success
byKevin Pitch
Rating: 5 out of 5 stars
5/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
Excel 2019 For Dummies
Ebook
Excel 2019 For Dummies
byGreg Harvey
Rating: 3 out of 5 stars
3/5
Microsoft Outlook Guide to Success: Learn Smart Email Practices and Calendar Management for a Smooth Workflow [II EDITION]
Ebook
Microsoft Outlook Guide to Success: Learn Smart Email Practices and Calendar Management for a Smooth Workflow [II EDITION]
byKevin Pitch
Rating: 5 out of 5 stars
5/5
QuickBooks 2023 All-in-One For Dummies
Ebook
QuickBooks 2023 All-in-One For Dummies
byStephen L. Nelson
Rating: 0 out of 5 stars
0 ratings
Excel for Beginners 2023: A Step-by-Step and Quick Reference Guide to Master the Fundamentals, Formulas, Functions, & Charts in Excel with Practical Examples | A Complete Excel Shortcuts Cheat Sheet
Ebook
Excel for Beginners 2023: A Step-by-Step and Quick Reference Guide to Master the Fundamentals, Formulas, Functions, & Charts in Excel with Practical Examples | A Complete Excel Shortcuts Cheat Sheet
byJames H. Moyle
Rating: 0 out of 5 stars
0 ratings
Experts' Guide to OneNote
Ebook
Experts' Guide to OneNote
byJeremy P. Jones
Rating: 5 out of 5 stars
5/5
Building Web Services with Microsoft Azure
Ebook
Building Web Services with Microsoft Azure
byAlex Belotserkovskiy
Rating: 0 out of 5 stars
0 ratings
Excel Formulas That Automate Tasks You No Longer Have Time For
Ebook
Excel Formulas That Automate Tasks You No Longer Have Time For
byErik Kopp
Rating: 5 out of 5 stars
5/5
Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program
Ebook
Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program
byJohn Ladley
Rating: 4 out of 5 stars
4/5
50 Useful Excel Functions: Excel Essentials, #3
Ebook
50 Useful Excel Functions: Excel Essentials, #3
byM.L. Humphrey
Rating: 5 out of 5 stars
5/5
QuickBooks Online For Dummies
Ebook
QuickBooks Online For Dummies
byDavid H. Ringstrom
Rating: 0 out of 5 stars
0 ratings
QuickBooks 2021 For Dummies
Ebook
QuickBooks 2021 For Dummies
byStephen L. Nelson
Rating: 0 out of 5 stars
0 ratings
Excel Tips and Tricks
Ebook
Excel Tips and Tricks
byM.L. Humphrey
Rating: 0 out of 5 stars
0 ratings
Learning Microsoft Azure
Ebook
Learning Microsoft Azure
byGeoff Webber-Cross
Rating: 4 out of 5 stars
4/5
Managing Humans: Biting and Humorous Tales of a Software Engineering Manager
Ebook
Managing Humans: Biting and Humorous Tales of a Software Engineering Manager
byMichael Lopp
Rating: 4 out of 5 stars
4/5
The Ridiculously Simple Guide to Google Docs: A Practical Guide to Cloud-Based Word Processing
Ebook
The Ridiculously Simple Guide to Google Docs: A Practical Guide to Cloud-Based Word Processing
byScott La Counte
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
Podcast episode
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
byData Engineering Podcast
0 ratings
0% found this document useful
62: Cracking the Data Code w/ Mike Bugembe: Mike Bugembe is a speaker, consultant, and Amazon best selling author of the book Cracking the Data Code. He joins today’s podcast to talk about the things that you can do that will help create successful analytics projects. After being the...
Podcast episode
62: Cracking the Data Code w/ Mike Bugembe: Mike Bugembe is a speaker, consultant, and Amazon best selling author of the book Cracking the Data Code. He joins today’s podcast to talk about the things that you can do that will help create successful analytics projects. After being the...
byAnalytics on Fire
0 ratings
0% found this document useful
Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams: With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head of products at Starburst which means that he has to spend all of his time thinking and talking about the details of product thinking and its application to data. In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides.
Podcast episode
Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams: With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head of products at Starburst which means that he has to spend all of his time thinking and talking about the details of product thinking and its application to data. In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides.
byData Engineering Podcast
0 ratings
0% found this document useful
Simplifying Data Integration Through Eventual Connectivity - Episode 91: An interview about a new pattern for data integration that reduces the amount of effort required to find connections in numerous data sets
Podcast episode
Simplifying Data Integration Through Eventual Connectivity - Episode 91: An interview about a new pattern for data integration that reduces the amount of effort required to find connections in numerous data sets
byData Engineering Podcast
0 ratings
0% found this document useful
#54 Women in Data Science
Podcast episode
#54 Women in Data Science
byDataFramed
0 ratings
0% found this document useful
Data Visualization with Manuel Lima: Gabi Ferrara and Jon Foust are back today and joined by fellow Googler Manuel Lima.
Podcast episode
Data Visualization with Manuel Lima: Gabi Ferrara and Jon Foust are back today and joined by fellow Googler Manuel Lima.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
An Agile Approach To Master Data Management with Mark Marinelli - Episode 46: Building A Master Data Catalog Using Machine Learning (Interview)
Podcast episode
An Agile Approach To Master Data Management with Mark Marinelli - Episode 46: Building A Master Data Catalog Using Machine Learning (Interview)
byData Engineering Podcast
100%
100% found this document useful
Delivering on the Chief Data Officer Imperatives: A Chief Data Officer (CDO) is expected to use data to continually improve internal operations and create a competitive advantage while aligning with partners, vendors, and customers. But complexities related to data quality, availability, visibility,...
Podcast episode
Delivering on the Chief Data Officer Imperatives: A Chief Data Officer (CDO) is expected to use data to continually improve internal operations and create a competitive advantage while aligning with partners, vendors, and customers. But complexities related to data quality, availability, visibility,...
byCIO Talk Network Podcast
0 ratings
0% found this document useful
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
Podcast episode
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
byPHPRoundtable Podcast
0 ratings
0% found this document useful
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
Podcast episode
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
byData Engineering Podcast
100%
100% found this document useful
108: PySpark - Jonathan Rioux: Apache Spark is a unified analytics engine for large-scale data processing. PySpark blends the powerful Spark big data processing engine with the Python programming language to provide a data analysis platform that can scale up for nearly any task.
Podcast episode
108: PySpark - Jonathan Rioux: Apache Spark is a unified analytics engine for large-scale data processing. PySpark blends the powerful Spark big data processing engine with the Python programming language to provide a data analysis platform that can scale up for nearly any task.
byTest and Code
0 ratings
0% found this document useful
Stryker on How to Connect Data Strategy to Business Value: Modern data leaders know creating a data-informed culture requires cross-functional partnership and collaboration across the entire business. IT by themselves can’t do it. Nor can individual business departments. Both the IT and business strategy must be in lock step to achieve results. On this episode of The Data Chief, Dora Boussias, Senior Director of Data Strategy and Architecture at Stryker, discusses the role of modern data executives, three keys to creating a data-informed culture, and her approach to breaking down silos based on her own 28 years of experience building effective data strategies across industries.
Podcast episode
Stryker on How to Connect Data Strategy to Business Value: Modern data leaders know creating a data-informed culture requires cross-functional partnership and collaboration across the entire business. IT by themselves can’t do it. Nor can individual business departments. Both the IT and business strategy must be in lock step to achieve results. On this episode of The Data Chief, Dora Boussias, Senior Director of Data Strategy and Architecture at Stryker, discusses the role of modern data executives, three keys to creating a data-informed culture, and her approach to breaking down silos based on her own 28 years of experience building effective data strategies across industries.
byThe Data Chief
0 ratings
0% found this document useful
#84 Building High-Impact Data Teams at Capital One
Podcast episode
#84 Building High-Impact Data Teams at Capital One
byDataFramed
0 ratings
0% found this document useful
Build Your Data Analytics Like An Engineer - Episode 81: An interview about how dbt enables your data teams to build better analytics in your data warehouse
Podcast episode
Build Your Data Analytics Like An Engineer - Episode 81: An interview about how dbt enables your data teams to build better analytics in your data warehouse
byData Engineering Podcast
0 ratings
0% found this document useful
Building Data Flows In Apache NiFi With Kevin Doran and Andy LoPresto - Episode 39: Self Service Data Flows With Apache NiFi (Interview)
Podcast episode
Building Data Flows In Apache NiFi With Kevin Doran and Andy LoPresto - Episode 39: Self Service Data Flows With Apache NiFi (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
23: Growing a data community to 200K Followers in 2 years w/ Mike Delgado of Experian: Have you ever thought about starting your own data community? Ever wondered what really goes into it? Data communities are hot and have become the number one source for learning and networking! Our guest today talks to us about exactly how he grew...
Podcast episode
23: Growing a data community to 200K Followers in 2 years w/ Mike Delgado of Experian: Have you ever thought about starting your own data community? Ever wondered what really goes into it? Data communities are hot and have become the number one source for learning and networking! Our guest today talks to us about exactly how he grew...
byAnalytics on Fire
0 ratings
0% found this document useful
GKE Cost Optimization with Kaslin Fields and Anthony Bushong: This week on the podcast, fellow Googlers Kaslin Fields and Anthony Bushong chat with hosts Mark Mirchandani and Stephanie Wong about how to budget and optimize spending with Google Kubernetes Engine.
Podcast episode
GKE Cost Optimization with Kaslin Fields and Anthony Bushong: This week on the podcast, fellow Googlers Kaslin Fields and Anthony Bushong chat with hosts Mark Mirchandani and Stephanie Wong about how to budget and optimize spending with Google Kubernetes Engine.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Data Operations vs. Data Analytics: Are we doing data and analytics correctly? Self service, centralization vs decentralization, analytics vs operations… so many aspects that data teams need to consider. Join this week’s episode of Catalog & Cocktails with hos...
Podcast episode
Data Operations vs. Data Analytics: Are we doing data and analytics correctly? Self service, centralization vs decentralization, analytics vs operations… so many aspects that data teams need to consider. Join this week’s episode of Catalog & Cocktails with hos...
byCatalog & Cocktails: The Honest, No-BS Data Podcast
0 ratings
0% found this document useful
Beam and Spark with Holden Karau: This week our colleague, Holden Karau, joins us to talk about Spark and Beam.
Podcast episode
Beam and Spark with Holden Karau: This week our colleague, Holden Karau, joins us to talk about Spark and Beam.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Managing non-REST APIs like GraphQL and gRPC with Nandan Sridhar and David Feuer: Alexandrina Garcia-Verdin and Stephanie Wong host this week's episode all about managing non-REST APIs.
Podcast episode
Managing non-REST APIs like GraphQL and gRPC with Nandan Sridhar and David Feuer: Alexandrina Garcia-Verdin and Stephanie Wong host this week's episode all about managing non-REST APIs.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
Podcast episode
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
byAnalytics on Fire
0 ratings
0% found this document useful
SnowflakeDB: The Data Warehouse Built For The Cloud - Episode 110: An interview about how SnowflakeDB was built to provide a performant and flexible data platform for the cloud era
Podcast episode
SnowflakeDB: The Data Warehouse Built For The Cloud - Episode 110: An interview about how SnowflakeDB was built to provide a performant and flexible data platform for the cloud era
byData Engineering Podcast
0 ratings
0% found this document useful
#429: [Right Now at AWS] Episode 4 – Developing an IoT solution to keep water flowing for millions: Having a clear vision about the IoT solution needed to keep clean water flowing for millions of peop
Podcast episode
#429: [Right Now at AWS] Episode 4 – Developing an IoT solution to keep water flowing for millions: Having a clear vision about the IoT solution needed to keep clean water flowing for millions of peop
byAWS Podcast
0 ratings
0% found this document useful
Azure Databricks: I sat down with Ali Ghodsi, CEO and found of Databricks, and John Chirapurath, GM for Data Platform Marketing at Microsoft related to the recent announcement of Azure Databricks. When I heard about the announcement, my first thoughts were...
Podcast episode
Azure Databricks: I sat down with Ali Ghodsi, CEO and found of Databricks, and John Chirapurath, GM for Data Platform Marketing at Microsoft related to the recent announcement of Azure Databricks. When I heard about the announcement, my first thoughts were...
byData Skeptic
0 ratings
0% found this document useful
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
Podcast episode
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
byData Engineering Podcast
0 ratings
0% found this document useful
Data Modeling That Evolves With Your Business Using Data Vault - Episode 119: An interview about the data vault method of data modeling and how it simplifies integrating the evolving data sources that you are dealing with in your enterprise data warehouse
Podcast episode
Data Modeling That Evolves With Your Business Using Data Vault - Episode 119: An interview about the data vault method of data modeling and how it simplifies integrating the evolving data sources that you are dealing with in your enterprise data warehouse
byData Engineering Podcast
0 ratings
0% found this document useful
Reframing Data Strategy Alignment: Reframing Data Strategy Alignment
Podcast episode
Reframing Data Strategy Alignment: Reframing Data Strategy Alignment
byInsights Tomorrow
0 ratings
0% found this document useful
Production data labeling workflows: with Mark Christensen, CEO of Xelex.ai
Podcast episode
Production data labeling workflows: with Mark Christensen, CEO of Xelex.ai
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful

Skip carousel

Why Is ELT Better For Cloud Data Warehousing?
Techfastly
Article
Why Is ELT Better For Cloud Data Warehousing?
Apr 1, 2021
2 min read
What is ELT?
Techfastly
Article
What is ELT?
Apr 1, 2021
It stands for extract, load, and transform- the processes a data pipeline uses for replicating the data from a source system into a target system such as a cloud data warehouse. 1. Extraction is the first step in which data is copied from the source
6 min read
Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
2029 VISION Where Technology Is Taking Business
NZBusiness and Management
Article
2029 VISION Where Technology Is Taking Business
May 27, 2019
6 min read
Dvc Framework: Accelerating Digital Value Creation
The European Business Review
Article
Dvc Framework: Accelerating Digital Value Creation
Mar 31, 2020
11 min read
The Future Of The Database
Linux Format
Article
The Future Of The Database
Aug 27, 2019
7 min read
The Three Cornerstones of a Smart Business
Rotman Management
Article
The Three Cornerstones of a Smart Business
Jan 1, 2019
Adaptable Products. Algorithms cannot iterate without the products—the online consumer interface that delivers customer experience directly while gathering consumer feedback to adjust algorithm models. Google’s search bar is a classic example of prod
1 min read
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
How Google Is Making The AI That Powers Its Products Better.
HWM Singapore
Article
How Google Is Making The AI That Powers Its Products Better.
Jun 3, 2019
3 min read
Build A Search And Analytic Engine
Linux Format
Article
Build A Search And Analytic Engine
Mar 10, 2020
7 min read
Leadership Forum: Making Digital Transformation A Reality
Rotman Management
Article
Leadership Forum: Making Digital Transformation A Reality
Jan 1, 2018
Glenda Crisp Senior Vice President and Chief Data Officer, TD Bank Group + Connie Bonello Associate Partner, Financial Services, IBM Canada IN MOST OF TODAY’S ORGANIZATIONS, data underpins every transaction, operation and interaction. And yet, the ab
8 min read
Data-driven Decision Making That Uses Data, Mind And Heart
The European Business Review
Article
Data-driven Decision Making That Uses Data, Mind And Heart
Jan 31, 2020
14 min read
Signals Of Change: how To Evolve For The New Global Reality
Rotman Management
Article
Signals Of Change: how To Evolve For The New Global Reality
May 1, 2022
11 min read
Facilities Systems
Facility Management
Article
Facilities Systems
Oct 21, 2018
5 min read
Putting Artificial Intelligence to Work
Rotman Management
Article
Putting Artificial Intelligence to Work
May 1, 2018
11 min read
TimeXtender HELPS EUROPEAN COMPANIES FROM NUMEROUS INDUSTRIES MANAGE THEIR DATA
The European Business Review
Article
TimeXtender HELPS EUROPEAN COMPANIES FROM NUMEROUS INDUSTRIES MANAGE THEIR DATA
May 25, 2021
3 min read
Extending The Time Equation
The European Business Review
Article
Extending The Time Equation
Jul 26, 2021
4 min read
Inform And Enhance Your Business With Open Data
PC Pro Magazine
Article
Inform And Enhance Your Business With Open Data
Jun 10, 2021
7 min read
Building Trends, Building Momentum
Facility Management
Article
Building Trends, Building Momentum
Oct 14, 2019
3 min read
Four Critical Skills For Tomorrow’s Innovation Workforce
Rotman Management
Article
Four Critical Skills For Tomorrow’s Innovation Workforce
Sep 1, 2020
12 min read
CULTURE SHIFT – An Indispensable Shift To Building An AI-Powered Organisation
Techfastly
Article
CULTURE SHIFT – An Indispensable Shift To Building An AI-Powered Organisation
May 3, 2021
5 min read
Saxo Bank And Thoughtworks: Enabling Data Democratization At A Global Investment Bank
Business Today
Article
Saxo Bank And Thoughtworks: Enabling Data Democratization At A Global Investment Bank
Jan 20, 2023
2 min read
Integrated Workplace Management Systems
Facility Management
Article
Integrated Workplace Management Systems
Dec 23, 2018
Property and facilities management are data-rich operating worlds. This is becoming even more complex as the Internet of Things (IoT) provides the capability to imbed sensors and diagnostic tools to monitor the use and performance of everything in re
4 min read
Leadership Forum: Investing in Disruption
Rotman Management
Article
Leadership Forum: Investing in Disruption
Jan 1, 2019
10 min read
How Are Technology Leaders Using Data and Machine Learning to Help Identify New Business Opportunities?
Techfastly
Article
How Are Technology Leaders Using Data and Machine Learning to Help Identify New Business Opportunities?
Mar 1, 2022
2 min read
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
The European Business Review
Article
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
May 25, 2021
8 min read
Cognitive Enterprise
Techfastly
Article
Cognitive Enterprise
Dec 1, 2021
6 min read
Will Generative AI Disrupt Your Company And Your need For Workers?
The European Business Review
Article
Will Generative AI Disrupt Your Company And Your need For Workers?
Jul 31, 2023
5 min read
The Future Of The Data Economy
The European Business Review
Article
The Future Of The Data Economy
Jun 1, 2022
6 min read
Why Your Organisation Needs To Lift Its Data Game
NZBusiness and Management
Article
Why Your Organisation Needs To Lift Its Data Game
Oct 22, 2019
From problems stemming from the recent New Zealand census to data collected by Facebook, data has been in the news a lot lately. It may seem obvious that large organisations such as Statistics New Zealand and Facebook need to continually improve thei
3 min read

Related categories

Skip carousel

Reviews for Building Big Data Applications

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Building Big Data Applications - Krish Krishnan

Building Big Data Applications

Krish Krishnan

Cover image

Title page

Copyright

Dedication

Preface

1. Big Data introduction

Big Data delivers business value

Big Data applications—processing data

Critical factors for success

Risks and pitfalls

2. Infrastructure and technology

Introduction

Distributed data processing

Big data processing requirements

Technologies for big data processing

MapReduce

MapReduce programming model

MapReduce Google architecture

History

Hadoop core components

NameNode

DataNode

Image

Journal

Checkpoint

HDFS startup

Block allocation and storage

HDFS client

Replication and recovery

NameNode and DataNode—communication and management

Heartbeats

CheckPointNode and BackupNode

CheckPointNode

BackupNode

Filesystem snapshots

YARN scalability

YARN execution flow

Zookeeper features

Locks and processing

Failure and recovery

Programming with Pig Latin

Pig data types

Running Pig programs

Pig program flow

Common Pig command

HBASE architecture

HBASE architecture implementation

Hive architecture

Execution—how does Hive process queries?

Hive data types

Hive examples

HCatalog

CAP theorem

A keyspace has configurable properties that are critical to understand

Cassandra ring architecture

The design features of document-oriented databases include the following:

3. Building big data applications

Data storyboard

4. Scientific research applications and usage

Accelerators

Big data platform and application

XRootD filesystem interface project

Service for web-based analysis (SWAN)

The result—Higgs Boson discovery

5. Pharmacy industry applications and usage

The complexity design for data applications

Complexities in transformation of data

Google deep mind

Case study

6. Visualization, storyboarding and applications

Let us look at some of the use cases of big data applications

Visualization

The evolving role of the data scientist

7. Banking industry applications and usage

The coming of age with uber banking

The use cases of analytics and big data applications in banking today

Fraud and compliance tracking

Client chatbots for call center

Antimoney laundering detection

Algorithmic trading

Recommendation engines

8. Travel and tourism industry applications and usage

Travel and big data

Real-time conversion optimization

Optimized disruption management

Niche targeting and unique selling propositions

Smart social media listening and sentiment analysis

Hospitality industry and big data

Analytics and travel industry

Examples of the use of predictive analytics

Develop applications using data and agile API

9. Governance

Definition

Metadata and master data

Master data

Data management in big data infrastructure

Processing complexity of big data

Processing limitations

Governance model for building an application

Use cases of governance

10. Building the big data application

Risk assessment questions

Business continuity management

11. Data discovery and connectivity

Challenges before you start with AI

Strategies you can follow to start with AI

Compliance and regulations

Use cases from industry vendors

Index

Copyright

Academic Press is an imprint of Elsevier

125 London Wall, London EC2Y 5AS, United Kingdom

525 B Street, Suite 1650, San Diego, CA 92101, United States

50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States

The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom

No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

Notices

Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

Library of Congress Cataloging-in-Publication Data

A catalog record for this book is available from the Library of Congress

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library

ISBN: 978-0-12-815746-6

For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals

Publisher: Mara Conner

Acquisition Editor: Mara Conner

Editorial Project Manager: Joanna Collett

Production Project Manager: Punithavathy Govindaradjane

Cover Designer: Mark Rogers

Typeset by TNQ Technologies

Dedication

Dedicated to all my teachers

Preface

In the world that we live in today it is very easy to manifest and analyze data at any given instance. Space a very insightful analytics is worth every executive's time to make decisions that impact the organization today and tomorrow. Space this analytics is what we call Big Data analytics since the year 2010, and our teams have been struggling to understand how to integrate data with the right metadata and master data in order to produce a meaningful platform that can be used to produce these insightful analytics.

Not only is the commercial space interested in this we also have scientific research and engineering teams very much wanting to study the data and build applications on top off at. The effort's taken to produce Big Data applications have been sporadic when measured in terms of success why is that a question that is being asked by folks across the industry. In my experience of working in this specific space, what I have realized is that we are still working with data which is lost in terms of volumes come on and it is produced very fast on demand by any consumer leading to metadata integration issues. This metadata integration issue can be handled if we make it an enterprise solution, and all renters in the space need not necessarily worry about their integration with a Big Data platform. This integration is handled through integration tools that have been built for data integration and transformation. Another interesting perspective is that while the data is voluminous and it is produced very fast it can be integrated and harvested as any enterprise data segment. We require the new data architecture to be flexible, and scalable to accommodate new additions, updates, and integrations in order to be successful in building a foundation platform. This data architecture will differ from the third normal and star schema forms that we built the data warehouse from. The new architecture will require more integration and just in time additions which are more represented by NoSQL database architecture's and how architectures do. How do we get this go to success factor? And how do we make the enterprise realize that new approaches are needed to ensure success and accomplishing the tipping point on a successful implementation.

Our executives are always known for asking questions about the lineage of data and its traceability. These questions today can be handled in the data architecture and engineering provided we as an enterprise take a few minutes to step back and analyze why our past journeys journeys were not successful enough, and how we can be impactful in the future journey delivering the Big Data application. The hidden secret here is resting in the farm off governance within the enterprise. Governance, it is not about measuring people it is about ensuring that all processes have been followed and completed as requirements and that all specifics are in place for delivering on demand lineage and traceability.

In writing this book there are specific points that have been discussed about the architecture and governance required to ensure success in Big Data applications. The goal of the book is to share the secrets that have been leveraged by different segments of people in their big data application projects and the risks that they had to overcome to become successful.

The chapters in the book present different types of scenarios that we all encounter, and in this process the goals of reproducibility and repeatability for ensuring experimental success has been demonstrated. If you ever wondered what the foundational difference in building a Big Data application is the foundational difference is that the datasets can be harvested and an experimental stage can be repeated if all of the steps are documented and implemented as specified into requirements. Any team that wants to become successful in the new world needs to remember that we have to follow governance and implement governance in order to become measurable. Measuring process completion is mandatory to become successful and as you read it in the book revisit this point and draw the highlights from.

In developing this book there are several discussions that I have had with teams from both commercial enterprises as well as research organizations and thank all contributors for that time and insights and sharing the endeavors, it did take time to ensure that all the relevant people across these teams were sought out and tipping point of failure what discussed in order to understand the risks that could be identified and avoided in the journey. There are several reference points that has been added to chapters and while the book is not all encompassing by any means it does provide any team that wants to understand how to build a Big Data application choices of how success can be accomplished as well as case studies that vendors have shared showcasing how companies have implemented technologies to build the final solution.

I thank all vendors who provided material for the book and in particular IO-Tahoe, Teradata, and Kinetica for access to teams to discuss the case studies.

I thank my entire editorial and publishing team at Elsevier publishing for their continued support in this journey for their patience and support in ensuring completion of this book is what is in your hands today.

Last but not the least, I thank my wife and our two sons for the continued inspiration and motivation for me to write. Your love and support is a motivation.

1

Big Data introduction

Abstract

This chapter presents an introduction to Big Data. The world we live in today is flooded with data. It delivers business value and ranges from personal care to beauty, healthily eating, clothing, perfumes, watches, jewelry, medicine, travel, tours, and investments. Big Data Applications are the answer to leveraging the analytics from complex events and getting the articulate insights for the enterprise. We should define a metadata-driven architecture to integrate the data for creating analytics. More opportunities exist in terms of space exploration, smart cars and trucks, and new forays into energy research as well as the smart wearable devices and devices for pet monitoring, remote communications, healthcare monitoring, sports training, and many other innovations.

Keywords

Analytics; Big Data; Hadoop technology; Healthcare monitoring; Remote communications; SAP

This chapter will be a brief introduction to Big Data, providing readers the history, where are we today, and the future of data. The reader will get a refresher view of the topic.

The world we live in today is flooded with data all around us, produced at rates that we have not experienced, and analyzed for usage at rates that we have heard as requirements before and now can fulfill the request. What is the phenomenon called as Big Data and how has it transformed our lives today? Let us take a look back at history, in 2001 when Doug Laney was working with Meta Group, he forecasted a trend that will create a new wave of innovation and articulated that the trend will be driven by the three V's namely volume, velocity, and variety of data. In the continuum in 2009, he wrote the first premise on how Big Data as the term was coined by him will impact the lives of all consumers using it. A more radical rush was seen in the industry with the embracement of Hadoop technology and followed by NoSQL technologies of different varieties, ultimately driving the evolution of new data visualization, analytics, storyboarding,and storytelling.

In a lighter vein, SAP published a cartoon which read the four words that Big Data brings —Make Me More Money

This is the confusion we need to steer clear of and be ready to understand how to monetize from Big Data.

First to understand how to build applications with Big Data, we need to look at Big Data from both the technology and data perspectives.

Big Data delivers business value

The e-Commerce market has shaped businesses around the world into a competitive platform where we can sell and buy what we need based on costs, quality, and preference. The spread of services ranges from personal care, beauty, healthily eating, clothing, perfumes, watches, jewelry, medicine, travel, tours, investments, and the list goes on. All of this activity has resulted in data of various formats, sizes, languages, symbols, currencies, volumes, and additional metadata which we collectivity today call as Big Data. The phenomenon has driven unprecedented value to business and can deliver insights like never before.

The business value did not and does not stop here; we are seeing the use of the same techniques of Big Data processing across insurance, healthcare, research, physics, cancer treatment, fraud analytics, manufacturing, retail, banking, mortgage, and more. The biggest question is how to realize the value repeatedly? What formula will bring success and value, how to monetize from the effort?

Take a step back for a moment and assess the same question with investments that has been made into a Salesforce or Unica or Endeca implementation and the business value that you can drive from the same. Chances are you will not have an accurate picture of the amount of return on investmentor the percentage of impact in terms of increased revenue or decreased spendor process optimization percentages from any such prior experiences. Not that your teams did not measure the impact, but they are unsure of expressing the actual benefit into quantified metrics. But in the case of a Big Data implementation, there are techniques to establish a quantified measurement strategy and associate the overall program with such cost benefits and process optimizations.

The interesting question to ask is what are organizations doing with Big Data? Are they collecting it, studying it, and working with it for advanced analytics? How exactly does the puzzle called Big Data fit into an organization's strategy and how does it enhance corporate decision-making?

To understand this picture better there are some key questions to think about and these are a few you can add more to this list:

• How many days does it take on an average to get answers to the question why?

• How many cycles of research does the organization do for understanding the market, competition, sales, employee performance, and customer satisfaction?

• Can your organization provide an executive dashboard along the ZachmanFramework model to provide insights and business answers on who, what, where, when, and how?

• Can we have a low code application that will be orchestrated with a workflow and can provide metrics and indicators on key processes?

• Do you have volumes of data but have no idea how to use it or do not collect it at all?

• Do you have issues with historical analysis?

• Do you experience issues with how to replay events? Simple or complex events?

The focus of answering these questions through the eyes of data is very essential and there is an abundance of data that any organization has today and there is a lot of hidden data or information in these nuggets that have to be harvested. Consider the following data:

• Traditional business systems—ERP, SCM, CRM, SFA

• Content management platforms

• Portals

• Websites

• Third-party agency data

• Data collected from social media

• Statistical data

• Research and competitive analysis data

• Point of sale data—retail or web channel

• Legal contracts

• Emails

If you observe a pattern here there is data about customers, products, services, sentiments, competition, compliance, and much more available. The question is does the organization leverage all the data that is listed here? And more important is the question, can you access all this data at relative ease and implement decisions? This is where the platforms and analytics of Big Data come into the picture within the enterprise. From the data nuggets that we have described 50% of them or more are internal systems and data producers that have been used for gathering data but not harnessing analytical value (the data here is structured, semistructured, and unstructured), the other 50% or less is the new data that is called Big Data (web data, machine data, and sensor data).

Big Data Applications are the answer to leveraging the analytics from complex events and getting the articulate insights for the enterprise. Consider the following example:

• Call center optimization—The worst fear of a customer is to deal with the call center. The fundamental frustration for the customer is the need to explain all the details about their transactions with the company they are calling, the current situation, and what they are expecting for a resolution, not once but many times (in most cases) to many people and maybe in more than one conversation. All of this frustration can be vented on their Facebook page or Twitter or a social media blog, causing multiple issues

• They will have an influence in their personal network that will cause potential attrition of prospects and customers

• Their frustration maybe shared by many others and eventually result in class action lawsuits

• Their frustration will provide an opportunity for the competition to pursue and sway customers and prospects

• All of these actions lead to one factor called as revenue loss.If this company continues to persist with poor quality of service, eventually the losses will be large and even leading to closure of business and loss of brand reputation. It is in situations like this where you can find a lot of knowledge in connecting the dots with data and create a powerful set of analytics to drive business transformation. Business transformation does not mean you need to change your operating model but rather it provides opportunities to create new service models created on data driven decisions and analytics.

The company that we are discussing here, let us assume,decides that the current solution needs an overhaul and the customer needs to be provided the best quality of service, it will need to have the following types of data ready for analysis and usage:

• Customer profile, lifetime value, transactional history, segmentation models, social profiles (if provided)

• Customer sentiments, survey feedback, call center interactions

• Product analytics

• Competitive research

• Contracts and agreements—customer specific

We should define a metadata-driven architecture to integrate the data for creating these analytics. There is a nuance of selecting the right technology and architecture for the physical deployment. A few days later the customer calls for support, the call center agent is now having a mash-up showing different types of analytics presented to them. The agent is able to ask the customer-guided questions on the current call and apprise them of the solutions and timelines, rather than ask for information; they are providing a knowledge service. In this situation the customer feels more privileged and even if there are issues with the service or product, the customer will not likely attrite. Furthermore, the same customer now can share positive feedback and report their satisfaction, thus creating a potential opportunity for more revenue. The agent feels more empowered and can start having conversations on cross-sell and up-sell opportunities. In this situation, there is a likelihood of additional revenue and diminished opportunities for loss of revenue. This is the type of business opportunities that Big Data analytics (internal and external) will bring to the organization, in addition to improving efficiencies, creating optimizations, and reducing risks and overall costs. There is some initial investment spent involved in creating this data strategy, architecture, and implementing additional technology solutions. The returnon investment will offset these costs and even save on license costs from technologies that may be retired post the new solution.

We see the absolute clarity that can be leveraged from an implementation of the Big Data–driven call center, which will provide the customer with confidence, the call center associate with clarity, the enterprise with fine details including competition, noise, campaigns, social media presence, the ability to see what customers in the same age group and location are sharing, similar calls, and results. All of this can be easily accomplished if we set the right strategy in motion for implementing Big Data applications. This requires us to understand the underlying infrastructure and how to leverage them for the implementation. This is the next segment of this chapter.

Healthcare example

In the past few years, a significant debate has emerged around healthcare and its costs. There are almost 80 million baby boomers approaching retirement, and economists forecast this trend will likely bankrupt Medicare and Medicaid in the near future. While healthcare reform and its new laws have ignited a number of important changes, the core issues are not resolved. It's critical we fix our system now, or else our $2.6 trillion in annual healthcare spending will grow to $4.6 trillion by 2020—one-fifth of our gross domestic product.

Data-rich and information-poor

Healthcare has always been datarich. Medicine has developed so quickly in the past 30 years that along with preventive and diagnostic developments, we have generated a lot of data: clinical trials, doctors' notes, patient therapies, pharmacists' notes, medical literature and, most importantly, structured analysis of the data sets in analytical models.

On the payer side, while insurance rates are skyrocketing, insurance companies are trying hard to vie for wallet share. However, you cannot ignore the strong influence of social media.

On the provider side, the small number of physicians and specialists available versus the growing need for them is becoming a larger problem. Additionally, obtaining second and third expert opinions for any situation to avoid medical malpractice lawsuits has created a need for sharing knowledge and seeking advice. At the same time, however, there are several laws being passed to protect patient privacy and data security.

On the therapy side, there are several smart machines capable of sending readings to multiple receivers, including doctors' mobile phones. We have become successful in reducing or eliminating latencies and have many treatment alternatives, but we do not know where best to apply them. Treatments that can work well for some, do not work well for others. We do not have statistics that can point to successful interventions, show which patients benefited from them, or predict how and where to apply them in a suggestion or recommendation to a physician.

There is a lot of data available, but not all of it is being harnessed into powerful information. Clearly, healthcare remains one of our nation's datarich, yet information-poor industries. It is clear that we must start producing better information, at a faster rate and on a larger scale.

Before cost reductions and meaningful improvements in outcomes can be delivered, relevant information is necessary. The challenge is that while the data is available today, the systems to harness it have not been available.

Big Data and healthcare

Big Data is information that is both traditionally available (doctors' notes, clinical trials, insurance claims data, and drug information), plus new data generated from social media, forums, and hosted sites (for example, WebMD) along with machine data. In healthcare, there are three characteristics of Big Data:

1. Volume: The data sizes are varied and range from megabytes to multiple terabytes

2. Velocity: The data production by machines, doctors' notes, nurses' notes, and clinical trials are all produced at different speeds and are highly unpredictable

3. Variety: The data is available or produced in a variety of formats but not all formats are based on similar standards

Over the past 5 years, there have been a number of technology innovations to handle Web 2.0-based data

Enjoying the preview?

Page 1 of 1

Building Big Data Applications

About this ebook

Krish Krishnan

Read more from Krish Krishnan

Related authors

Related to Building Big Data Applications

Related ebooks

Enterprise Applications For You

Related podcast episodes

Related articles

Related categories

Reviews for Building Big Data Applications

What did you think?

Book preview

Building Big Data Applications - Krish Krishnan

Table of Contents

Copyright

Preface

1

Big Data introduction

Abstract

Keywords

Analytics; Big Data; Hadoop technology; Healthcare monitoring; Remote communications; SAP

Big Data delivers business value

Healthcare example

Data-rich and information-poor

Big Data and healthcare