Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark
By Butch Quinto
()
About this ebook
Utilize this practical and easy-to-follow guide to modernize traditional enterprise data warehouse and business intelligence environments with next-generation big data technologies.
Next-Generation Big Data takes a holistic approach, covering the most important aspects of modern enterprise big data. The book covers not only the main technology stack but also the next-generation tools and applications used for big data warehousing, data warehouse optimization, real-time and batch data ingestion and processing, real-time data visualization, big data governance, data wrangling, big data cloud deployments, and distributed in-memory big data computing. Finally, the book has an extensive and detailed coverage of big data case studies from Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and Mastercard.
What You’ll Learn
- Install Apache Kudu, Impala, and Spark to modernize enterprise data warehouse and business intelligence environments, complete with real-world, easy-to-follow examples, and practical advice
- Integrate HBase, Solr, Oracle, SQL Server, MySQL, Flume, Kafka, HDFS, and Amazon S3 with Apache Kudu, Impala, and Spark
- Use StreamSets, Talend, Pentaho, and CDAP for real-time and batch data ingestion and processing
- Utilize Trifacta, Alteryx, and Datameer for data wrangling and interactive data processing
- Turbocharge Spark with Alluxio, a distributed in-memory storage platform
- Deploy big data in the cloud using Cloudera Director
- Perform real-time data visualization and time series analysis using Zoomdata, Apache Kudu, Impala, and Spark
- Understand enterprise big data topics such as big data governance, metadata management, data lineage, impact analysis, and policy enforcement, and how to use Cloudera Navigator to perform common data governance tasks
- Implement big data use cases such as big data warehousing, data warehouse optimization, Internet of Things, real-time data ingestion and analytics, complex event processing, and scalable predictive modeling
- Study real-world big data case studies from innovative companies, including Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and Mastercard
BI and big data warehouse professionals interested in gaining practical and real-world insight into next-generation big data processing and analytics using Apache Kudu, Impala, and Spark; and those who want to learn more about other advanced enterprise topics
Related to Next-Generation Big Data
Related ebooks
Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library Rating: 0 out of 5 stars0 ratingsNext-Generation Machine Learning with Spark: Covers XGBoost, LightGBM, Spark NLP, Distributed Deep Learning with Keras, and More Rating: 0 out of 5 stars0 ratingsRed Hat and IT Security: With Red Hat Ansible, Red Hat OpenShift, and Red Hat Security Auditing Rating: 0 out of 5 stars0 ratingsSplunk Best Practices Rating: 0 out of 5 stars0 ratingsFoundations of Libvirt Development: How to Set Up and Maintain a Virtual Machine Environment with Python Rating: 0 out of 5 stars0 ratingsScalable Big Data Architecture: A practitioners guide to choosing relevant Big Data architecture Rating: 0 out of 5 stars0 ratingsScaling Big Data with Hadoop and Solr - Second Edition Rating: 0 out of 5 stars0 ratingsArchitecting and Operating OpenShift Clusters: OpenShift for Infrastructure and Operations Teams Rating: 0 out of 5 stars0 ratingsPractical Enterprise Data Lake Insights: Handle Data-Driven Challenges in an Enterprise Big Data Lake Rating: 0 out of 5 stars0 ratingsPractical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets Rating: 0 out of 5 stars0 ratingsMastering Scala Machine Learning Rating: 0 out of 5 stars0 ratingsEffective Data Science Infrastructure: How to make data scientists productive Rating: 0 out of 5 stars0 ratingsMicroservices for the Enterprise: Designing, Developing, and Deploying Rating: 0 out of 5 stars0 ratingsBeginning Laravel: Build Websites with Laravel 5.8 Rating: 0 out of 5 stars0 ratingsPro Machine Learning Algorithms: A Hands-On Approach to Implementing Algorithms in Python and R Rating: 0 out of 5 stars0 ratingsSQL Server Big Data Clusters: Data Virtualization, Data Lake, and AI Platform Rating: 0 out of 5 stars0 ratingsHands-on Scikit-Learn for Machine Learning Applications: Data Science Fundamentals with Python Rating: 0 out of 5 stars0 ratingsData Lake Analytics on Microsoft Azure: A Practitioner's Guide to Big Data Engineering Rating: 0 out of 5 stars0 ratingsData Science Solutions with Python: Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn Rating: 0 out of 5 stars0 ratingsPractical Machine Learning for Streaming Data with Python: Design, Develop, and Validate Online Learning Models Rating: 0 out of 5 stars0 ratingsData Science Revealed: With Feature Engineering, Data Visualization, Pipeline Development, and Hyperparameter Tuning Rating: 0 out of 5 stars0 ratingsDeep Belief Nets in C++ and CUDA C: Volume 1: Restricted Boltzmann Machines and Supervised Feedforward Networks Rating: 0 out of 5 stars0 ratingsBeginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library Rating: 0 out of 5 stars0 ratingsData Science Fundamentals for Python and MongoDB Rating: 0 out of 5 stars0 ratingsPython Data Analysis Rating: 4 out of 5 stars4/5Splunk Developer's Guide Rating: 0 out of 5 stars0 ratingsData Science Solutions on Azure: Tools and Techniques Using Databricks and MLOps Rating: 0 out of 5 stars0 ratingsData Lake Development with Big Data Rating: 0 out of 5 stars0 ratingsEssential Computer Science: A Programmer’s Guide to Foundational Concepts Rating: 0 out of 5 stars0 ratingsPro Oracle Database 18c Administration: Manage and Safeguard Your Organization’s Data Rating: 0 out of 5 stars0 ratings
Databases For You
Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5SQL Clearly Explained Rating: 5 out of 5 stars5/5Serverless Architectures on AWS, Second Edition Rating: 5 out of 5 stars5/5Advanced Analytics in Power BI with R and Python: Ingesting, Transforming, Visualizing Rating: 0 out of 5 stars0 ratingsJoe Celko's SQL Programming Style Rating: 4 out of 5 stars4/5Codeless Data Structures and Algorithms: Learn DSA Without Writing a Single Line of Code Rating: 0 out of 5 stars0 ratingsLearn SQL in 24 Hours Rating: 5 out of 5 stars5/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Python Projects for Everyone Rating: 0 out of 5 stars0 ratingsGo in Action Rating: 5 out of 5 stars5/5Practical Data Analysis Rating: 4 out of 5 stars4/5Mastering the Microsoft Deployment Toolkit Rating: 0 out of 5 stars0 ratingsSQL Server: Tips and Tricks - 2 Rating: 4 out of 5 stars4/5Access 2016 For Dummies Rating: 0 out of 5 stars0 ratingsCOMPUTER SCIENCE FOR ROOKIES Rating: 0 out of 5 stars0 ratingsA Concise Guide to Object Orientated Programming Rating: 0 out of 5 stars0 ratingsAccess 2010 All-in-One For Dummies Rating: 4 out of 5 stars4/5Access 2019 For Dummies Rating: 0 out of 5 stars0 ratingsBeginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics Rating: 0 out of 5 stars0 ratingsBig Data Forensics – Learning Hadoop Investigations Rating: 0 out of 5 stars0 ratingsThe Visual Imperative: Creating a Visual Culture of Data Discovery Rating: 4 out of 5 stars4/5Data Science Strategy For Dummies Rating: 0 out of 5 stars0 ratingsVisualizing Graph Data Rating: 0 out of 5 stars0 ratingsThe SQL Workshop: Learn to create, manipulate and secure data and manage relational databases with SQL Rating: 0 out of 5 stars0 ratingsPython and SQLite Development Rating: 0 out of 5 stars0 ratingsBusiness Intelligence Guidebook: From Data Integration to Analytics Rating: 4 out of 5 stars4/5Data Mining: Concepts and Techniques Rating: 4 out of 5 stars4/5Implementing Cloud Design Patterns for AWS Rating: 0 out of 5 stars0 ratingsMeasuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework Rating: 5 out of 5 stars5/5
Reviews for Next-Generation Big Data
0 ratings0 reviews