Fast Data Processing Systems with SMACK Stack

Ebook760 pages7 hours

Fast Data Processing Systems with SMACK Stack

Name: Fast Data Processing Systems with SMACK Stack
Author: Raúl Estrada
ISBN: 9781786468062

By Raúl Estrada

Rating: 0 out of 5 stars

()

Read preview

About this ebook

If you are a developer, data architect, or a data scientist looking for information on how to integrate the Big Data stack architecture and how to choose the correct technology in every layer, this book is what you are looking for.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateDec 22, 2016

ISBN9781786468062

Author

Raúl Estrada

Related authors

Skip carousel

Related to Fast Data Processing Systems with SMACK Stack

Related ebooks

Skip carousel

AWS Certified Database Study Guide: Specialty (DBS-C01) Exam
Ebook
AWS Certified Database Study Guide: Specialty (DBS-C01) Exam
byMatheus Arrais
Rating: 0 out of 5 stars
0 ratings
Oracle 10g/11g Data and Database Management Utilities
Ebook
Oracle 10g/11g Data and Database Management Utilities
byHector R. Madrid
Rating: 0 out of 5 stars
0 ratings
Learning Apache Cassandra
Ebook
Learning Apache Cassandra
byMat Brown
Rating: 0 out of 5 stars
0 ratings
Mastering Apache Cassandra - Second Edition
Ebook
Mastering Apache Cassandra - Second Edition
byNishant Neeraj
Rating: 0 out of 5 stars
0 ratings
Oracle 11g Streams Implementer's Guide
Ebook
Oracle 11g Streams Implementer's Guide
byAnn L. R. McKinnell
Rating: 0 out of 5 stars
0 ratings
OpenStack Sahara Essentials
Ebook
OpenStack Sahara Essentials
byOmar Khedher
Rating: 0 out of 5 stars
0 ratings
IBM InfoSphere Replication Server and Data Event Publisher
Ebook
IBM InfoSphere Replication Server and Data Event Publisher
byPav Kumar-Chatterjee
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics
Ebook
Big Data Analytics
byVenkat Ankam
Rating: 0 out of 5 stars
0 ratings
Elasticsearch 8 for Developers - 2nd Edition: A beginner's guide to indexing, analyzing, searching, and aggregating data (English Edition)
Ebook
Elasticsearch 8 for Developers - 2nd Edition: A beginner's guide to indexing, analyzing, searching, and aggregating data (English Edition)
byAnurag Srivastava
Rating: 0 out of 5 stars
0 ratings
SQL and NoSQL Interview Questions: Your essential guide to acing SQL and NoSQL job interviews (English Edition)
Ebook
SQL and NoSQL Interview Questions: Your essential guide to acing SQL and NoSQL job interviews (English Edition)
byVishwanathan Narayanan
Rating: 0 out of 5 stars
0 ratings
SOA Governance in Action: REST and WS-* Architectures
Ebook
SOA Governance in Action: REST and WS-* Architectures
byJos Dirksen
Rating: 0 out of 5 stars
0 ratings
Kubernetes Secrets Management
Ebook
Kubernetes Secrets Management
byAlex Soto Bueno
Rating: 0 out of 5 stars
0 ratings
Azure SQL Data Warehouse A Complete Guide - 2020 Edition
Ebook
Azure SQL Data Warehouse A Complete Guide - 2020 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Nginx Troubleshooting
Ebook
Nginx Troubleshooting
byAlex Kapranoff
Rating: 0 out of 5 stars
0 ratings
Location-Aware Applications
Ebook
Location-Aware Applications
byRichard Ferraro
Rating: 0 out of 5 stars
0 ratings
Elasticsearch Blueprints
Ebook
Elasticsearch Blueprints
byVineeth Mohan
Rating: 0 out of 5 stars
0 ratings
Apache Solr Search Patterns
Ebook
Apache Solr Search Patterns
byJayant Kumar
Rating: 0 out of 5 stars
0 ratings
Professional ASP.NET MVC 5
Ebook
Professional ASP.NET MVC 5
byJon Galloway
Rating: 0 out of 5 stars
0 ratings
Lucene 4 Cookbook
Ebook
Lucene 4 Cookbook
byEdwood Ng
Rating: 0 out of 5 stars
0 ratings
Spark GraphX in Action
Ebook
Spark GraphX in Action
byMichael Malak
Rating: 0 out of 5 stars
0 ratings
Hadoop in Practice
Ebook
Hadoop in Practice
byAlex Holmes
Rating: 0 out of 5 stars
0 ratings
IPv6 Complete Self-Assessment Guide
Ebook
IPv6 Complete Self-Assessment Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Implementing Cloud Design Patterns for AWS
Ebook
Implementing Cloud Design Patterns for AWS
byMarcus Young
Rating: 0 out of 5 stars
0 ratings
Understanding Azure Monitoring: Includes IaaS and PaaS Scenarios
Ebook
Understanding Azure Monitoring: Includes IaaS and PaaS Scenarios
byBapi Chakraborty
Rating: 0 out of 5 stars
0 ratings
GeoServer Cookbook
Ebook
GeoServer Cookbook
byStefano Iacovella
Rating: 0 out of 5 stars
0 ratings
AWS Certified SysOps Administrator Official Study Guide: Associate Exam
Ebook
AWS Certified SysOps Administrator Official Study Guide: Associate Exam
byStephen Cole
Rating: 0 out of 5 stars
0 ratings
Cassandra Design Patterns - Second Edition
Ebook
Cassandra Design Patterns - Second Edition
byThottuvaikkatumana Rajanarayanan
Rating: 0 out of 5 stars
0 ratings
SOA for the Business Developer: Concepts, BPEL, and SCA
Ebook
SOA for the Business Developer: Concepts, BPEL, and SCA
byBen Margolis
Rating: 0 out of 5 stars
0 ratings
Schematron: A language for validating XML
Ebook
Schematron: A language for validating XML
byErik Siegel
Rating: 0 out of 5 stars
0 ratings
PostgreSQL 9 Administration Cookbook: LITE Edition
Ebook
PostgreSQL 9 Administration Cookbook: LITE Edition
bySimon Riggs
Rating: 3 out of 5 stars
3/5

Computers For You

Skip carousel

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
CompTIA Security+ Practice Questions
Ebook
CompTIA Security+ Practice Questions
byIP Specialist
Rating: 2 out of 5 stars
2/5
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 0 out of 5 stars
0 ratings
Network+ Study Guide & Practice Exams
Ebook
Network+ Study Guide & Practice Exams
byRobert Shimonski
Rating: 4 out of 5 stars
4/5
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
Ebook
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
byRizwan Virk
Rating: 5 out of 5 stars
5/5
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
Ebook
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
byTriumph Books
Rating: 5 out of 5 stars
5/5
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
Practical Lock Picking: A Physical Penetration Tester's Training Guide
Ebook
Practical Lock Picking: A Physical Penetration Tester's Training Guide
byDeviant Ollam
Rating: 5 out of 5 stars
5/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
AP Computer Science Principles Premium, 2024: 6 Practice Tests + Comprehensive Review + Online Practice
Ebook
AP Computer Science Principles Premium, 2024: 6 Practice Tests + Comprehensive Review + Online Practice
bySeth Reichelson
Rating: 0 out of 5 stars
0 ratings
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance
Ebook
Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance
byKatherine Johnson Martinko
Rating: 0 out of 5 stars
0 ratings
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
Summary of Dotcom Secrets: by Russell Brunson - The Underground Playbook for Growing Your Company Online with Sales Funnels - A Comprehensive Summary
Ebook
Summary of Dotcom Secrets: by Russell Brunson - The Underground Playbook for Growing Your Company Online with Sales Funnels - A Comprehensive Summary
byAlexander Cooper
Rating: 5 out of 5 stars
5/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Master Builder Roblox: The Essential Guide
Ebook
Master Builder Roblox: The Essential Guide
byTriumph Books
Rating: 4 out of 5 stars
4/5
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
Ebook
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
byDavid Kadavy
Rating: 5 out of 5 stars
5/5
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
Ebook
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
byDexter Jackson
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Spanner Myths Busted with Pritam Shah and Vaibhav Govil: This week, we’re busting myths around Google Cloud Spanner with our guests Pritam Shah and Vaibhav Govil. and host this episode and learn about the fantastic capabilities of Cloud Spanner. Our guests give us a quick run-down of Spanner database...
Podcast episode
Spanner Myths Busted with Pritam Shah and Vaibhav Govil: This week, we’re busting myths around Google Cloud Spanner with our guests Pritam Shah and Vaibhav Govil. and host this episode and learn about the fantastic capabilities of Cloud Spanner. Our guests give us a quick run-down of Spanner database...
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
235: Pair programming with Ben Orenstein & Tuple: In this episode, Kaushik goes solo and interviews Ben Orenstein. Ben is a prolific Ruby developer, an amazing conference speaker, an ardent vim-ster, and now the CEO of Tuple. Kaushik has been a big fan of Ben's work and was super stoked to talk to Ben and pick his brains on a host of topics: starting the company Tuple, pair programming in general, learning different programming languages and technology, giving better conference talks and more! This episode is chock full of wisdom from Ben. Enjoy!
Podcast episode
235: Pair programming with Ben Orenstein & Tuple: In this episode, Kaushik goes solo and interviews Ben Orenstein. Ben is a prolific Ruby developer, an amazing conference speaker, an ardent vim-ster, and now the CEO of Tuple. Kaushik has been a big fan of Ben's work and was super stoked to talk to Ben and pick his brains on a host of topics: starting the company Tuple, pair programming in general, learning different programming languages and technology, giving better conference talks and more! This episode is chock full of wisdom from Ben. Enjoy!
byFragmented - An Android Developer Podcast
0 ratings
0% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
Podcast episode
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
byData Engineering Podcast
0 ratings
0% found this document useful
#456: Data Architectures with AWS Hero Elliott Cordo: AWS Data Hero and Head of Data at Capsule, Elliott Cordo, has built many ground-up data architecture
Podcast episode
#456: Data Architectures with AWS Hero Elliott Cordo: AWS Data Hero and Head of Data at Capsule, Elliott Cordo, has built many ground-up data architecture
byAWS Podcast
0 ratings
0% found this document useful
DevOps Cafe Ep. 49 - Brandon Burton: John and Damon catch-up with Brandon Burton (Mozilla) to talk about his DevOps journey, the upcoming SCALE conference, and how DevOps patterns are really Lean patterns and apply everywhere. Show notes at http://devopscafe.org
Podcast episode
DevOps Cafe Ep. 49 - Brandon Burton: John and Damon catch-up with Brandon Burton (Mozilla) to talk about his DevOps journey, the upcoming SCALE conference, and how DevOps patterns are really Lean patterns and apply everywhere. Show notes at http://devopscafe.org
byDevOps Cafe Podcast
0 ratings
0% found this document useful
Lessons Learned from Cloud Foundry
Podcast episode
Lessons Learned from Cloud Foundry
byThe Cloudcast
0 ratings
0% found this document useful
EP 01: The Best of SpringOne 2021 (ft. Dan Vega)
Podcast episode
EP 01: The Best of SpringOne 2021 (ft. Dan Vega)
byPro Coder Show
0 ratings
0% found this document useful
Open Source at Google Cloud Platform with Sarah Novotny: Mark and Melanie are joined by Sarah Novotny, Head of Open Source Strategy for GCP, to talk all about Open Source, the Cloud Native Compute Foundation & their relationships to Google Cloud Platform.
Podcast episode
Open Source at Google Cloud Platform with Sarah Novotny: Mark and Melanie are joined by Sarah Novotny, Head of Open Source Strategy for GCP, to talk all about Open Source, the Cloud Native Compute Foundation & their relationships to Google Cloud Platform.
byGoogle Cloud Platform Podcast
100%
100% found this document useful
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
Podcast episode
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
byData Engineering Podcast
0 ratings
0% found this document useful
Hasty Treat - Webhooks: In this Hasty Treat, Scott and Wes talk about webhooks — one of those concepts that seems a lot scarier than it actually is. Linode - Sponsor Whether you’re working on a personal project or managing enterprise infrastructure, you deserve simple,...
Podcast episode
Hasty Treat - Webhooks: In this Hasty Treat, Scott and Wes talk about webhooks — one of those concepts that seems a lot scarier than it actually is. Linode - Sponsor Whether you’re working on a personal project or managing enterprise infrastructure, you deserve simple,...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Cloud Native Security Con with Emily Fox: is a security engineer @Apple Cloud Services, a CNCF Technical Oversight Committee member and co-chair for a bunch of CNCF events including recently the Cloud Native Security Conference in Seattle. We had a chance to talk to Emily about the first...
Podcast episode
Cloud Native Security Con with Emily Fox: is a security engineer @Apple Cloud Services, a CNCF Technical Oversight Committee member and co-chair for a bunch of CNCF events including recently the Cloud Native Security Conference in Seattle. We had a chance to talk to Emily about the first...
byKubernetes Podcast from Google
0 ratings
0% found this document useful
#457: [INTRODUCING] AWS BugBust: Today Nicki is joined by Alex Bush, Head of AI Services and Vishnu Parimi, Principal Product Manager
Podcast episode
#457: [INTRODUCING] AWS BugBust: Today Nicki is joined by Alex Bush, Head of AI Services and Vishnu Parimi, Principal Product Manager
byAWS Podcast
0 ratings
0% found this document useful
React Hooks - 1 Year Later: In this episode of Syntax, Scott and Wes talk about React Hooks, one year later — what’s changed, how to use them, and more! Sanity - Sponsor is a real-time headless CMS with a fully customizable Content Studio built in React. Get a Sanity...
Podcast episode
React Hooks - 1 Year Later: In this episode of Syntax, Scott and Wes talk about React Hooks, one year later — what’s changed, how to use them, and more! Sanity - Sponsor is a real-time headless CMS with a fully customizable Content Studio built in React. Get a Sanity...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Hasty Treat - Why should I use React Hooks?: In this Hasty Treat, Scott and Wes talk about React Hooks and why you might want to use them instead of class components. Sentry - Sponsor If you want to know what’s happening with your errors, track them with . Sentry is open-source error...
Podcast episode
Hasty Treat - Why should I use React Hooks?: In this Hasty Treat, Scott and Wes talk about React Hooks and why you might want to use them instead of class components. Sentry - Sponsor If you want to know what’s happening with your errors, track them with . Sentry is open-source error...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Introduction to Data Governance with Skyflow’s Ashley Jose: In this episode, Ashley Jose, a product lead at Skyflow with a decade of experience in SaaS product management, explores the importance of data governance in today's data-driven world. He discusses the impact of growing data on business decisions and...
Podcast episode
Introduction to Data Governance with Skyflow’s Ashley Jose: In this episode, Ashley Jose, a product lead at Skyflow with a decade of experience in SaaS product management, explores the importance of data governance in today's data-driven world. He discusses the impact of growing data on business decisions and...
byPartially Redacted: Data Privacy, Security & Compliance
0 ratings
0% found this document useful
TypeScript Fundamentals — Getting a Bit Deeper: In this episode of Syntax, Scott and Wes continue their discussion of TypeScript Fundamentals with a deeper diver into more advanced use cases. Deque - Sponsor Deque’s axe DevTools makes accessibility testing easy and doesn’t require special...
Podcast episode
TypeScript Fundamentals — Getting a Bit Deeper: In this episode of Syntax, Scott and Wes continue their discussion of TypeScript Fundamentals with a deeper diver into more advanced use cases. Deque - Sponsor Deque’s axe DevTools makes accessibility testing easy and doesn’t require special...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Dapr Distributed Application Runtime with Azure CTO Mark Russinovich: Dapr is a an event-driven, portable runtime for building microservices on cloud and edge. In this episode Scott talks to Azure CTO Mark Russinovich about what this means and why you should care? What are the responsibilities of a microservice, and what should YOU worry about and what a responsibilities better delegated to an open source project like Dapr?
Podcast episode
Dapr Distributed Application Runtime with Azure CTO Mark Russinovich: Dapr is a an event-driven, portable runtime for building microservices on cloud and edge. In this episode Scott talks to Azure CTO Mark Russinovich about what this means and why you should care? What are the responsibilities of a microservice, and what should YOU worry about and what a responsibilities better delegated to an open source project like Dapr?
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
Storage Launches with Brian Schwarz and Sean Derrington: On the podcast this week, our guests Brian Schwarz and Sean Derrington discuss the ins and outs of the new storage launches with your hosts Stephanie Wong and Jenny Brown.
Podcast episode
Storage Launches with Brian Schwarz and Sean Derrington: On the podcast this week, our guests Brian Schwarz and Sean Derrington discuss the ins and outs of the new storage launches with your hosts Stephanie Wong and Jenny Brown.
byGoogle Cloud Platform Podcast
100%
100% found this document useful
Performance Engineering for Hybrid Cloud re-platforming with Klaus Kierer: When moving to the cloud - have you thought of the performance difference between App Gateway and Application Load Balancers? The disk speed and disk cache limitations impacting Cassandra and or Elasticsearch Performance? Challenges with pre-built...
Podcast episode
Performance Engineering for Hybrid Cloud re-platforming with Klaus Kierer: When moving to the cloud - have you thought of the performance difference between App Gateway and Application Load Balancers? The disk speed and disk cache limitations impacting Cassandra and or Elasticsearch Performance? Challenges with pre-built...
byPurePerformance
0 ratings
0% found this document useful
Demystifying Apache Kafka
Podcast episode
Demystifying Apache Kafka
byContinuous improvement
0 ratings
0% found this document useful
Episode 409: RR 402: Ruby 2.6.0 Bugs, Kafka, and Karafka with Maciej Mensfeld
Podcast episode
Episode 409: RR 402: Ruby 2.6.0 Bugs, Kafka, and Karafka with Maciej Mensfeld
byRuby Rogues
0 ratings
0% found this document useful
Cassandra on Kubernetes using K8ssandra
Podcast episode
Cassandra on Kubernetes using K8ssandra
byKubernetes Bytes
0 ratings
0% found this document useful
MLOps Meetup #34: Streaming Machine Learning with Apache Kafka and Tiered Storage // Kai Waehner, Confluent
Podcast episode
MLOps Meetup #34: Streaming Machine Learning with Apache Kafka and Tiered Storage // Kai Waehner, Confluent
byMLOps.community
0 ratings
0% found this document useful
Easier Stream Processing On Kafka With ksqlDB - Episode 122: An interview about the ksqlDB platform and the unified experience that it provides for building stream processing applications on top of Kafka with SQL.
Podcast episode
Easier Stream Processing On Kafka With ksqlDB - Episode 122: An interview about the ksqlDB platform and the unified experience that it provides for building stream processing applications on top of Kafka with SQL.
byData Engineering Podcast
0 ratings
0% found this document useful
Data Mechanics: Data Engineering with Jean-Yves Stephan: Apache Spark is a popular open source analytics engine for large-scale data processing. Applications can be written in Java, Scala, Python, R, and SQL. These applications have flexible options to run on like Kubernetes or in the cloud.
Podcast episode
Data Mechanics: Data Engineering with Jean-Yves Stephan: Apache Spark is a popular open source analytics engine for large-scale data processing. Applications can be written in Java, Scala, Python, R, and SQL. These applications have flexible options to run on like Kubernetes or in the cloud.
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Episode 464 - Azure Deployment Environments: Cale and Russell talk to the Microsoft Program Manager for DevBox and Azure Deployment Environments, Sagar Chandra Reddy Lankala, about how Azure Deployment Environments can enable rapid deployment of on-demand dev/test environments while providing governance, security and cost management - plus some more updates from Microsoft Build 2023! Media File: https://azpodcast.blob.core.windows.net/episodes/Episode464.mp3 Sagar's links: GA blog - https://aka.ms/ade-ga-blog Sign up for Terraform support - https://aka.ms/ade-terraform-signup LinkedIn profile - https://www.linkedin.com/in/sagarchandrareddy Other updates mentioned in this episode: Public preview: Introducing NGads V620-series VMs optimized for cloud gaming | Azure updates | Microsoft Azure Generally available: Azure Data Explorer Kusto Emulator on Linux | Azure updates | Microsoft Azure Explore the latest features for Datadog—An Azure Native ISV Service Microsoft Cost Management updates
Podcast episode
Episode 464 - Azure Deployment Environments: Cale and Russell talk to the Microsoft Program Manager for DevBox and Azure Deployment Environments, Sagar Chandra Reddy Lankala, about how Azure Deployment Environments can enable rapid deployment of on-demand dev/test environments while providing governance, security and cost management - plus some more updates from Microsoft Build 2023! Media File: https://azpodcast.blob.core.windows.net/episodes/Episode464.mp3 Sagar's links: GA blog - https://aka.ms/ade-ga-blog Sign up for Terraform support - https://aka.ms/ade-terraform-signup LinkedIn profile - https://www.linkedin.com/in/sagarchandrareddy Other updates mentioned in this episode: Public preview: Introducing NGads V620-series VMs optimized for cloud gaming | Azure updates | Microsoft Azure Generally available: Azure Data Explorer Kusto Emulator on Linux | Azure updates | Microsoft Azure Explore the latest features for Datadog—An Azure Native ISV Service Microsoft Cost Management updates
byThe Azure Podcast
0 ratings
0% found this document useful
Episode 111 - Kafka
Podcast episode
Episode 111 - Kafka
byThe Backend Engineering Show with Hussein Nasser
0 ratings
0% found this document useful
MLOps Coffee Sessions #8 // MLOps from the Perspective of an SRE // Neeran Gul
Podcast episode
MLOps Coffee Sessions #8 // MLOps from the Perspective of an SRE // Neeran Gul
byMLOps.community
0 ratings
0% found this document useful
Package Management in Elixir vs. JavaScript with Wojtek Mach & Amal Hussein: Wojtek Mach of HexPM and Amal Hussein, engineering leader and former NPM team member, join Owen Bickford to compare notes on package management in Elixir vs. JavaScript.
Podcast episode
Package Management in Elixir vs. JavaScript with Wojtek Mach & Amal Hussein: Wojtek Mach of HexPM and Amal Hussein, engineering leader and former NPM team member, join Owen Bickford to compare notes on package management in Elixir vs. JavaScript.
byElixir Wizards
0 ratings
0% found this document useful

Skip carousel

It’s Great When You’re K8s
Linux Format
Article
It’s Great When You’re K8s
Oct 18, 2022
8 min read
Basic Concepts
Linux Format
Article
Basic Concepts
Jul 2, 2019
A messaging system such as Kafka enables you to send messages between processes, applications and servers. Applications connect to Kafka to send or get data. Strictly speaking, a Kafka ‘topic’ is a unit of storage in Kafka: data in Kafka is stored in
1 min read
Traefik Configuration
Linux Format
Article
Traefik Configuration
Mar 10, 2020
In this tutorial we have configured Traefik using command-line switches in our Docker Compose file (the section starting command:). This is the equivalent of starting the application with a whole bunch of command options each time, and while this wou
1 min read
Build A Search And Analytic Engine
Linux Format
Article
Build A Search And Analytic Engine
Mar 10, 2020
7 min read
The State Of Linux Security
Linux Format
Article
The State Of Linux Security
Apr 7, 2020
1 min read
MapReduce: The ‘Big Data’ Idea Inside Your Android Phone
APC
Article
MapReduce: The ‘Big Data’ Idea Inside Your Android Phone
Dec 2, 2019
4 min read
Charts And Diagrams
Linux Format
Article
Charts And Diagrams
Nov 15, 2022
1 min read
Automatically Provision Devices With Ansible
Linux Format
Article
Automatically Provision Devices With Ansible
Nov 15, 2022
Matt Holder has worked in IT support for over a decade, and always tries to utilise Linux alongside other installed systems. C loud computing is a term that means a number of things. Software as a Service (SaaS) is one such example of what can be hos
9 min read
Build The Kernel
Linux Format
Article
Build The Kernel
Mar 8, 2022
1 min read
The Future Of Home Networking
APC
Article
The Future Of Home Networking
Feb 22, 2021
10 min read
Tackling Terminal Tabular Table Tools!
Linux Format
Article
Tackling Terminal Tabular Table Tools!
Jan 10, 2023
9 min read
KAFKA Build Utilities With The Kafka Server
Linux Format
Article
KAFKA Build Utilities With The Kafka Server
Jul 2, 2019
Nowadays, quite a few data architectures involve both a database and Apache Kafka, which is a distributed streaming platform and the subject of this tutorial. You can also find Kafka described as a publish-subscribe message system, which is a fancy w
7 min read
Pull, Configure And Run
Linux Format
Article
Pull, Configure And Run
Apr 7, 2020
Guacamole offers ready-to-run installation packages that are available for Linux distros such as CentOS or Debian. However, the thrust of this article is to illustrate running Guacamole in a Docker container context. Fire up an environment where you
8 min read
Nextcloud
Maximum PC
Article
Nextcloud
Jan 5, 2021
4 min read
Qsan XCubeNAS XN8112R
PC Pro Magazine
Article
Qsan XCubeNAS XN8112R
Apr 6, 2023
2 min read
Nextcloud
Linux Format
Article
Nextcloud
Sep 22, 2020
Regulars will be familiar with our ranting and raving about how great Nextcloud is. And it keeps getting better – the latest edition of the self-hosted storage and sharing platform, Nextcloud 19, sees it transformed into a complete Collaboration Hub.
3 min read
Join the Pod, Man!
Linux Format
Article
Join the Pod, Man!
May 30, 2023
8 min read
Rolling The Database As A Service
Linux Format
Article
Rolling The Database As A Service
Aug 27, 2019
A couple of times during our conversation, Robin alluded to the fact that DataStax has now set its eyes on helping users eradicate some of the day-to-day operational complexity from their workflow. The DataStax Apache Cassandra as a Service is one of
2 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
Set Up Secure Remote Access To Wekan
Linux Format
Article
Set Up Secure Remote Access To Wekan
May 5, 2020
1 min read
Benchmark your SSD
APC
Article
Benchmark your SSD
Nov 2, 2020
4 min read
Observability Of The Kernel And Containers
Linux Format
Article
Observability Of The Kernel And Containers
Apr 4, 2023
Mihalis Tsoukalos is currently working on Time Series. You can reach him at: @mactsouk. For our final delve into eBPF, we’re tackling applications, the kernel and Docker containers. At the end of the day, all Linux machines execute code for applicat
10 min read
Scan Cloud RTX Virtual Workstation
PC Pro Magazine
Article
Scan Cloud RTX Virtual Workstation
Aug 7, 2022
2 min read
Art Beyond The Canvas
Linux Format
Article
Art Beyond The Canvas
May 2, 2023
9 min read
An easy-to-Understand Overview of Popular extended BPF Tools: BCC, Falco, and More
Techfastly
Article
An easy-to-Understand Overview of Popular extended BPF Tools: BCC, Falco, and More
Apr 1, 2022
7 min read
ManageEngine OpManager Professional 12.7
PC Pro Magazine
Article
ManageEngine OpManager Professional 12.7
Feb 8, 2024
2 min read
Database Control With C++ Tools
Linux Format
Article
Database Control With C++ Tools
Dec 17, 2019
10 min read
Do Docker Like An Adult!
Linux Format
Article
Do Docker Like An Adult!
Feb 6, 2024
In the world of ephemeral services, Docker images provide a great way to have disposable I services on a ‘quick in, quick out’ scenario. With that ease of use, bad practice can creep in. Here we’re discussing some of the ways to optimise production u
4 min read
Rediscover Speed With The Redis Revolution
Linux Format
Article
Rediscover Speed With The Redis Revolution
Jul 25, 2023
Credit: https://redis.io Redis is an open-source, in-memory data structure store that has gained popularity R as a highly efficient caching and messaging system. It prioritises speed, efficiency and versatility, making it a top choice for various ap
8 min read
All Your Database Are Belong To Us
Linux Format
Article
All Your Database Are Belong To Us
Apr 6, 2021
7 min read

Related categories

Skip carousel

Reviews for Fast Data Processing Systems with SMACK Stack

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Fast Data Processing Systems with SMACK Stack - Raúl Estrada

Fast Data Processing Systems with SMACK Stack

Credits

About the Author

About the Reviewers

www.PacktPub.com

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. An Introduction to SMACK

Modern data-processing challenges

The data-processing pipeline architecture

The NoETL manifesto

Lambda architecture

Hadoop

SMACK technologies

Apache Spark

Akka

Apache Cassandra

Apache Kafka

Apache Mesos

Changing the data center operations

From scale-up to scale-out

The open-source predominance

Data store diversification

Data gravity and data locality

DevOps rules

Data expert profiles

Data architects

Data engineers

Data analysts

Data scientists

Is SMACK for me?

Summary

2. The Model - Scala and Akka

The language - Scala

Kata 1 - The collections hierarchy

Sequence

Map

Set

Kata 2 - Choosing the right collection

Sequence

Map

Set

Kata 3 - Iterating with foreach

Kata 4 - Iterating with for

Kata 5 - Iterators

Kata 6 - Transforming with map

Kata 7 - Flattening

Kata 8 - Filtering

Kata 9 - Subsequences

Kata 10 - Splitting

Kata 11 - Extracting unique elements

Kata 12 - Merging

Kata 13 - Lazy views

Kata 14 - Sorting

Kata 15 - Streams

Kata 16 - Arrays

Kata 17 - ArrayBuffer

Kata 18 - Queues

Kata 19 - Stacks

Kata 20 - Ranges

The model - Akka

The Actor Model in a nutshell

Kata 21 - Actors

The actor system

Actor reference

Kata 22 - Actor communication

Kata 23 - Actor life cycle

Kata 24 - Starting actors

Kata 25 - Stopping actors

Kata 26 - Killing actors

Kata 27 - Shutting down the actor system

Kata 28 - Actor monitoring

Kata 29 - Looking up actors

Summary

3. The Engine - Apache Spark

Spark in single mode

Downloading Apache Spark

Testing Apache Spark

Spark core concepts

Resilient distributed datasets

Running Spark applications

Initializing the Spark context

Spark applications

Running programs

RDD operation

Transformations

Actions

Persistence (caching)

Spark in cluster mode

Runtime architecture

Driver

Dividing a program into tasks

Scheduling tasks on executors

Executor

Cluster manager

Program execution

Application deployment

Standalone cluster manager

Launching the standalone manager

Submitting our application

Configuring resources

Working in the cluster

Spark Streaming

Spark Streaming architecture

Transformations

Stateless transformations

Stateful transformations

Windowed operations

Update state by key

Output operations

Fault-tolerant Spark Streaming

Checkpointing

Spark Streaming performance

Parallelism level

Window size and batch size

Garbage collector

Summary

4. The Storage - Apache Cassandra

A bit of history

NoSQL

NoSQL or SQL?

CAP Brewer's theorem

Apache Cassandra installation

Data model

Data storage

Installation

DataStax OpsCenter

Creating a key space

Authentication and authorization (roles)

Setting up a simple authentication and authorization

Backup

Compression

Recovery

Restart node

Printing schema

Logs

Configuring log4j

Log file rotation

User activity log

Transaction log

SQL dump

CQL

CQL commands

DBMS Cluster

Deleting the database

CLI delete commands

CQL shell delete commands

DB and DBMS optimization

Bloom filter

Data cache

Java heap tune up

Java garbage collection tune up

Views, triggers, and stored procedures

Client-server architecture

Drivers

Spark-Cassandra connector

Installing the connector

Establishing the connection

Using the connector

Summary

5. The Broker - Apache Kafka

Introducing Kafka

Features of Apache Kafka

Born to be fast data

Use cases

Installation

Installing Java

Installing Kafka

Importing Kafka

Cluster

Single node - single broker cluster

Starting Zookeeper

Starting the broker

Creating a topic

Starting a producer

Starting a consumer

Single node - Multiple broker cluster

Starting the brokers

Creating a topic

Starting a producer

Starting a consumer

Multiple node - multiple broker cluster

Broker properties

Architecture

Segment files

Offset

Leaders

Groups

Log compaction

Kafka design

Message compression

Replication

Asynchronous replication

Synchronous replication

Producers

Producer API

Scala producers

Step 1: Import classes

Step 2: Define properties

Step 3: Build and send the message

Step 4: Create the topic

Step 5: Compile the producer

Step 6: Run the producer

Step 7: Run a consumer

Producers with custom partitioning

Step 1: Import classes

Step 2: Define properties

Step 3: Implement the partitioner class

Step 4: Build and send the message

Step 5: Create the topic

Step 6: Compile the programs

Step 7: Run the producer

Step 8: Run a consumer

Producer properties

Consumers

Consumer API

Simple Scala consumers

Step 1: Import classes

Step 2: Define properties

Step 3: Code the SimpleConsumer

Step 4: Create the topic

Step 5: Compile the program

Step 6: Run the producer

Step 7: Run the consumer

Multithread Scala consumers

Step 1: Import classes

Step 2: Define properties

Step 3: Code the MultiThreadConsumer

Step 4: Create the topic

Step 5: Compile the program

Step 6: Run the producer

Step 7: Run the consumer

Consumer properties

Integration

Integration with Apache Spark

Administration

Cluster tools

Adding servers

Kafka topic tools

Cluster mirroring

Summary

6. The Manager - Apache Mesos

The Apache Mesos architecture

Frameworks

Existing Mesos frameworks

Frameworks for long running applications

Frameworks for scheduling

Frameworks for storage

Attributes and resources

Attributes

Resources

The Apache Mesos API

Messages

The Executor API

Executor Driver API

The Scheduler API

The Scheduler Driver API

Resource allocation

The DRF algorithm

Weighted DRF algorithm

Resource configuration

Resource reservation

Static reservation

Defining roles

Assigning frameworks to roles

Setting policies

Dynamic reservation

The reserve operation

The unreserve operation

HTTP reserve

HTTP unreserve

Running a Mesos cluster on AWS

AWS instance types

AWS instances launching

Installing Mesos on AWS

Downloading Mesos

Building Mesos

Launching several instances

Running a Mesos cluster on a private data center

Mesos installation

Setting up the environment

Start the master

Start the slaves

Process automation

Common Mesos issues

Missing library dependencies

Directory permissions

Missing library

Debugging

Directory structure

Slaves not connecting with masters

Multiple slaves on the same machine

Scheduling and management frameworks

Marathon

Marathon installation

Installing Apache Zookeeper

Running Marathon in local mode

Multi-node Marathon installation

Running a test application from the web UI

Application scaling

Terminating the application

Chronos

Chronos installation

Job scheduling

Chronos and Marathon

Chronos REST API

Listing running jobs

Starting a job manually

Adding a job

Deleting a job

Deleting all the job tasks

Marathon REST API

Listing the running applications

Adding an application

Changing the application configuration

Deleting the application

Apache Aurora

Installing Aurora

Singularity

Singularity installation

The Singularity configuration file

Apache Spark on Apache Mesos

Submitting jobs in client mode

Submitting jobs in cluster mode

Advanced configuration

Apache Cassandra on Apache Mesos

Advanced configuration

Apache Kafka on Apache Mesos

Kafka log management

Summary

7. Study Case 1 - Spark and Cassandra

Spark Cassandra connector

Requisites

Preparing Cassandra

SparkContext setup

Cassandra and Spark Streaming

Spark Streaming setup

Cassandra setup

Streaming context creation

Stream creation

Kafka Streams

Akka Streams

Enabling Cassandra

Write the Stream to Cassandra

Read the Stream from Cassandra

Saving datasets to Cassandra

Saving a collection of tuples to Cassandra

Saving collections to Cassandra

Modifying collections

Saving objects of Cassandra (user defined types)

Scala options to Cassandra options conversion

Saving RDDs as new tables

Cluster deployment

Spark Cassandra use cases

Study case: The Calliope project

Installing Calliope

CQL3

Read from Cassandra with CQL3

Write to Cassandra with CQL3

Thrift

Read from Cassandra with Thrift

Write to Cassandra with Thrift

Calliope SQL context creation

Calliope SQL Configuration

Loading Cassandra tables programmatically

Summary

8. Study Case 2 - Connectors

Akka and Cassandra

Writing to Cassandra

Reading from Cassandra

Connecting to Cassandra

Scanning tweets

Testing the scanner

Akka and Spark

Kafka and Akka

Kafka and Cassandra

Summary

9. Study Case 3 - Mesos and Docker

Mesos frameworks API

Authentication, authorization, and access control

Framework authentication

Authentication configuration

Framework authorization

Access control lists

Spark Mesos run modes

Coarse-grained

Fine-grained

Apache Mesos API

Scheduler HTTP API

Requests

SUBSCRIBE

TEARDOWN

ACCEPT

DECLINE

REVIVE

KILL

SHUTDOWN

ACKNOWLEDGE

RECONCILE

MESSAGE

REQUEST

Responses

SUBSCRIBED

OFFERS

RESCIND

UPDATE

MESSAGE

FAILURE

ERROR

HEARTBEAT

Mesos containerizers

Containers

Docker containerizers

Containers and containerizers

Types of containerizers

Creating containerizers

Mesos containerizer

Launching Mesos containerizer

Architecture of Mesos containerizer

Shared filesystem

PID namespace

Posix disk

Docker containerizers

Docker containerizer setup

Launching the Docker containerizers

Composing containerizers

Summary

Fast Data Processing Systems with SMACK Stack

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Production reference: 1151216

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-78646-720-1

www.packtpub.com

Credits

About the Author

Raúl Estrada is a programmer since 1996 and Java Developer since 2001. He loves functional languages such as Scala, Elixir, Clojure, and Haskell. He also loves all the topics related to Computer Science. With more than 12 years of experience in High Availability and Enterprise Software, he has designed and implemented architectures since 2003.

His specialization is in systems integration and has participated in projects mainly related to the financial sector. He has been an enterprise architect for BEA Systems and Oracle Inc., but he also enjoys Mobile Programming and Game Development. He considers himself a programmer before an architect, engineer, or developer.

He is also a Crossfitter in San Francisco, Bay Area, now focused on Open Source projects related to Data Pipelining such as Apache Flink, Apache Kafka, and Apache Beam. Raul is a supporter of free software, and enjoys to experiment with new technologies, frameworks, languages, and methods.

I want to thank my family, especially my mom for her patience and dedication.

I would like to thank Master Gerardo Borbolla and his family for the support and feedback they provided on this book writing.

I want to say thanks to the acquisition editor, Divya Poojari, who believed in this project since the beginning.

I also thank my editors Deepti Thore and Amrita Noronha. Without their effort and patience, it would not have been possible to write this book.

And finally, I want to thank all the heroes who contribute (often anonymously and without profit) with the Open Source projects specifically: Spark, Mesos, Akka, Cassandra, and Kafka; an honorable mention for those who build the connectors of these technologies.

About the Reviewers

Anton Kirillov started his career as a Java developer in 2007, working on his PhD thesis in the Semantic Search domain at the same time. After finishing and defending his thesis, he switched to Scala ecosystem and distributed systems development. He worked for and consulted startups focused on Big Data analytics in various domains (real-time bidding, telecom, B2B advertising, and social networks) in which his main responsibilities were focused on designing data platform architectures and further performance and stability validation. Besides helping startups, he has worked in the bank industry building Hadoop/Spark data analytics solutions and in a mobile games company where he has designed and implemented several reporting systems and a backend for a massive parallel online game.

The main technologies that Anton has been using for the recent years include Scala, Hadoop, Spark, Mesos, Akka, Cassandra, and Kafka and there are a number of systems he’s built from scratch and successfully released using these technologies. Currently, Anton is working as a Staff Engineer in Ooyala Data Team with focus on fault-tolerant fast analytical solutions for the ad serving/reporting domain.

Sumit Pal has more than 24 years of experience in the Software Industry, spanning companies from startups to enterprises. He is a big data architect, visualization and data science consultant, and builds end-to-end data-driven analytic systems. Sumit has worked for Microsoft (SQLServer), Oracle (OLAP), and Verizon (Big Data Analytics). Currently, he works for multiple clients building their data architectures and big data solutions and works with Spark, Scala, Java, and Python. He has extensive experience in building scalable systems in middletier, datatier to visualization for analytics applications, using BigData and NoSQL DB. Sumit has expertise in DataBase Internals, Data Warehouses, Dimensional Modeling, As an Associate Director for Big Data at Verizon, Sumit, strategized, managed, architected and developed analytic platforms for machine learning applications. Sumit was the Chief Architect at ModelN/LeapfrogRX (2006-2013), where he architected the core Analytics Platform.

Sumit has recently authored a book with Apress - called - SQL On Big Data - Technology, Architecture and Roadmap. Sumit regularly speaks on the above topic in Big Data Conferences across USA.

Sumit has hiked to Mt. Everest Base Camp at 18.2K feet in Oct, 2016. Sumit is also an avid Badminton player and has won a bronze medal in 2015 in Connecticut Open in USA in the men's single category.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Preface

The SMACK stack is a generalized web-scale data pipeline. It was popularized in the San Francisco Bay Area data engineering meet ups and conferences and spread around the world. SMACK stands for:

S = Spark: This involves data in-memory distributed computing. Think in Apache Flink, Apache Ignite, Google Millwheel, and so on.

M = Mesos: This involves Cluster OS, distributed system management, scheduling and scaling. Think in Apache YARN, Kubernetes, Docker, and so on.

A = Akka: This is the API. It is an implementation of the actor's model. Think in Scala, Erlang, Elixir, GoLang and so on.

C = Cassandra: This is a persistence layer, noSQL database. Think in Apache HBase, Riak, Google BigTable, MongoDB, and so on.

K = Kafka: This is a distributed streaming platform, the message broker. Think in Apache Storm, ActiveMQ, RabbitMQ, Kestrel, JMS, and so on.

During the years 2014, 2015, and 2016, surveys show that among all software developers, those with higher wages are the data engineers, the data scientists, and the data architects. This is because there is a huge demand for technical professionals in data and unfortunately for large organizations and fortunately for developers, there is a very low offer.

If you are reading this book, it is for two reasons: either you want to belong to best paid IT professionals, or you already belong and you want to learn how today's trends in the not too distant future will become requirements.

This book explains how to dominate the SMACK stack, which is also called the Spark++, because it seems to be the open stack that will succeed in the near future.

What this book covers

Chapter 1, Introducing SMACK,speaks about the fundamental SMACK architecture. We review the differences between the technologies in SMACK and the traditional data technologies. We also reviewed every technology in the SMACK and briefly expose each tool's potential.

Chapter 2, The Model - Scala and Akka, makes it easy by dividing the text into two parts: Scala (the language) and Akka (the actor model implementation for the JVM). It is a mini Scala Akka cookbook to learn through several exercises. The first half is for the fundamental parts of Scala, the second half is focused on the Akka actor model.

Chapter 3, The Engine - Apache Spark, describes the process of setting up a new project with the help of templates by importing an existing project, serving a web application, and using File Watchers.

Chapter 4, The Storage - Apache Cassandra, describes using package managers and building systems for your application by means of WebStorm's built-in features.

Chapter 5, The Broker - Apache Kafka, focuses on the state-of-the-art technologies of the web industry and describes the process of building a typical application in them using the power of WebStorm features.

Chapter 6, The Manager - Apache Mesos, shows you how to use JavaScript, HTML, and CSS to develop a mobile application and how to set up the environment to test run this mobile application.

Chapter 7, Study case 1 - Spark and Cassandra, shows how to perform the debugging, tracing, profiling, and code style checking activities directly in WebStorm.

Chapter 8, Study case 2 - Connectors, presents a couple of proven ways to easily perform application testing in WebStorm using some of the most popular testing libraries.

Chapter 9, Study case 3 - Mesos and Docker, speaks about a second portion of powerful features provided within WebStorm. In this chapter, we focus on some of WebStorm's power features that help us boost productivity and developer experience.

What you need for this book

The reader should have some experience in programming (Java or Scala), some experience in Linux/Unix operating systems and the basics of databases:

For Scala, the reader should know the basics about programming

For Spark, the reader should know the fundamentals of Scala Programming Language

For Mesos, the reader should know the basics of the Operating Systems administration

For Cassandra, the reader should know the fundamentals of Databases

For Kafka, the reader should have basic knowledge about Scala

Who this book is for

This book is for software developers, data architects, and data engineers looking for how to integrate the most successful Open Source Data stack architecture and how to choose the correct technology in every layer and also what are the practical benefits in every case.

There are a lot of books that talk about each technology separately. This book is for people looking for alternative technologies and practical examples on how to connect the entire stack.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: In the case of HDFS, we should change the mesos.hdfs.role in the file mesos-site.xml to the value of role1.

A block of code is set as follows:

[default]

exten => s,1,Dial(Zap/1|30)

exten => s,2,Voicemail(u100)

exten => s,102,Voicemail(b100)

exten => i,1,Voicemail(s0)

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default]

exten => s,1,Dial(Zap/1|30)

exten => s,2,Voicemail(u100)

exten => s,102,Voicemail(b100)

exten => i,1,Voicemail(s0)

Any command-line input or output is written as follows:

# cp /usr/src/asterisk-addons/configs/cdr_mysql.conf.sample

/etc/asterisk/cdr_mysql.conf

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: clicking the Next button moves you to the next screen.

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply e-mail feedback@packtpub.com, and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.

Hover the mouse pointer on the SUPPORT tab at the top.

Click on Code Downloads & Errata.

Enter the name of the book in the Search box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on Code Download.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Fast-Data-Processing-Systems-with-SMACK-Stack. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes

Enjoying the preview?

Page 1 of 1

Fast Data Processing Systems with SMACK Stack

About this ebook

Raúl Estrada

Related authors

Related to Fast Data Processing Systems with SMACK Stack

Related ebooks

Computers For You

Related podcast episodes

Related articles

Related categories

Reviews for Fast Data Processing Systems with SMACK Stack

What did you think?

Book preview

Fast Data Processing Systems with SMACK Stack - Raúl Estrada

Table of Contents

Fast Data Processing Systems with SMACK Stack

Fast Data Processing Systems with SMACK Stack

About the Author

About the Reviewers

www.PacktPub.com

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book