Ebook865 pages4 hours

Hadoop 2.x Administration Cookbook

Name: Hadoop 2.x Administration Cookbook
Author: Gurmukh Singh
ISBN: 9781787126879

By Gurmukh Singh

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book

Become an expert Hadoop administrator and perform tasks to optimize your Hadoop Cluster
Import and export data into Hive and use Oozie to manage workflow.
Practical recipes will help you plan and secure your Hadoop cluster, and make it highly available

Who This Book Is For

If you are a system administrator with a basic understanding of Hadoop and you want to get into Hadoop administration, this book is for you. It’s also ideal if you are a Hadoop administrator who wants a quick reference guide to all the Hadoop administration-related tasks and solutions to commonly occurring problems

Skip carousel

Computers

LanguageEnglish

PublisherPackt Publishing

Release dateMay 26, 2017

ISBN9781787126879

Author

Gurmukh Singh

Related authors

Skip carousel

Related to Hadoop 2.x Administration Cookbook

Related ebooks

Skip carousel

Hadoop Real-World Solutions Cookbook - Second Edition
Ebook
Hadoop Real-World Solutions Cookbook - Second Edition
byDeshpande Tanmay
Rating: 0 out of 5 stars
0 ratings
Apache Hive Cookbook
Ebook
Apache Hive Cookbook
byHanish Bansal
Rating: 0 out of 5 stars
0 ratings
Hadoop MapReduce v2 Cookbook - Second Edition
Ebook
Hadoop MapReduce v2 Cookbook - Second Edition
byThilina Gunarathne
Rating: 0 out of 5 stars
0 ratings
Apache Camel Developer's Cookbook
Ebook
Apache Camel Developer's Cookbook
byScott Cranton
Rating: 0 out of 5 stars
0 ratings
Neo4j Cookbook
Ebook
Neo4j Cookbook
byAnkur Goel
Rating: 0 out of 5 stars
0 ratings
ElasticSearch Cookbook - Second Edition
Ebook
ElasticSearch Cookbook - Second Edition
byAlberto Paro
Rating: 0 out of 5 stars
0 ratings
MongoDB Cookbook - Second Edition
Ebook
MongoDB Cookbook - Second Edition
byDasadia Cyrus
Rating: 0 out of 5 stars
0 ratings
PostgreSQL 9 Administration Cookbook - Second Edition
Ebook
PostgreSQL 9 Administration Cookbook - Second Edition
bySimon Riggs
Rating: 0 out of 5 stars
0 ratings
Clojure Data Analysis Cookbook - Second Edition
Ebook
Clojure Data Analysis Cookbook - Second Edition
byEric Rochester
Rating: 0 out of 5 stars
0 ratings
Talend Open Studio Cookbook
Ebook
Talend Open Studio Cookbook
byRick Barton
Rating: 2 out of 5 stars
2/5
PostgreSQL 9 High Availability Cookbook
Ebook
PostgreSQL 9 High Availability Cookbook
byShaun M. Thomas
Rating: 5 out of 5 stars
5/5
DynamoDB Cookbook
Ebook
DynamoDB Cookbook
byDeshpande Tanmay
Rating: 0 out of 5 stars
0 ratings
Red Hat Enterprise Linux Server Cookbook
Ebook
Red Hat Enterprise Linux Server Cookbook
byLeemans William
Rating: 2 out of 5 stars
2/5
Oracle Data Integrator 11g Cookbook
Ebook
Oracle Data Integrator 11g Cookbook
byChristophe Dupupet
Rating: 0 out of 5 stars
0 ratings
CentOS 7 Linux Server Cookbook - Second Edition
Ebook
CentOS 7 Linux Server Cookbook - Second Edition
byPelz Oliver
Rating: 0 out of 5 stars
0 ratings
Apache Spark for Data Science Cookbook
Ebook
Apache Spark for Data Science Cookbook
byPadma Priya Chitturi
Rating: 0 out of 5 stars
0 ratings
Hadoop: Data Processing and Modelling
Ebook
Hadoop: Data Processing and Modelling
byDeshpande Tanmay
Rating: 0 out of 5 stars
0 ratings
Hadoop in Practice
Ebook
Hadoop in Practice
byAlex Holmes
Rating: 0 out of 5 stars
0 ratings
Cloudera Administration Handbook
Ebook
Cloudera Administration Handbook
byRohit Menon
Rating: 0 out of 5 stars
0 ratings
Learn Hadoop in 24 Hours
Ebook
Learn Hadoop in 24 Hours
byAlex Nordeen
Rating: 0 out of 5 stars
0 ratings
Hadoop in Action
Ebook
Hadoop in Action
byChuck Lam
Rating: 0 out of 5 stars
0 ratings
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Ebook
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
byWei Liu
Rating: 0 out of 5 stars
0 ratings
Learning Hadoop 2
Ebook
Learning Hadoop 2
byGarry Turkington
Rating: 4 out of 5 stars
4/5
Cassandra High Availability
Ebook
Cassandra High Availability
byRobbie Strickland
Rating: 5 out of 5 stars
5/5
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Ebook
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
byWei Liu
Rating: 0 out of 5 stars
0 ratings
Learning HBase
Ebook
Learning HBase
byShashwat Shriparv
Rating: 0 out of 5 stars
0 ratings
Mastering Hadoop
Ebook
Mastering Hadoop
bySandeep Karanth
Rating: 0 out of 5 stars
0 ratings
Hadoop Beginner's Guide
Ebook
Hadoop Beginner's Guide
byGarry Turkington
Rating: 4 out of 5 stars
4/5
Apache Hive Essentials
Ebook
Apache Hive Essentials
byDayong Du
Rating: 0 out of 5 stars
0 ratings
Securing Hadoop
Ebook
Securing Hadoop
bySudheesh Narayanan
Rating: 4 out of 5 stars
4/5

Computers For You

Skip carousel

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
CompTIA Security+ Practice Questions
Ebook
CompTIA Security+ Practice Questions
byIP Specialist
Rating: 2 out of 5 stars
2/5
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 0 out of 5 stars
0 ratings
Network+ Study Guide & Practice Exams
Ebook
Network+ Study Guide & Practice Exams
byRobert Shimonski
Rating: 4 out of 5 stars
4/5
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
Ebook
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
byRizwan Virk
Rating: 5 out of 5 stars
5/5
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
Ebook
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
byTriumph Books
Rating: 5 out of 5 stars
5/5
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
Practical Lock Picking: A Physical Penetration Tester's Training Guide
Ebook
Practical Lock Picking: A Physical Penetration Tester's Training Guide
byDeviant Ollam
Rating: 5 out of 5 stars
5/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
AP Computer Science Principles Premium, 2024: 6 Practice Tests + Comprehensive Review + Online Practice
Ebook
AP Computer Science Principles Premium, 2024: 6 Practice Tests + Comprehensive Review + Online Practice
bySeth Reichelson
Rating: 0 out of 5 stars
0 ratings
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance
Ebook
Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance
byKatherine Johnson Martinko
Rating: 0 out of 5 stars
0 ratings
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
Summary of Dotcom Secrets: by Russell Brunson - The Underground Playbook for Growing Your Company Online with Sales Funnels - A Comprehensive Summary
Ebook
Summary of Dotcom Secrets: by Russell Brunson - The Underground Playbook for Growing Your Company Online with Sales Funnels - A Comprehensive Summary
byAlexander Cooper
Rating: 5 out of 5 stars
5/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Master Builder Roblox: The Essential Guide
Ebook
Master Builder Roblox: The Essential Guide
byTriumph Books
Rating: 4 out of 5 stars
4/5
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
Ebook
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
byDavid Kadavy
Rating: 5 out of 5 stars
5/5
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
Ebook
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
byDexter Jackson
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
Podcast episode
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
Podcast episode
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
Podcast episode
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
byData Engineering Podcast
100%
100% found this document useful
This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars: This Week in Machine Learning & AI brings you the…
Podcast episode
This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars: This Week in Machine Learning & AI brings you the…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
20 JavaScript Array and Object Methods to make you a better developer: Wes and Scott rattle through ~20 different Object and Arra Methods that will make you a better JavaScript developer. Freshbooks - Sponsor This is episode Wes mentions the free book . Get a 30 day free trial of Freshbooks at . Netlify —...
Podcast episode
20 JavaScript Array and Object Methods to make you a better developer: Wes and Scott rattle through ~20 different Object and Arra Methods that will make you a better JavaScript developer. Freshbooks - Sponsor This is episode Wes mentions the free book . Get a 30 day free trial of Freshbooks at . Netlify —...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
Podcast episode
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
byThe Web Platform Podcast
100%
100% found this document useful
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
Podcast episode
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
byData Engineering Podcast
0 ratings
0% found this document useful
Open Source at Google Cloud Platform with Sarah Novotny: Mark and Melanie are joined by Sarah Novotny, Head of Open Source Strategy for GCP, to talk all about Open Source, the Cloud Native Compute Foundation & their relationships to Google Cloud Platform.
Podcast episode
Open Source at Google Cloud Platform with Sarah Novotny: Mark and Melanie are joined by Sarah Novotny, Head of Open Source Strategy for GCP, to talk all about Open Source, the Cloud Native Compute Foundation & their relationships to Google Cloud Platform.
byGoogle Cloud Platform Podcast
100%
100% found this document useful
#464: Diving deep into Amazon MWAA: Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service for Apache Air
Podcast episode
#464: Diving deep into Amazon MWAA: Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service for Apache Air
byAWS Podcast
0 ratings
0% found this document useful
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
Podcast episode
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
Azure Databricks: I sat down with Ali Ghodsi, CEO and found of Databricks, and John Chirapurath, GM for Data Platform Marketing at Microsoft related to the recent announcement of Azure Databricks. When I heard about the announcement, my first thoughts were...
Podcast episode
Azure Databricks: I sat down with Ali Ghodsi, CEO and found of Databricks, and John Chirapurath, GM for Data Platform Marketing at Microsoft related to the recent announcement of Azure Databricks. When I heard about the announcement, my first thoughts were...
byData Skeptic
0 ratings
0% found this document useful
ML Lifecycle with Dale Markowitz and Craig Wiley: Jenny Brown co-hosts with Mark Mirchandani this week for a great conversation about the ML lifecycle with our guests Craig Wiley and Dale Markowitz.
Podcast episode
ML Lifecycle with Dale Markowitz and Craig Wiley: Jenny Brown co-hosts with Mark Mirchandani this week for a great conversation about the ML lifecycle with our guests Craig Wiley and Dale Markowitz.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
State In React: In this episode of Syntax, Scott and Wes talk about state in React: local state, global state, UI state, data state, caching, API data and more! LogRocket - Sponsor LogRocket lets you replay what users do on your site, helping you reproduce bugs and...
Podcast episode
State In React: In this episode of Syntax, Scott and Wes talk about state in React: local state, global state, UI state, data state, caching, API data and more! LogRocket - Sponsor LogRocket lets you replay what users do on your site, helping you reproduce bugs and...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
2155: Databricks - The Story Behind the Lakehouse Company: Many are citing open source as the future. The UK Government's National Data Strategy even talks about the importance of opening public sector datasets to form the backbone of innovation, efficiency, and growth. This is a trend that Databricks...
Podcast episode
2155: Databricks - The Story Behind the Lakehouse Company: Many are citing open source as the future. The UK Government's National Data Strategy even talks about the importance of opening public sector datasets to form the backbone of innovation, efficiency, and growth. This is a trend that Databricks...
byThe Tech Talks Daily Podcast
0 ratings
0% found this document useful
What even is a micro frontend?: with Michael Geers, author of Micro Frontends in Action
Podcast episode
What even is a micro frontend?: with Michael Geers, author of Micro Frontends in Action
byJS Party: JavaScript, CSS, Web Development
0 ratings
0% found this document useful
Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57: Scalable and Stateful Streaming Data With Apache Flink (Interview)
Podcast episode
Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57: Scalable and Stateful Streaming Data With Apache Flink (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
Episode 77: Securing Infrastructure as Code (IaC)
Podcast episode
Episode 77: Securing Infrastructure as Code (IaC)
byThe Azure Security Podcast
0 ratings
0% found this document useful
Building A Cost Effective Data Catalog With Tree Schema - Episode 158: An interview about the Tree Schema data catalog platform and using it to quickly get visibility into your data assets.
Podcast episode
Building A Cost Effective Data Catalog With Tree Schema - Episode 158: An interview about the Tree Schema data catalog platform and using it to quickly get visibility into your data assets.
byData Engineering Podcast
0 ratings
0% found this document useful
Hasty Treat - Webhooks: In this Hasty Treat, Scott and Wes talk about webhooks — one of those concepts that seems a lot scarier than it actually is. Linode - Sponsor Whether you’re working on a personal project or managing enterprise infrastructure, you deserve simple,...
Podcast episode
Hasty Treat - Webhooks: In this Hasty Treat, Scott and Wes talk about webhooks — one of those concepts that seems a lot scarier than it actually is. Linode - Sponsor Whether you’re working on a personal project or managing enterprise infrastructure, you deserve simple,...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
#623: API Modernization Strategies Episode 2: AWS AppSync is a serverless GraphQL and Pub/Sub's APIs that simplify application development through
Podcast episode
#623: API Modernization Strategies Episode 2: AWS AppSync is a serverless GraphQL and Pub/Sub's APIs that simplify application development through
byAWS Podcast
0 ratings
0% found this document useful
OWASP Top 10 of 2021 - Andrew van der Stock - ASW #145: The OWASP Top 10 2021 is in development. A public survey has just been released. We have finished collecting data. I would like to discuss what the plans are for the OWASP Top 10 2021, and when it will be released, and how you can get involved. ...
Podcast episode
OWASP Top 10 of 2021 - Andrew van der Stock - ASW #145: The OWASP Top 10 2021 is in development. A public survey has just been released. We have finished collecting data. I would like to discuss what the plans are for the OWASP Top 10 2021, and when it will be released, and how you can get involved. ...
bySecurity Weekly Podcast Network (Video)
0 ratings
0% found this document useful
SnowflakeDB: The Data Warehouse Built For The Cloud - Episode 110: An interview about how SnowflakeDB was built to provide a performant and flexible data platform for the cloud era
Podcast episode
SnowflakeDB: The Data Warehouse Built For The Cloud - Episode 110: An interview about how SnowflakeDB was built to provide a performant and flexible data platform for the cloud era
byData Engineering Podcast
0 ratings
0% found this document useful
HashiCorp Vault for Kubernetes: Bret is joined by Rosemary Wang from HashiCorp to show off Vault for Kubernetes, an open source secrets provider.
Podcast episode
HashiCorp Vault for Kubernetes: Bret is joined by Rosemary Wang from HashiCorp to show off Vault for Kubernetes, an open source secrets provider.
byDevOps and Docker Talk: Cloud Native Interviews and Tooling
0 ratings
0% found this document useful
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
Podcast episode
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Episode 101. Allright, let's talk about Kafka: Whew! So we took a big break over summer (like Bob said, we were just swamped with work.. oof), but we are BACK! and like always we are ready to explore even deeper Java topics for the professional developer. This time we set our sights in Apache...
Podcast episode
Episode 101. Allright, let's talk about Kafka: Whew! So we took a big break over summer (like Bob said, we were just swamped with work.. oof), but we are BACK! and like always we are ready to explore even deeper Java topics for the professional developer. This time we set our sights in Apache...
byJava Pub House
0 ratings
0% found this document useful
#457: [INTRODUCING] AWS BugBust: Today Nicki is joined by Alex Bush, Head of AI Services and Vishnu Parimi, Principal Product Manager
Podcast episode
#457: [INTRODUCING] AWS BugBust: Today Nicki is joined by Alex Bush, Head of AI Services and Vishnu Parimi, Principal Product Manager
byAWS Podcast
0 ratings
0% found this document useful
#37 Prophet, Time Series & Causal Inference, with Sean Taylor
Podcast episode
#37 Prophet, Time Series & Causal Inference, with Sean Taylor
byLearning Bayesian Statistics
0 ratings
0% found this document useful
#28 - Becoming an Effective Software Engineering Manager - James Stanier
Podcast episode
#28 - Becoming an Effective Software Engineering Manager - James Stanier
byTech Lead Journal
0 ratings
0% found this document useful
The Undocumented Web: scraping, private APIs, proxies and “alternative solutions”: What is the undocumented web? Scott and Wes dive into it, discussing APIs, faking, scraping, automation, proxies as well as tips and tricks for best practices. Kyle Prinsloo’s Freelancing & Beyond — Sponsor Kyle Prinsloo teaches you everything...
Podcast episode
The Undocumented Web: scraping, private APIs, proxies and “alternative solutions”: What is the undocumented web? Scott and Wes dive into it, discussing APIs, faking, scraping, automation, proxies as well as tips and tricks for best practices. Kyle Prinsloo’s Freelancing & Beyond — Sponsor Kyle Prinsloo teaches you everything...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful

Skip carousel

Build A Search And Analytic Engine
Linux Format
Article
Build A Search And Analytic Engine
Mar 10, 2020
7 min read
Grafana, Telegraf And Influxdb
Linux Format
Article
Grafana, Telegraf And Influxdb
Jun 30, 2020
If you don’t like Netdata or if you want to try something else, you can give Grafana (https://grafana.com), Telegraf (www.influxdata.com/time-series-platform/telegraf) and InfluxDB (www.influxdata.com/products/influxdb-overview) a try. Grafana can’t
1 min read
Types Of Databases
Linux Format
Article
Types Of Databases
Aug 27, 2019
NoSQL databases provide the performance, scalability and stability that’s required by the modern data-driven apps we interact with these days. But that is where the similarity between NoSQL systems end. In fact, it wouldn’t be wrong to say that the o
1 min read
Your First Steps In Grafana
Linux Format
Article
Your First Steps In Grafana
Nov 17, 2020
The easiest way to get hold of Grafana and begin using it as soon as possible is by downloading and executing its official Docker image. This means that apart from the Docker image, you won’t need to download, set up or install anything else for Graf
1 min read
Usability
Linux Format
Article
Usability
Oct 19, 2021
3 min read
Access Your Mac Anywhere
MacLife
Article
Access Your Mac Anywhere
Nov 8, 2022
2 min read
Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
Elasticsearch And Kibana Basics
Linux Format
Article
Elasticsearch And Kibana Basics
Dec 15, 2020
1 min read
Understand And Deploy Security Keys
Linux Format
Article
Understand And Deploy Security Keys
Feb 8, 2022
9 min read
Ice Cold With Kali
Linux Format
Article
Ice Cold With Kali
May 2, 2023
3 min read
Filesystems
Linux Format
Article
Filesystems
Nov 16, 2021
1 min read
Build A Software Analysis Gitlab Pipeline
Linux Format
Article
Build A Software Analysis Gitlab Pipeline
Aug 24, 2021
8 min read
Cybersecurity Courses Ramp Up Amid Shortage Of Professionals
TechLife News
Article
Cybersecurity Courses Ramp Up Amid Shortage Of Professionals
Jun 18, 2022
7 min read
Coding Secure Rust System Tools
Linux Format
Article
Coding Secure Rust System Tools
Apr 5, 2022
8 min read
The Future Of The Database
Linux Format
Article
The Future Of The Database
Aug 27, 2019
7 min read
AWS Vs Azure What’s The Difference?
PC Pro Magazine
Article
AWS Vs Azure What’s The Difference?
Sep 11, 2022
7 min read
Protect Your Privacy
Linux Format
Article
Protect Your Privacy
Nov 15, 2022
4 min read
Build A Cloud-based Documentation Site
Linux Format
Article
Build A Cloud-based Documentation Site
Jul 27, 2021
Docusaurus is an open source documentation application developed by Facebook. It’s one of a growing number of JAMstack static site generators that uses a blend of JavaScript, React and markdown to make it easy for you to deploy clean, professional-lo
8 min read
Docker vs Podman
APC
Article
Docker vs Podman
Apr 19, 2021
When Cockpit was first developed, it had plug-in support for administering your Docker containers remotely via its user-friendly web interface. But then Red Hat OS became a major backer of Cockpit, and when Red Hat developed its own alternative to Do
1 min read
Traefik Configuration
Linux Format
Article
Traefik Configuration
Mar 10, 2020
In this tutorial we have configured Traefik using command-line switches in our Docker Compose file (the section starting command:). This is the equivalent of starting the application with a whole bunch of command options each time, and while this wou
1 min read
How To Cyber Security: Software Testing Is Cool
HWM Singapore
Article
How To Cyber Security: Software Testing Is Cool
Jul 3, 2020
4 min read
Enterprise-grade Monitoring Made Easy
Linux Format
Article
Enterprise-grade Monitoring Made Easy
Mar 10, 2020
9 min read
Set Up A Production- Ready Web Server
APC
Article
Set Up A Production- Ready Web Server
Nov 4, 2019
8 min read
HDDs Up, This Is A RAID!
Linux Format
Article
HDDs Up, This Is A RAID!
Jan 9, 2024
Sometimes novice administrators don’t S understand that cloud disk performance management and optimisation can be an art form. Some cloud providers limit individual disk I/O performance to ensure decent performance is available to all. As an example,
2 min read
Using Active Directory with Debian & Samba
Linux Format
Article
Using Active Directory with Debian & Samba
Feb 9, 2021
Like it or not, a lot of authentication uses Active Directory (AD) to manage users and resources. It’s a staple of the Windows world and until recently (Samba 4) there was no way to have a complete AD stack in FOSS to “talk proper AD”. Luckily, with
9 min read
Twenty Years Of WordPress Websites!
Linux Format
Article
Twenty Years Of WordPress Websites!
Oct 17, 2023
11 min read
Answers
Linux Format
Article
Answers
Mar 9, 2021
8 min read
Answers
Linux Format
Article
Answers
Jul 2, 2019
Q How do I generate a very large file, say around 980GB, with the dd command? I’m trying to test a quota for an XFS file system under RHEL7. I use the following command as the user oracle, which has the user quota set: and it outputs Where is my
7 min read
Observability Of The Kernel And Containers
Linux Format
Article
Observability Of The Kernel And Containers
Apr 4, 2023
Mihalis Tsoukalos is currently working on Time Series. You can reach him at: @mactsouk. For our final delve into eBPF, we’re tackling applications, the kernel and Docker containers. At the end of the day, all Linux machines execute code for applicat
10 min read
2 Build A Photography Website In Drupal
Digital Camera World
Article
2 Build A Photography Website In Drupal
Aug 21, 2020
4 min read

Related categories

Skip carousel

Reviews for Hadoop 2.x Administration Cookbook

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Hadoop 2.x Administration Cookbook - Gurmukh Singh

Hadoop 2.x Administration Cookbook

Credits

About the Author

About the Reviewers

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

How to do it…

How it works…

There's more…

See also

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Hadoop Architecture and Deployment

Introduction

Overview of Hadoop Architecture

Building and compiling Hadoop

Getting ready

How to do it...

How it works...

Installation methods

Getting ready

How to do it...

How it works...

Setting up host resolution

Getting ready

How to do it...

How it works...

Installing a single-node cluster - HDFS components

Getting ready

How to do it...

How it works...

There's more...

Setting up ResourceManager and NodeManager

Installing a single-node cluster - YARN components

Getting ready

How to do it...

How it works...

There's more...

See also

Installing a multi-node cluster

Getting ready

How to do it...

How it works...

Configuring the Hadoop Gateway node

Getting ready

How to do it...

How it works...

See also

Decommissioning nodes

Getting ready

How to do it...

How it works...

See also

Adding nodes to the cluster

Getting ready

How to do it...

How it works...

There's more...

2. Maintaining Hadoop Cluster HDFS

Introduction

Overview of HDFS

Configuring HDFS block size

Getting ready

How to do it...

How it works...

Setting up Namenode metadata location

Getting ready

How to do it...

How it works...

Loading data in HDFS

Getting ready

How to do it...

How it works...

Configuring HDFS replication

Getting ready

How to do it...

How it works...

See also

HDFS balancer

Getting ready

How to do it...

How it works...

Quota configuration

Getting ready

How to do it...

How it works...

HDFS health and FSCK

Getting ready

How to do it...

How it works...

See also

Configuring rack awareness

Getting ready

How to do it...

How it works...

See also

Recycle or trash bin configuration

Getting ready

How to do it...

How it works...

There's more...

Distcp usage

Getting ready

How to do it...

How it works...

Control block report storm

Getting ready

How to do it...

How it works...

Configuring Datanode heartbeat

Getting ready

How to do it...

How it works...

3. Maintaining Hadoop Cluster – YARN and MapReduce

Introduction

Running a simple MapReduce program

Getting ready

How to do it...

Hadoop streaming

Getting ready

How to do it...

How it works...

Configuring YARN history server

Getting ready

How to do it...

How it works...

There's more...

Job history web interface and metrics

Getting ready

How to do it...

How it works...

Configuring ResourceManager components

Getting ready

How to do it...

How it works...

There's more...

See also

YARN containers and resource allocations

Getting ready

How to do it...

How it works...

There's more...

See also

ResourceManager Web UI and JMX metrics

Getting ready

How to do it...

How it works...

Preserving ResourceManager states

Getting ready

How to do it...

How it works...

There's more...

4. High Availability

Introduction

Namenode HA using shared storage

Getting ready

How to do it...

How it works...

See also

ZooKeeper configuration

Getting ready

How to do it...

How it works...

Namenode HA using Journal node

Getting ready

How to do it...

How it works...

Resourcemanager HA using ZooKeeper

Getting ready

How to do it...

How it works…

Rolling upgrade with HA

Getting ready

How to do it...

How it works...

Configure shared cache manager

Getting ready

How to do it...

There's more...

See also

Configure HDFS cache

Getting ready

How to do it...

How it works...

See also

HDFS snapshots

Getting ready

How to do it...

How it works...

Configuring storage based policies

Getting ready

How to do it...

How it works...

Configuring HA for Edge nodes

Getting ready

How to do it...

How it works...

5. Schedulers

Introduction

Configuring users and groups

Getting ready

How to do it...

How it works...

See also

Fair Scheduler configuration

Getting ready

How to do it...

How it works...

Fair Scheduler pools

Getting ready

How to do it...

How it works...

Configuring job queues

Getting ready

How to do it...

How it works...

See also

Job queue ACLs

Getting ready

How to do it...

How it works...

See also

Configuring Capacity Scheduler

Getting ready

How to do it...

How it works...

See also

Queuing mappings in Capacity Scheduler

Getting ready

How to do it...

How it works...

YARN and Mapred commands

Getting ready

How to do it...

How it works...

YARN label-based scheduling

Getting ready

How to do it...

How it works...

YARN SLS

Getting ready

How to do it...

How it works...

6. Backup and Recovery

Introduction

Initiating Namenode saveNamespace

Getting ready

How to do it...

How it works...

Using HDFS Image Viewer

Getting ready

How to do it...

How it works...

Fetching parameters which are in-effect

Getting ready

How to do it...

How it works...

Configuring HDFS and YARN logs

Getting ready

How to do it...

How it works...

See also

Backing up and recovering Namenode

Getting ready

How to do it...

How it works...

See also

Configuring Secondary Namenode

Getting ready

How to do it...

How it works…

Promoting Secondary Namenode to Primary

Getting ready

How to do it...

How it works...

See also

Namenode recovery

Getting ready

How to do it...

How it works...

Namenode roll edits – online mode

Getting ready

How to do it...

How it works...

Namenode roll edits – offline mode

Getting ready

How to do it...

How it works...

Datanode recovery – disk full

Getting ready

How to do it...

How it works...

Configuring NFS gateway to serve HDFS

Getting ready

How to do it...

How it works...

Recovering deleted files

Getting ready

How to do it...

How it works...

7. Data Ingestion and Workflow

Introduction

Hive server modes and setup

Getting ready

How to do it...

How it works...

Using MySQL for Hive metastore

How to do it…

How it works...

Operating Hive with ZooKeeper

Getting ready

How to do it...

How it works...

Loading data into Hive

Getting ready

How to do it...

How it works...

See also

Partitioning and Bucketing in Hive

Getting ready

How to do it...

How it works...

See also

Hive metastore database

Getting ready

How to do it...

How it works...

See also

Designing Hive with credential store

Getting ready

How to do it...

How it works...

Configuring Flume

Getting ready

How to do it...

How it works...

Configure Oozie and workflows

Getting ready

How to do it...

How it works...

8. Performance Tuning

Tuning the operating system

Getting ready

How to do it...

How it works...

See also

Tuning the disk

Getting ready

How to do it...

How it works...

Tuning the network

Getting ready

How to do it...

How it works...

Tuning HDFS

Getting ready

How to do it...

How it works...

Tuning Namenode

Getting ready

How to do it...

There's more...

See also

Tuning Datanode

Getting ready

How to do it...

How it works...

See also

Configuring YARN for performance

Getting ready

How to do it...

How it works...

Configuring MapReduce for performance

Getting ready

How to do it...

How it works...

Hive performance tuning

Getting ready

How to do it...

There's more...

How it works...

Benchmarking Hadoop cluster

Getting ready

How to do it...

Benchmark 1--Testing HDFS with TestDFSIO

Benchmark 2--Stress testing Namenode

Benchmark 3--MapReduce testing by generating small files

Benchmark 4--TeraGen, TeraSort, and TeraValidate benchmarks

There's more...

How it works...

9. HBase Administration

Introduction

Setting up single node HBase cluster

Getting ready

How to do it...

How it works...

Setting up multi-node HBase cluster

Getting ready

How to do it...

How it works...

Inserting data into HBase

Getting ready

How to do it...

How it works...

Integration with Hive

Getting ready

How to do it...

How it works...

See also

HBase administration commands

Getting ready

How to do it...

How it works...

See also

HBase backup and restore

Getting ready

How to do it...

How it works...

Tuning HBase

Getting ready

How to do it...

How it works...

HBase upgrade

Getting ready

How to do it...

How it works...

Migrating data from MySQL to HBase using Sqoop

Getting ready

How to do it...

10. Cluster Planning

Introduction

Disk space calculations

Getting ready

How to do it...

How it works...

Nodes needed in the cluster

Getting ready

How to do it...

How it works...

See also

Memory requirements

Getting ready

How to do it...

How it works...

See also

Sizing the cluster as per SLA

Getting ready

How to do it...

How it works...

See also

Network design

Getting ready

How to do it...

How it works...

Estimating the cost of the Hadoop cluster

How to do it...

How it works...

Hardware and software options

How it works...

11. Troubleshooting, Diagnostics, and Best Practices

Introduction

Namenode troubleshooting

Getting ready

How to do it...

How it works...

See also

Datanode troubleshooting

Getting ready

How to do it...

How it works...

See also

Resourcemanager troubleshooting

Getting ready

How to do it…

How it works...

See also

Diagnose communication issues

Getting ready

How to do it...

How it works...

Parse logs for errors

Getting ready

How to do it...

How it works...

Hive troubleshooting

Getting ready

How to do it...

How it works...

See also

HBase troubleshooting

Getting ready

How to do it...

How it works...

Hadoop best practices

How it works...

12. Security

Introduction

Encrypting disk using LUKS

Getting ready

How to do it...

How it works...

See also

Configuring Hadoop users

Getting ready

How to do it...

How it works...

HDFS encryption at Rest

Getting ready

How to do it...

How it works...

Configuring SSL in Hadoop

Getting ready

How to do it...

How it works...

See also

In-transit encryption

Getting ready

How to do it...

There's more...

See also

Enabling service level authorization

Getting ready

How to do it...

How it works...

See also

Securing ZooKeeper

Getting ready

How to do it...

How it works...

Configuring auditing

Getting ready

How to do it...

How it works...

Configuring Kerberos server

Getting ready

How to do it...

How it works...

Configuring and enabling Kerberos for Hadoop

Getting ready

How to do it...

How it works...

Index

Hadoop 2.x Administration Cookbook

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: May 2017

Production reference: 1220517

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78712-673-2

www.packtpub.com

Credits

Author

Gurmukh Singh

Reviewers

Rajiv Tiwari

Wissem EL Khlifi

Commissioning Editor

Amey Varangaonkar

Acquisition Editor

Varsha Shetty

Content Development Editor

Deepti Thore

Technical Editor

Nilesh Sawakhande

Copy Editors

Laxmi Subramanian

Safis Editing

Project Coordinator

Shweta H Birwatkar

Proofreader

Safis Editing

Indexer

Francy Puthiry

Graphics

Tania Dutta

Production Coordinator

Nilesh Mohite

Cover Work

Nilesh Mohite

About the Author

Gurmukh Singh is a seasoned technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies.

He has worked with companies such as HP, JP Morgan, and Yahoo.

He has authored Monitoring Hadoop by Packt Publishing (https://www.packtpub.com/big-data-and-business-intelligence/monitoring-hadoop)

I would like to thank my wife, Navdeep Kaur, and my lovely daughter, Amanat Dhillon, who have always supported me throughout the journey  of this book.

About the Reviewers

Rajiv Tiwari is a freelance big data and cloud architect with over 17 years of experience across big data, analytics, and cloud computing for banks and other financial organizations. He is an electronics engineering graduate from IIT Varanasi, and has been working in England for the past 13 years, mostly in the financial city of London. Rajiv can be contacted on Twitter at @bigdataoncloud.

He is the author of the book Hadoop for Finance, an exclusive book for using Hadoop in banking and financial services.

I would like to thank my wife, Seema, and my son, Rivaan, for allowing me to spend their quota of time on reviewing this book.

Wissem El Khlifi is the first Oracle ACE in Spain and an Oracle Certified Professional DBA with over 12 years of IT experience.

He earned the Computer Science Engineer degree from FST Tunisia, Master in Computer Science from the UPC Barcelona, and Master in Big Data Science from the UPC Barcelona.

His area of interest include Cloud Architecture, Big Data Architecture, and Big Data Management and Analysis.

His career has included the roles of: Java analyst / programmer, Oracle Senior DBA, and big data scientist. He currently works as Senior Big Data and Cloud Architect for Schneider Electric / APC.

He writes numerous articles on his website http://www.oracle-class.com and is avaialble on twitter at @orawiss.

www.PacktPub.com

eBooks, discount offers, and more

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1787126730.

If you'd like to join our team of regular reviewers, you can e-mail us at customerreviews@packtpub.com. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Preface

Hadoop is a distributed system with a large ecosystem, which is growing at an exponential rate, and hence it becomes important to get a grip on things and do a deep dive into the functioning of a Hadoop cluster in production. Whether you are new to Hadoop or a seasoned Hadoop specialist, this recipe book contains recipes to deep dive into Hadoop cluster configuration and optimization.

What this book covers

Chapter 1, Hadoop Architecture and Deployment, covers Hadoop's architecture, its components, various installation modes and important daemons, and the services that make Hadoop a robust system. This chapter covers single-node and multinode clusters.

Chapter 2, Maintaining Hadoop Cluster – HDFS, wraps the storage layer HDFS, block size, replication, cluster health, Quota configuration, rack awareness, and communication channel between nodes.

Chapter 3, Maintaining Hadoop Cluster – YARN and MapReduce, talks about the processing layer in Hadoop and the resource management framework YARN. This chapter covers how to configure YARN components, submit jobs, configure job history server, and YARN fundamentals.

Chapter 4, High Availability, covers high availability for a Namenode and Resourcemanager, ZooKeeper configuration, HDFS storage-based policies, HDFS snapshots, and rolling upgrades.

Chapter 5, Schedulers, talks about YARN schedulers such as fair and capacity scheduler, with detailed recipes on configuring Queues, Queue ACLs, configuration of users and groups, and other Queue administration commands.

Chapter 6, Backup and Recovery, covers Hadoop metastore, backup and restore procedures on a Namenode, configuration of a secondary Namenode, and various ways of recovering lost Namenodes. This chapter also talks about configuring HDFS and YARN logs for troubleshooting.

Chapter 7, Data Ingestion and Workflow, talks about Hive configuration and its various modes of operation. This chapter also covers setting up Hive with the credential store and highly available access using ZooKeeper. The recipes in this chapter give details about the process of loading data into Hive, partitioning, bucketing concepts, and configuration with an external metastore. It also covers Oozie installation and Flume configuration for log ingestion.

Chapter 8, Performance Tuning, covers the performance tuning aspects of HDFS, YARN containers, the operating system, and network parameters, as well as optimizing the cluster for production by comparing benchmarks for various configurations.

Chapter 9, Hbase and RDBMS, talks about HBase cluster configuration, best practices, HBase tuning, backup, and restore. It also covers migration of data from MySQL to HBase and the procedure to upgrade HBase to the latest release.

Chapter 10, Cluster Planning, covers Hadoop cluster planning and the best practices for designing clusters are, in terms of disk storage, network, servers, and placement policy. This chapter also covers costing and the impact of SLA driver workloads on cluster planning.

Chapter 11, Troubleshooting, Diagnostics, and Best Practices, talks about the troubleshooting steps for a Namenode and Datanode, and diagnoses communication errors. It also covers details on logs and how to parse them for errors to extract important key points on issues faced.

Chapter 12, Security, covers Hadoop security in terms of data encryption, in-transit encryption, ssl configuration, and, more importantly, configuring Kerberos for the Hadoop cluster. This chapter also covers auditing and a recipe on securing ZooKeeper.

What you need for this book

To go through the recipes in this book, users need any Linux distribution, which could be Ubuntu, Centos, or any other flavor, as long as it supports running JVM. We use Centos in our recipe, as it is the most commonly used operating system for Hadoop clusters.

Hadoop runs on both virtualized and physical servers, so it is recommended to have at least 8 GB for the base system, on which about three virtual hosts can be set up. Users do not need to set up all the recipes covered in this book all at once; they can run only those daemons that are necessary for that particular recipe. This way, they can keep the resource requirements to the bare minimum. It is good to have at least four hosts to practice all the recipes in this book. These hosts could be virtual or physical.

In terms of software, users need JDK 1.7 minimum, and any SSH client, such as PuTTY in Windows or Terminal, to connect to the Hadoop nodes.

Who this book is for

If you are a system administrator with a basic understanding of Hadoop and you want to get into Hadoop administration, this book is for you. It's also ideal if you are a Hadoop administrator who wants a quick reference guide to all the Hadoop administration-related tasks and solutions to commonly occurring problems.

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it, How it works, There's more, and See also).

To give clear instructions on how to complete a recipe, we use these sections as follows:

Getting ready

This section tells you what to expect in the recipe, and describes how to set up any software or any preliminary settings required for the recipe.

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous section.

There's more…

This section consists of additional information about the recipe in order to make the reader more knowledgeable about the recipe.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: You will see a tarball under the hadoop-2.7.3-src/hadoop-dist/target/ folder.

A block of code is set as follows:

dfs.hosts.exclude

/home/hadoop/excludes

true

Any command-line input or output is written as follows:

$ stop-yarn.sh

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <feedback@packtpub.com>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a

Enjoying the preview?

Page 1 of 1

Hadoop 2.x Administration Cookbook

About this ebook

Gurmukh Singh

Related authors

Related to Hadoop 2.x Administration Cookbook

Related ebooks

Computers For You

Related podcast episodes

Related articles

Related categories

Reviews for Hadoop 2.x Administration Cookbook

What did you think?

Book preview

Hadoop 2.x Administration Cookbook - Gurmukh Singh

Table of Contents

Hadoop 2.x Administration Cookbook

Hadoop 2.x Administration Cookbook

Credits

About the Author

About the Reviewers

eBooks, discount offers, and more

Why subscribe?

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

There's more…

See also

Conventions

Note

Tip

Reader feedback

Customer support