Ebook945 pages17 hours

Pentaho Data Integration 4 Cookbook

Name: Pentaho Data Integration 4 Cookbook
Author: AdriÃ¡n Sergio Pulvirenti
ISBN: 9781849515252

By AdriÃ¡n Sergio Pulvirenti and MarÃa Carina RoldÃ¡n

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book has step-by-step instructions to solve data manipulation problems using PDI in the form of recipes. It has plenty of well-organized tips, screenshots, tables, and examples to aid quick and easy understanding. If you are a software developer or anyone involved or interested in developing ETL solutions, or in general, doing any kind of data manipulation, this book is for you. It does not cover PDI basics, SQL basics, or database concepts. You are expected to have a basic understanding of the PDI tool, SQL language, and databases.

Skip carousel

Information Technology

LanguageEnglish

PublisherPackt Publishing

Release dateJun 23, 2011

ISBN9781849515252

Author

AdriÃ¡n Sergio Pulvirenti

Related authors

Skip carousel

Related to Pentaho Data Integration 4 Cookbook

Related ebooks

Skip carousel

Pentaho Data Integration Cookbook - Second Edition
Ebook
Pentaho Data Integration Cookbook - Second Edition
byMaría Carina Roldán
Rating: 0 out of 5 stars
0 ratings
Microsoft SQL Server 2012 Integration Services: An Expert Cookbook
Ebook
Microsoft SQL Server 2012 Integration Services: An Expert Cookbook
byPedro Perfeito
Rating: 5 out of 5 stars
5/5
PostgreSQL 9 Administration Cookbook - Second Edition
Ebook
PostgreSQL 9 Administration Cookbook - Second Edition
bySimon Riggs
Rating: 0 out of 5 stars
0 ratings
Talend Open Studio Cookbook
Ebook
Talend Open Studio Cookbook
byRick Barton
Rating: 2 out of 5 stars
2/5
Apache Hive Cookbook
Ebook
Apache Hive Cookbook
byShrey Mehrotra
Rating: 0 out of 5 stars
0 ratings
Oracle Database 11g R2 Performance Tuning Cookbook
Ebook
Oracle Database 11g R2 Performance Tuning Cookbook
byCiro Fiorillo
Rating: 0 out of 5 stars
0 ratings
Tabular Modeling with SQL Server 2016 Analysis Services Cookbook
Ebook
Tabular Modeling with SQL Server 2016 Analysis Services Cookbook
byDerek Wilson
Rating: 4 out of 5 stars
4/5
DotNetNuke 5.4 Cookbook
Ebook
DotNetNuke 5.4 Cookbook
byJohn K Murphy
Rating: 5 out of 5 stars
5/5
Hadoop 2.x Administration Cookbook
Ebook
Hadoop 2.x Administration Cookbook
byGurmukh Singh
Rating: 0 out of 5 stars
0 ratings
SQL Server Analysis Services 2012 Cube Development Cookbook
Ebook
SQL Server Analysis Services 2012 Cube Development Cookbook
byPaul Turley
Rating: 0 out of 5 stars
0 ratings
IBM DB2 9.7 Advanced Application Developer Cookbook
Ebook
IBM DB2 9.7 Advanced Application Developer Cookbook
byMohankumar Saraswatipura
Rating: 0 out of 5 stars
0 ratings
Chef Infrastructure Automation Cookbook - Second Edition
Ebook
Chef Infrastructure Automation Cookbook - Second Edition
byMatthias Marschall
Rating: 0 out of 5 stars
0 ratings
Instant Pentaho Data Integration Kitchen
Ebook
Instant Pentaho Data Integration Kitchen
bySergio Ramazzina
Rating: 0 out of 5 stars
0 ratings
Learning Microsoft Cognitive Services
Ebook
Learning Microsoft Cognitive Services
byLeif Larsen
Rating: 0 out of 5 stars
0 ratings
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Ebook
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
byWei Liu
Rating: 0 out of 5 stars
0 ratings
Data Architecture Complete Self-Assessment Guide
Ebook
Data Architecture Complete Self-Assessment Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
Ebook
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
byDebananda Ghosh
Rating: 0 out of 5 stars
0 ratings
Learning RabbitMQ
Ebook
Learning RabbitMQ
byToshev Martin
Rating: 0 out of 5 stars
0 ratings
The Microsoft Data Warehouse Toolkit: With SQL Server 2008 R2 and the Microsoft Business Intelligence Toolset
Ebook
The Microsoft Data Warehouse Toolkit: With SQL Server 2008 R2 and the Microsoft Business Intelligence Toolset
byJoy Mundy
Rating: 0 out of 5 stars
0 ratings
Mahout in Action
Ebook
Mahout in Action
bySean Owen
Rating: 0 out of 5 stars
0 ratings
Learning Apache Mahout Classification
Ebook
Learning Apache Mahout Classification
byGupta Ashish
Rating: 0 out of 5 stars
0 ratings
Expert Cube Development with Microsoft SQL Server 2008 Analysis Services
Ebook
Expert Cube Development with Microsoft SQL Server 2008 Analysis Services
byAlberto Ferrari
Rating: 5 out of 5 stars
5/5
Data Lake Architecture Complete Self-Assessment Guide
Ebook
Data Lake Architecture Complete Self-Assessment Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Databricks A Complete Guide - 2021 Edition
Ebook
Databricks A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Learning Azure DocumentDB
Ebook
Learning Azure DocumentDB
byBecker Riccardo
Rating: 0 out of 5 stars
0 ratings
Data Hubs A Complete Guide - 2021 Edition
Ebook
Data Hubs A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Reporting with Microsoft SQL Server 2012
Ebook
Reporting with Microsoft SQL Server 2012
byJames Serra
Rating: 1 out of 5 stars
1/5
Oracle Warehouse Builder 11g: Getting Started
Ebook
Oracle Warehouse Builder 11g: Getting Started
byBob Griesemer
Rating: 0 out of 5 stars
0 ratings
Azure Data Lake A Clear and Concise Reference
Ebook
Azure Data Lake A Clear and Concise Reference
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Neo4j High Performance
Ebook
Neo4j High Performance
bySonal Raj
Rating: 0 out of 5 stars
0 ratings

Information Technology For You

Skip carousel

Windows Registry Forensics: Advanced Digital Forensic Analysis of the Windows Registry
Ebook
Windows Registry Forensics: Advanced Digital Forensic Analysis of the Windows Registry
byHarlan Carvey
Rating: 4 out of 5 stars
4/5
Hacking With Kali Linux : A Comprehensive, Step-By-Step Beginner's Guide to Learn Ethical Hacking With Practical Examples to Computer Hacking, Wireless Network, Cybersecurity and Penetration Testing
Ebook
Hacking With Kali Linux : A Comprehensive, Step-By-Step Beginner's Guide to Learn Ethical Hacking With Practical Examples to Computer Hacking, Wireless Network, Cybersecurity and Penetration Testing
byPeter Bradley
Rating: 5 out of 5 stars
5/5
How to Write Effective Emails at Work
Ebook
How to Write Effective Emails at Work
byRamakrishna Reddy
Rating: 4 out of 5 stars
4/5
Health Informatics: Practical Guide
Ebook
Health Informatics: Practical Guide
byWilliam Hersh
Rating: 0 out of 5 stars
0 ratings
How To Use Chatgpt: Using Chatgpt To Make Money Online Has Never Been This Simple
Ebook
How To Use Chatgpt: Using Chatgpt To Make Money Online Has Never Been This Simple
byMoses Omojola
Rating: 0 out of 5 stars
0 ratings
Inkscape Beginner’s Guide
Ebook
Inkscape Beginner’s Guide
byBethany Hiitola
Rating: 5 out of 5 stars
5/5
CompTIA ITF+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam FC0-U61
Ebook
CompTIA ITF+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam FC0-U61
byMike Chapple
Rating: 0 out of 5 stars
0 ratings
An Ultimate Guide to Kali Linux for Beginners
Ebook
An Ultimate Guide to Kali Linux for Beginners
byAnsh Goyal
Rating: 3 out of 5 stars
3/5
SharePoint Designer Tutorial: Working with SharePoint Websites
Ebook
SharePoint Designer Tutorial: Working with SharePoint Websites
byMike Poole
Rating: 1 out of 5 stars
1/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Data Analytics for Beginners: Introduction to Data Analytics
Ebook
Data Analytics for Beginners: Introduction to Data Analytics
byAnthony S. Williams
Rating: 4 out of 5 stars
4/5
AWS Certified Cloud Practitioner: Study Guide with Practice Questions and Labs
Ebook
AWS Certified Cloud Practitioner: Study Guide with Practice Questions and Labs
byNouman Ahmed Khan
Rating: 5 out of 5 stars
5/5
Handbook of Digital Forensics and Investigation
Ebook
Handbook of Digital Forensics and Investigation
byEoghan Casey
Rating: 4 out of 5 stars
4/5
Computer Science: A Concise Introduction
Ebook
Computer Science: A Concise Introduction
byIan Sinclair
Rating: 4 out of 5 stars
4/5
The Basics of Hacking and Penetration Testing: Ethical Hacking and Penetration Testing Made Easy
Ebook
The Basics of Hacking and Penetration Testing: Ethical Hacking and Penetration Testing Made Easy
byPatrick Engebretson
Rating: 4 out of 5 stars
4/5
CompTIA Network+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam N10-008
Ebook
CompTIA Network+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam N10-008
byMike Chapple
Rating: 0 out of 5 stars
0 ratings
CompTIA Security +: Malware and Malware Infections
Ebook
CompTIA Security +: Malware and Malware Infections
byAS Snipes
Rating: 0 out of 5 stars
0 ratings
The Power of Pull (Review and Analysis of Hagel, Brown and Davison's Book)
Ebook
The Power of Pull (Review and Analysis of Hagel, Brown and Davison's Book)
by BusinessNews Publishing
Rating: 5 out of 5 stars
5/5
Information Security Best Practices: 205 Basic Rules
Ebook
Information Security Best Practices: 205 Basic Rules
byGeorge L Stefanek
Rating: 0 out of 5 stars
0 ratings
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
Ebook
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
byJason Scotts
Rating: 3 out of 5 stars
3/5
Hacking Essentials - The Beginner's Guide To Ethical Hacking And Penetration Testing
Ebook
Hacking Essentials - The Beginner's Guide To Ethical Hacking And Penetration Testing
byAdidas Wilson
Rating: 3 out of 5 stars
3/5
Computer Organization and Design: The Hardware / Software Interface
Ebook
Computer Organization and Design: The Hardware / Software Interface
byJohn L. Hennessy
Rating: 4 out of 5 stars
4/5
Codeless Data Structures and Algorithms: Learn DSA Without Writing a Single Line of Code
Ebook
Codeless Data Structures and Algorithms: Learn DSA Without Writing a Single Line of Code
byArmstrong Subero
Rating: 0 out of 5 stars
0 ratings
Supercommunicator: Explaining the Complicated So Anyone Can Understand
Ebook
Supercommunicator: Explaining the Complicated So Anyone Can Understand
byFrank Pietrucha
Rating: 3 out of 5 stars
3/5
Panda3d 1.7 Game Developer's Cookbook
Ebook
Panda3d 1.7 Game Developer's Cookbook
byChristoph Lang
Rating: 0 out of 5 stars
0 ratings
Practical Ethical Hacking from Scratch
Ebook
Practical Ethical Hacking from Scratch
byAnsh Goyal
Rating: 5 out of 5 stars
5/5
COMPUTER SCIENCE FOR ROOKIES
Ebook
COMPUTER SCIENCE FOR ROOKIES
byAngel Bahabwa
Rating: 0 out of 5 stars
0 ratings
ChatGPT: The Future of Intelligent Conversation
Ebook
ChatGPT: The Future of Intelligent Conversation
byCea West
Rating: 4 out of 5 stars
4/5
The Programmer's Brain: What every programmer needs to know about cognition
Ebook
The Programmer's Brain: What every programmer needs to know about cognition
byFelienne Hermans
Rating: 5 out of 5 stars
5/5
Excel VBA: A Step-By-Step Tutorial For Beginners To Learn Excel VBA Programming From Scratch: 1
Ebook
Excel VBA: A Step-By-Step Tutorial For Beginners To Learn Excel VBA Programming From Scratch: 1
byPeter Bradley
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

The impact of Artificial Intelligence on Enterprise Architecture
Podcast episode
The impact of Artificial Intelligence on Enterprise Architecture
byEnterprise Architecture Podcast
0 ratings
0% found this document useful
Azure Databricks: I sat down with Ali Ghodsi, CEO and found of Databricks, and John Chirapurath, GM for Data Platform Marketing at Microsoft related to the recent announcement of Azure Databricks. When I heard about the announcement, my first thoughts were...
Podcast episode
Azure Databricks: I sat down with Ali Ghodsi, CEO and found of Databricks, and John Chirapurath, GM for Data Platform Marketing at Microsoft related to the recent announcement of Azure Databricks. When I heard about the announcement, my first thoughts were...
byData Skeptic
0 ratings
0% found this document useful
Kafka Streams with Jay Kreps: Kafka Streams is a library for building streaming applications that transform input Kafka topics into output Kafka topics. In a time when there are numerous streaming frameworks already out there, why do we need yet another?
Podcast episode
Kafka Streams with Jay Kreps: Kafka Streams is a library for building streaming applications that transform input Kafka topics into output Kafka topics. In a time when there are numerous streaming frameworks already out there, why do we need yet another?
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
Podcast episode
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
Episode 77: Securing Infrastructure as Code (IaC)
Podcast episode
Episode 77: Securing Infrastructure as Code (IaC)
byThe Azure Security Podcast
0 ratings
0% found this document useful
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
Podcast episode
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
byData Engineering Podcast
0 ratings
0% found this document useful
#321: Understanding the AWS Serverless Application Model (SAM): Do you want to deploy Serverless applications faster, easier and more reliably? The AWS Serverless A
Podcast episode
#321: Understanding the AWS Serverless Application Model (SAM): Do you want to deploy Serverless applications faster, easier and more reliably? The AWS Serverless A
byAWS Podcast
0 ratings
0% found this document useful
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
Podcast episode
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
byData Engineering Podcast
0 ratings
0% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
Podcast episode
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
byData Engineering Podcast
0 ratings
0% found this document useful
SQL Server 2022
Podcast episode
SQL Server 2022
byThe Azure Security Podcast
0 ratings
0% found this document useful
Prometheus Monitoring with Brian Brazil: Prometheus is a tool for monitoring our distributed applications. It allows us to focus on the services we are deploying rather than the individual machines that make up instances of that service. The monitoring service itself is a portion of a distr...
Podcast episode
Prometheus Monitoring with Brian Brazil: Prometheus is a tool for monitoring our distributed applications. It allows us to focus on the services we are deploying rather than the individual machines that make up instances of that service. The monitoring service itself is a portion of a distr...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
#71 - Strategic Monoliths and Microservices - Vaughn Vernon
Podcast episode
#71 - Strategic Monoliths and Microservices - Vaughn Vernon
byTech Lead Journal
0 ratings
0% found this document useful
62: Cracking the Data Code w/ Mike Bugembe: Mike Bugembe is a speaker, consultant, and Amazon best selling author of the book Cracking the Data Code. He joins today’s podcast to talk about the things that you can do that will help create successful analytics projects. After being the...
Podcast episode
62: Cracking the Data Code w/ Mike Bugembe: Mike Bugembe is a speaker, consultant, and Amazon best selling author of the book Cracking the Data Code. He joins today’s podcast to talk about the things that you can do that will help create successful analytics projects. After being the...
byAnalytics on Fire
0 ratings
0% found this document useful
#79 - Domain-Driven Design With Functional Programming - Scott Wlaschin
Podcast episode
#79 - Domain-Driven Design With Functional Programming - Scott Wlaschin
byTech Lead Journal
0 ratings
0% found this document useful
Running Databases on Kubernetes
Podcast episode
Running Databases on Kubernetes
byThe Cloudcast
0 ratings
0% found this document useful
Episode 458 - Integration Patterns: Elizabeth Graham, a Senior Software Engineer in the Commercial Software Engineering group at Microsoft, talks to Evan and Sujit about the various options we have to integrating external systems with Azure. She talks about how messages, files and other input forms can be ingested and processed with some of the PaaS options available in Azure. Media file: https://azpodcast.blob.core.windows.net/episodes/Episode458.mp3 YouTube: https://youtu.be/NekugFAh2x8 Resources: https://learn.microsoft.com/en-us/azure/architecture/integration/integration-start-here Data Mapper   Other updates: https://azure.microsoft.com/en-us/updates/azure-service-fabric-91-third-refresh-release-2/ https://azure.microsoft.com/en-us/updates/generally-available-crossregion-service-endpoints-for-azure-storage/ https://azure.microsoft.com/en-us/updates/azurefilessmblinuxad/
Podcast episode
Episode 458 - Integration Patterns: Elizabeth Graham, a Senior Software Engineer in the Commercial Software Engineering group at Microsoft, talks to Evan and Sujit about the various options we have to integrating external systems with Azure. She talks about how messages, files and other input forms can be ingested and processed with some of the PaaS options available in Azure. Media file: https://azpodcast.blob.core.windows.net/episodes/Episode458.mp3 YouTube: https://youtu.be/NekugFAh2x8 Resources: https://learn.microsoft.com/en-us/azure/architecture/integration/integration-start-here Data Mapper   Other updates: https://azure.microsoft.com/en-us/updates/azure-service-fabric-91-third-refresh-release-2/ https://azure.microsoft.com/en-us/updates/generally-available-crossregion-service-endpoints-for-azure-storage/ https://azure.microsoft.com/en-us/updates/azurefilessmblinuxad/
byThe Azure Podcast
0 ratings
0% found this document useful
Episode 8: Interview Eric Evans: Eric Evans is the author of the well known Domain-Driven Design book. In his day job he works as a consultant and coach for his own company, Domain Language. In this interview, Eric talks about the essential building blocks of domain-driven design as w...
Podcast episode
Episode 8: Interview Eric Evans: Eric Evans is the author of the well known Domain-Driven Design book. In his day job he works as a consultant and coach for his own company, Domain Language. In this interview, Eric talks about the essential building blocks of domain-driven design as w...
bySoftware Engineering Radio - the podcast for professional software developers
0 ratings
0% found this document useful
Hasty Treat - Webhooks: In this Hasty Treat, Scott and Wes talk about webhooks — one of those concepts that seems a lot scarier than it actually is. Linode - Sponsor Whether you’re working on a personal project or managing enterprise infrastructure, you deserve simple,...
Podcast episode
Hasty Treat - Webhooks: In this Hasty Treat, Scott and Wes talk about webhooks — one of those concepts that seems a lot scarier than it actually is. Linode - Sponsor Whether you’re working on a personal project or managing enterprise infrastructure, you deserve simple,...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
? Feature stores and CI/CD for machine learning with Qwak.ai VP Engineering, Ran Romano
Podcast episode
? Feature stores and CI/CD for machine learning with Qwak.ai VP Engineering, Ran Romano
byThe MLOps Podcast
0 ratings
0% found this document useful
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
Podcast episode
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Revisiting The Technical And Social Benefits Of The Data Mesh: An interview with Zhamak Dehghani about her experience working with the community that has grown up around her idea of the data mesh and the lessons that she has learned.
Podcast episode
Revisiting The Technical And Social Benefits Of The Data Mesh: An interview with Zhamak Dehghani about her experience working with the community that has grown up around her idea of the data mesh and the lessons that she has learned.
byData Engineering Podcast
0 ratings
0% found this document useful
Episode 1. Volatile, and Synchronized: On this Episode, we talk about the keyword "volatile", and what does it really mean. Even if you are a multithreading guru, this chapter goes in deep of the different things that volatile protects you from, including L2 caches and code re-ordering. We...
Podcast episode
Episode 1. Volatile, and Synchronized: On this Episode, we talk about the keyword "volatile", and what does it really mean. Even if you are a multithreading guru, this chapter goes in deep of the different things that volatile protects you from, including L2 caches and code re-ordering. We...
byJava Pub House
0 ratings
0% found this document useful
The AWS Evangelist with Jon Myer: Jon Myer is a partner solutions architect for cloud management tools at AWS. Prior to joining AWS, Jon worked as a senior cloud solutions architect at NetEnrich AWS, an AWS consultant for DevOps and Solutions at MetroStar Systems, and an AWS course author
Podcast episode
The AWS Evangelist with Jon Myer: Jon Myer is a partner solutions architect for cloud management tools at AWS. Prior to joining AWS, Jon worked as a senior cloud solutions architect at NetEnrich AWS, an AWS consultant for DevOps and Solutions at MetroStar Systems, and an AWS course author
byScreaming in the Cloud
0 ratings
0% found this document useful
Software Architecture with Simon Brown: Software architecture address the challenge of communicating and navigating large, complex systems to stakeholders, both technical and non-technical. Over the years software architecture has gone in and out of fashion.
Podcast episode
Software Architecture with Simon Brown: Software architecture address the challenge of communicating and navigating large, complex systems to stakeholders, both technical and non-technical. Over the years software architecture has gone in and out of fashion.
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
EP 22: What is OAuth 2?
Podcast episode
EP 22: What is OAuth 2?
byPro Coder Show
0 ratings
0% found this document useful
[Best of 2022] #90 - Clean Craftsmanship - Robert C. Martin (Uncle Bob)
Podcast episode
[Best of 2022] #90 - Clean Craftsmanship - Robert C. Martin (Uncle Bob)
byTech Lead Journal
0 ratings
0% found this document useful
#28 - Becoming an Effective Software Engineering Manager - James Stanier
Podcast episode
#28 - Becoming an Effective Software Engineering Manager - James Stanier
byTech Lead Journal
0 ratings
0% found this document useful
EP 09: Application Contexts, Dependency Injection, and Inversion of Control - OH MY!
Podcast episode
EP 09: Application Contexts, Dependency Injection, and Inversion of Control - OH MY!
byPro Coder Show
0 ratings
0% found this document useful
Best Integration Practices for Architecture Automation | BiZZdesign
Podcast episode
Best Integration Practices for Architecture Automation | BiZZdesign
byEnterprise Architecture Podcast
0 ratings
0% found this document useful

Skip carousel

Build A Search And Analytic Engine
Linux Format
Article
Build A Search And Analytic Engine
Mar 10, 2020
7 min read
Why Is ELT Better For Cloud Data Warehousing?
Techfastly
Article
Why Is ELT Better For Cloud Data Warehousing?
Apr 1, 2021
2 min read
What is ELT?
Techfastly
Article
What is ELT?
Apr 1, 2021
It stands for extract, load, and transform- the processes a data pipeline uses for replicating the data from a source system into a target system such as a cloud data warehouse. 1. Extraction is the first step in which data is copied from the source
6 min read
“We’re Learning As We Go And Accepting Any False Starts As Being A Part Of The Process”
PC Pro Magazine
Article
“We’re Learning As We Go And Accepting Any False Starts As Being A Part Of The Process”
Jul 8, 2021
6 min read
KAFKA Build Utilities With The Kafka Server
Linux Format
Article
KAFKA Build Utilities With The Kafka Server
Jul 2, 2019
Nowadays, quite a few data architectures involve both a database and Apache Kafka, which is a distributed streaming platform and the subject of this tutorial. You can also find Kafka described as a publish-subscribe message system, which is a fancy w
7 min read
Budget Strategies for Maximizing Big Data
Entrepreneur
Article
Budget Strategies for Maximizing Big Data
Jun 1, 2016
1 min read
Cloudy With No Chance Of Erp
Architectural Review Asia Pacific
Article
Cloudy With No Chance Of Erp
Nov 11, 2019
ERP (enterprise resource planning) was born around the time the first ‘[Something] for Dummies’ book was published*. It’s typically inflexible, uncompromising software designed for large businesses, like banks, large corporations, manufacturing and s
2 min read
Your First Steps In Grafana
Linux Format
Article
Your First Steps In Grafana
Nov 17, 2020
The easiest way to get hold of Grafana and begin using it as soon as possible is by downloading and executing its official Docker image. This means that apart from the Docker image, you won’t need to download, set up or install anything else for Graf
1 min read
MARIADB Optimise And Control Your Databases
Linux Format
Article
MARIADB Optimise And Control Your Databases
Jul 30, 2019
9 min read
Scan And Scrape Websites Using Python
Linux Format
Article
Scan And Scrape Websites Using Python
Nov 14, 2023
David Bolton once accidentally boosted the traffic for his firm’s website by 25% in one day by running a web scraper on it. Luckily, they never found out! Ever since the web made an appearance back in the mid-’90s, programmers have been writing softw
6 min read
ORGANIZING YOUR PHOTOS, PART 2: Using Keywords
Outdoor Photographer
Article
ORGANIZING YOUR PHOTOS, PART 2: Using Keywords
Sep 14, 2019
10 min read
Build A Streaming Ebook Library
Linux Format
Article
Build A Streaming Ebook Library
Apr 2, 2024
10 min read
Enterprise-grade Monitoring Made Easy
Linux Format
Article
Enterprise-grade Monitoring Made Easy
Mar 10, 2020
9 min read
Mac 911
MacWorld
Article
Mac 911
Sep 18, 2018
5 min read
A Place For Everything
Outdoor Photographer
Article
A Place For Everything
Aug 10, 2019
9 min read
Manage Zip And Other Archives With PeaZip
iCreate
Article
Manage Zip And Other Archives With PeaZip
Aug 11, 2022
2 min read
Reboot Windows 11 File Explorer
APC
Article
Reboot Windows 11 File Explorer
Oct 3, 2022
10 min read
Mac 911
MacWorld
Article
Mac 911
Mar 15, 2018
6 min read
Finish Your Cataloguing App
Linux Format
Article
Finish Your Cataloguing App
Jan 10, 2023
Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. In his spare time, Matt enjoys listening to music and reading. More featurepacked source code for this project can be downlo
7 min read
MacOS
MacFormat
Article
MacOS
Nov 16, 2021
3 min read
Do More With Finder
MacLife
Article
Do More With Finder
Nov 13, 2018
Thought Finder would stay largely the same after all this time? Oh no! Here are the most useful improvements found in Mojave. Choose View > as Gallery (or press Cmd+4) to try the new view, which is good for browsing any kind of file that Finder is ab
1 min read
Mac 911
MacWorld
Article
Mac 911
Apr 20, 2021
7 min read
Find Anything With Everything
Maximum PC
Article
Find Anything With Everything
Dec 5, 2023
4 min read
Mac 911
MacWorld
Article
Mac 911
Nov 20, 2018
7 min read
One Tree To Rule Them All
Family Tree
Article
One Tree To Rule Them All
Apr 19, 2022
7 min read
Mac 911
MacWorld
Article
Mac 911
Mar 20, 2018
6 min read
Reboot Windows 11 File Explorer
Maximum PC
Article
Reboot Windows 11 File Explorer
Sep 13, 2022
8 min read
Mailserver
Linux Format
Article
Mailserver
Jun 27, 2023
4 min read
Macos
MacFormat
Article
Macos
Jan 9, 2024
3 min read
How To Build The Linux Format Server
Linux Format
Article
How To Build The Linux Format Server
Oct 19, 2021
10 min read

Related categories

Skip carousel

Reviews for Pentaho Data Integration 4 Cookbook

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Pentaho Data Integration 4 Cookbook - AdriÃ¡n Sergio Pulvirenti

Pentaho Data Integration 4 Cookbook

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Working with Databases

Introduction

Sample databases

Pentaho BI platform databases

Connecting to a database

Getting ready

How to do it...

How it works...

There's more...

Avoiding creating the same database connection over and over again

Avoiding modifying jobs and transformations every time a connection changes

Specifying advanced connection properties

Connecting to a database not supported by Kettle

Checking the database connection at run-time

Getting data from a database

Getting ready

How to do it...

How it works...

There's more...

See also

Getting data from a database by providing parameters

Getting ready

How to do it...

How it works...

There's more...

Parameters coming in more than one row

Executing the SELECT statement several times, each for a different set of parameters

See also

Getting data from a database by running a query built at runtime

Getting ready

How to do it...

How it works...

There's more...

See also

Inserting or updating rows in a table

Getting ready

How to do it...

How it works...

There's more...

Alternative solution if you just want to insert records

Alternative solution if you just want to update rows

Alternative way for inserting and updating

See also

Inserting new rows where a simple primary key has to be generated

Getting ready

How to do it...

How it works...

There's more...

Using the Combination lookup/update for looking up

See also

Inserting new rows where the primary key has to be generated based on stored values

Getting ready

How to do it...

How it works...

There's more...

See also

Deleting data from a table

Getting ready

How to do it...

How it works...

See also

Creating or altering a database table from PDI (design time)

Getting ready

How to do it...

How it works...

There's more...

See also

Creating or altering a database table from PDI (runtime)

How to do it...

How it works...

There's more...

See also

Inserting, deleting, or updating a table depending on a field

Getting ready

How to do it...

How it works...

There's more...

Insert, update, and delete all-in-one

Synchronizing after merge

See also

Changing the database connection at runtime

Getting ready

How to do it...

How it works...

There's more...

See also

Loading a parent-child table

Getting ready

How to do it...

How it works...

See also

2. Reading and Writing Files

Introduction

Reading a simple file

Getting ready

How to do it...

How it works...

There's more...

Alternative notation for a separator

About file format and encoding

About data types and formats

Altering the names, order, or metadata of the fields coming from the file

Reading files with fixed width fields

Reading several files at the same time

Getting ready

How to do it...

How it works...

There's more...

Reading unstructured files

Getting ready

How to do it...

How it works...

There's more...

Master/detail files

Log files

Reading files having one field by row

Getting ready

How to do it...

How it works...

There's more...

See also

Reading files with some fields occupying two or more rows

Getting ready

How to do it...

How it works...

See also

Writing a simple file

Getting ready

How to do it...

How it works...

There's more...

Changing headers

Giving the output fields a format

Writing an unstructured file

Getting ready

How to do it...

How it works...

There's more...

Providing the name of a file (for reading or writing) dynamically

Getting ready

How to do it...

How it works...

There's more...

Get System Info

Generating several files simultaneously with the same structure, but different names

Using the name of a file (or part of it) as a field

Getting ready

How to do it...

How it works...

Reading an Excel file

Getting ready

How to do it...

How it works...

See also

Getting the value of specific cells in an Excel file

Getting ready

How to do it...

How it works...

There's more...

Labels and values horizontally arranged

Looking for a given cell

Writing an Excel file with several sheets

Getting ready

How to do it...

How it works...

There's more...

See also

Writing an Excel file with a dynamic number of sheets

Getting ready

How to do it...

How it works...

See also

3. Manipulating XML Structures

Introduction

Reading simple XML files

Getting ready

How to do it...

How it works...

There's more...

XML data in a field

XML file name in a field

ECMAScript for XML

See also

Specifying fields by using XPath notation

Getting ready

How to do it...

How it works...

There's more...

Getting data from a different path

Getting data selectively

Getting more than one node when the nodes share their XPath notation

Saving time when specifying XPath

Validating well-formed XML files

Getting ready

How to do it...

How it works...

See also

Validating an XML file against DTD definitions

Getting ready

How to do it...

How it works...

There's more...

See also

Validating an XML file against an XSD schema

Getting ready

How to do it...

How it works...

There's more...

See also

Generating a simple XML document

Getting ready

How to do it...

How it works...

There's more...

Generating fields with XML structures

See also

Generating complex XML structures

Getting ready

How to do it...

How it works...

See also

Generating an HTML page using XML and XSL transformations

Getting ready

How to do it...

How it works...

There's more...

See also

4. File Management

Introduction

Copying or moving one or more files

Getting ready

How to do it...

How it works...

There's more...

Moving files

Detecting the existence of the files before copying them

Creating folders

See also

Deleting one or more files

Getting ready

How to do it...

How it works...

There's more...

Figuring out which files have been deleted

See also

Getting files from a remote server

Getting ready

How to do it...

How it works...

There's more...

Specifying files to transfer

Some considerations about connecting to an FTP server

Access via SFTP

Access via FTPS

Getting information about the files being transferred

See also

Putting files on a remote server

Getting ready

How to do it...

How it works...

There's more...

See also

Copying or moving a custom list of files

Getting ready

How to do it...

How it works...

See also

Deleting a custom list of files

Getting ready

How to do it...

How it works...

See also

Comparing files and folders

Getting ready

How to do it...

How it works...

There's more...

Comparing folders

Working with ZIP files

Getting ready

How to do it...

How it works...

There's more...

Avoiding zipping files

Avoiding unzipping files

See also

5. Looking for Data

Introduction

Looking for values in a database table

Getting ready

How to do it...

How it works...

There's more...

Taking some action when the lookup fails

Taking some action when there are too many results

Looking for non-existent data

See also

Looking for values in a database (with complex conditions or multiple tables involved)

Getting ready

How to do it...

How it works...

There's more...

See also

Looking for values in a database with extreme flexibility

Getting ready

How to do it...

How it works...

There's more...

See also

Looking for values in a variety of sources

Getting ready

How to do it...

How it works...

There's more...

Looking for alternatives when the Stream Lookup step doesn't meet your needs

Speeding up your transformation

Using the Value Mapper step for looking up from a short list of values

See also

Looking for values by proximity

Getting ready

How to do it...

How it works...

There's more...

Looking for values consuming a web service

Getting ready

How to do it...

How it works...

There's more...

See also

Looking for values over an intranet or Internet

Getting ready

How to do it...

How it works...

There's more...

See also

6. Understanding Data Flows

Introduction

Splitting a stream into two or more streams based on a condition

Getting ready

How to do it...

How it works...

There's more...

Avoiding the use of Dummy steps

Comparing against the value of a Kettle variable

Avoiding the use of nested Filter Rows steps

Overcoming the difficulties of complex conditions

Merging rows of two streams with the same or different structures

Getting ready

How to do it...

How it works...

There's more...

Making sure that the metadata of the streams is the same

Telling Kettle how to merge the rows of your streams

See also

Comparing two streams and generating differences

Getting ready

How to do it...

How it works...

There's more...

Using the differences to keep a table up to date

See also

Generating all possible pairs formed from two datasets

How to do it...

How it works...

There's more...

Getting variables in the middle of the stream

Limiting the number of output rows

See also

Joining two or more streams based on given conditions

Getting ready

How to do it...

How it works...

There's more...

See also

Interspersing new rows between existent rows

Getting ready

How to do it...

How it works...

See also

Executing steps even when your stream is empty

Getting ready

How to do it...

How it works...

There's more...

Processing rows differently based on the row number

Getting ready

How to do it...

How it works...

There's more...

Identifying specific rows

Identifying the last row in the stream

Avoiding using an Add sequence step to enumerate the rows

See also

7. Executing and Reusing Jobs and Transformations

Introduction

Sample transformations

Sample transformation: Hello

Sample transformation: Random list

Sample transformation: Sequence

Sample transformation: File list

Launching jobs and transformations

Executing a job or a transformation by setting static arguments and parameters

Getting ready

How to do it...

How it works...

There's more...

See also

Executing a job or a transformation from a job by setting arguments and parameters dynamically

Getting ready

How to do it...

How it works...

There's more...

See also

Executing a job or a transformation whose name is determined at runtime

Getting ready

How to do it...

How it works...

There's more...

See also

Executing part of a job once for every row in a dataset

Getting ready

How to do it...

How it works...

There's more...

Accessing the copied rows from jobs, transformations, and other entries

Executing a transformation once for every row in a dataset

Executing a transformation or part of a job once for every file in a list of files

See also

Executing part of a job several times until a condition is true

Getting ready

How to do it...

How it works...

There's more...

Implementing loops in a job

Using the JavaScript step to control the execution of the entries in your job

See also

Creating a process flow

Getting ready

How to do it...

How it works...

There's more...

Serializing/De-serializing data

Other means for transferring or sharing data between transformations

Moving part of a transformation to a subtransformation

Getting ready

How to do it...

How it works...

There's more...

8. Integrating Kettle and the Pentaho Suite

Introduction

A sample transformation

Creating a Pentaho report with data coming from PDI

Getting ready

How to do it...

How it works...

There's more...

Configuring the Pentaho BI Server for running PDI jobs and transformations

Getting ready

How to do it...

How it works...

There's more...

See also

Executing a PDI transformation as part of a Pentaho process

Getting ready

How to do it...

How it works...

There's more...

Specifying the location of the transformation

Supplying values for named parameters, variables and arguments

Keeping things simple when it's time to deliver a plain file

See also

Executing a PDI job from the Pentaho User Console

Getting ready

How to do it...

How it works...

There's more...

See also

Generating files from the PUC with PDI and the CDA plugin

Getting ready

How to do it...

How it works...

There's more...

Populating a CDF dashboard with data coming from a PDI transformation

Getting ready

How to do it...

How it works...

There's more...

See also

9. Getting the Most Out of Kettle

Introduction

Sending e-mails with attached files

Getting ready

How to do it...

How it works...

There's more...

Sending logs through an e-mail

Sending e-mails in a transformation

Generating a custom log file

Getting ready

How to do it...

How it works...

There's more...

Filtering the log file

Creating a clean log file

Isolating log files for different jobs or transformations

See also

Programming custom functionality

Getting ready

How to do it...

How it works...

There's more...

Data type's equivalence

Generalizing you code

Looking up information with additional steps

Customizing logs

Scripting alternatives to the UDJC step

Generating sample data for testing purposes

How to do it...

How it works...

There's more...

Using Data grid step to generate specific data

Working with subsets of your data

See also

Working with Json files

Getting ready

How to do it...

How it works...

There's more...

Reading Json files dynamically

Writing Json files

Getting information about transformations and jobs (file-based)

Getting ready

How to do it...

How it works...

There's more...

Transformation XML nodes

Job XML nodes

Steps and entries information

See also

Getting information about transformations and jobs (repository-based)

Getting ready

How to do it...

How it works...

There's more...

Transformation tables

Job tables

Database connections tables

A. Data Structures

Book's data structure

Books

Authors

Museum's data structure

Museums

Cities

Outdoor data structure

Products

Categories

Steel Wheels structure

Index

Pentaho Data Integration 4 Cookbook

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: June 2011

Production Reference: 1170611

Published by Packt Publishing Ltd.

32 Lincoln Road

Olton

Birmingham, B27 6PA, UK.

ISBN 978-1-849515-24-5

www.packtpub.com

Cover Image by Ed Maclean (<edmaclean@gmail.com>)

Credits

Authors

Adrián Sergio Pulvirenti

María Carina Roldán

Reviewers

Jan Aertsen

Pedro Alves

Slawomir Chodnicki

Paula Clemente

Samatar Hassan

Nelson Sousa

Acquisition Editor

Usha Iyer

Development Editor

Neha Mallik

Technical Editors

Conrad Sardinha

Azharuddin Sheikh

Project Coordinator

Joel Goveya

Proofreaders

Stephen Silk

Aaron Nash

Indexer

Tejal Daruwale

Graphics

Nilesh Mohite

Production Coordinator

Kruthika Bangera

Cover Work

Kruthika Bangera

About the Authors

Adrián Sergio Pulvirenti was born in Buenos Aires, Argentina, in 1972. He earned his Bachelor's degree in Computer Sciences at UBA, one of the most prestigious universities in South America.

He has dedicated more than 15 years to developing desktop and web-based software solutions. Over the last few years he has been leading integration projects and development of BI solutions.

I'd like to thank my lovely kids Camila and Nicolas, who understood that I couldn't share with them the usual videogame sessions during the writing process. I'd also thank my wife who introduced me to the Pentaho world.

María Carina Roldán was born in Esquel, Argentina, in 1970. She earned her Bachelors degree in Computer Science at UNLP in La Plata; after that she did a postgraduate course in Statistics at the University of Buenos Aires (UBA) in Buenos Aires city where she lives since 1994.

She has worked as a BI consultant for more than 10 years. Over the last four years, she has been dedicated full time to developing BI solutions using Pentaho Suite. Currently she works for Webdetails, one of the main Pentaho contributors.

She is the author of Pentaho 3.2 Data Integration: Beginner's Guide published by Packt Publishing in April 2010.

You can follow her on Twitter at @mariacroldan.

I'd like to thank those who have encouraged me to write this book: On one hand, the Pentaho community. They have given me a rewarding feedback after the Beginner's book. On the other side, my husband who without hesitation agreed to write the book with me. Without them I'm not sure I would have embarked on a new book project.

I'd also like to thank the technical reviewers for the time and dedication that they have put in reviewing the book. In particular, thanks to my colleagues at Webdetails; it's a pleasure and a privilege to work with them every day.

About the Reviewers

Jan Aertsen has worked in IT and decision support for the past 10 years. Since the beginning of his career he has specialized in data warehouse design and business intelligence projects. He has worked on numerous global data warehouse projects within the fashion industry, retail, banking and insurance, telco and utilities, logistics, automotive, and public sector.

Jan holds the degree of Commercial Engineer in international business affairs from the Catholic University of Leuven (Belgium) and extended his further knowledge in the field of business intelligence through a Masters in Artificial Intelligence.

In 1999 Jan started up the business intelligence activities at IOcore together with some of his colleagues, rapidly making this the most important revenue area of the Belgian affiliate. They quickly gained access to a range of customers as KPN Belgium, Orange (now Base), Mobistar, and other Belgian Telcos.

After this experience Jan joined Cap Gemini Ernst & Young in Italy and rapidly became one of their top BI project managers. After having managed some large BI projects (up to 1 million € projects) Jan decided to leave the company and pursue his own ambitions.

In 2002, he founded kJube as an independent platform to develop his ambitions in the world of business intelligence. Since then this has resulted in collaborations with numerous companies as Volvo, Fendi-LVMH, ING, MSC, Securex, SDWorx, Blinck, and Beate Uhse.

Over the years Jan has worked his way through every possible aspect of business intelligence from KPI and strategy definition over budgeting, tool selection, and software investments acquisition to project management and all implementation aspects with most of the available tools. He knows the business side as well as the IT side of the business intelligence, and therefore is one of the rare persons that are able to give you a sound, all-round, vendor-independent advice on business intelligence.

He continues to share his experiences in the field through his blog (blog.kjube.be) and can be contacted at .

Pedro Alves, is the founder of Webdetails. A Physicist by formation, serious video gamer, volleyball player, open source passionate, and dad of two lovely children.

Since his early professional years he has been responsible for Business Software development and his career led him to work as a Consultant in several Portuguese companies.

In 2008 he decided it was time to get his accumulated experience and share his knowledge about the Pentaho Business Intelligence platform on his own. He founded Webdetails and joined the Mozilla metrics team. Now he leads an international team of BI Consultants and keeps nurturing Webdetails as a world reference Pentaho BI solutions provider and community contributor. He is the Ctools (CDF, CDA, CDE, CBF, CST, CCC) architect and, on a daily basis, keeps developing and improving new components and features to extend and maximize Pentaho's capabilities.

Slawomir Chodnicki specializes in data warehousing and ETL, with a background in web development using various programming languages and frameworks. He has established his blog at http://type-exit.org to help fellow BI developers embrace the possibilities of PDI and other open source BI tools.

I would like to thank all regular members of the ##pentaho IRC channel for their endless patience and support regarding PDI related questions. Very special thanks go to María Carina and Adrián Sergio for creating the Kettle Cookbook and inviting me to be part of the project.

Paula Clemente was born in Sintra, Portugal, in 1983. Divided between the idea of spending her life caring about people and animals or spending quality time with computers, she started studying Computer Science at IST Engineering College—the Portuguese MIT—at a time where Internet Social Networking was a synonym of IRC. She graduated in 2008 after completing her Master thesis on Business Processes Management. Since then she is proudly working as a BI Consultant for Webdetails, a Portuguese company specialized in delivering Pentaho BI solutions that earned the Pentaho Best Community Contributor 2011 award.

Samatar Hassan is an application developer focusing on data integration and business intelligence. He was involved in the Kettle project since the year it was open sourced. He tries to help the community by contributing in different ways; taking the translation effort for French language, participating in the forums, resolving bugs, and adding new features to the software.

He contributed to the Pentaho Kettle Solutions book edited by Wiley and written by Matt Casters, the founder of Kettle.

I would first like to thank Adrián Sergio and María Carina Roldán for taking the time to write this book. It is a great idea to show how to take advantage of Kettle through step-by-step recipes. Kettle users have their own ETL bible now.

Finally, I'd like to thank all community members. They are the real power of open source software.

Nelson Sousa is a business intelligence consultant at Webdetails. He's part of the Metrics team at Mozilla where he helps develop and maintain Mozilla's Pentaho server and solution. He specializes in Pentaho dashboards using CDF, CDE, and CDA and also in PDI, processing vast amounts of information that are integrated daily in the various dashboards and reports that are part of the Metrics team day-to-day life.

www.PacktPub.com

Support files, eBooks, discount offers and more

You might want to visit www.PacktPub.com for support files and downloads related to your book.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

http://PacktLib.PacktPub.com

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by Packt

Copy and paste, print and bookmark content

On demand and accessible via web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

We dedicate this book to our family and specially our adorable kids.

- María Carina and Adrián -

Preface

Pentaho Data Integration (PDI, also called Kettle), one of the data integration tools leaders, is broadly used for all kind of data manipulation, such as migrating data between applications or databases, exporting data from databases to flat files, data cleansing, and much more. Do you need quick solutions to the problems you face while using Kettle?

Pentaho Data Integration 4 Cookbook explains Kettle features in detail through clear and practical recipes that you can quickly apply to your solutions. The recipes cover a broad range of topics including processing files, working with databases, understanding XML structures, integrating with Pentaho BI Suite, and more.

Pentaho Data Integration 4 Cookbook shows you how to take advantage of all the aspects of Kettle through a set of practical recipes organized to find quick solutions to your needs. The initial chapters explain the details about working with databases, files, and XML structures. Then you will see different ways for searching data, executing and reusing jobs and transformations, and manipulating streams. Further, you will learn all the available options for integrating Kettle with other Pentaho tools.

Pentaho Data Integration 4 Cookbook has plenty of recipes with easy step-by-step instructions to accomplish specific tasks. There are examples and code that are ready for adaptation to individual needs.

Learn to solve data manipulation problems using the Pentaho Data Integration tool Kettle.

What this book covers

Chapter 1, Working with Databases helps you to deal with databases in Kettle. The recipes cover creating and sharing connections, loading tables under different scenarios, and creating dynamic SQL statements among others topics.

Chapter 2, Reading and Writing Files shows you not only the basics for reading and writing files, but also all the how-tos for dealing with files. The chapter includes parsing unstructured files, reading master/detail files, generating multi-sheet Excel files, and more.

Chapter 3, Manipulating XML Structures teaches you how to read, write, and validate XML data. It covers both simple and complex XML structures.

Chapter 4, File Management helps you to pick and configure the different options for copying, moving, and transferring lists of files or directories.

Chapter 5, Looking for Data explains the different methods for searching information in databases, text files, web services, and more.

Chapter 6, Understanding Data Flows focuses on the different ways for combining, splitting, or manipulating streams or flows of data in simple and complex situations.

Chapter 7, Executing and Reusing Jobs and Transformations explains in a simple fashion topics that are critical for building complex PDI projects. For example, building reusable jobs and transformations, iterating the execution of a transformation over a list of data and transferring data between transformations.

Chapter 8, Integrating Kettle and the Pentaho Suite. PDI aka Kettle is part of the Pentaho Business Intelligent Suite. As such, it can be used interacting with other components of the suite, for example as the datasource for reporting, or as part of a bigger process. This chapter shows you how to run Kettle jobs and transformations in that context.

Chapter 9, Getting the Most Out of Kettle covers a wide variety of topics, such as customizing a log file, sending e-mails with attachments, or creating a custom functionality.

Appendix, Data Structures describes some structures used in several recipes throughout the book.

What you need for this book

PDI is a multiplatform tool, meaning that you will be able to install the tool no matter what your operating system is. The only prerequisite to work with PDI is to have JVM 1.5 or a higher version installed. It is also useful to have Excel or Calc, a nice text editor, and access to a database engine of your preference.

Having an Internet connection while reading is extremely useful as well. Several links are provided throughout the book that complement what is explained. Besides, there is the PDI forum where you may search or post doubts if you are stuck with something.

Who this book is for

If you are a software developer or anyone involved or interested in developing ETL solutions, or in general, doing any kind of data manipulation, this book is for you. It does not cover PDI basics, SQL basics, or database concepts. You are expected to have a basic understanding of the PDI tool, SQL language, and databases.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text are shown as follows: Copy the .jar file containing the driver to the libext/JDBC directory inside the Kettle installation directory.

A block of code is set as follows:

NUMBER, LASTNAME, FIRSTNAME, EXT, OFFICE, REPORTS, TITLE

1188, Firrelli, Julianne,x2174,2,1143, Sales Manager

1619, King, Tom,x103,6,1088,Sales Rep

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

City

Buenos aires, Argentina

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: Add a Delete file entry from the File management category

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <feedback@packtpub.com>, and mention the book title via the subject of your message.

If there is a book that you need and would like to see us publish, please send us a note in the SUGGEST A TITLE form on www.packtpub.com or e-mail .

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you

Enjoying the preview?

Page 1 of 1

Pentaho Data Integration 4 Cookbook

About this ebook

AdriÃ¡n Sergio Pulvirenti

Related authors

Related to Pentaho Data Integration 4 Cookbook

Related ebooks

Information Technology For You

Related podcast episodes

Related articles

Related categories

Reviews for Pentaho Data Integration 4 Cookbook

What did you think?

Book preview

Pentaho Data Integration 4 Cookbook - AdriÃ¡n Sergio Pulvirenti

Table of Contents

Pentaho Data Integration 4 Cookbook

Pentaho Data Integration 4 Cookbook

Credits

About the Authors

About the Reviewers

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Errata