Talend Open Studio Cookbook
By Rick Barton
2/5
()
About this ebook
Related to Talend Open Studio Cookbook
Related ebooks
PostgreSQL 9 Administration Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsQlikView for Developers Cookbook Rating: 0 out of 5 stars0 ratingsPython Business Intelligence Cookbook Rating: 0 out of 5 stars0 ratingsHadoop Real-World Solutions Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsMongoDB Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsApache Hive Cookbook Rating: 0 out of 5 stars0 ratingsHadoop MapReduce v2 Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsApache Camel Developer's Cookbook Rating: 0 out of 5 stars0 ratingsHadoop 2.x Administration Cookbook Rating: 0 out of 5 stars0 ratingsPostgreSQL High Performance Cookbook Rating: 0 out of 5 stars0 ratingsApache Spark for Data Science Cookbook Rating: 0 out of 5 stars0 ratingsPostgreSQL 9 High Availability Cookbook Rating: 5 out of 5 stars5/5Node Cookbook: Second Edition Rating: 3 out of 5 stars3/5Getting Started with Talend Open Studio for Data Integration Rating: 0 out of 5 stars0 ratingsMastering Apache Camel Rating: 0 out of 5 stars0 ratingsQlikView for Developers Rating: 0 out of 5 stars0 ratingsThe Data Model Resource Book: Volume 3: Universal Patterns for Data Modeling Rating: 0 out of 5 stars0 ratingsHadoop BIG DATA Interview Questions You'll Most Likely Be Asked Rating: 0 out of 5 stars0 ratingsHDInsight Essentials - Second Edition Rating: 0 out of 5 stars0 ratingsData Architecture: A Primer for the Data Scientist: A Primer for the Data Scientist Rating: 5 out of 5 stars5/5PostgreSQL for Data Architects Rating: 0 out of 5 stars0 ratingsSoftware Design Pattern A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratingsUML A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratingsHadoop Beginner's Guide Rating: 4 out of 5 stars4/5Scala for Data Science Rating: 0 out of 5 stars0 ratingsHadoop in Practice Rating: 0 out of 5 stars0 ratingsApache Mahout Essentials Rating: 0 out of 5 stars0 ratingsDatabase Modeling and Design: Logical Design Rating: 0 out of 5 stars0 ratingsAgile Data Warehousing for the Enterprise: A Guide for Solution Architects and Project Leaders Rating: 0 out of 5 stars0 ratingsInstant MongoDB Rating: 0 out of 5 stars0 ratings
Databases For You
100+ SQL Queries T-SQL for Microsoft SQL Server Rating: 4 out of 5 stars4/5Practical Data Analysis Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Learn SQL Server Administration in a Month of Lunches Rating: 0 out of 5 stars0 ratingsLearn SQL in 24 Hours Rating: 5 out of 5 stars5/5Blockchain Basics: A Non-Technical Introduction in 25 Steps Rating: 5 out of 5 stars5/5CompTIA DataSys+ Study Guide: Exam DS0-001 Rating: 0 out of 5 stars0 ratingsData Governance: How to Design, Deploy and Sustain an Effective Data Governance Program Rating: 4 out of 5 stars4/5Oracle DBA Mentor: Succeeding as an Oracle Database Administrator Rating: 0 out of 5 stars0 ratingsAccess 2010 All-in-One For Dummies Rating: 4 out of 5 stars4/5Access 2019 For Dummies Rating: 0 out of 5 stars0 ratingsBuilding a Scalable Data Warehouse with Data Vault 2.0 Rating: 4 out of 5 stars4/5Behind Every Good Decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight Rating: 5 out of 5 stars5/5The Visual Imperative: Creating a Visual Culture of Data Discovery Rating: 4 out of 5 stars4/5Data Mining: Concepts and Techniques Rating: 4 out of 5 stars4/5Beginning Microsoft SQL Server 2012 Programming Rating: 1 out of 5 stars1/5Relational Database Design and Implementation Rating: 5 out of 5 stars5/5Business Intelligence Guidebook: From Data Integration to Analytics Rating: 4 out of 5 stars4/5The Data and Analytics Playbook: Proven Methods for Governed Data and Analytic Quality Rating: 5 out of 5 stars5/5Data Modeling Essentials Rating: 4 out of 5 stars4/5SQL Clearly Explained Rating: 5 out of 5 stars5/5The SQL Workshop: Learn to create, manipulate and secure data and manage relational databases with SQL Rating: 0 out of 5 stars0 ratingsDatabase Design: Know It All Rating: 5 out of 5 stars5/5Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics Rating: 0 out of 5 stars0 ratingsServerless Architectures on AWS, Second Edition Rating: 5 out of 5 stars5/5Python and SQLite Development Rating: 0 out of 5 stars0 ratings
Reviews for Talend Open Studio Cookbook
1 rating0 reviews
Book preview
Talend Open Studio Cookbook - Rick Barton
Table of Contents
Talend Open Studio Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Introduction and General Principles
Before you begin
Installing the software
How to do it…
Enabling tHashInput and tHashOutput
How to do it…
2. Metadata and Schemas
Introduction
Schema metadata
Schemas
Repository schemas
Generic schemas
Shared schemas
Generated data sources
Fixed schemas and columns
Hand-cranking a built-in schema
Getting ready
How to do it…
How it works…
There’s more...
Date patterns
Nullable elements
Propagating schema changes
Getting ready
How to do it…
How it works…
There’s more…
Creating a generic schema from the existing metadata
How to do it…
How it works…
Cutting and pasting schema information
Getting ready
How to do it…
How it works…
There’s more…
Dropping schemas to empty components
Getting ready
How to do it…
How it works…
There’s more…
Creating schemas from lists
Getting ready
How to do it...
How it works…
There’s more…
3. Validating Data
Introduction
Enabling and disabling reject flows
Getting ready
How to do it…
How it works…
There's more...
See also
Gathering all rejects prior to killing a job
Getting ready
How to do it…
How it works…
There's more...
See also
Validating against the schema
Getting ready
How to do it…
How it works…
Rejecting rows using tMap
Getting ready
How to do it…
How it works…
There's more…
Checking a column against a list of allowed values
Getting ready
How to do it…
How it works…
There's more…
See also
Checking a column against a lookup
Getting ready
How to do it…
How it works…
Creating validation rules for more complex requirements
Getting ready
How to do it…
How it works…
There's more…
See also
Creating binary error codes to store multiple test results
Getting ready
How to do it…
How it works…
There's more…
Decrypting the error code
4. Mapping Data
Introduction
The tMap component
Single line of code
Batch versus real time
Simple mapping and tMap time savers
Getting ready
How to do it...
How it works...
There's more…
Creating tMap expressions
Getting ready
How to do it...
How it works...
There's more…
Testing expressions
Expression editor
Getting around the 'one line' limitation
See Also
Using the ternary operator for conditional logic
Getting ready
How to do it...
Single ternary expression: if-then-else
Ternary in ternary: if-then-elsif-then-else
How it works…
There's more…
Using intermediate variables in tMap
Getting ready
How to do it…
How it works…
There's more…
Filtering input rows
Getting ready
How to do it...
How it works…
There's more…
Splitting an input row into multiple outputs based on input conditions
Getting ready
How to do it...
How it works…
There's more…
Joining data using tMap
Getting ready
How to do it...
How it works…
There's more…
See Also
Hierarchical joins using tMap
Getting ready
How to do it...
How it works…
Using reload at each row to process real-time / near real-time data
Getting ready
How to do it...
How it works…
Loading the data into memory
The globalMap key
The WHERE clause
The result
There's more…
5. Using Java in Talend
Introduction
Performing one-off pieces of logic using tJava
Getting ready
How to do it…
How it works…
See also
Setting the context and globalMap variables using tJava
Getting ready
How to do it…
How it works…
There's more…
See also
Adding complex logic into a flow using tJavaRow
Getting ready
How to do it…
How it works…
Creating pseudo components using tJavaFlex
Getting ready
How to do it…
How it works…
There's more…
Creating custom functions using code routines
Getting ready
How to do it…
How it works…
There's more…
See also
Importing JAR files to allow use of external Java classes
Getting ready
How to do it…
How it works…
There's more…
6. Managing Context Variables
Introduction
Transportable code
Context variables
Common values in contexts
Passing command line parameters
Setting context variables in the code
Database context variables
Creating a context group
How to do it...
How it works...
There’s more…
Context types
Prompt for variable values using the tree mode
Adding a context group to your job
Getting ready
How to do it...
How it works…
There’s more…
Adding contexts to a context group
Getting ready
How to do it...
There’s more…
Using tContextLoad to load contexts
Getting ready
How to do it...
How it works...
There’s more…
Print operations
Warnings
Context file location
Using implicit context loading to load contexts
Getting ready
How to do it...
How it works...
There’s more…
Turning implicit context loading on and off in a job
Getting ready
How to do it...
How it works...
Setting the context file location in the operating system
Getting ready
How to do it...
How it works…
There’s more…
Variable not present
Implicit context load
7. Working with Databases
Introduction
Setting up a database connection
Getting ready
How to do it...
How it works...
There's more…
Using the connection
Always create database connections
Connection names
Context
Importing the table schemas
Getting ready
How to do it…
How it works...
There's more…
Reading from database tables
Getting ready
How to do it…
Selected rows and columns
Multiple tables and complex queries
How it works…
There's more…
Efficiency versus readability
SQL string
SQL style
Using context and globalMap variables in SQL queries
Getting ready
How to do it…
How it works…
There's more…
The globalMap variables
Developing the query
Reloading at each row
Printing your input query
Getting ready
How to do it…
How it works…
There's more…
Writing to a database table
Getting ready
How to do it…
How it works…
There's more…
Creating tables
Update and delete keys
Batches
Bulk loading
Bulk loading to temp table
Printing your output query
Getting ready
How to do it…
How it works…
There's more…
Managing database sessions
Getting ready
How to do it…
How it works…
Executions
There's more…
Multiple outputs
Don't forget the commit
Committing but not closing
Passing a session to a child job
Getting ready
How to do it…
How it works…
Selecting different fields and keys for insert, update, and delete
Getting ready
How to do it…
How it works…
There's more...
Updating
Deleting
Capturing individual rejects and errors
Getting ready
How to do it…
How it works…
There's more…
Die on error
Efficiency
Error management
Database and table management
Getting ready
How to do it…
How it works…
There's more…
Managing surrogate keys for parent and child tables
Getting ready
How to do it…
How it works…
There's more...
Added efficiency using hashMap key table
Ranges
Sequences
Auto increment keys
The LastInsertId component
Auto increment procedure
Rewritable lookups using an in-process database
Background
Getting ready
How to do it…
How it works…
In-memory components
Initialize the data
tMap
Write back
There's more…
Memory
See also
8. Managing Files
Introduction
Appending records to a file
Getting ready
How to do it...
How it works...
There's more…
Concatenating files using the append method
Reading rows using a regular expression
Getting ready
How to do it...
How it works...
There's more…
Using temporary files
Getting ready
How to do it...
How it works...
There's more…
See also
Storing intermediate data in the memory using tHashMap
Getting ready
How to do it...
How it works...
There's more…
Reading headers and trailers using tMap
Getting ready
How to do it...
How it works...
There's more…
Reading headers and trailers with no identifiers
Getting ready
How to do it...
How it works...
Using the information in the header and trailer
Getting ready
How to do it...
Validation subjob
Use the header information subjob
How it works...
Validating using the trailer information
Using the header information in the detail
There's more…
Adding a header and trailer to a file
Getting ready
How to do it...
How it works...
There's more…
See also
Moving, copying, renaming, and deleting files and folders
Getting ready
How to do it...
Copying a file to another directory
Copying file to a different name
Renaming a file
Moving a file
Deleting a file
How it works...
There's more…
Capturing file information
Getting ready
How to do it...
How it works...
There's more…
Processing multiple files at once
Getting ready
How to do it...
How it works...
There's more…
Processing control/validation files
Getting ready
How to do it...
How it works...
There's more…
Creating and writing files depending on the input data
Getting ready
How to do it...
How it works...
tJavaRow code explained
There's more…
9. Working with XML, Queues, and Web Services
Introduction
Using tXMLMap to read XML
Getting ready
How to do it...
How it works...
There's more…
Document objects
XML Structure
Using tXMLMap to create an XML document
Getting ready
How to do it...
How it works...
There's more…
Reading complex hierarchical XML
Getting ready
How to do it...
How it works...
There's more…
Managing the relationships
File information
XML to database mapping
XPATH
Web service XML
Writing complex XML
Understanding the XML structure
Node
Method
Java DOM
Getting ready
How to do it...
How it works...
So here we go…
tWriteXMLField
Code utilities
tFlowToIterate
tHash components
XPATH Condition
Putting it all together
There's more…
Job shape
Calling a SOAP web service
Getting ready...
How to do it...
How it works...
There’s more…
Decoding the response
Using web service calls in-flow
Calling a RESTful web service
Getting ready
How to do it...
How it works...
There's more…
Reading and writing to a queue
Getting ready
How to do it...
How it works...
There's more…
Ensuring lossless queues using sessions
Getting ready
How to do it...
How it works...
There's more…
10. Debugging, Logging, and Testing
Introduction
Debugging
Logging
Testing
Find the location of compilation errors using the Problems tab
Getting ready
How to do it...
How it works...
There's more…
Locating execution errors from the console output
Getting ready
How to do it...
How it works...
There's more…
See also
Using the Talend debug mode – row-by-row execution
Getting ready
How to do it...
How it works...
There's more…
See also
Using the Java debugger to debug Talend jobs
Getting ready
How to do it...
How it works...
There's more…
Using tLogRow to show data in a row
Getting ready
How to do it...
How it works...
There's more…
Using tJavaRow to display row information
Getting ready
How to do it...
How it works...
There's more…
Using tJava to display status messages and variables
Getting ready
How to do it...
How it works...
Printing out the context
Getting ready
How to do it...
How it works...
There's more…
Dumping the console output to a file from within a job
Getting ready
How to do it...
How it works...
There's more…
Creating simple test data using tRowGenerator
Getting ready
How to do it...
How it works...
There's more…
Creating complex test data using tRowGenerator, tFlowToIterate, tMap, and sequences
Getting ready
How to do it...
How it works...
There's more…
Creating random test data using lookups
Getting ready
How to do it...
How it works...
There's more…
Creating test data using Excel
Getting ready
How to do it...
How it works...
There's more…
Testing logic – the most-used pattern
Getting ready
How to do it...
How it works...
There's more…
Killing a job from within tJavaRow
Getting ready
How to do it...
How it works...
11. Deploying and Scheduling Talend Code
Introduction
Context Variables
Executable code
Managing job dependencies within Talend
Creating compiled executables
How to do it...
How it works…
Using a different context
Getting ready
How to do it…
How it works…
There's more…
Adding command-line context parameters
Getting ready
How to do it…
How it works…
There's more…
Managing job dependencies
Getting ready
How to do it…
How it works…
There's more…
Die on error
Adding error checks to the schedule
Restartability
Capturing and acting on different return codes
Getting ready
How to do it…
How it works…
There's more…
Returning codes from a child job without tDie
Getting ready
How to do it…
How it works…
There's more…
Passing parameters to a child job
Getting ready
How to do it…
How it works…
There's more
Executing non-Talend objects and operating system commands
Getting ready
How to do it…
How it works…
There's more…
12. Common Mistakes and Other Useful Hints and Tips
Introduction
My tab is missing
How to do it…
Show view:
Reset the perspective
Finding the code routine
How to do it…
Finding a new context variable
How to do it…
Reloads going missing at each row global variable
How to do it...
Dragging component globalMap variables
Some complex date formats
Capturing tMap rejects
Adding job name, project name, and other job specific information
Printing tMap variables
Stopping memory errors in Talend
Increasing the memory allocated to a job
Reducing lookup data
Using hashMap/in-memory tables
Splitting the job
Dropping data to disk
Split the files
Hardware solutions
A. Common Type Conversions
B. Management of Contexts
Introduction
Manipulating contexts in Talend Open Studio
Understanding implicit context loading
Understanding tContextLoad
Manually checking and setting contexts
Index
Talend Open Studio Cookbook
Talend Open Studio Cookbook
Copyright © 2013 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: October 2013
Production Reference: 2221013
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78216-726-6
www.packtpub.com
Cover Image by Artie Ng (<artherng@yahoo.com.au>)
Credits
Author
Rick Barton
Reviewers
Robert Baumgartner
Mustapha EL HASSAK
Viral Patel
Stéphane Planquart
Acquisition Editor
James Jones
Lead Technical Editor
Amey Varangaonkar
Technical Editors
Monica John
Mrunmayee Patil
Tarunveer Shetty
Sonali Vernekar
Project Coordinator
Abhijit Suvarna
Proofreader
Clyde Jenkins
Indexer
Tejal R. Soni
Production Coordinator
Adonia Jones
Cover Work
Adonia Jones
About the Author
Rick Barton is a freelance consultant who has specialized in data integration and ETL for the last 13 years as part of an IT career spanning over 25 years.
After gaining a degree in Computer Systems from Cardiff University, he began his career as a firmware programmer before moving into Mainframe data processing and then into ETL tools in 1999.
He has provided technical consultancy to some of the UK’s largest companies, including banks and telecommunications companies, and was a founding partner of a Big Data
integration consultancy.
Four years ago he moved back into freelance development and has been working almost exclusively with Talend Open Studio and Talend Integration Suite, on multiple projects, of various sizes, in UK. It is on these projects that he has learned many of the lessons that can be found in this, his first book.
I would like to thank my wife Ange for support and my children, Alice and Ed for putting up with my weekend writing sessions.
I’d also like to thank the guys at Packt for keeping me motivated and productive and for making it so easy to get started. Their professionalism and most especially their confidence in me, has allowed me to do something I never thought I would.
About the Reviewers
Robert Baumgartner has a degree in Business Informatics from Austria, Europe, where he is living today. He began his career in 2002 as a business intelligence consultant working for different service companies. After this he was working in the paper industry sector as a consultant and project manager for an enterprise resource planning (ERP) system. In 2009 he founded his company datenpol
—a service integrator specialist in selected open source software products focusing on business intelligence and ERP. Robert is an open source enthusiast who held several speeches at open source events. The products he is working on are OpenERP, Talend Data Integration, and JasperReports. He is contributing to the open source community by sharing his knowledge with blog entries at his company blog http://www.datenpol.at/blog and he commits software to github like the OpenERP Talend Connector component which can be found at https://github.com/baumgaro/OpenERP-Talend-Component.
Mustapha EL HASSAK is a computer sciences fanatic since many years, he obtained a Bachelor’s Degree in Mathematics in 2003 then attended university to study Information Technology. After five years of study, he joined the largest investment bank in Morocco as an IT engineer. After that he worked in EAI, an IT services company specialized in insurance, as a senior developer responsible of data migration. He has always worked with Talend Open Studio and sometimes with Business Objects. This is the first time he is working on a book, but he wrote several articles in French and English about Talend on his personal blog.
I would like to thank my parents, Khadija and Hassan, Said, my brother and Asmae, my