Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Talend Open Studio Cookbook
Talend Open Studio Cookbook
Talend Open Studio Cookbook
Ebook834 pages3 hours

Talend Open Studio Cookbook

Rating: 2 out of 5 stars

2/5

()

Read preview

About this ebook

Primarily designed as a reference book, simple and effective exercises based upon genuine real-world tasks enable the developer to reduce the time to deliver the results. Presentation of the activities in a recipe format will enable the readers to grasp even the complex concepts with consummate ease.Talend Open Studio Cookbook is principally aimed at relative beginners and intermediate Talend Developers who have used the product to perform some simple integration tasks, possibly via a training course or beginner's tutorials.
LanguageEnglish
Release dateOct 25, 2013
ISBN9781782167273
Talend Open Studio Cookbook

Related to Talend Open Studio Cookbook

Related ebooks

Databases For You

View More

Related articles

Reviews for Talend Open Studio Cookbook

Rating: 2 out of 5 stars
2/5

1 rating0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Talend Open Studio Cookbook - Rick Barton

    Table of Contents

    Talend Open Studio Cookbook

    Credits

    About the Author

    About the Reviewers

    www.PacktPub.com

    Support files, eBooks, discount offers and more

    Why Subscribe?

    Free Access for Packt account holders

    Preface

    What this book covers

    What you need for this book

    Who this book is for

    Conventions

    Reader feedback

    Customer support

    Downloading the example code

    Errata

    Piracy

    Questions

    1. Introduction and General Principles

    Before you begin

    Installing the software

    How to do it…

    Enabling tHashInput and tHashOutput

    How to do it…

    2. Metadata and Schemas

    Introduction

    Schema metadata

    Schemas

    Repository schemas

    Generic schemas

    Shared schemas

    Generated data sources

    Fixed schemas and columns

    Hand-cranking a built-in schema

    Getting ready

    How to do it…

    How it works…

    There’s more...

    Date patterns

    Nullable elements

    Propagating schema changes

    Getting ready

    How to do it…

    How it works…

    There’s more…

    Creating a generic schema from the existing metadata

    How to do it…

    How it works…

    Cutting and pasting schema information

    Getting ready

    How to do it…

    How it works…

    There’s more…

    Dropping schemas to empty components

    Getting ready

    How to do it…

    How it works…

    There’s more…

    Creating schemas from lists

    Getting ready

    How to do it...

    How it works…

    There’s more…

    3. Validating Data

    Introduction

    Enabling and disabling reject flows

    Getting ready

    How to do it…

    How it works…

    There's more...

    See also

    Gathering all rejects prior to killing a job

    Getting ready

    How to do it…

    How it works…

    There's more...

    See also

    Validating against the schema

    Getting ready

    How to do it…

    How it works…

    Rejecting rows using tMap

    Getting ready

    How to do it…

    How it works…

    There's more…

    Checking a column against a list of allowed values

    Getting ready

    How to do it…

    How it works…

    There's more…

    See also

    Checking a column against a lookup

    Getting ready

    How to do it…

    How it works…

    Creating validation rules for more complex requirements

    Getting ready

    How to do it…

    How it works…

    There's more…

    See also

    Creating binary error codes to store multiple test results

    Getting ready

    How to do it…

    How it works…

    There's more…

    Decrypting the error code

    4. Mapping Data

    Introduction

    The tMap component

    Single line of code

    Batch versus real time

    Simple mapping and tMap time savers

    Getting ready

    How to do it...

    How it works...

    There's more…

    Creating tMap expressions

    Getting ready

    How to do it...

    How it works...

    There's more…

    Testing expressions

    Expression editor

    Getting around the 'one line' limitation

    See Also

    Using the ternary operator for conditional logic

    Getting ready

    How to do it...

    Single ternary expression: if-then-else

    Ternary in ternary: if-then-elsif-then-else

    How it works…

    There's more…

    Using intermediate variables in tMap

    Getting ready

    How to do it…

    How it works…

    There's more…

    Filtering input rows

    Getting ready

    How to do it...

    How it works…

    There's more…

    Splitting an input row into multiple outputs based on input conditions

    Getting ready

    How to do it...

    How it works…

    There's more…

    Joining data using tMap

    Getting ready

    How to do it...

    How it works…

    There's more…

    See Also

    Hierarchical joins using tMap

    Getting ready

    How to do it...

    How it works…

    Using reload at each row to process real-time / near real-time data

    Getting ready

    How to do it...

    How it works…

    Loading the data into memory

    The globalMap key

    The WHERE clause

    The result

    There's more…

    5. Using Java in Talend

    Introduction

    Performing one-off pieces of logic using tJava

    Getting ready

    How to do it…

    How it works…

    See also

    Setting the context and globalMap variables using tJava

    Getting ready

    How to do it…

    How it works…

    There's more…

    See also

    Adding complex logic into a flow using tJavaRow

    Getting ready

    How to do it…

    How it works…

    Creating pseudo components using tJavaFlex

    Getting ready

    How to do it…

    How it works…

    There's more…

    Creating custom functions using code routines

    Getting ready

    How to do it…

    How it works…

    There's more…

    See also

    Importing JAR files to allow use of external Java classes

    Getting ready

    How to do it…

    How it works…

    There's more…

    6. Managing Context Variables

    Introduction

    Transportable code

    Context variables

    Common values in contexts

    Passing command line parameters

    Setting context variables in the code

    Database context variables

    Creating a context group

    How to do it...

    How it works...

    There’s more…

    Context types

    Prompt for variable values using the tree mode

    Adding a context group to your job

    Getting ready

    How to do it...

    How it works…

    There’s more…

    Adding contexts to a context group

    Getting ready

    How to do it...

    There’s more…

    Using tContextLoad to load contexts

    Getting ready

    How to do it...

    How it works...

    There’s more…

    Print operations

    Warnings

    Context file location

    Using implicit context loading to load contexts

    Getting ready

    How to do it...

    How it works...

    There’s more…

    Turning implicit context loading on and off in a job

    Getting ready

    How to do it...

    How it works...

    Setting the context file location in the operating system

    Getting ready

    How to do it...

    How it works…

    There’s more…

    Variable not present

    Implicit context load

    7. Working with Databases

    Introduction

    Setting up a database connection

    Getting ready

    How to do it...

    How it works...

    There's more…

    Using the connection

    Always create database connections

    Connection names

    Context

    Importing the table schemas

    Getting ready

    How to do it…

    How it works...

    There's more…

    Reading from database tables

    Getting ready

    How to do it…

    Selected rows and columns

    Multiple tables and complex queries

    How it works…

    There's more…

    Efficiency versus readability

    SQL string

    SQL style

    Using context and globalMap variables in SQL queries

    Getting ready

    How to do it…

    How it works…

    There's more…

    The globalMap variables

    Developing the query

    Reloading at each row

    Printing your input query

    Getting ready

    How to do it…

    How it works…

    There's more…

    Writing to a database table

    Getting ready

    How to do it…

    How it works…

    There's more…

    Creating tables

    Update and delete keys

    Batches

    Bulk loading

    Bulk loading to temp table

    Printing your output query

    Getting ready

    How to do it…

    How it works…

    There's more…

    Managing database sessions

    Getting ready

    How to do it…

    How it works…

    Executions

    There's more…

    Multiple outputs

    Don't forget the commit

    Committing but not closing

    Passing a session to a child job

    Getting ready

    How to do it…

    How it works…

    Selecting different fields and keys for insert, update, and delete

    Getting ready

    How to do it…

    How it works…

    There's more...

    Updating

    Deleting

    Capturing individual rejects and errors

    Getting ready

    How to do it…

    How it works…

    There's more…

    Die on error

    Efficiency

    Error management

    Database and table management

    Getting ready

    How to do it…

    How it works…

    There's more…

    Managing surrogate keys for parent and child tables

    Getting ready

    How to do it…

    How it works…

    There's more...

    Added efficiency using hashMap key table

    Ranges

    Sequences

    Auto increment keys

    The LastInsertId component

    Auto increment procedure

    Rewritable lookups using an in-process database

    Background

    Getting ready

    How to do it…

    How it works…

    In-memory components

    Initialize the data

    tMap

    Write back

    There's more…

    Memory

    See also

    8. Managing Files

    Introduction

    Appending records to a file

    Getting ready

    How to do it...

    How it works...

    There's more…

    Concatenating files using the append method

    Reading rows using a regular expression

    Getting ready

    How to do it...

    How it works...

    There's more…

    Using temporary files

    Getting ready

    How to do it...

    How it works...

    There's more…

    See also

    Storing intermediate data in the memory using tHashMap

    Getting ready

    How to do it...

    How it works...

    There's more…

    Reading headers and trailers using tMap

    Getting ready

    How to do it...

    How it works...

    There's more…

    Reading headers and trailers with no identifiers

    Getting ready

    How to do it...

    How it works...

    Using the information in the header and trailer

    Getting ready

    How to do it...

    Validation subjob

    Use the header information subjob

    How it works...

    Validating using the trailer information

    Using the header information in the detail

    There's more…

    Adding a header and trailer to a file

    Getting ready

    How to do it...

    How it works...

    There's more…

    See also

    Moving, copying, renaming, and deleting files and folders

    Getting ready

    How to do it...

    Copying a file to another directory

    Copying file to a different name

    Renaming a file

    Moving a file

    Deleting a file

    How it works...

    There's more…

    Capturing file information

    Getting ready

    How to do it...

    How it works...

    There's more…

    Processing multiple files at once

    Getting ready

    How to do it...

    How it works...

    There's more…

    Processing control/validation files

    Getting ready

    How to do it...

    How it works...

    There's more…

    Creating and writing files depending on the input data

    Getting ready

    How to do it...

    How it works...

    tJavaRow code explained

    There's more…

    9. Working with XML, Queues, and Web Services

    Introduction

    Using tXMLMap to read XML

    Getting ready

    How to do it...

    How it works...

    There's more…

    Document objects

    XML Structure

    Using tXMLMap to create an XML document

    Getting ready

    How to do it...

    How it works...

    There's more…

    Reading complex hierarchical XML

    Getting ready

    How to do it...

    How it works...

    There's more…

    Managing the relationships

    File information

    XML to database mapping

    XPATH

    Web service XML

    Writing complex XML

    Understanding the XML structure

    Node

    Method

    Java DOM

    Getting ready

    How to do it...

    How it works...

    So here we go…

    tWriteXMLField

    Code utilities

    tFlowToIterate

    tHash components

    XPATH Condition

    Putting it all together

    There's more…

    Job shape

    Calling a SOAP web service

    Getting ready...

    How to do it...

    How it works...

    There’s more…

    Decoding the response

    Using web service calls in-flow

    Calling a RESTful web service

    Getting ready

    How to do it...

    How it works...

    There's more…

    Reading and writing to a queue

    Getting ready

    How to do it...

    How it works...

    There's more…

    Ensuring lossless queues using sessions

    Getting ready

    How to do it...

    How it works...

    There's more…

    10. Debugging, Logging, and Testing

    Introduction

    Debugging

    Logging

    Testing

    Find the location of compilation errors using the Problems tab

    Getting ready

    How to do it...

    How it works...

    There's more…

    Locating execution errors from the console output

    Getting ready

    How to do it...

    How it works...

    There's more…

    See also

    Using the Talend debug mode – row-by-row execution

    Getting ready

    How to do it...

    How it works...

    There's more…

    See also

    Using the Java debugger to debug Talend jobs

    Getting ready

    How to do it...

    How it works...

    There's more…

    Using tLogRow to show data in a row

    Getting ready

    How to do it...

    How it works...

    There's more…

    Using tJavaRow to display row information

    Getting ready

    How to do it...

    How it works...

    There's more…

    Using tJava to display status messages and variables

    Getting ready

    How to do it...

    How it works...

    Printing out the context

    Getting ready

    How to do it...

    How it works...

    There's more…

    Dumping the console output to a file from within a job

    Getting ready

    How to do it...

    How it works...

    There's more…

    Creating simple test data using tRowGenerator

    Getting ready

    How to do it...

    How it works...

    There's more…

    Creating complex test data using tRowGenerator, tFlowToIterate, tMap, and sequences

    Getting ready

    How to do it...

    How it works...

    There's more…

    Creating random test data using lookups

    Getting ready

    How to do it...

    How it works...

    There's more…

    Creating test data using Excel

    Getting ready

    How to do it...

    How it works...

    There's more…

    Testing logic – the most-used pattern

    Getting ready

    How to do it...

    How it works...

    There's more…

    Killing a job from within tJavaRow

    Getting ready

    How to do it...

    How it works...

    11. Deploying and Scheduling Talend Code

    Introduction

    Context Variables

    Executable code

    Managing job dependencies within Talend

    Creating compiled executables

    How to do it...

    How it works…

    Using a different context

    Getting ready

    How to do it…

    How it works…

    There's more…

    Adding command-line context parameters

    Getting ready

    How to do it…

    How it works…

    There's more…

    Managing job dependencies

    Getting ready

    How to do it…

    How it works…

    There's more…

    Die on error

    Adding error checks to the schedule

    Restartability

    Capturing and acting on different return codes

    Getting ready

    How to do it…

    How it works…

    There's more…

    Returning codes from a child job without tDie

    Getting ready

    How to do it…

    How it works…

    There's more…

    Passing parameters to a child job

    Getting ready

    How to do it…

    How it works…

    There's more

    Executing non-Talend objects and operating system commands

    Getting ready

    How to do it…

    How it works…

    There's more…

    12. Common Mistakes and Other Useful Hints and Tips

    Introduction

    My tab is missing

    How to do it…

    Show view:

    Reset the perspective

    Finding the code routine

    How to do it…

    Finding a new context variable

    How to do it…

    Reloads going missing at each row global variable

    How to do it...

    Dragging component globalMap variables

    Some complex date formats

    Capturing tMap rejects

    Adding job name, project name, and other job specific information

    Printing tMap variables

    Stopping memory errors in Talend

    Increasing the memory allocated to a job

    Reducing lookup data

    Using hashMap/in-memory tables

    Splitting the job

    Dropping data to disk

    Split the files

    Hardware solutions

    A. Common Type Conversions

    B. Management of Contexts

    Introduction

    Manipulating contexts in Talend Open Studio

    Understanding implicit context loading

    Understanding tContextLoad

    Manually checking and setting contexts

    Index

    Talend Open Studio Cookbook


    Talend Open Studio Cookbook

    Copyright © 2013 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First published: October 2013

    Production Reference: 2221013

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham B3 2PB, UK.

    ISBN 978-1-78216-726-6

    www.packtpub.com

    Cover Image by Artie Ng (<artherng@yahoo.com.au>)

    Credits

    Author

    Rick Barton

    Reviewers

    Robert Baumgartner

    Mustapha EL HASSAK

    Viral Patel

    Stéphane Planquart

    Acquisition Editor

    James Jones

    Lead Technical Editor

    Amey Varangaonkar

    Technical Editors

    Monica John

    Mrunmayee Patil

    Tarunveer Shetty

    Sonali Vernekar

    Project Coordinator

    Abhijit Suvarna

    Proofreader

    Clyde Jenkins

    Indexer

    Tejal R. Soni

    Production Coordinator

    Adonia Jones

    Cover Work

    Adonia Jones

    About the Author

    Rick Barton is a freelance consultant who has specialized in data integration and ETL for the last 13 years as part of an IT career spanning over 25 years.

    After gaining a degree in Computer Systems from Cardiff University, he began his career as a firmware programmer before moving into Mainframe data processing and then into ETL tools in 1999.

    He has provided technical consultancy to some of the UK’s largest companies, including banks and telecommunications companies, and was a founding partner of a Big Data integration consultancy.

    Four years ago he moved back into freelance development and has been working almost exclusively with Talend Open Studio and Talend Integration Suite, on multiple projects, of various sizes, in UK. It is on these projects that he has learned many of the lessons that can be found in this, his first book.

    I would like to thank my wife Ange for support and my children, Alice and Ed for putting up with my weekend writing sessions.

    I’d also like to thank the guys at Packt for keeping me motivated and productive and for making it so easy to get started. Their professionalism and most especially their confidence in me, has allowed me to do something I never thought I would.

    About the Reviewers

    Robert Baumgartner has a degree in Business Informatics from Austria, Europe, where he is living today. He began his career in 2002 as a business intelligence consultant working for different service companies. After this he was working in the paper industry sector as a consultant and project manager for an enterprise resource planning (ERP) system. In 2009 he founded his company datenpol—a service integrator specialist in selected open source software products focusing on business intelligence and ERP. Robert is an open source enthusiast who held several speeches at open source events. The products he is working on are OpenERP, Talend Data Integration, and JasperReports. He is contributing to the open source community by sharing his knowledge with blog entries at his company blog http://www.datenpol.at/blog and he commits software to github like the OpenERP Talend Connector component which can be found at https://github.com/baumgaro/OpenERP-Talend-Component.

    Mustapha EL HASSAK is a computer sciences fanatic since many years, he obtained a Bachelor’s Degree in Mathematics in 2003 then attended university to study Information Technology. After five years of study, he joined the largest investment bank in Morocco as an IT engineer. After that he worked in EAI, an IT services company specialized in insurance, as a senior developer responsible of data migration. He has always worked with Talend Open Studio and sometimes with Business Objects. This is the first time he is working on a book, but he wrote several articles in French and English about Talend on his personal blog.

    I would like to thank my parents, Khadija and Hassan, Said, my brother and Asmae, my

    Enjoying the preview?
    Page 1 of 1