Monday, 18 August 2014

Talend Reject Records

talend Reject Records Capturing


The following post will explain how to capture the reject records from input flow. Lets see capturing reject records from tFileInputDelmited component.


Step 1: Create file with following content

id;CustomerName;RegisterTime
1;Griffith Paving and Sealcoatin;2001-01-17 06:26:40.000
2;Bill's Dive Shop;2002-06-07 09:40:00.000
3;Bill's Dive Shop;2002-06-07 09:40:00.000
3;Childress Child Day Care;1990-04-01 21:00:00.000
4;Facelift Kitchen and Bath;1972-04-23 18:00:00.000
5;Terrinni & Son Auto and Truck;1982-04-19 10:26:40.000
6;Kermit the Pet Shop;2006-05-27 17:00:00.000
7;Tub's Furniture Store;1970-03-27 23:08:16.000
8;Toggle & Myerson Ltd;2005-08-02 01:26:40.000
9;Childress Child Day Care;1994-05-03 11:13:20.000
10;Elle Hypnosis and Therapy Cent;1975-06-10 20:20:00.000
11;Lennox Air Pollution Control;1983-02-26 17:08:16.000
12;Keyth Contracting and Repair;2001-09-10 11:01:36.000
13;Park District Of America;2005-10-02 00:34:56.000
14;Nirabi Auto Service;1997-02-18 21:06:40.000
15;Darcy Frame and Matting Servic;2001-01-07 20:40:00.000
16;Glenwood Credit Union;2003-12-03 19:08:16.000
17;Gourmet the Frog;1983-01-31 22:26:40.000
18;Acturial Enterprises Ltd.;2004-05-31 15:00:00.000
19;Salt & Pepper Catering Service;2006-01-24 10:53:20.000




Step 2: Create Metadata for the above file

Configure metadata like following


Step 3: Create job using this metadata

Create job like following
To create job Right Click on Job Designs and choose Create Job


Step 4: Design job using with metadata

  1. Drag and drop configured metadata "customer" to job select "tFileInputDelimited"
  2. Drag two tLogRow components from panel to job
  3. Connect the tLogRow components with main connection and reject connection 
Design job like following




Step 5: Execute the job

Execute job using with "F6" or "Run" button.
After execution of job you will get 20 records in main flow and 0 records for rejection flow because you don't have reject as per records

Step 6: Change the records in file against the schema

Change some records against to the schema like following

id;CustomerName;RegisterTime
ab;Bill's Dive Shop;2002-06-07 09:40:00.000
4;Childress Child Day Care;07-09-20007 09:40:00


Step 6: Execute job again



Execute job using with "F6" or "Run" button.
After execution of job you will get 18 records in main flow and 2 records for rejection flow.

Friday, 13 September 2013

Talend - tFlowToIterate

About tFlowToIterate:

It is used to convert Main flow to iterate flow. It iterates input and convert into global variables. We can use these global variable at any situation in the job

Difference between Main and Iterate connections

Main Flow: It will process the records as batch wise. 
Iterate Flow: It will process the records as record wise 


If  there are 1000 records to process, batch size has been fixed to 200 then Main connection will process 200 records at a time as a batch, but iterate will process one by one record only.


Sample Job

We will see one sample job about this, In database  "SALES" table which contains month wise sales details. This data need to move to CSV files but One CSV file should be generated per one month. If table contains 12 months data then 12 files should be generated.

SALES Table Data like below table


Job process will be like below image


Step by Step Process;

Step 1: Define one context variable "filename" to store the name of the file.




Step 2: Read the distinct month from database using tOracleInput component
 Query:
                  "Select distinct month from Sales"





Step 3:  Set the file name with month  using with "tJavaRow" code given below
Code:
             output_row.MONTH = input_row.MONTH;
             context.filename=input_row.MONTH+".csv";





Step 4: Convert main flow to Iterate flow

Step 5:  Read the data from SALES table for each month using "tOracleInput" Component
Query:
          "SELECT S.CUST_ID,S.CUST_NAME,S.TIME_ID,S.AMOUNT_SOLD FROM SALES S       
           WHERE MONTH='"+((String)globalMap.get("row4.MONTH"))+"'"




Step 6: Load data in CSV file using with "tFileOutputDelimited" component using the following in File Path

Code:

                   "E:/Notes/"+context.filename




Wednesday, 28 August 2013

Talend File Rank


I have received some queries about apply Dense Rank for the files. Below post will give clear idea.
Please have a look at the below table.
I have to generate Dense Rank depending upon "Total Sales" column values


CustomerID Name Total_Sales
1010 Pivot Point College 1127
1018 Petes Auto Repaire 1127
1022 Green Bay Travel Services 1127
1009 BBQ Smith's Tex Mex 1137
1017 janice Mann Accounting Service 1137
1021 Bee Line Publishing 1137
1007 Vanity Town 1199
1015 Resnick DDS 1199
1019 Michael Montgomery Hair Saloon 1199
1006 Nitelites Party Rental 1227
1014 Acturial Enterprises Ltd. 1227

The following job will give the rank for the above table.

  1. tFineInputExcel: Reading data from excel with above table
  2. tSortRow : Sort the rows based on "Total Sales" either Ascending / Descending order
  3. tMemorizeRows: Memory the last two rows of "Total  Sales" values
  4. tJavaFlex: I have written some java code here for Rank I explained in below
  5. tMap: To map source and destination
  6. tFileoutputExcel: To load data in destination




Declare one variable in context "cnt"

Java code in tJavaFlex:

Start Code:
                              int count=0;

Main Code:

         if  (!TOTAL_SALES_tMemorizeRows_2[0].equals(TOTAL_SALES_tMemorizeRows_2[1]))
             {
                     count++;
              }
        context.cnt=count;



Finally you will get output like below table.

CustomerID Name Total_Sales Rank
1010 Pivot Point College 1127 1
1018 Petes Auto Repaire 1127 1
1022 Green Bay Travel Services 1127 1
1009 BBQ Smith's Tex Mex 1137 2
1017 janice Mann Accounting Service 1137 2
1021 Bee Line Publishing 1137 2
1007 Vanity Town 1199 3
1015 Resnick DDS 1199 3
1019 Michael Montgomery Hair Saloon 11993
1006 Nitelites Party Rental 12274
1014 Acturial Enterprises Ltd. 12274


Friday, 16 August 2013

Talend Tmap Join

About tMap

In Talend tMap is very important component. It is used in map source(s) and destination(s).

It contains the following features.
  • Extract data from multiple sources using with join (Inner Join / Left outer Join)
  • Loading data into multiple destinations
  • Applying different types of transformations to data
  • Applying filter conditions for source / destination data
  • Type casting
  • Capture Reject records
To extract the data from multiple sources we need to join sources. While join the sources we can change tMap settings in that four type of settings are available
  1. Lookup Model (Load once/ Reload at each row/ Reload at each row (Cache))
  2. Match Model (Unique Match/ First Match/ Last Match)
  3. Join Model (Inner Join/ Left outer join
  4. Store Temp Data (True/ False)

Match Model:

This option is useful whenever you have multiple records in "LookUp" table. The following options are available in Match Model

  1. Unique Match  :    It will match with last record of the look up table.
  2. First Match      :    It will match with first record of the loop up table
  3. All Matches     :    It will match with all records with the loop up table
My Video about Match models

The following example will explain you about "Match Models" under tMap component

Suppose I have data like following

Main File

CustomerID CustomerName AddressID
1
Pivot Point College
1001
2
Petes Auto Repaire
1002
3
Green Bay Travel Services
1003
4
BBQ Smith's Tex Mex
1004
5
janice Mann Accounting Service
1005
6
Bee Line Publishing
1006
7
Vanity Town
1007
8
Resnick DDS
1008
9
Michael Montgomery Hair Saloon
1009
10
Nitelites Party Rental
1010
11
Acturial Enterprises Ltd.

Lookup File

AddressId Address
1001
12 Craven Way
1001
929 Brittany Ave.
1002
45 South Deer Creek Drive
1003
11486 Oakwood
1004
3181 Barkwood Ct.
1005
242 Ashland Ave.
1006
3059 Priscilla Court
1007
965 Marion Place Apt. 65C
1008
1535 Knollwood Drive
1009
1658 Second St.
1010
511 Maple Ave. Apt. 1B
    tMap Settings Image

Unique Match Output

CustomerID CustomerName AddressID Address
1
Pivot Point College
1001
929 Brittany Ave.                
2
Petes Auto Repaire
1002
45 South Deer Creek Drive
3
Green Bay Travel Services
1003
1486 Oakwood
4
BBQ Smith's Tex Mex
1004
3181 Barkwood Ct.
5
janice Mann Accounting Service
1005
242 Ashland Ave.
6
Bee Line Publishing
1006
3059 Priscilla Court
7
Vanity Town
1007
965 Marion Place Apt. 65C
8
Resnick DDS
1008
1535 Knollwood Drive
9
Michael Montgomery Hair Saloon
1009
1658 Second St.
10
Nitelites Party Rental
1010
511 Maple Ave.  Apt. 1B

First Match output

CustomerIDCustomerNameAddressIDAddress
1
Pivot Point College
1001
12 Craven Way                    
2
Petes Auto Repaire
1002
45 South Deer Creek Drive
3
Green Bay Travel Services
1003
1486 Oakwood
4
BBQ Smith's Tex Mex
1004
3181 Barkwood Ct.
5
janice Mann Accounting Service
1005
242 Ashland Ave.
6
Bee Line Publishing
1006
3059 Priscilla Court
7
Vanity Town
1007
965 Marion Place Apt. 65C
8
Resnick DDS
1008
1535 Knollwood Drive
9
Michael Montgomery Hair Saloon
1009
1658 Second St.
10
Nitelites Party Rental
1010
511 Maple Ave.  Apt. 1B

All Matches Output

CustomerIDCustomerNameAddressIDAddress
1
Pivot Point College
1001
12 Craven Way                    
1
Pivot Point College
1001
1929 Brittany Ave.               
2
Petes Auto Repaire
1002
45 South Deer Creek Drive
3
Green Bay Travel Services
1003
1486 Oakwood
4
BBQ Smith's Tex Mex
1004
3181 Barkwood Ct.
5
janice Mann Accounting Service
1005
242 Ashland Ave.
6
Bee Line Publishing
1006
3059 Priscilla Court
7
Vanity Town
1007
965 Marion Place Apt. 65C
8
Resnick DDS
1008
1535 Knollwood Drive
9
Michael Montgomery Hair Saloon
1009
1658 Second St.
10
Nitelites Party Rental
1010
511 Maple Ave.  Apt. 1B

Thursday, 28 February 2013

Talend XML - Grouping Elements

Converting CSV file to XML file with grouping of elements for example i have records in CSV 
file like following.
CSV File Content:

                                     CONTINENT,COUNTRY

                                     Asia,India

                                     Asia,Pakistan

                                     Asia,Bangladesh

                                     Asia,Nepal

                                     Asia,Srilanka

                                     Africa,Ethiopia
                                     Africa,Kenya
                                     Africa,Egypt
                                     Africa,Algeria
                                    Africa,Uganda
                                    Africa,Tanzania
Out XML Should be:


                            <?xml version="1.0" encoding="ISO-8859-15"?>

                                <root>

                                   <CONTINENT CONTINENT="Asia">

                                               <row/root COUNTRY="India"/>

                                               <row/root COUNTRY="Pakistan"/>

                                               <row/root COUNTRY="Bangladesh"/>

                                               <row/root COUNTRY="Nepal"/>
                                               <row/root COUNTRY="Srilanka"/>
                                  </CONTINENT>
                                 <CONTINENT CONTINENT="Africa">
                                               <row/root COUNTRY="Ethiopia"/>
                                               <row/root COUNTRY="Kenya"/>
                                               <row/root COUNTRY="Egypt"/>
                                               <row/root COUNTRY="Algeria"/>
                                               <row/root COUNTRY="Uganda"/>
                                               <row/root COUNTRY="Tanzania"/>
                                 </CONTINENT>
                            </root>


Job Process:

Step 1: Create tFileInpuDelimited to read the data from CSV file
Step 2: Take tFileOutputXML map Input and Output using with tMap
Step 3: In tFileOutputXML Change Advance Setting Add the Group By Image given below








Sunday, 3 February 2013

Talend - Merge Files


Scenario 1:
  • One file has 50 rows and another file has 30 rows
  • If you want to combine both and need to get output as 80 rows
Solution:
  • Read the both files with tFileInputFullRow
  • Combine the files with tUnite component
Please find the screenshot below



Scenario 2:
  • One Template File is there 
  • Daily I am getting files and I expected format of Template file
  • If format is same then I want process those.
Solution
  • Read the both files with tFileInputDelimited
  • Join (Left outer Join) the both files using with tJoin / tMap 
  • Note: TemplateFile should be connect with Lookup  and capture the output from Main Section 
Please find the screenshot below




Friday, 25 January 2013

Talend Course Topics

Below is the process of my Talend online training in which i am going to explain from creation of projects to deployment of projects.
  • talend installation
  • Overview of talend architecture
  • Creating project and import/export project
  • Overview about Repository, Designer, Pallete
  • Explanation about different type of components and its settings
         450 components has been divided into 21 types
  • Creating different types of Jobs     - Master jobs / Child jobs / Quality jobs
  • Mapping Data flows
  • Declaration of local and global Context variables
  • Execution logging and Data Viewers
  • Applying security with hashing technology
  • Export & Import the jobs
  • Different types of deployment
  • Scheduling the jobs with batch file and shell script
  • Routines development
  • Using SQL Templates
  • Introduction to Components development
  • Overview about Big Data Components and MongoDB components

Day wise plan

Day Topic
1
Talend Introduction
2
Talend Architecture
3
File Components ( Management)
4
File Components (Input & Output)
5
tMap Component
6
tMap Component
7
Context & Global variable
8
Project on File Components
9
Logs & Error Components
10
Processing Components
11
Internet Components
12
Data Quality components
13
Miscellaneous components
14
Custom code components
15
Introduction about Database
16
Database components
17
Database components
18
DB SCD components
19
ELT components
20
Project on DB Components
21
Orchestration components
22
Global context variables
23
Preparation of Master Jobs
24
Parsing values one job to another
25
Routine Development
26
Introduction about Components Development
27
Export & Import Jobs
28
Deployment of jobs
29
Talend Administration center
30
Talend Administration center