Apache Pig - CONCAT()
The CONCAT() function of Pig Latin is used to concatenate two or more expressions of the same type.
Syntax
grunt> CONCAT (expression, expression, [...expression])
Example
Assume that we have a file named student_details.txt in the HDFS directory /pig_data/ as shown below.
student_details.txt
001,Rajiv,Reddy,21,9848022337,Hyderabad,89 002,siddarth,Battacharya,22,9848022338,Kolkata,78 003,Rajesh,Khanna,22,9848022339,Delhi,90 004,Preethi,Agarwal,21,9848022330,Pune,93 005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar,75 006,Archana,Mishra,23,9848022335,Chennai,87 007,Komal,Nayak,24,9848022334,trivendram,83 008,Bharathi,Nambiayar,24,9848022333,Chennai,72
And we have loaded this file into Pig with the relation name student_details as shown below.
grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray, gpa:int);
Concatenating Two Strings
We can use the CONCAT() function to concatenate two or more expressions. First of all, verify the contents of the student_details relation using the Dump operator as shown below.
grunt> Dump student_details; ( 1,Rajiv,Reddy,21,9848022337,Hyderabad,89 ) ( 2,siddarth,Battacharya,22,9848022338,Kolkata,78 ) ( 3,Rajesh,Khanna,22,9848022339,Delhi,90 ) ( 4,Preethi,Agarwal,21,9848022330,Pune,93 ) ( 5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar,75 ) ( 6,Archana,Mishra,23,9848022335,Chennai,87 ) ( 7,Komal,Nayak,24,9848022334,trivendram,83 ) ( 8,Bharathi,Nambiayar,24,9848022333,Chennai,72 )
And, verify the schema using describe operator as shown below.
grunt> Describe student_details; student_details: {id: int, firstname: chararray, lastname: chararray, age: int, phone: chararray, city: chararray, gpa: int}
In the above schema, you can observe that the name of the student is represented using two chararray values namely firstname and lastname. Let us concatinate these two values using the CONCAT() function.
grunt> student_name_concat = foreach student_details Generate CONCAT (firstname, lastname);
Verification
Verify the relation student_name_concat using the DUMP operator as shown below.
grunt> Dump student_name_concat;
Output
It will produce the following output, displaying the contents of the relation student_name_concat.
(RajivReddy) (siddarthBattacharya) (RajeshKhanna) (PreethiAgarwal) (TrupthiMohanthy) (ArchanaMishra) (KomalNayak) (BharathiNambiayar)
We can also use an optional delimiter between the two expressions as shown below.
grunt> CONCAT(firstname, '_',lastname);
Now, let us concatenate the first name and last name of the student records in the student_details relation by placing ‘_’ between them as shown below.
grunt> student_name_concat = foreach student_details GENERATE CONCAT(firstname, '_',lastname);
Verification
Verify the relation student_name_concat using the DUMP operator as shown below.
grunt> Dump student_name_concat;
Output
It will produce the following output, displaying the contents of the relation student_name_concat as follows.
(Rajiv_Reddy) (siddarth_Battacharya) (Rajesh_Khanna) (Preethi_Agarwal) (Trupthi_Mohanthy) (Archana_Mishra) (Komal_Nayak) (Bharathi_Nambiayar)